NLP on PDF Data

Overview

This is a series of works on analyze PDF files, each module and its function are listed below.
All modules are integrated into streamlit app, which provide some selectable options
Local run
1. Clone repository to local doc
2. Intall all the dependencies
```
 pip install -r requirements.txt 
```
3. Run streamlit app
```
 streamlit run app.py
```

Import from folder: using pypdf2 to conduct text extraction, might not be accurate but quite fast.
Import from file: read csv/pickle type files that have include text data inside.