Extractor for NUS High Programme of Studies (POS) from PDF to CSV.
- Create a Python 3 virtual environment (i used python 3.11 but most versions >3.8 should work)
pip install -r requirements.txt
- Download the POS of choice and save in same folder as scripts as "POS.pdf"
- Run PDFFilter.py, will create merged.pdf
- Open merged.pdf with mIcrOSoft WoRD and save it as table.docx in same folder
- Run CSVFromWord.py, pos.csv should be generated
- Profit
This repo includes pos.csv generated from POS for C2028
Current information included: "department", "level", "sem", "code", "type", "title", "description", "mcs", "prerequisites", "preclusions", "corequisites", "hrs", "remarks"
PDFFilter.py filters all horizontal pages as pages are horizontal if and only if they contain useful table data We exploit miCrOSOft woRd'S ability to open PDFs to make the table into a pdf because pdf tables are impossible to manipulate Then use CSVFromWord.py to deal with scuffed newlines and put it in CSV format