Extract Text from a PDF file using Python
pip install PyPDF2
PyPDF2 is a pure-python PDF library capable of splitting, merging together, cropping,
and transforming the pages of PDF files
Importing required modules
import PyPDF2
Creating a pdf file object
pdfFileObj = open('file location', 'rb') #Replace file location
Creating a pdf reader object
pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
Printing number of pages in pdf file
print(pdfReader.numPages)
Creating a page object
pageObj = pdfReader.getPage(0)
Extracting text from page
print(pageObj.extractText())
Closing the pdf file object
pdfFileObj.close()