Basic Selenium

How To Read PDF Files In Python Using PyPDF2 Library

How To Read PDF Files In Python Using PyPDF2 Library

Reading and Writing to PDF files in Python is quite easy, we have different libraries or packages in Python which can help us to achieve our task. In this article, I will show you how to read PDF files in Python using PyPDF2 package.

In case you are new to automation then do check our Selenium tutorial which covers everything from basic till advance.

Official Link for PyPDF2 https://pypi.org/project/PyPDF2/

How To Read PDF Files In Python Using PyPDF2 Library

Step 1- Install PyPDF2

pip install PyPDF2

Step 2- Write the below code which can help you read pdf

import PyPDF2
#Open File in read binary mode
file=open("sample.pdf","rb")

# pass the file object to PdfFileReader
reader=PyPDF2.PdfFileReader(file)

# getPage will accept index
page1=reader.getPage(0)

# numPage will return number of pages in pdf
print(reader.numPages)

#extractText will return the text
pdfData=page1.extractText()

#print the data
print(pdfData)

page2=reader.getPage(1)

print("Data from page 2",page2.extractText())

Add assert to verify the PDF content

import PyPDF2

file=open("sample.pdf","rb")

reader=PyPDF2.PdfFileReader(file)

page1=reader.getPage(1)

pdfData=page1.extractText()

print(pdfData)

# asserting the keyword in PDFData which is retured from PDF
assert "boring" in pdfData

assert "Mukesh" in pdfData

I hope this post was useful to you. Keep learning.

author-avatar

About Mukesh Otwani

I am Mukesh Otwani working professional in a beautiful city Bangalore India. I completed by BE from RGPV university Bhopal. I have passion towards automation testing since couple of years I started with Selenium then I got chance to work with other tools like Maven, Ant, Git, GitHub, Jenkins, Sikuli, Selenium Builder etc.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.