How to Generate PDF documents Using Python?

pdf to python

PDFs (Portable Document Format) are everywhere. The digital era has made it the universal language of documents. Be it an invoice, a report card, an e-book, or an airline ticket, I’m sure you’re looking at a PDF. Unlike Word documents, PDFs preserve layouts, look professional, and can be shared across different platforms without worrying about the format changing.

When it comes to generating PDF documents using Python, Python libraries help you arrange the text, images, and formatting elements via coding into professional PDF documents. Broadly speaking, with Python, you can generate a PDF from scratch: a blank canvas’s layout and content will be designed entirely by code. Another approach is to modify an existing PDF, making changes to it (text, images, or annotations). Lastly, a PDF with data from either Excel or CRM can be populated using libraries that support form filling.

In this blog, you’ll learn:

  • How to create PDFs using Python
  • How to format and style your documents
  • How to generate multiple PDFs from Excel data with Python
  • How to generate multiple PDFs using CRM data
  • Troubleshooting common pitfalls
  • Best Practices to scale and optimize the process

Did you know?

Surveys indicate that over 70% of individuals express interest in learning basic automation tools like Python, emphasizing its accessibility.

And if your Python skills are limited and you operate a business requiring the creation of over 1000 documents daily, depending solely on Python for document generation can pose significant challenges. However, there’s an alternative solution: employing document generation software such as Perfect Doc Studio.

Let us first look at some beginner-friendly prerequisites required to work with PDFs.

Required Libraries

1. ReportLab – This creates new PDFs from scratch, designing and generating PDFs like invoices, certificates, or reports. You can install it by running this on the terminal:

pip install reportlab

2. PYPDF2 – This aids in editing or modifying existing PDFs. PyPDF2 is your go-to library for merging, splitting, or adding watermarks to an existing PDF. You can install it by running this on the terminal:

pip install reportlab

3. Pandas – This aids in reading and handling Excel or CSV data (especially when looping through rows). You can install it by running this on the terminal:

pip install pandas

Before reading further, visit Quora’s discussion on the best Python libraries for crafting PDFs for a community driven overview.

Step-by-Step: How to Create PDFs Using Python

Let’s look at how we can create a simple PDF with some text:

Step 1: You have to import ReportLab (Open the terminal and run this code)

from reportlab.pdfgen import canvas

This first step is crucial as it sets you up with the tools to create something on a blank canvas.

Step 2: Start by creating a blank canvas

c = canvas.Canvas("canvas.pdf")

This will create a blank PDF file on your system named canvas.pdf

Step 3: Add the text you want to appear on your blank canvas

c.drawString(100, 750, "This is a sample PDF!")

The numbers in the brackets are the distances from the edges of the PDF.

100 is the distance from the left edge, and 750 is the distance from the bottom edge.

Step 4: The last step is to save the PDF

c.save()

The last step finalizes and writes the PDF to your system. Without [.save()] the text on the blank canvas will not be saved.

Below is the complete code:

from reportlab.pdfgen import canvas

c = canvas.Canvas("hello.pdf")
c.drawString(100, 750, "Hello, PDF world!")
c.save()

All you have to do is run this code and check your project folder, and you’ll see a PDF file titled canvas.pdf

Would you be interested in discovering the process of generating Word documents with Python? Delve into the comprehensive, step-by-step guide provided in the blog.

How to Add Headings, Lines, and Shapes

The example we saw earlier was quite basic. Now, let’s go a little further and learn to format things more professionally.

How to Add a Title With setFont() and drawString()
c.setFont("Helvetica-Bold", 20)
c.drawString(100, 800, "My First PDF Document")
How to Draw a Line using line()
c.line(100, 790, 500, 790)
How to Insert a Rectangle (like a border or box) rect()
c.rect(100, 700, 300, 100)  # (x, y, width, height)

How to Set Page Size and Orientation

from reportlab.lib.pagesizes import A4, landscape

c = canvas.Canvas("landscape_example.pdf", pagesize=landscape(A4))
How to Set Margins and Plan Layouts

This helps to draw layout guides. Basically, this means you can visually plan the margins.

for y in range(100, 800, 50):
    c.drawString(50, y, f"{y}")
How to Add Multiple Pages
c.showPage()
c.drawString(100, 750, "This is page 2")
How to Add Images
c.drawImage("logo.png", 100, 600, width=120, height=60)

Remember, you’re working with a blank canvas, and you need to plot the points. The (0,0) coordinate starts at the bottom left.

How to Generate Multiple PDFs from Excel Data Using Python

excel sheet to pdf

If you don’t want to create documents from scratch but instead want to populate data using a data source like an Excel file, Python can also help with that.

For instance, if you have an Excel file with rows of customer names, invoice amounts, and other details, Python helps you generate a personalized PDF invoice for each row.

Below is the Code:

import pandas as pd
from reportlab.pdfgen import canvas

# Step 1: Load the Excel file
df = pd.read_excel("customer_data.xlsx")  # Use openpyxl engine 

# Step 2: Loop through each row and generate PDF
for index, row in df.iterrows():
    name = row['Name']
    amount = row['Amount']
    due = row['Due Date']
    
    filename = f"{name}_invoice.pdf"
    c = canvas.Canvas(filename)
    
    c.setFont("Helvetica-Bold", 16)
    c.drawString(100, 750, "Invoice Document")
     c.setFont("Helvetica", 12)
    c.drawString(100, 720, f"Customer: {name}")
    c.drawString(100, 700, f"Amount Due: ${amount}")
    c.drawString(100, 680, f"Due Date: {due}")
    
    c.save()

The code written with # will not run. It’s there to provide guidance. Some additional things you have to note are rewriting the Excel file’s name when you run the code to match the file on your system. Since there is no output path set, by default, the PDF files will be saved in the same directory (folder) where the Python script is being run. Make sure that the column headers in the Excel file match your script (Name, Amount, etc.).

Here is a sample PDF output: (Must include photo, photo is also new)

invoice

How to Generate Multiple PDFs Using CRM Data

Most people use CRM tools like HubSpot, Zoho, or Salesforce. That means customer data is stored in CSV exports or accessible through APIs. We’ll be looking at how to turn that data into PDFs. Most CRMs allow users to export customer records as a CSV file, which is almost the same as an Excel file with data.

For instance, a file with customer data can be used to send thank-you notes, onboarding documents, and more.

import pandas as pd
from reportlab.pdfgen import canvas

df = pd.read_csv("crm_export.csv")

for _, row in df.iterrows():
    name = row['Full Name']
    plan = row['Subscription Plan']
    
    c = canvas.Canvas(f"{name}_welcome_letter.pdf")
    c.setFont("Helvetica", 14)
    c.drawString(100, 750, f"Dear {name},")
    c.drawString(100, 730, f"Thank you for subscribing to the {plan} plan.")
    c.drawString(100, 710, "We're excited to have you on board!")
    c.save()

Some additional things you have to note are rewriting the CSV file’s name when you run the code to match the file on your system. Since there is no output path set, the PDF files will be saved in the same directory (folder) where the Python script is being run by default.

Troubleshooting Common Pitfalls and Tips for Beginners

1. Coordinates for PDFs: In PDFs, the coordinates count from the bottom left, not top left like in HTML or Word.

2. Styling Text: It’s not easy for beginners to play around with fonts, sizes, and spacing to make things readable and professional. Until you learn more, stick to built-in fonts like Helvetica, Times New Roman, and others.

3. Custom Fonts: If you’re using a custom font, ensure it is registered with reportlab.pdfbase.ttfonts.

Below is the py code:

from reportlab.pdfbase import pdfmetrics
from reportlab.pdfbase.ttfonts import TTFont

pdfmetrics.registerFont(TTFont('MyFont', 'MyFont.ttf'))
c.setFont("MyFont", 12)

4. PDF File Looks Empty or Missing Text: If you’ve written the text, but it’s not visible on the PDF, most likely due to incorrect coordinates. Double check that your drawString() or other elements are within (0,0) to (595, 842) for an A4 page.

5. Overlapping Text or Content: If you use static coordinates for each line, it can cause the text to stack on top of each other. You should use a variable Y position and decrement it after each line to move the text downwards:

y = 750
for item in items:
    c.drawString(100, y, item)
    y -= 20  # move down for next line

 “We’ve also put together a practical video tutorial on generating Word documents directly from Excel data using Python.”

Interested to learn more?

We’ve looked at some of the basics, and there is a lot more that Python can do. For instance:

  • Generate invoices with dynamic tables
  • Automate entire workflows for reporting
  • Create interactive forms with Python
  • Performance optimization with ReportLab’s canvas approach
  • Combine PDFs using PyPDF2

Are you facing challenges in generating documents? Read our blog to discover a range of tools that can be better alternatives.

Perfect Doc Studio is a practical alternative for those who do not know coding. It is especially efficient when complex document workflows involve embedding structured data, using customizable layouts, or automating multi-step generation processes. For tasks that go beyond simple templates or one-off documents, Perfect Doc Studio offers a structured environment without the overhead of writing scripts.

Python has powerful capabilities for creating documents across various business applications. The combination of ReportLab and PyPDF2 offers comprehensive solutions for professional document generation. The investment in learning PDF generation pays off, it’s obvious because of the efficiency, consistency, and professional presentation of your organization’s documents.

Start with simple examples and gradually move towards complex or advanced features. However, remember to consider performance implications for large-scale applications and implement appropriate solutions.

Good luck with coding–and may your PDFs always be perfectly formatted!

Additional Resources

ReportLab Docs: This is the official documentation for ReportLab Library, guiding users to generate PDF documents in Python. If you are interested in creating, editing, and stying PDFs, this guide can help you understand how features like rendering, graphics, and page layout management can be used effectively.

PyPDF2: PYPDF2 is a library capable of splitting, merging, cropping, and transforming the pages of PDF files. It also alters viewing option and protects PDF files by using passwords. The guide will help you understand the library’s capabilities and functionalities.