How To Manipulate PDF Files in Linux With PDFtk

PDF
(Image credit: Tom's Hardware)

One would think that being called a “swiss army knife” would be the highest compliment a tool or utility could aspire to. Yet, the expression is oddly insufficient when describing PDFtk, an incredibly versatile utility that can be used to manipulate PDF files in myriad ways. According to its man page, if PDF is electronic paper, then PDFtk is an electronic staple-remover, hole-punch, binder, secret-decoder-ring, and X-Ray-glasses.

There’s a number of operations you can perform on PDF files, such as merging, splitting, rotating, breaking into single pages, encrypting and decrypting, applying or removing watermarks, compressing or decompressing, etc. PDFtk, which stands for PDF toolkit can manage all these tasks, and more. PDFtk has been around for years and is available in the software repositories of most popular desktop distributions.

To install on an Ubuntu / Debian machine:

1. Open a terminal window and update the list of repositories.

$ sudo apt update

2. Install pdftk using this command.

$ sudo apt install pdftk

How to use PDFtk

The process of merging multiple PDF files into a single document is fairly straightforward with PDFtk. We only have to provide a space separated list of files, in the order you wish to add them, and the name of the output file.

$ pdftk 1.pdf another.pdf 3.pdf yet-another.pdf fifth.pdf output combined.pdf

If all of the files are sequentially numbered or named, and in the same directory, wildcards can be used instead of typing the name of each file individually.

$ pdftk *.pdf output compilation.pdf

To use a specified range of pages from multiple PDFs, we have to make use of ‘handles’. Instead of referring to the different files directly by name, we create aliases, called ‘handles’, for the different files. We can then select different pages from these files, and even perform different operations on them. 

Here is an example that will take 15 pages from one file, and only 20 from the second, where both files are actually more than a 100 pages each, we can do that by using handles. Notice how we’ve used alphabets A and B to identify our two input PDF files. 

These are called ‘handles’. You can use multiple such ‘handles’ with PDFtk where each ‘handle’ is a separate PDF file. You can only use uppercase letters to define ‘handles’ and we can use multiple letters per handle to create as many handles as we require, for example A, AB, ASD. 

Note that unlike previous examples, we’ve also added ‘cat’ to this command. This enables us to refer to the different ‘handles.’ Otherwise PDFtk is unable to locate the different PDF files. Finally, we’ve used page ranges with the different ‘handles’ to define the sequence of pages in the final output file.

$ pdftk A=first-file.pdf B=second-file.pdf cat A1-20 B10-35 output mixed-file.pdf

We’ve only seen a merge operation so far, but we can easily break a PDF file into smaller parts with PDFtk by utilizing the ‘cat’ operation along with the desired page range:

$ pdftk filename.pdf cat 1-5 output fivepages.pdf
$ pdftk filename.pdf cat 1-5 10-end removespages6-9.pdf

Making sense of the page ranges

There are many different options available when working with page ranges. Apart from using numbers, such as 1-4, 20-32, etc., we can also use certain words such as end, odd, even. A page range of 14-end means all pages from 14 to the end of the file. Similarly, the page range 14-20 odd will only include odd numbered pages within the given range.

When specifying the page range, PDFtk will let us use a number of keywords to easily identify the pages you wish to use. These can be odd, even, and we can even use the letter ‘r’ to refer to page numbers in reverse. 

So, r1 is the last page in the PDF file, and r5 is the fifth page from the end, and a range of r1-r3 means the last three pages, starting from the end. When working with page ranges, it’s also important to remember that for /PDFtk, a page range of 1-6 isn’t the same as 6-1. 

While the former means the first six pages in sequence, the latter means that pages are assembled in reverse order, starting from the sixth. We can easily use the page range feature to either split a PDF into smaller parts, or remove entire pages. Unfortunately, there isn’t a single command that can help you split a 100 page PDF file into five smaller ones each with 20 pages. But, if we want to break a PDF file such that each page becomes a separate PDF file, we can do that with the ‘burst’ command.

$ pdftk compilation.pdf burst
$ ls
compilation.pdf  pg_0001.pdf  pg_0003.pdf  pg_0005.pdf
doc_data.txt     pg_0002.pdf  pg_0004.pdf  pg_0006.pdf

How to rotate pages with PDFtk

Graphical PDF readers such as Adobe Acrobat or Evince can easily rotate a PDF file. This is especially helpful when the entire file needs to be rotated in the same direction. But what happens when the PDF file comprises only a few pages that require rotation? Thankfully, PDFtk provides a quick solution. 

The ‘rotate’ command operation takes a single input PDF file, and rotates just the specified pages. The rotation can be north, south, east, west, left, right or down. In the following example, we’ve rotated the fifth page to the left, i.e, anti-clockwise while all other pages remain unchanged. The output file, done.pdf, takes the first four pages as it is, then the fifth page is rotated left, and then finally the remaining pages are used to create the output file.

$ pdftk A=filename.pdf cat A1-4 A5left A6-end output done.pdf

To rotate all the odd numbered pages in a PDF file, use the following command.

$ pdftk A=filename.pdf cat Aoddleft output odd-pages.pdf

This command creates a PDF file comprising only the odd numbered pages from the input file, with all the pages rotated in the direction specified. We can similarly rotate all the even numbered pages with the following command:

$ pdftk A=filename.pdf cat Aevenright output even-pages.pdf

To put two odd and even files together use the ‘shuffle’ command operation, which creates the output file by picking one page at a time from each handle.

$ pdftk A=odd-pages.pdf B=even-pages.pdf shuffle A B output properlyrotated.pdf

How to use advanced features of PDFtk

If we’re working with sensitive data and wish to protect it from prying eyes, PDFtk can encrypt PDF files by using the user_pw command option.

By default, PDFtk commands run quietly, and don’t produce any output. Append verbose at the end of the command to change its behaviour. (Image credit: Tom's Hardware)
$ pdftk filename.pdf output pw-protected.pdf user_pw thisisthepassword

The newly created pw-protected.pdf file cannot be accessed without the password.

The input_pw command option can similarly be used to provide the password if any of our input files are password protected. In the first example, only one input file is encrypted, but in the second case, both the input files require a password. This is why we’ve provided the password for each with the input_pw command option.

$ pdftk A=filename.pdf B=secure-file.pdf input_pw B=password shuffle A B output shuffled.pdf user_pw thisisthepassword
$ pdftk A=secure.pdf B=safe.pdf input_pw A=foopass B=passfoo cat A1 B1 output filename.pdf user_pw foopassfoo

While it’s not possible to remove password protection from a file, an easy solution is to create a new file that’s not password protected. Once again, we must use the input_pw command option to provide the password used to encrypt the file.

$ pdftk secure.pdf input_pw password output unsafe.pdf

There’s plenty more you can do with PDFtk such as applying watermarks to a PDF file, or changing its metadata, etc. Refer to the project’s man page to learn more about these and other capabilities.

Refer to the PDFtk main page, for a quick introduction and examples on examples on how to optimally use the different command options. (Image credit: Tom's Hardware)
$ man pdftk
Shashank Sharma
Freelance Writer

Shashank Sharma is a freelance writer for Tom’s Hardware US, where he writes about his triumphs and joys of working with the Linux CLI.