How to Read/Extract QR Codes from PDF Files using Node.js

😡 😡 β€” > πŸ™ PDF β€” > πŸ‘ Image β€” > πŸ’ͺQR Code β€” > 😎 😎

Sivaram P
3 min readJan 3, 2021

There comes a time in one’s career where they need to read & decode the contents of QR Codes present in PDF Files programatically 😩. A few weeks ago such was the phase of my career. 😣

After spending a lot of time trying to find an optimum way to automate the process of extracting QR Code data from a set of PDFs using Javascript, there seemed to be surprisingly little helpful information out there in the wild 😟.

Alas, No StackOverflow to the rescue for me this time! 😰 πŸ™.

After piecing together the little bits and pieces of information I found on the various steps required to achieve this proccess’s automation, I came up with this solution that worked quite well for my use case! 😀

Lets get Started 😊

The QR Code which we will be trying to extract from our sample PDF file.

Content String: β€œHello from the other side”

Required NPM Dependencies:-

  1. pdf2pic β€” A utility for converting pdf to image and base64 format.
    ** NOTE: pdf2picneeds these dependencies to work mandatorily. Make sure to install them prior to using this utility. **
  2. pngjs β€” Simple PNG encoder/decoder for Node.js with no dependencies.
  3. jsqr β€” A pure javascript QR code reading library.

β€˜Nuff talk show me the code right ? πŸ˜‘β€¦. RIGHT!

We need to begin with passing the path to the PDF file to the fromPath method of pdf2pic utility along with the options object shown in the code block below.

NOTE: pdf2pic depends upon GraphicsMagick (GM) library under the hood. For more details on the options exposed by pdf2pic utility visit GM’s website.

fromPath method returns a Convert method that takes pageNumber & isBase64 as its parameters. We pass values 1 & true as arguments to the aforementioned parameters respectively.

pageNumber β€” page number to be converted to image
isBase64β€” if true, convert() will return base64 output instead

If successful we obtain a base64Response object which contains the Base64 image string else an error is thrown.

Convert PDF to Base64 image string.

For this article we assume that only the first page of the PDF file contains the QR Code/s. In case this is not true, pdf2pic utility provides a bulk method on the Convert that accepts -1 as an argment to the pageNumber parameter which will convert all the pages of the PDF into images and return an base64Response[] which you can handle accordingly. The details of the bulk method can be found in the documentation for pdf2pic library.

// Example for Bulk Operation:- const base64ResponseArray = await fromPath(
pdfFilePath,
pdf2picOptions
).bulk(-1, true);

Getting back to our original script, using the Base64 image string previously generated, we create a buffer and read this buffer using pngjs’s synchronous read method to generate a PNG with the necessary metadata for the next step.

Create Buffer from dataUri and recreate PNG by reading the buffer.

Extract the QR Code data from the PNG using the jsqr library by passing the Uint8ClampedArray format of the PNG, its width and height as arguments to the jsQR methods.

The QR Code string should be present in the data property of the code object else an error should be thrown.

Extract QR Code related information from the PNG.
Script Output

Et voila! We have the QR Code string at our disposal. πŸ”₯ πŸŽ†

--

--

Sivaram P
Sivaram P

Written by Sivaram P

SDE-2 | Grow collectively!. Website: sivaramp.com

Responses (1)