There comes a time in oneβs career where they need to read & decode the contents of QR Codes present in PDF Files programatically π©. A few weeks ago such was the phase of my career. π£
After spending a lot of time trying to find an optimum way to automate the process of extracting QR Code data from a set of PDFs using Javascript, there seemed to be surprisingly little helpful information out there in the wild π.
Alas, No StackOverflow to the rescue for me this time! π° π.
After piecing together the little bits and pieces of information I found on the various steps required to achieve this proccessβs automation, I came up with this solution that worked quite well for my use case! π€
Lets get Started π
The QR Code which we will be trying to extract from our sample PDF file.
Required NPM Dependencies:-
- pdf2pic β A utility for converting pdf to image and base64 format.
** NOTE:pdf2pic
needs these dependencies to work mandatorily. Make sure to install them prior to using this utility. ** - pngjs β Simple PNG encoder/decoder for Node.js with no dependencies.
- jsqr β A pure javascript QR code reading library.
βNuff talk show me the code right ? π‘β¦. RIGHT!
We need to begin with passing the path to the PDF file to the fromPath
method of pdf2pic utility along with the options object shown in the code block below.
NOTE:
pdf2pic
depends upon GraphicsMagick (GM) library under the hood. For more details on the options exposed bypdf2pic
utility visit GMβs website.
fromPath
method returns a Convert method that takes pageNumber
& isBase64
as its parameters. We pass values 1
& true
as arguments to the aforementioned parameters respectively.
pageNumber
β page number to be converted to imageisBase64
β if true,convert()
will return base64 output instead
If successful we obtain a base64Response
object which contains the Base64 image string else an error is thrown.
For this article we assume that only the first page of the PDF file contains the QR Code/s. In case this is not true,
pdf2pic
utility provides abulk
method on the Convert that accepts-1
as an argment to thepageNumber
parameter which will convert all the pages of the PDF into images and return anbase64Response[]
which you can handle accordingly. The details of thebulk
method can be found in the documentation forpdf2pic
library.
// Example for Bulk Operation:- const base64ResponseArray = await fromPath(
pdfFilePath,
pdf2picOptions
).bulk(-1, true);
Getting back to our original script, using the Base64 image string previously generated, we create a buffer and read this buffer using pngjs
βs synchronous read
method to generate a PNG with the necessary metadata for the next step.
Extract the QR Code data from the PNG using the jsqr
library by passing the Uint8ClampedArray
format of the PNG, its width and height as arguments to the jsQR
methods.
The QR Code string should be present in the data
property of the code
object else an error should be thrown.
Et voila! We have the QR Code string at our disposal. π₯ π
Hope you find this helpful &&
Happy Hacking! π
The code for this article can be found in my Github Repository