Every detail about generating PDF of a website using Playwright

Post by
Andrew Pierno
Every detail about generating PDF of a website using Playwright

Every detail about generating PDF of a website using Playwright

In this tutorial, we will learn how to create a PDF using Playwright. We’ll use Playwright’s pdf method to generate PDF of website using just the website URL as input. This is the same method used by most of the invoice generators out there. During this course, we’ll play with some of the options provided by the pdf method at length.

Installing Playwright

Using yarn

yarn add playwright

Using npm

npm install playwright

Creating a PDF using Playwright

Once you’ve got Playwright installed, we’ll now write a function that accepts website URL as a parameter and saves the PDF to a specified path locally.

const { chromium } = require("playwright");
const createPDF = async (websiteUrl) => {
  const browser = await chromium.launch();
  const context = await browser.newContext();
  const page = await context.newPage();
  await page.goto(websiteUrl);
  await page.pdf({ path: "my_pdf.pdf" });
  await browser.close();
};

This code will launch a Chromium browser, navigate to the website URL passed to the function, create a PDF of the page, and save it locally to a file named my_pdf.pdf.

PDF Options

The pdf method accepts the following options to help control the pdf output:

  • displayHeaderFooter: controls whether the header and footer will be displayed on the PDF. Defaults to false.
  • format: the paper format of the PDF. If set, takes priority over width and height options. Defaults to 'Letter'.
  • headerTemplate: the HTML template for the print header.
  • footerTemplate: the HTML template for the print footer.
  • height: the paper height, accepts values labeled with units.
  • landscape: controls the paper orientation. Defaults to false.
  • margin: controls the paper margins.
  • pageRanges: controls the paper ranges to print.
  • path: the file path to save the PDF to.
  • preferCSSPageSize: controls whether to give any CSS @page size declared in the page priority over what is declared in width and height or format options. Defaults to false.
  • printBackground: controls whether to print background graphics. Defaults to false.
  • scale: the scale of the webpage rendering. Defaults to 1.
  • width: the paper width, accepts values labeled with units.

Let’s take a closer look at using these options to achieve some common use cases.

Adding a Header and Footer in a PDF using Playwright

You can append a customised header and footer to your pdf by specifying the headerTemplate & footerTemplate. Make sure to pass displayHeaderFooter as true.

const { chromium } = require("playwright");
const generatePDF = async (websiteUrl) => {
  const browser = await chromium.launch();
  const context = await browser.newContext();
  const page = await context.newPage();
  await page.goto(websiteUrl);
  await page.pdf({
    path: "my_pdf.pdf",
    displayHeaderFooter: true,
    headerTemplate:
      '<div style="font-size: 10px; color: #666; text-align: center; width: 100%;">'
        <span className="date"></span></div>',
    footerTemplate:
      '<div style="font-size: 10px; color: #666; text-align: center; width: 100%;">'
        <span className="pageNumber"></span> of <span className="totalPages"></span></div>',
    printBackground: true,
    format: "A4",
  });
  await browser.close();
};

Converting HTML to PDF using Playwright

If you’d like to generate PDF from an HTML file or string, you can pass it as content to the pdf method. This is exactly how all the beautiful invoice generators work. You have a defined html template, you pass certain variables whenever needed and generate invoices easily.

const { chromium } = require("playwright");
const htmlToPdf = async (html = "<div>ScreenshotAPI</div>") => {
  const browser = await chromium.launch();
  const context = await browser.newContext();
  const page = await context.newPage();
  await page.setContent(html);
  await page.pdf({
    path: "my_pdf.pdf",
  });
  await browser.close();
};

Generating a PDF and uploading it to S3

Make sure to install the AWS sdk before we begin.

Using yarn

yarn add aws-sdk

Using npm

npm install aws-sdk

Next, let’s write a function to take a screenshot and upload it to S3. The function will accept websiteUrl, bucketName and fileName as input and return uploadedFileUri as output.

const { chromium } = require("playwright");
const AWS = require('aws-sdk');
AWS.config.update({
  accessKeyId: "YOUR_ACCESS_KEY",
  secretAccessKey: "YOUR_SECRET_KEY",
  region: "YOUR_REGION",
});
const s3 = new AWS.S3();
const generatePDFAndUpload = async (websiteUrl, bucketName, fileName) => {
  const browser = await chromium.launch();
  const context = await browser.newContext();
  const page = await context.newPage();
  await page.goto(websiteUrl);
  const pdf = await page.pdf();
  const params = {
    Bucket: bucketName,
    Key: fileName,
    Body: pdf,
  };
  const uploadedFile = await s3.upload(params).promise();
  await browser.close();
  return uploadedFile;
};

Reliably generating PDFs of websites at scale

While it is super quick to write a function to create PDFs using playwright, unfortunately, it gets very complicated very quickly as you try to scale it given the resource intensive behaviour of any headless browser. Managing proxies, handling cookies, storing and caching images is something we haven’t even talked about.

Screenshotapi.net solves the above problems with grace. Screenshotapi provides a reliable API to take screenshots or generate pdfs of a website at blazing fast speeds. Screenshotapi process a few million screenshots every month without fail. Screenshotapi is trusted by marquee companies like the crypto.com, dentsu, e.ventures to name a few.

screenshot api customers

Hope the tutorial was helpful. Don’t forget to give screenshotapi.net a shot, we have a 7-day free trial that allows you to capture up to 100 screenshots or PDFs.

©2024 ScreenshotAPI. All Rights Reserved.