Web Scraping with Playwright: A Complete Guide

Last updated on October 6th, 2025 at 03:05 am

Web scraping is the process of automatically extracting information from websites. It allows developers and businesses to collect data such as product prices, reviews, articles, or any other useful content from the web in a structured format. Instead of manually copying information, web scraping tools handle this task efficiently and at scale.

One of the most powerful tools available today for this purpose is Playwright. Originally designed for end-to-end testing, Playwright has grown into a versatile framework that also excels at browser automation and data extraction. With its ability to handle modern, JavaScript-heavy websites, it provides developers with an edge in scraping dynamic pages that traditional libraries struggle with.

In this guide, we’ll explore web scraping with Playwright, showing how you can use it to extract data, automate browsers, and overcome challenges such as handling single-page applications or dynamic content.

Why Use Playwright for Web Scraping?
Getting Started with Playwright Web Scraping
Handling Dynamic Content and SPAs
- Example: Handling Infinite Scroll in Playwright (JavaScript)
Playwright Web Scraping Examples
Comparing Playwright with Other Tools
Responsible Web Scraping Practices
Conclusion

Why Use Playwright for Web Scraping?

When it comes to web scraping, many developers rely on traditional libraries like BeautifulSoup or Requests. While these tools work well for static websites, they often fall short when dealing with modern, JavaScript-heavy applications. This is where Playwright stands out.

Comparison of traditional libraries vs Playwright for modern web scraping, highlighting static vs dynamic website support. — Traditional libraries work well for static websites while Playwright is the best choice for scraping dynamic modern websites

One of the biggest advantages of Playwright is its cross-browser support. It allows you to scrape websites on Chromium, Firefox, and WebKit with a single API. This ensures your scripts can adapt to different environments and mimic real user behavior more accurately.

Another key benefit is Playwright’s ability to handle dynamic content. Many websites load data asynchronously using JavaScript, making it invisible to basic scraping libraries. Playwright can wait for elements, handle infinite scrolling, and interact with complex single-page applications, giving you access to the full content of a site.

Additionally, Playwright makes it easy to run headless scraping where the browser operates without a visible window, making the process faster and more resource-efficient. Combined with Playwright browser automation, you can simulate user actions such as clicking, typing, or logging in before extracting data.

In short, Playwright combines the flexibility of automation with the reliability needed for modern web scraping tasks.

Getting Started with Playwright Web Scraping

To begin with web scraping with Playwright, you first need to set up your environment. The recommended tools are Node.js + Playwright + Visual Studio Code (VS Code). For a detailed installation tutorial, you can also check out the Playwright setup guide for beginners.

Step 1: Install Node.js

Download and install Node.js from the Node.js official website. During installation, make sure to add Node to your system path so you can run commands from the terminal. To confirm it is installed correctly, open a terminal and run:

node -v
npm -v

Step 2: Install Visual Studio Code

Download VS Code from its official site and install it. Once installed, open VS Code and create a project folder where you will write your scraping scripts.

Step 3: Initialize a Playwright Project

Open the integrated terminal in VS Code (usually via View > Terminal or `Ctrl + “). In your project folder, run:

npm init playwright@latest

This sets up Playwright with the test runner, example files, and browser binaries. It is the easiest way to start both testing and scraping projects.

Step 4: Run Playwright in Headless Mode

Playwright can run browsers in headless mode (no UI) or visual mode (with UI).

Headless mode: Faster and uses fewer system resources. Ideal for scraping large amounts of data.
Headed mode: Opens the browser so you can see what is happening. Useful for debugging or learning how Playwright interacts with the page.

You can enable headless mode in the scraping script when launching the browser:

const browser = await chromium.launch({ headless: true });

Step 5: Write Your First Scraping Script

Create a file named scraper.spec.js in your project folder. Add the following code:

const { test, chromium } = require('@playwright/test');

test('Scrape Example Website', async () => {
  const browser = await chromium.launch({ headless: true });
  const page = await browser.newPage();

  await page.goto('your-page-url'); //Replace with your page URL.

  const title = await page.title();
  console.log('Page Title:', title);

  const heading = await page.textContent('h1');
  console.log('Heading:', heading);

  await browser.close();
});

Step 6: Run the Script in VS Code

Since this uses the Playwright test runner, run your script using:

npx playwright test scraper.spec.js

The output in the terminal will display the page title and main heading, indicating that your scraping script is working correctly.

This setup provides a quick start for web scraping with Playwright using JavaScript, the Playwright test runner, and VS Code as your development environment.

Handling Dynamic Content and SPAs

Modern websites often rely on JavaScript frameworks like React, Angular, and Vue to load and update content dynamically. Instead of serving static HTML, these applications render elements on the client side, making it challenging for traditional scrapers to capture the data.

With handling dynamic content in Playwright, you can easily interact with elements that appear after page load. Playwright provides methods to wait for elements, ensuring your script only proceeds when the desired content is visible. For example, you can use page.waitForSelector() to pause execution until a specific element appears.

Another common challenge is infinite scrolling, where content loads continuously as the user scrolls down. Playwright allows you to simulate user actions like scrolling, clicking “Load More” buttons, or navigating through virtual pages. This makes it possible to extract complete datasets that would otherwise remain hidden.

When it comes to scraping single-page applications with Playwright, the framework truly shines. SPAs rely on client-side routing and dynamically injected components, but Playwright can handle these transitions seamlessly. By waiting for the right network requests or DOM updates, you can reliably capture content even from complex, JavaScript-heavy sites.

Example: Handling Infinite Scroll in Playwright (JavaScript)

// scraper.spec.js
const { test } = require('@playwright/test');

test('Scraping infinite scroll with Playwright', async ({ page }) => {
  await page.goto('your-infinite-scroll-page-url'); //Replace with actual

  let previousHeight;
  for (let i = 0; i < 5; i++) { // Adjust loop count as needed
    previousHeight = await page.evaluate('document.body.scrollHeight');
    await page.evaluate(() => window.scrollTo(0, document.body.scrollHeight));
    await page.waitForTimeout(2000); // wait for new content to load
    const newHeight = await page.evaluate('document.body.scrollHeight');
    if (newHeight === previousHeight) break; // stop if no more content
  }

  const items = await page.$$eval('.item-selector', elements =>
    elements.map(el => el.textContent.trim())
  );

  console.log(items);
});

In this script, Playwright scrolls down the page, waits for new content, and collects data until no more items load.

Playwright Web Scraping Examples

To make this tutorial practical, we will use a local HTML file. You can download Scraping.html here
and save it in your D: drive. This ensures you can experiment safely without depending on external websites.

Once downloaded, open it in your browser or keep it stored locally for running Playwright tests. Below are three Playwright web scraping examples that demonstrate how to extract product data, capture blog elements, and scrape behind authentication.

Example 1: Extracting Text or Product Data

In the Scraping.html file, the Products section lists items with a title, description, and price. Using Playwright, you can extract this information for Playwright data extraction.

const { test, chromium } = require('@playwright/test');

test('Extract product data', async () => {
  const browser = await chromium.launch({ headless: true });
  const page = await browser.newPage();

  await page.goto('file:///D:/Scraping.html');

  const products = await page.$$eval('.product', items => {
    return items.map(item => ({
      title: item.querySelector('.product-title').innerText.trim(),
      description: item.querySelector('.product-description').innerText.trim(),
      price: item.querySelector('.product-price').innerText.trim()
    }));
  });

  console.log(products);
  await browser.close();
});

This will return structured product data such as:

[
  {
    title: 'Dell XPS 13 Laptop',
    description: '13-inch display, Intel i7, 16GB RAM, 512GB SSD',      
    price: '$1200'
  },
  {
    title: 'iPhone 15 Pro',
    description: '6.1-inch OLED display, A17 chip, 256GB storage',      
    price: '$999'
  },
  {
    title: 'Sony WH-1000XM5 Headphones',
    description: 'Noise-canceling, 30-hour battery, wireless Bluetooth',
    price: '$349'
  }
]

Example 2: Capturing Links or Images

The Blog section in the local file includes blog posts with links and images. You can capture the article titles, their links, and even the image alt text for more detailed Playwright data extraction.

const { test, chromium } = require('@playwright/test');

test('Extract blog links and images', async () => {
  const browser = await chromium.launch({ headless: true });
  const page = await browser.newPage();

  await page.goto('file:///D:/Scraping.html');

  const blogPosts = await page.$$eval('.blog-post', posts => {
    return posts.map(post => ({
      link: post.querySelector('.blog-link').href,
      title: post.querySelector('.blog-link').innerText.trim(),
      image: {
      src: post.querySelector('img').src,
      alt: post.querySelector('img').alt
    }
    }));
  });
  console.log(blogPosts);
  await browser.close();
});

This lets you extract not only blog titles and links but also the image alternative text, which is useful for accessibility and SEO analysis.

The Login section in the local file demonstrates how to log in before accessing hidden content. Playwright makes it possible to fill out forms and then scrape data behind authentication.

const { test, chromium } = require('@playwright/test');

test('Login and scrape dashboard', async () => {
  const browser = await chromium.launch({ headless: true });
  const page = await browser.newPage();

  await page.goto('file:///D:/Scraping.html');

  await page.fill('#username', 'demoUser');
  await page.fill('#password', 'demoPass');
  await page.click('button[type="submit"]');

  await page.waitForSelector('#dashboard', { timeout: 5000 });

  const dashboardContent = await page.$eval('#dashboard', el => el.innerText.trim());
  console.log('Dashboard content:', dashboardContent);

  await browser.close();
});

Once the login is successful, the script extracts the text inside the hidden section, showing how Playwright can be used for scraping single-page applications or login-gated content.

Extending Your Scraping with Excel Integration

A real-world scraping project often requires saving the collected data in a structured format for further use. With Playwright, you can easily extend your scraping by integrating with Excel. This way, instead of printing results in the console, your script directly writes the extracted information into a spreadsheet.

For a more advanced use case, refer to this guide on Playwright parameterized tests in JavaScript. It demonstrates how to read input data from Excel and write test results back, which can also be applied here to manage URLs and store scraped results.

In our example below, we are only writing scraped data into Excel using the exceljs package.

Install ExcelJS

Before running the script, you need to install the ExcelJS library:

npm install exceljs

Code Example

Here is a complete example using Playwright with ExcelJS:

// local-scraper.spec.js
const { test } = require('@playwright/test');
const ExcelJS = require('exceljs');

test('Scrape product data from local HTML and save to Excel', async ({ page }) => {
  // Prepare output Excel
  const outWorkbook = new ExcelJS.Workbook();
  const outSheet = outWorkbook.addWorksheet('Scraped Data');

  // Add headers
  outSheet.addRow(['Page Title', 'Product Title', 'Description', 'Price']);

  // Load local HTML file
  const filePath = 'file:///D:/Scraping.html';
  await page.goto(filePath);

  // Extract page title
  const pageTitle = await page.title();

  // Extract product data
  const products = await page.locator('.product').all();

  for (const product of products) {
    const title = await product.locator('.product-title').innerText();
    const description = await product.locator('.product-description').innerText();
    const price = await product.locator('.product-price').innerText();

    // Save each product as a new row
    outSheet.addRow([pageTitle, title.trim(), description.trim(), price.trim()]);
  }

  // Save Excel file
  await outWorkbook.xlsx.writeFile('D:\\scraped-data.xlsx');
  console.log('Data saved to D:\\scraped-data.xlsx');
});

Scraped product data stored in Excel using Playwright and ExcelJS — Example of scraped product data saved into an Excel file with Playwright and ExcelJS

With this approach, Playwright not only automates the scraping process but also ensures your data is neatly structured in Excel. This makes it simple to integrate with reporting tools, perform analysis, or share results with your team.

With these Playwright web scraping examples, you now have a practical starting point to perform Playwright data extraction on product listings, blog posts, and authenticated pages.

Comparing Playwright with Other Tools

When exploring automation frameworks, it is helpful to see how Playwright compares to other popular scraping tools.

Playwright vs Selenium Web Scraping

Selenium has been around for a long time and is widely used in testing and automation. However, Playwright offers a faster and more modern approach. Playwright is easier to set up, supports multiple browsers out of the box, and handles modern JavaScript-heavy sites with less effort. In terms of speed and reliability, Playwright often outperforms Selenium when dealing with dynamic content and single-page applications.

Playwright vs Puppeteer Web Scraping

Puppeteer and Playwright share many similarities, as both were designed with modern web automation in mind. The key difference is that Playwright supports cross-browser automation, while Puppeteer mainly focuses on Chromium-based browsers. Performance is quite similar, but Playwright offers more flexibility when scraping across Chrome, Firefox, and WebKit. If your project requires multi-browser testing or advanced scraping, Playwright usually provides a broader range of features.

Quick Comparison Table

Feature	Playwright	Selenium	Puppeteer
Ease of Setup	Simple, modern setup	More complex, requires drivers	Simple setup for Chromium
Browser Support	Chrome, Firefox, WebKit	Chrome, Firefox, Edge, Safari (via drivers)	Mainly Chromium
Speed & Performance	Very fast with modern async API	Slower due to legacy architecture	Fast, but limited to Chromium
Dynamic Content Handling	Excellent, built-in wait and auto-handling	Requires explicit waits	Legacy projects, a wide ecosystem
Best Use Case	Cross-browser scraping and testing	Legacy projects, wide ecosystem	Chromium-only scraping and testing

Responsible Web Scraping Practices

While this web scraping tutorial using Playwright guide shows you how to extract data from websites, it is important to follow safe and lawful practices. Web scraping should always be done responsibly and within legal boundaries.

Educational Purpose Only

This article is intended for educational purposes. The examples shown here use a local HTML file and should not be applied directly to real websites without proper authorization.

Follow Website Policies

Always check and respect a site’s terms of service and robots.txt file before scraping.
Avoid scraping copyrighted or sensitive personal information without permission.
Use official APIs whenever available, as they are safer and often provide structured data.

Respectful Scraping

Do not overload servers with too many requests. Use throttling and rate limiting.
Scrape only public data and avoid collecting sensitive or personal information. Always use the extracted data responsibly and ethically.
Comply with data protection regulations such as GDPR or CCPA if applicable.

Disclaimer

This tutorial does not promote or encourage unlawful data scraping. You are solely responsible for ensuring that your scraping activities comply with all applicable laws, website policies, and ethical standards.

Conclusion

Web scraping with Playwright is a powerful way to extract structured data from websites and local files. With its ability to handle dynamic pages, automate logins, and even export results into Excel, Playwright makes data extraction more accessible for developers.

However, it is important to use Playwright responsibly. Always follow ethical guidelines, respect website terms, and ensure compliance with legal standards when performing data scraping.

If you want to go deeper, check out related tutorials on Playwright parameterized tests in JavaScript and other advanced guides that cover selectors, automation strategies, and test reporting. These will help you not only scrape data but also build reliable automation frameworks.

Aravind QA Automation Engineer & Technical Blogger

Aravind is a QA Automation Engineer and technical blogger specializing in Playwright, Selenium, and AI in software testing. He shares practical tutorials to help QA professionals improve their automation skills.

See Full Bio