Automation Testing Weekly #01

Welcome to Issue #1!

Big Playwright release this month, some genuinely good migration war stories, and the AI testing space is finally separating into things that work vs things that looked good in the demo.

Glad you’re here.

Build reliable tests. Ship with confidence. 🙂

NEWS

locator.drop() finally makes drag-and-drop clean cross-browser. tracing.startHar() turns HAR recording into a proper tracing API. ARIA snapshots now carry bounding boxes for AI agents. And test.abort() lets you kill a test from inside a fixture or route handler, genuinely useful for guardrail enforcement. Heads-up: Currents had a brief compatibility break on 1.60, it’s patched now but check before upgrading.

I found this one worth bookmarking. Currents put together a clear map of where things actually stand: MCP server, planner, generator, healer agent loop, artifact-first debugging. Cuts through a lot of vendor noise. Also flags something important: full MCP runs burn ~114K tokens per task. The new @playwright/cli brings that to ~27K.

Selenium-webdriver sits at ~2.1M on npm. ThoughtWorks moved Playwright to “Adopt.” Adobe, ING, NASA, all documented migrations. What I find more interesting than the download numbers: teams are now sharing what actually breaks during migration, not just why to bother.

AUTOMATION

A setup job counts tests, computes shards, outputs JSON and downstream jobs consume it. No more “we added 80 tests and nobody updated the matrix.” One thing to remember: sharding splits files, not individual tests, so uneven file sizes will eat your speedup.

Teams fail not because Playwright is weak but because they swap frameworks without changing how QA and eng collaborate or how CI feedback flows. And writing “Selenium-style Playwright” with explicit waits and PageFactory patterns leaves you with a worse suite than what you started with. I’ve seen this happen more than once.

CircleCI’s MCP server has a find_flaky_tests tool that pulls historical CI failure patterns into your AI assistant. Run a few builds, ask what’s flaky and why, get fix suggestions in context. Less tab-switching, faster diagnosis.

TOOLS & GITHUB

40+ browser tools via accessibility tree snapshots. LLMs read ~200 tokens per page state instead of processing screenshots. Works with VS Code Copilot, Cursor, Claude Desktop, Claude Code. One config block and it’s running.

Kudos to Jay Yeluru for putting this together. Copilot (or Claude) inspects a live DOM through MCP, writes tests from scratch, self-heals broken locators. Comes with POM, typed test data, and a working GitHub Actions pipeline. Good scaffold even if you swap the AI.

Analyze → plan → configure → model → implement, chained as five MCP tools. The decomposition pattern is the useful part, cleaner than prompt-and-paste for anything beyond a single test file.

COMMUNITY INSIGHT

A recurring theme this month. Test infra treated as second-class. Reproducible environments plus monitoring equals roughly 20% fewer false failures per EdgeDelta’s data. GitLab’s public handbook has an actual triage model worth borrowing: auto-classifies failures as Flaky, Master-broken, or Unclear and routes to EMs automatically.

Built on 40K+ survey responses and 50+ interviews. Better signal than most year-preview stuff, especially the part on where AI testing ROI is real vs still aspirational. Worth your time if you’re making H2 tooling decisions.

Started this discussion on r/QualityAssurance this week after
a conversation with a connection growing her QA career. The
core point: testing features is not the same as testing
products. Replies from mid-level testers reflecting on their
own turning points are worth reading. If you’ve had a similar
moment, drop it in the thread.

PRACTICAL TIP

Add failOnFlakyTests: !!process.env.CI to playwright.config.ts. Playwright marks a test flaky when it passes on retry after an initial fail. With this flag, that kills the build instead of quietly going green. Pair with retries: 2 on CI. Shipped in v1.52. Almost nobody has turned it on, which I still find surprising.

VIDEOS

Live panel, practitioners not vendors. The part on where AI maintenance ROI is actually showing up vs where it’s still a demo is worth it alone. Skip to ~18 min past the opener.

Good counterpoint to the AI-fixes-everything wave. Aditya Challa explains why LLM-generated tests fail on mobile specifically. Deterministic reliability issues most AI tooling wasn’t built to handle.

And Finally…

Spent time this week reviewing a team’s “AI-generated” test suite. Half the tests had hardcoded waits and no assertions worth trusting. AI is only as good as the engineer reviewing its output. We’re not at the “just ship it” stage yet. Maybe we never will be. 😅

author avatar
Aravind QA Automation Engineer & Technical Blogger
Aravind is a QA Automation Engineer and technical blogger specializing in Playwright, Selenium, and AI in software testing. He shares practical tutorials to help QA professionals improve their automation skills.

Leave a Reply

Your email address will not be published. Required fields are marked *

Are you human? Please solve:Captcha