Bombadil and the Return of Serious QA: Why Property-Based Testing for Web UIs Matters Again
Bombadil brings property-based testing to web UIs. Here’s why that matters for SaaS teams, conversion paths, and any product that can’t afford flaky frontend regressions.

Hacker News has a way of resurfacing ideas right when the industry is tired enough to listen.
This time it was Bombadil, a property based testing approach for web UIs from Antithesis. If you skimmed the thread and moved on, fair. Most “new testing thing” posts look like developer toy projects or a rebrand of something we already do.
But Bombadil felt like a signal, not a novelty.
Because frontend quality is still one of the most expensive, under disciplined, and quietly revenue destructive parts of modern software. We ship more UI than ever. We rely on more browsers, more devices, more JS, more third party scripts, more personalization, more experiments. And then we try to “test it” with a pile of brittle end to end scripts that break because a button moved 6 pixels.
Property based testing, applied to web UIs, flips the framing. Instead of “replay this exact user path and hope it covers enough”, it asks:
What must always be true, no matter what the user clicks, types, resizes, or navigates?
And that is the kind of question teams that own revenue pages should be asking more often.
If you’re a technical SEO, a SaaS product lead, a frontend lead, or the person who gets paged when the signup flow breaks at 2am, this isn’t just dev tooling news. This is a better mental model for protecting conversion, retention, and trust.
Let’s unpack it in plain English, then get practical.
Why Bombadil showing up now matters
Frontend QA has been stuck in an awkward loop for years:
- Unit tests feel clean but don’t protect user journeys.
- Snapshot tests catch changes but often punish legitimate UI iteration.
- Scripted E2E tests give confidence until they become a flaky time sink.
- Manual QA is still the last line of defense, which means it’s also the first thing cut when shipping pressure hits.
Meanwhile, the UI itself has become the product for a lot of SaaS. Your “app” is a web surface. Your onboarding is a funnel. Your pricing page is a decision engine. Your editor is the differentiator.
A flaky UI doesn’t always explode loudly. It leaks.
A dropdown fails to open on iOS. A form validation message blocks submission. A modal traps focus and users bounce. A dashboard loads but filters silently do nothing. A checkout works except when someone uses autofill.
That’s the stuff that turns into:
- “Paid spend went up but signups didn’t”
- “SEO traffic is growing but trials are flat”
- “Mobile conversion is weirdly low”
- “Support tickets mention ‘it’s broken’ but we can’t reproduce”
So when something like Bombadil pops up and says, effectively, “stop scripting paths, start asserting invariants and explore the UI space”, people perk up. Because it targets the painful part of UI testing, the combinatorial explosion of states and interactions.
And it does it in a way that lines up with how real users behave. Chaotically. Not like your happy path test.
Property based testing, in normal human terms
Property based testing starts with a simple shift:
Instead of writing tests like, “click this button, then expect this text”, you write tests like, “no matter what sequence of actions happens, these properties should hold”.
A property is an invariant. A rule. A “this must never break” statement.
For web UIs, properties tend to be things like:
- A user can always reach the signup success state if the inputs are valid.
- Submitting the form twice doesn’t create two accounts.
- The checkout total always equals line items plus tax minus discounts.
- The page never gets stuck in an unresponsive loading state.
- A modal does not trap the user without an exit.
- Navigation does not produce a blank screen.
- After login, authenticated pages never show logged out content.
- A filter never returns items that don’t match the filter criteria.
- Any error state is visible and recoverable, not silent.
Then you let the test system generate many different sequences of actions and inputs, trying to falsify the property.
It’s less “replay this script” and more “search for a counterexample”.
That search aspect is the whole point.
And when it finds a failure, a good property based system will also try to shrink the failing case into the smallest reproducible sequence. Instead of “it broke after 38 steps”, you get “it breaks if you do these 4 steps”.
That’s gold for debugging.
How this differs from scripted E2E tests (and why your team feels the pain)
A typical E2E test suite is a set of narrow stories:
- Go to /pricing
- Click Start Trial
- Fill email and password
- Submit
- Assert redirect to /app
- Assert welcome message
That’s fine. It covers something. It also has a few problems that compound over time:
1. It assumes a stable UI structure
E2E scripts are tightly coupled to selectors, DOM layout, and copy. You change the button label, or add a step, and the test fails. Not because the product is broken. Because the script is.
2. It’s path based, not state based
You only test what you wrote down. Real users wander. They go back. They open multiple tabs. They type weird things. They paste passwords with spaces. They rotate their phone mid flow.
3. It becomes slow and expensive
You end up with “the suite” that takes 45 minutes. People stop running it locally. CI times creep up. Flakiness enters. Confidence drops.
4. It’s bad at discovering unknown unknowns
Scripted tests confirm what you already think is important. Property based tests are better at finding the thing you didn’t predict, the weird state combination.
To be clear, this is not “throw away E2E”. It’s “stop expecting E2E scripts to do the job of exploration”.
They’re different tools.
How this differs from snapshot tests (which many teams misuse)
Snapshot tests usually capture “the rendered output looks like this”.
They can be useful for stable UI components. But on real production surfaces, snapshots often turn into a noise generator:
- Someone updates a dependency and 80 snapshots change.
- A timestamp or randomized ID alters output.
- A minor style tweak creates huge diffs.
- People rubber stamp updates without reading them.
More importantly, snapshots answer the question:
“Did the UI change?”
They do not answer:
“Did the UI still behave correctly?”
Property based testing is behavior first. And that’s why it maps better to funnels, flows, and interactive pages.
Snapshots can still be part of the story, but they’re not the story.
The part technical SEOs should care about: UI bugs are ranking and revenue bugs
Technical SEO teams usually think in terms of crawlability, indexing, performance, templates, schema, internal linking, and content quality.
But here’s the messy reality: modern SEO traffic doesn’t land on static pages anymore. It lands on:
- JavaScript heavy pages with client side rendering
- Interactive pricing calculators
- Personalized landing experiences
- Signup flows embedded into the marketing site
- “Free tools” pages that act like lead gen
- Web apps that double as content hubs
When those surfaces break, you get indirect SEO damage:
- Increased pogo sticking, lower engagement signals
- Higher bounce, lower conversion, worse CAC to LTV math
- Slower pages due to error retries or stuck loading states
- Accessibility regressions that reduce usable audience
- Broken internal navigation that reduces discovery
Also, “AI search” and assistant driven discovery is pushing more users into fewer clicks. When they arrive, the page has to work. There’s less patience. Less context. Less forgiveness.
If you’re working on AI visibility and agentic discovery, this ties in tightly with the idea of resilient web surfaces. (Related: Sitefire and AI visibility agentic web SEO digs into that landscape.)
Concrete business failures property based UI testing can catch
Let’s make this less abstract. Here are failures that are common, expensive, and often missed until customers complain.
Broken signup flows (the classic)
- Form submits, spinner shows forever
- Email verification state fails to render
- Password rules differ between client and server
- The “Continue” button becomes disabled after a validation error and never re enables
- Safari autofill inserts values but the UI doesn’t register them as “typed”, so it won’t submit
A property might be:
For any valid email and password, there exists a sequence of actions that results in an account created state, and the UI never enters a terminal loading state.
That’s a mouthful, but it’s basically: valid users should always be able to sign up, and the UI should never dead end them.
Flaky checkout behavior
- Totals miscalculate for a rare coupon combination
- Currency formatting breaks in a locale
- Payment error happens and the UI loses cart state
- Back button causes duplicate charge attempt
- A/B test variant hides a required field on mobile widths
A property might be:
The amount charged equals the displayed total, and total equals item sum plus tax minus discounts, across all cart modifications and coupon applications.
Hidden mobile regressions
This is the one teams consistently miss because everyone tests on desktop.
- Sticky header covers a “Submit” button
- Bottom sheet cannot be dismissed
- Keyboard overlaps form fields
- Scroll locks and traps the user
- Tap targets become too small after a CSS change
A property might be:
At any viewport size in an allowed range, primary actions remain reachable and actionable, and the UI never traps scroll/focus.
Conversion leakage from “minor” UI errors
- A tooltip overlays the CTA for one segment
- A cookie banner blocks the form in one region
- A third party script delays interactivity just long enough for people to bounce
- A client side exception prevents analytics and also breaks a component, but only for returning users with cached data
A property might be:
The UI remains functional and primary journey remains possible even if third party scripts fail or are delayed.
This is a big one. Third party fragility is real. Testing for graceful degradation is not optional anymore.
So what is Bombadil actually doing in this space?
At a high level, Bombadil aims to bring property based testing style exploration to web UIs. Think: generate event sequences, explore states, assert invariants, find minimal repro steps.
The interesting part isn’t the name. It’s the direction.
It suggests we might finally treat web UI behavior as something you can fuzz, search, and systematically break, instead of only scripting happy paths.
And if you’re an operator, the exciting part is not “we can test more”. It’s “we can discover failures earlier, with fewer hand written scripts, and with better bug reports”.
That changes the economics.
Where property based testing fits in a modern QA stack (without pretending it replaces everything)
Most teams don’t need a new religion. They need a layered system that catches different classes of failure.
Here’s a stack that tends to work well for revenue critical web surfaces.
1. Unit tests for pure logic
Keep them boring. Keep them fast.
- Pricing calculations
- Formatting and parsing
- Validation rules
- State reducers
- Permissions logic
These are still the cheapest tests you’ll ever run.
2. Component tests for UI contracts
This is where snapshots can help if you’re disciplined, but behavior assertions are better.
- Buttons disabled states
- Error rendering
- Loading states
- Accessibility attributes
- Keyboard navigation on components
3. Property based tests for interactive flows and invariants
This is the new muscle.
Use it for:
- Signup and onboarding flow invariants
- Checkout/cart invariants
- Navigation invariants across routes
- “Never dead end the user” rules
- “Errors are visible and recoverable” rules
- “Auth state is consistent” rules
Don’t start with 40 properties. Start with 5 that map to money.
The flows that pay your salaries.
4. Scripted E2E tests for a small set of critical smoke paths
Yes, keep some scripts. Just stop letting them metastasize.
Use E2E scripts for:
- A sanity pass on production like environments
- A few “can we still sign up and log in” checks
- Canary checks post deploy
If you have 300 E2E tests and half are flaky, you’re not safer. You’re slower.
5. Visual regression tests for layout and styling breakage
Visual diffs are good at catching:
- Spacing and alignment regressions
- Hidden elements
- Overlaps
- Font and color shifts
- Mobile layout breakage
They’re also good for marketing pages where copy and layout matter more than interactivity.
But keep them scoped. If everything is visually tested, people stop looking at diffs.
6. AI assisted testing and QA workflows
This is where things get practical for teams without infinite QA headcount.
AI can help by:
- Generating test ideas from product specs and tickets
- Turning bug reports into reproducible steps
- Suggesting properties to encode as invariants
- Scanning PRs for risky UI changes
- Reviewing analytics and session replays for patterns
If you want a concrete example of where AI helps and where it doesn’t, the piece teaching Claude QA: mobile app AI testing workflows is a good companion. Different surface, same problem. You’re trying to scale QA thinking without scaling chaos.
7. Human review, but targeted and respected
Manual QA should not be “click around randomly before launch”.
It should be:
- Exploratory testing on new features
- Reviewing edge cases
- Accessibility checks
- Sanity checks on real devices
- Reviewing the small number of high risk changes
Property based testing actually makes human QA better, because it catches the dumb hidden dead ends early, freeing humans to focus on nuance.
A practical “how to start” plan for web teams
If you’re curious but also busy, here’s a reasonable rollout that won’t set your roadmap on fire.
Step 1: Pick one revenue critical flow
Examples:
- Signup to activation
- Trial to paid upgrade
- Checkout
- Lead capture on a high traffic landing page
Choose the one where breakage hurts within hours.
Step 2: Define 3 to 7 properties in plain English first
Literally write them in a doc.
- “A valid user can always complete signup.”
- “The UI never spins forever.”
- “Errors are shown and the user can recover.”
- “Back button does not break the flow.”
- “On mobile widths, the submit button is reachable.”
If you can’t say the property simply, it’s probably not a good first property.
Step 3: Instrument the product so tests can observe state
Property based tests need signals.
- Unique identifiers for important UI states
- Deterministic ways to know if you’re logged in
- Hooks to observe network failures
- Logs that can be correlated to test runs
This is where teams often stumble. Not because the testing idea is bad, but because the UI isn’t observable.
Make it observable.
Step 4: Run it in CI, but also run it in a nightly chaos mode
The real wins come when you let it explore deeper than “fast CI budget”.
- CI: smaller run, quick feedback
- Nightly: longer exploration, more weird sequences, more seed diversity
Step 5: When it finds failures, feed them into your release workflow
A discovered bug that doesn’t change behavior is just a cool demo.
You want:
- Minimal repro steps
- Severity tags (blocks signup, blocks checkout, etc.)
- A clear owner (frontend, backend, experiment platform)
- A regression test added as a property or an example case
And yes, you still need good release hygiene. Staging environments that reflect prod, canary deploys, feature flag discipline, rollback plans.
This connects to SEO.software more than it seems
If you’re building content heavy dashboards, editors, publishing workflows, or “AI automation” surfaces, UI reliability isn’t an internal quality metric. It’s part of the product promise.
SEO.software’s pitch is basically: ship more, faster, with automation. That only works if the UI is trustworthy, because operators are making decisions, scheduling jobs, reviewing outputs, publishing content. A flaky UI there doesn’t just annoy someone. It creates operational risk.
Also, if you’re using SEO.software to scale content and revenue pages, the same principle applies outward. Your pages have to work. Your funnels have to work. Your interactive tools have to work.
If you want a subtle but practical next step, treat UI reliability as part of your growth stack the same way you treat technical SEO checks and on page audits. The platform already lives in that “systematize what used to be manual” world. Testing is in the same category.
You don’t need to boil the ocean. You just need to stop letting frontend correctness be vibes.
You can check out SEO.software if you want the automation side of the equation, then bring the same discipline to the surfaces that convert the traffic you earn.
Treat UI reliability as a growth function, not an engineering afterthought
The older I get in software, the more I think this is the real divide.
Some teams treat QA like a cost center. Something you do after “real work”. Something that slows shipping.
Other teams treat reliability like a growth function. Because it is.
- Reliable signup flows compound marketing spend.
- Reliable checkout flows compound pricing experiments.
- Reliable dashboards compound retention.
- Reliable landing pages compound SEO.
Property based testing for web UIs is not magic. But it’s a return to serious thinking. Define what must always be true. Then try to break it, systematically, before your users do.
Bombadil showing up on Hacker News wasn’t just a cool link. It was a reminder.
We can do better than brittle scripts and hope.
And if your UI is where your revenue happens, doing better is not optional.