How to Attract AI Bots to Your Open Source Project: What It Reveals About AI Discovery and GEO

A satirical post about attracting AI bots exposes something real: AI systems are already discovering, interpreting, and acting on public web content.

March 23, 2026
13 min read
How to attract AI bots to your open source project

A couple days ago I read a fresh post that’s framed like satire. It’s basically a “how to” guide for maintainers who want to attract AI bots to their open source repo.

The advice is intentionally bad.

Publish vague issues. Remove safeguards. Make it easy for agents to open PRs, run tasks, and “help” without asking. The kind of thing you read and laugh, then you wince a bit because the joke lands a little too cleanly.

Here’s the post if you want the original context: How to attract AI bots to your open source project.

And the reason it’s doing well already is telling. Live search is surfacing it with a featured snippet and direct results, even though it’s brand new.

So yeah, it’s funny. But it’s not just funny.

Under the joke is a very real shift: AI systems are already crawling, interpreting, summarizing, and in some cases taking action on public software, docs, and websites. And they don’t “read” like humans read. They parse. They extract. They classify. They decide whether something looks actionable, reliable, and safe enough to use.

Which is exactly where GEO comes in.

The punchline is “attract bots”. The real story is “AI discovery is changing”

The satire works because it flips a familiar worldview.

Old worldview: your repo and docs are for humans, and bots are mostly passive. They index pages. Maybe they scrape. They don’t really do anything.

New worldview: some bots are not passive. They’re agents. They browse. They evaluate intent. They follow instructions. They open issues, draft PRs, call tools, run workflows. Not always, not safely, not perfectly. But enough that “machine-facing UX” is now a thing.

And once you accept that, a bunch of stuff changes:

  • A GitHub issue is not just a message to a maintainer. It’s a machine-readable task spec.
  • A README is not just onboarding. It’s an agent prompt.
  • A changelog is not just history. It’s a risk signal.
  • A docs page is not just education. It’s a retrieval target for answer engines.

This is also why the “featured snippet” moment matters. Classic search is already blending into answer extraction, and answer extraction is already feeding LLM interfaces. So you get this weird loop where:

  1. A post is written as a joke.
  2. Search systems extract it because it matches an emerging query pattern.
  3. AI systems ingest that extraction and amplify the framing.
  4. Teams see the traction and copy the pattern.

That’s AI discovery in 2026. Not a librarian. More like a panel of impatient interns who skim, highlight, and sometimes… ship.

What the satire gets right (even while pretending to be wrong)

Let’s pull the “real” lessons out of the joke, without adopting the cursed advice.

1. Vague requests are surprisingly machine-friendly

A vague issue like “Improve performance” sounds useless to a human because it’s underspecified.

To an agent, it can be an invitation. It can interpret it as permission to do something broad. Search for hotspots, try micro-optimizations, propose dependency upgrades, refactor something. It fills in blanks because that’s literally what LLMs do.

So the satire is poking at a real tension: ambiguity creates space for machines to act.

For maintainers, that’s dangerous. For marketers and SEO strategists, it’s illuminating. Because it’s the same mechanism behind why AI answers sometimes confidently fill gaps on your site with implied context. If your page is vague, models will “complete” it.

2. Public backlogs are becoming instruction surfaces

Open issues, discussions, PR templates, contributing guides. These used to be community scaffolding.

Now they’re also “agent interfaces”.

If you publish labels like good first issue or help wanted and you provide reproduction steps and acceptance criteria, you’re not just helping humans. You’re creating a structured task environment that an agent can pick up.

That can be good, if you control it. It can also be chaos, if you don’t.

3. Removing safeguards increases agent throughput (but it’s still a bad idea)

The joke about weakening safeguards is grim because it’s technically true. Less friction means more automated action.

But the real takeaway isn’t “remove friction”. It’s: your friction points are signals.

They tell humans and machines where risk exists. They shape behavior. They act like guardrails and metadata.

The equivalent on the web side is stuff like disclaimers, policy pages, canonical sources, citations, and clear “do not do X” sections. These are not just legal padding. They’re interpretability anchors.

4. The repo that looks easiest to act on will get acted on

Agents prioritize “paths to completion”.

If a repo has:

  • clear setup steps
  • a test command
  • a tight issue template
  • labeled tasks
  • explicit acceptance criteria

…it’s easier to take action on.

Same with websites. The site that’s easiest to extract from and cite is the one that gets cited. GEO is partly about content quality, sure, but it’s also about machine-operational clarity.

If this idea is new in your org, you’ll like this framing: Generative Engine Optimization (GEO): get cited by AI.

AI crawlers do not behave like classic search bots. They behave like readers with objectives

Classic SEO trained us to think about:

  • crawlability
  • indexation
  • ranking signals
  • snippets
  • backlinks

Still important. But incomplete.

Now you also have:

  • retrieval systems that chunk your page
  • embeddings that match by meaning, not keywords
  • answer engines that synthesize across sources
  • agents that look for “next steps” and “how to”

Which means the question is no longer only “can Google crawl this”.

It’s also:

  • Can a model extract the right part of this page without losing nuance?
  • Will it interpret this as authoritative or risky?
  • Does it contain quotable, bounded statements?
  • Does it present steps and constraints clearly enough that an agent won’t freestyle?

This is why some GEO playbooks look a bit like “write docs for machines”. Because that’s basically what you’re doing.

If you want a tactical overview of being cited in AI answers, this is worth reading: GEO playbook for getting cited in AI answers.

The documentation shift: from “help humans” to “help humans and be safely machine-readable”

You don’t need to turn docs into robotic schema soup. (Please don’t.) But you do need to accept that docs are now dual-use:

  • Human consumption
  • Machine extraction and action

So what does “machine-readable” actually mean in practice?

It usually means:

  • explicit definitions
  • tight scope
  • consistent formatting
  • minimal implied context
  • clear boundaries

It’s not about sounding formal. It’s about making it hard to misinterpret.

A simple test you can run

Pick any page in your docs and ask:

  1. If an LLM only saw two paragraphs of this page, would it still understand the core point?
  2. If it only saw a single bullet list, would it mislead someone?
  3. If it extracted one sentence, would that sentence still be true without the surrounding nuance?

If the answer is “no”… you have an extraction problem, not just a writing problem.

Practical recommendations for open source repos (that also map cleanly to GEO)

Let’s talk repo mechanics for a second, because the patterns are useful even if you’re not an OSS maintainer.

1. Write issues like tasks, not vibes

A lot of issue pages are vague because humans can ask follow-up questions.

Agents can’t. Or rather, they can, but they often won’t.

A strong issue template should include:

  • Goal: what “done” means in one sentence
  • Context: why it matters
  • Constraints: what not to change
  • Repro steps: if it’s a bug
  • Acceptance criteria: bullet list
  • Links to relevant code/docs
  • Security note: any sensitive areas or “never do this” guidance

This helps humans. It also reduces agent hallucination. And that’s the big theme: reduce ambiguous space where machines invent.

2. Add “machine obvious” labels and states

Labels are not just organization. They’re routing.

Consider labels like:

  • needs spec
  • blocked
  • security-sensitive
  • safe-to-automate
  • good first issue (but be careful, agents love this one)
  • needs human decision

If you run an open repo, you can think of this as a kind of access control layer. Not permissions, but clarity.

3. Make CONTRIBUTING.md explicit about automation

Most contributing guides speak to humans. Add a section that speaks to agents, without actually calling it that.

Example topics:

  • PRs that touch auth, billing, permissions require maintainer review
  • no dependency bumps unless requested
  • never run network calls in tests
  • how to handle secrets
  • what evidence is required in PR descriptions

Not because “robots will obey”, but because these documents become part of the retrieval context that tools use when deciding what’s allowed.

4. Treat README as a prompt, because it is one

Agents use README content the way humans use it, but faster and with less patience.

Make sure your README has:

  • what the project does (not a slogan)
  • who it’s for
  • how to install
  • how to run tests
  • basic usage examples
  • known limitations
  • links to canonical docs

Also, avoid burying critical constraints in a long narrative paragraph. Put them in bullets. Make them extractable.

Now pivot back to websites: your marketing site is also being “used” by AI systems

Everything above has a web equivalent.

Your landing page is an issue template. Your docs are a contributing guide. Your blog posts are a backlog of claims.

And AI discovery systems are deciding:

  • Is this page worth citing?
  • Is it safe to recommend?
  • Is it current?
  • Is it specific?
  • Does it conflict with other sources?
  • Does it look like it’s trying to manipulate the model?

This last one matters more than people admit. Over-optimized, overly promotional content often reads like spam to humans and models.

If you’re worried about detection, not just citation, this is relevant: Google detect AI content signals. Not because “AI content is bad”, but because low-effort patterns are becoming a trust smell.

GEO-minded teams should redesign content for extraction, citation, and safe action

Here’s the practical part. What do you do Monday morning.

1. Create “quotable blocks” on purpose

If you want to get cited, you need compact sections that can be lifted with minimal distortion.

Patterns that work:

  • short definitions
  • step-by-step lists
  • tables of comparisons
  • clear “when to use vs not use”
  • “common mistakes” bullets
  • mini FAQs with direct answers

This is not gimmicky. It’s helping the model not screw up your meaning.

And yes, it also helps featured snippets. Same shape.

2. Make canonical sources painfully clear

AI systems do source selection. If your site has ten overlapping posts that contradict each other, you’re making it harder.

Do:

  • one canonical “guide” page per concept
  • strong internal links from related posts
  • visible “last updated” dates when it matters
  • consistent terminology

This also ties into E-E-A-T style trust. If you’re building credibility signals, this is useful: E-E-A-T AI signals to improve.

3. Think in “retrieval chunks”, not just page-level SEO

LLMs rarely use your whole page. They use chunks.

So structure matters:

  • descriptive H2s that stand alone
  • short paragraphs
  • lists with clear nouns
  • avoid pronouns that require context (this, that, it) in key definitions
  • avoid clever section titles that mean nothing out of context

You’re basically writing so a section can survive being copied into a completely different interface.

4. Adopt llms.txt style thinking, but don’t treat it like robots.txt

People keep looking for a single file that “solves” AI visibility.

It doesn’t. But the mindset is right: create a machine-friendly map of what matters.

If you’re exploring this, read: llms.txt for GEO, not robots.txt.

The bigger point is you should curate:

  • your best pages to cite
  • your policy pages
  • your definitions and terminology
  • your docs hubs
  • your dataset and research pages if you have them

5. Align content with agent workflows, not just human reading

Agents and assistants often operate like:

  • user asks a question
  • assistant suggests a plan
  • assistant recommends tools
  • assistant provides steps
  • user asks follow-ups

So build pages that support this chain:

  • “what it is”
  • “how it works”
  • “how to implement”
  • “checklist”
  • “templates”
  • “troubleshooting”
  • “examples”

And if you’re in SEO and content automation, it helps to show the workflow end to end. Not just features. A good reference point: AI SEO content workflow that ranks.

Machine-readable signals that actually help (without turning your site into a robot brochure)

A quick hit list that tends to matter for AI discovery and citation:

  • Schema where it fits (Organization, Product, Article, FAQPage, HowTo)
  • Clean HTML hierarchy (real headings, real lists)
  • Fast load and stable rendering (yes, still matters)
  • Clear authorship (about pages, author bios, editorial policy)
  • Citations and external references when you claim facts
  • Consistent internal linking to canonical pages
  • Transparent updates when things change

And, quietly, one of the biggest: don’t hide the lede. If your page takes 600 words to say what it does, models may never reach the point.

Where SEO.software fits in this (and why this is not just an “OSS maintainer” conversation)

The joke post is about GitHub. But the exact same dynamics are hammering marketing teams right now.

AI answers are compressing the top of funnel. Google AI Overviews, AI Mode, ChatGPT browsing, Perplexity citations. Users get “good enough” without clicking. Or they click only when they trust the source.

If you haven’t felt that yet, you will. And the response is not “write more content”. It’s “write content that gets selected”.

That’s basically the promise of GEO.

If you want to go deeper on the broader traffic impact and what to do about it, this pairs well: Google AI summaries killing website traffic: how to fight back.

And if you’re building an AI assisted content engine and you want it to produce pages that are structured for ranking and citation, that’s the lane SEO Software plays in. Research, write, optimize, publish. But more importantly, standardize the structure so you’re not reinventing “machine-readable clarity” every time a new writer or model touches your site.

(Also, if you’re experimenting with agentic browsing and “AI visibility” as a concept, this is adjacent and worth having open: Sitefire and agentic web SEO.)

A quick checklist: “Are we ready to be interpreted by agents?”

If you want something you can paste into a doc and argue over with your team, here:

  • Do our key pages have one-paragraph definitions that stand alone?
  • Do we have canonical pages for our main concepts, or 12 overlapping posts?
  • Are our headings specific, or cute?
  • Are “how to” steps explicit enough that an assistant won’t invent missing steps?
  • Do we clearly mark constraints, limitations, and “do not do this” areas?
  • Do we show evidence, examples, and citations where it matters?
  • Are we building pages that can be chunked and still make sense?
  • Do we have a curated map of what we want models to read first?

If you can’t answer these confidently, you’re not behind on “SEO”. You’re behind on interpretability.

Wrap up: stop writing only for humans and only for bots. Write for extraction, citation, and safe action

The satire post is funny because it’s basically saying: “want bots to show up? make your project easy for them to operate on.”

The serious version for websites is: want AI systems to cite you and recommend you? make your content easy to extract, hard to misinterpret, and clearly authoritative.

That’s GEO. Not a trick. More like a formatting and clarity tax we all pay now.

If you want help operationalizing this across a real content pipeline, not just a one-off guide, take a look at SEO Software. The whole point is to ship content that’s rank-ready, but also increasingly, agent-readable. Because classic search bots are not the only audience anymore.

Frequently Asked Questions

The satire humorously suggests intentionally bad practices like publishing vague issues and removing safeguards to attract AI bots to open source repos. Underneath the joke, it highlights a real shift where AI systems actively interact with public software and docs, not just passively indexing but taking actions like opening PRs and running workflows.

AI bots are evolving from passive indexers to active agents that browse, evaluate intent, follow instructions, open issues, draft PRs, call tools, and run workflows. This shift means repositories need to consider 'machine-facing UX' to effectively engage with these intelligent agents.

Vague requests like 'Improve performance' invite AI agents to interpret and fill in the blanks broadly, which aligns with how LLMs operate. While this ambiguity can encourage machine action, it poses risks for maintainers as agents may take unintended or unsafe actions without clear guidance.

Public backlogs, including open issues, discussions, PR templates, and contributing guides, serve as structured task environments or 'agent interfaces.' Labels like 'good first issue' or 'help wanted' combined with detailed reproduction steps and acceptance criteria help both humans and AI agents understand tasks clearly, facilitating effective collaboration or automation.

Removing safeguards reduces friction, allowing more automated actions by AI agents. However, safeguards act as signals indicating risk areas and shape behavior by serving as guardrails and metadata. They enhance interpretability for both humans and machines through disclaimers, policies, citations, and clear instructions on prohibited actions.

GEO focuses on enhancing machine-operational clarity by making content easier for AI systems to extract, interpret, and cite. This includes clear setup steps, test commands, tight issue templates, labeled tasks, and explicit acceptance criteria. GEO helps projects get cited by AI-driven answer engines and supports effective collaboration with agent-based systems.

Ready to boost your SEO?

Start using AI-powered tools to improve your search rankings today.