How to Attract AI Bots to Your Open Source Project: What It Reveals About AI Discovery and GEO
A satirical post about attracting AI bots exposes something real: AI systems are already discovering, interpreting, and acting on public web content.

A couple days ago I read a fresh post that’s framed like satire. It’s basically a “how to” guide for maintainers who want to attract AI bots to their open source repo.
The advice is intentionally bad.
Publish vague issues. Remove safeguards. Make it easy for agents to open PRs, run tasks, and “help” without asking. The kind of thing you read and laugh, then you wince a bit because the joke lands a little too cleanly.
Here’s the post if you want the original context: How to attract AI bots to your open source project.
And the reason it’s doing well already is telling. Live search is surfacing it with a featured snippet and direct results, even though it’s brand new.
So yeah, it’s funny. But it’s not just funny.
Under the joke is a very real shift: AI systems are already crawling, interpreting, summarizing, and in some cases taking action on public software, docs, and websites. And they don’t “read” like humans read. They parse. They extract. They classify. They decide whether something looks actionable, reliable, and safe enough to use.
Which is exactly where GEO comes in.
The punchline is “attract bots”. The real story is “AI discovery is changing”
The satire works because it flips a familiar worldview.
Old worldview: your repo and docs are for humans, and bots are mostly passive. They index pages. Maybe they scrape. They don’t really do anything.
New worldview: some bots are not passive. They’re agents. They browse. They evaluate intent. They follow instructions. They open issues, draft PRs, call tools, run workflows. Not always, not safely, not perfectly. But enough that “machine-facing UX” is now a thing.
And once you accept that, a bunch of stuff changes:
- A GitHub issue is not just a message to a maintainer. It’s a machine-readable task spec.
- A README is not just onboarding. It’s an agent prompt.
- A changelog is not just history. It’s a risk signal.
- A docs page is not just education. It’s a retrieval target for answer engines.
This is also why the “featured snippet” moment matters. Classic search is already blending into answer extraction, and answer extraction is already feeding LLM interfaces. So you get this weird loop where:
- A post is written as a joke.
- Search systems extract it because it matches an emerging query pattern.
- AI systems ingest that extraction and amplify the framing.
- Teams see the traction and copy the pattern.
That’s AI discovery in 2026. Not a librarian. More like a panel of impatient interns who skim, highlight, and sometimes… ship.
What the satire gets right (even while pretending to be wrong)
Let’s pull the “real” lessons out of the joke, without adopting the cursed advice.
1. Vague requests are surprisingly machine-friendly
A vague issue like “Improve performance” sounds useless to a human because it’s underspecified.
To an agent, it can be an invitation. It can interpret it as permission to do something broad. Search for hotspots, try micro-optimizations, propose dependency upgrades, refactor something. It fills in blanks because that’s literally what LLMs do.
So the satire is poking at a real tension: ambiguity creates space for machines to act.
For maintainers, that’s dangerous. For marketers and SEO strategists, it’s illuminating. Because it’s the same mechanism behind why AI answers sometimes confidently fill gaps on your site with implied context. If your page is vague, models will “complete” it.
2. Public backlogs are becoming instruction surfaces
Open issues, discussions, PR templates, contributing guides. These used to be community scaffolding.
Now they’re also “agent interfaces”.
If you publish labels like good first issue or help wanted and you provide reproduction steps and acceptance criteria, you’re not just helping humans. You’re creating a structured task environment that an agent can pick up.
That can be good, if you control it. It can also be chaos, if you don’t.
3. Removing safeguards increases agent throughput (but it’s still a bad idea)
The joke about weakening safeguards is grim because it’s technically true. Less friction means more automated action.
But the real takeaway isn’t “remove friction”. It’s: your friction points are signals.
They tell humans and machines where risk exists. They shape behavior. They act like guardrails and metadata.
The equivalent on the web side is stuff like disclaimers, policy pages, canonical sources, citations, and clear “do not do X” sections. These are not just legal padding. They’re interpretability anchors.
4. The repo that looks easiest to act on will get acted on
Agents prioritize “paths to completion”.
If a repo has:
- clear setup steps
- a test command
- a tight issue template
- labeled tasks
- explicit acceptance criteria
…it’s easier to take action on.
Same with websites. The site that’s easiest to extract from and cite is the one that gets cited. GEO is partly about content quality, sure, but it’s also about machine-operational clarity.
If this idea is new in your org, you’ll like this framing: Generative Engine Optimization (GEO): get cited by AI.
AI crawlers do not behave like classic search bots. They behave like readers with objectives
Classic SEO trained us to think about:
- crawlability
- indexation
- ranking signals
- snippets
- backlinks
Still important. But incomplete.
Now you also have:
- retrieval systems that chunk your page
- embeddings that match by meaning, not keywords
- answer engines that synthesize across sources
- agents that look for “next steps” and “how to”
Which means the question is no longer only “can Google crawl this”.
It’s also:
- Can a model extract the right part of this page without losing nuance?
- Will it interpret this as authoritative or risky?
- Does it contain quotable, bounded statements?
- Does it present steps and constraints clearly enough that an agent won’t freestyle?
This is why some GEO playbooks look a bit like “write docs for machines”. Because that’s basically what you’re doing.
If you want a tactical overview of being cited in AI answers, this is worth reading: GEO playbook for getting cited in AI answers.
The documentation shift: from “help humans” to “help humans and be safely machine-readable”
You don’t need to turn docs into robotic schema soup. (Please don’t.) But you do need to accept that docs are now dual-use:
- Human consumption
- Machine extraction and action
So what does “machine-readable” actually mean in practice?
It usually means:
- explicit definitions
- tight scope
- consistent formatting
- minimal implied context
- clear boundaries
It’s not about sounding formal. It’s about making it hard to misinterpret.
A simple test you can run
Pick any page in your docs and ask:
- If an LLM only saw two paragraphs of this page, would it still understand the core point?
- If it only saw a single bullet list, would it mislead someone?
- If it extracted one sentence, would that sentence still be true without the surrounding nuance?
If the answer is “no”… you have an extraction problem, not just a writing problem.
Practical recommendations for open source repos (that also map cleanly to GEO)
Let’s talk repo mechanics for a second, because the patterns are useful even if you’re not an OSS maintainer.
1. Write issues like tasks, not vibes
A lot of issue pages are vague because humans can ask follow-up questions.
Agents can’t. Or rather, they can, but they often won’t.
A strong issue template should include:
- Goal: what “done” means in one sentence
- Context: why it matters
- Constraints: what not to change
- Repro steps: if it’s a bug
- Acceptance criteria: bullet list
- Links to relevant code/docs
- Security note: any sensitive areas or “never do this” guidance
This helps humans. It also reduces agent hallucination. And that’s the big theme: reduce ambiguous space where machines invent.
2. Add “machine obvious” labels and states
Labels are not just organization. They’re routing.
Consider labels like:
needs specblockedsecurity-sensitivesafe-to-automategood first issue(but be careful, agents love this one)needs human decision
If you run an open repo, you can think of this as a kind of access control layer. Not permissions, but clarity.
3. Make CONTRIBUTING.md explicit about automation
Most contributing guides speak to humans. Add a section that speaks to agents, without actually calling it that.
Example topics:
- PRs that touch auth, billing, permissions require maintainer review
- no dependency bumps unless requested
- never run network calls in tests
- how to handle secrets
- what evidence is required in PR descriptions
Not because “robots will obey”, but because these documents become part of the retrieval context that tools use when deciding what’s allowed.
4. Treat README as a prompt, because it is one
Agents use README content the way humans use it, but faster and with less patience.
Make sure your README has:
- what the project does (not a slogan)
- who it’s for
- how to install
- how to run tests
- basic usage examples
- known limitations
- links to canonical docs
Also, avoid burying critical constraints in a long narrative paragraph. Put them in bullets. Make them extractable.
Now pivot back to websites: your marketing site is also being “used” by AI systems
Everything above has a web equivalent.
Your landing page is an issue template. Your docs are a contributing guide. Your blog posts are a backlog of claims.
And AI discovery systems are deciding:
- Is this page worth citing?
- Is it safe to recommend?
- Is it current?
- Is it specific?
- Does it conflict with other sources?
- Does it look like it’s trying to manipulate the model?
This last one matters more than people admit. Over-optimized, overly promotional content often reads like spam to humans and models.
If you’re worried about detection, not just citation, this is relevant: Google detect AI content signals. Not because “AI content is bad”, but because low-effort patterns are becoming a trust smell.
GEO-minded teams should redesign content for extraction, citation, and safe action
Here’s the practical part. What do you do Monday morning.
1. Create “quotable blocks” on purpose
If you want to get cited, you need compact sections that can be lifted with minimal distortion.
Patterns that work:
- short definitions
- step-by-step lists
- tables of comparisons
- clear “when to use vs not use”
- “common mistakes” bullets
- mini FAQs with direct answers
This is not gimmicky. It’s helping the model not screw up your meaning.
And yes, it also helps featured snippets. Same shape.
2. Make canonical sources painfully clear
AI systems do source selection. If your site has ten overlapping posts that contradict each other, you’re making it harder.
Do:
- one canonical “guide” page per concept
- strong internal links from related posts
- visible “last updated” dates when it matters
- consistent terminology
This also ties into E-E-A-T style trust. If you’re building credibility signals, this is useful: E-E-A-T AI signals to improve.
3. Think in “retrieval chunks”, not just page-level SEO
LLMs rarely use your whole page. They use chunks.
So structure matters:
- descriptive H2s that stand alone
- short paragraphs
- lists with clear nouns
- avoid pronouns that require context (this, that, it) in key definitions
- avoid clever section titles that mean nothing out of context
You’re basically writing so a section can survive being copied into a completely different interface.
4. Adopt llms.txt style thinking, but don’t treat it like robots.txt
People keep looking for a single file that “solves” AI visibility.
It doesn’t. But the mindset is right: create a machine-friendly map of what matters.
If you’re exploring this, read: llms.txt for GEO, not robots.txt.
The bigger point is you should curate:
- your best pages to cite
- your policy pages
- your definitions and terminology
- your docs hubs
- your dataset and research pages if you have them
5. Align content with agent workflows, not just human reading
Agents and assistants often operate like:
- user asks a question
- assistant suggests a plan
- assistant recommends tools
- assistant provides steps
- user asks follow-ups
So build pages that support this chain:
- “what it is”
- “how it works”
- “how to implement”
- “checklist”
- “templates”
- “troubleshooting”
- “examples”
And if you’re in SEO and content automation, it helps to show the workflow end to end. Not just features. A good reference point: AI SEO content workflow that ranks.
Machine-readable signals that actually help (without turning your site into a robot brochure)
A quick hit list that tends to matter for AI discovery and citation:
- Schema where it fits (Organization, Product, Article, FAQPage, HowTo)
- Clean HTML hierarchy (real headings, real lists)
- Fast load and stable rendering (yes, still matters)
- Clear authorship (about pages, author bios, editorial policy)
- Citations and external references when you claim facts
- Consistent internal linking to canonical pages
- Transparent updates when things change
And, quietly, one of the biggest: don’t hide the lede. If your page takes 600 words to say what it does, models may never reach the point.
Where SEO.software fits in this (and why this is not just an “OSS maintainer” conversation)
The joke post is about GitHub. But the exact same dynamics are hammering marketing teams right now.
AI answers are compressing the top of funnel. Google AI Overviews, AI Mode, ChatGPT browsing, Perplexity citations. Users get “good enough” without clicking. Or they click only when they trust the source.
If you haven’t felt that yet, you will. And the response is not “write more content”. It’s “write content that gets selected”.
That’s basically the promise of GEO.
If you want to go deeper on the broader traffic impact and what to do about it, this pairs well: Google AI summaries killing website traffic: how to fight back.
And if you’re building an AI assisted content engine and you want it to produce pages that are structured for ranking and citation, that’s the lane SEO Software plays in. Research, write, optimize, publish. But more importantly, standardize the structure so you’re not reinventing “machine-readable clarity” every time a new writer or model touches your site.
(Also, if you’re experimenting with agentic browsing and “AI visibility” as a concept, this is adjacent and worth having open: Sitefire and agentic web SEO.)
A quick checklist: “Are we ready to be interpreted by agents?”
If you want something you can paste into a doc and argue over with your team, here:
- Do our key pages have one-paragraph definitions that stand alone?
- Do we have canonical pages for our main concepts, or 12 overlapping posts?
- Are our headings specific, or cute?
- Are “how to” steps explicit enough that an assistant won’t invent missing steps?
- Do we clearly mark constraints, limitations, and “do not do this” areas?
- Do we show evidence, examples, and citations where it matters?
- Are we building pages that can be chunked and still make sense?
- Do we have a curated map of what we want models to read first?
If you can’t answer these confidently, you’re not behind on “SEO”. You’re behind on interpretability.
Wrap up: stop writing only for humans and only for bots. Write for extraction, citation, and safe action
The satire post is funny because it’s basically saying: “want bots to show up? make your project easy for them to operate on.”
The serious version for websites is: want AI systems to cite you and recommend you? make your content easy to extract, hard to misinterpret, and clearly authoritative.
That’s GEO. Not a trick. More like a formatting and clarity tax we all pay now.
If you want help operationalizing this across a real content pipeline, not just a one-off guide, take a look at SEO Software. The whole point is to ship content that’s rank-ready, but also increasingly, agent-readable. Because classic search bots are not the only audience anymore.