How it works

Submitting a repo runs four steps: parse the URL, fetch a snapshot, stream the wiki from a language model, and persist the result. Each step is small and visible from the UI.

1. Parse the URL

The form on the home page posts to a server action (submitRepo). The first thing it does is normalize the input — github.com/owner/repo, https://github.com/owner/repo, and owner/repo all reduce to the same { owner, name, url }shape. Anything that doesn't parse redirects you back to /?error=invalid_url with an inline message.

If you ticked "Add to public Library", the action then checks whether a listed wiki for that repo already exists. If so, you're sent straight to the existing slug — no duplicate generation, no duplicate cost.

2. Insert a placeholder, redirect with ?stream=1

A new row is inserted in the wikis table with empty content and the chosen listed flag. The action then redirects to the slug URL with ?stream=1 appended. That query flag is the only signal the page needs to know it should start a generation; without it, the page just renders whatever is already in the database.

3. Fetch a repo snapshot

When the streaming page mounts, it POSTs to /api/generate. The handler looks up the wiki by slug, then calls fetchRepoSnapshot() in src/lib/github.ts. That function:

Reads repo metadata (default branch, stars, topics, primary language).
Walks the full git tree at the default branch's tip commit.
Filters out anything under noisy directories — node_modules, dist, build, .next, target, vendor, .cache, and friends.
Scores remaining files with a small heuristic (see below).
Greedily picks the top-scoring files until it hits the cap: 40 files, 120 KB total, 12 KB per file. Anything larger is skipped, not truncated.
Fetches the blob contents in parallel.

The file scoring heuristic

sticky doesn't embed the repo or do any retrieval — it just picks files it thinks a reader would want a summary of. The rules are intentionally simple:

Always include READMEs and well-known config files (package.json, pyproject.toml, Cargo.toml, go.mod, tsconfig.json, Dockerfile, etc.).
Reward source code in canonical directories — src/, lib/, app/, pkg/, internal/, cmd/.
Reward index.* files. Penalize tests, specs, and .d.ts.
Penalize depth — shallow files describe the project; deep files describe leaves.
Only consider files whose extension looks readable to a language model (most popular source extensions plus Markdown, JSON, YAML, TOML).

This isn't pretending to be an architecturally sound retrieval system. It's a guess at "what a smart reader would skim first" — and it works surprisingly well for repos under a few thousand files. Bigger repos get less coverage; see Limits & costs.

4. Stream the wiki

The snapshot is rendered into a single prompt and handed to streamText from the Vercel AI SDK. The model is anthropic/claude-sonnet-4-6, called through the Vercel AI Gateway — no provider keys live in the app, the function authenticates with Vercel's OIDC token.

The system prompt fixes the output shape (H1, summary line, Overview, Architecture, Key Modules, Getting Started, Notable Details) and the tone (sentence case, no emoji, no marketing words like "powerful" or "robust"). The user prompt is the repo snapshot — metadata, the truncated file tree, and the selected file contents, each prefixed with a path header.

Tokens stream straight to the client over a plain text/plain response and render through react-markdown with remark-gfm. You see the wiki being written, top to bottom, as it's being written.

5. Persist on finish

When the model finishes, the route's onFinish callback writes the full Markdown back to the same wiki row — along with the extracted summary (the first non-heading line), the default branch, the commit SHA the snapshot was taken at, and the model identifier.

After that, the slug URL is permanently cacheable: hitting /w/[slug] with no ?stream=1reads from Neon and renders the saved Markdown directly. There's no "regenerate" button yet — the stored commitShais there so we can add one later without losing the version it was generated against.

Next: Limits & costs— what fits, what doesn't, and what each generation costs in practice.