Capper started because of a TikTok meme. I wanted to make a funny photo of a friend using a reference image, found a tool called Yapper that did it, paid for the SaaS to make one picture, and thought: I could build this and undercut them.
That was the entire business case. No market research. No TAM/SAM/SOM slide. One funny photo.
Yapper is comprehensive and complex — a full-featured image generation platform with extensive tooling. I didn't want to compete on features. I wanted to make funny photos with reference images, and I wanted an API I could call from my terminal. The rest happened because the tool turned out to be useful for more than memes.
Capper (getcapper.com) is an AI image generation tool with a web UI and a REST API. The honest description: it's a convoluted Gemini wrapper. And that's fine.
What it does:
Text-to-image generation via Google Gemini (Pro tier) or Cloudflare Workers AI FLUX (free tier). Reference image support — upload a photo, describe what you want, get a result that incorporates the reference. Video generation via Google Veo. Permanent, shareable URLs for every generated image (no expiry, no presigning). An API designed for programmatic use, not just the web UI.
Two models with intentionally playful names: nano-banana (fast, for iteration) and nano-banana-pro (best quality, for final output). The naming is part of not taking this too seriously.
Invite-only. Manual admin approval. Not because of artificial scarcity — because I haven't prioritized growth while other projects take precedence. It's a tool I use daily that happens to have a front door.
This is where Capper earns its keep. Not as a product with paying users, but as infrastructure that shows up everywhere in my work.
Capper runs as a slash command in my terminal. /capper-generate "gold and gunmetal OG card for 1MR LLC" and the image appears. The API has an llm.txt endpoint — a plain-text spec that AI agents can read to make correct API calls without hallucinating model IDs. Every image on this website's social cards was generated this way.
When ThreadMuse generates social media content batches, each piece gets a branded image. The image generation worker calls Capper's API — Claude extracts a headline from the content, derives a visual aesthetic from the creator's brief, and Capper generates the image. My product is a customer of my other product.
A client's AI blog system uses Capper to generate featured images for every published post. The pipeline runs on the client's hardware, calls the Capper API, and uploads the result to their Shopify store.
A client described how they wanted to add a yacht listing import feature to their PHP broker website. Instead of spending a weekend learning Figma, I prompted Capper to generate the mockup I needed. Same with LinkedIn article images. It's faster than any design tool for "good enough" visuals.
Making funny photos of friends. Still works. Still the best use case.
| Layer | Technology | Role |
|---|---|---|
| Runtime | Bun + Next.js 16 | App shell and API |
| AI (Pro) | Google Gemini (Flash + Pro) | Image generation |
| AI (Free) | Cloudflare Workers AI FLUX | Free tier fallback |
| AI (Video) | Google Veo 3.1 | Video generation |
| Storage | Cloudflare R2 | Permanent image hosting |
| Database | Neon PostgreSQL + Drizzle ORM | Users, images, history |
| Rate Limiting | Upstash Redis | Per-key sliding window |
| Payments | LemonSqueezy | Subscription billing |
| Auth | NextAuth v5 | Magic link + password |
| Deploy | Vercel | Hosting |
The API lifecycle for a Pro generation request:
1. Bearer token validated (SHA-256 hash lookup — raw keys never stored). 2. User status checked (must be APPROVED). 3. Rate limit applied (10 req/min per key via Upstash sliding window). 4. Optional reference image processed (Sharp: resize, strip EXIF, normalize). 5. Generation dispatched in parallel (one Gemini call per requested image, up to 4). 6. Results converted to PNG, uploaded to Cloudflare R2. 7. Metadata saved to PostgreSQL (prompt, model, aspect ratio, URL). 8. If DB save fails, R2 object is deleted (no orphans). 9. Response: permanent public URLs at images.getcapper.com.
Every URL is permanent. No signed URLs, no expiry tokens. Generate it once, share the link forever.
The most interesting technical decision in Capper is the llm.txt endpoint. GET https://getcapper.com/llm.txt returns a plain-text API specification — curl examples, parameter tables, model IDs, error codes — formatted for AI agents to consume.
When Claude Code loads the Capper skill, it fetches this file and has a complete, working API reference. No documentation site to browse, no hallucinated model names, no outdated SDK wrappers. One HTTP GET, complete context, correct code on the first try.
This is how Capper functions as development infrastructure rather than just a product. The API isn't an afterthought bolted onto a web app — it's the primary interface, and the web UI is a convenience layer on top.
This site, ThreadMuse, BlogBot, client work, and personal use — all calling the same API.
Capper is an AI image generation tool with a web UI and REST API, powered by Google Gemini and Cloudflare Workers AI. It generates images from text prompts with optional reference images, stores them permanently on Cloudflare R2, and provides shareable URLs that never expire.
Capper is designed as development infrastructure, not just a consumer product. The API is the primary interface, with an llm.txt endpoint that AI agents can read for complete API specifications. Every generated image gets a permanent, shareable URL. It's invite-only and focused on practical utility over feature comprehensiveness.
Three models across two tiers. Pro tier uses Google Gemini (nano-banana for fast iteration, nano-banana-pro for best quality). Free tier uses Cloudflare Workers AI FLUX for basic generation. Video generation uses Google Veo 3.1.
llm.txt is a plain-text API specification served at getcapper.com/llm.txt. It contains curl examples, parameter tables, model IDs, and error codes formatted for AI agents to consume. When an AI coding assistant fetches this file, it can make correct API calls without documentation browsing or hallucinated parameters.
Yes. The REST API accepts Bearer token authentication and returns JSON with permanent image URLs. It supports text prompts, reference images (base64), multiple aspect ratios, and batch generation of up to 4 images per request on Pro tier.
Capper is primarily a tool used across the builder's own projects — content generation pipelines, client work, development workflows. The invite system exists because growth hasn't been prioritized while other projects take precedence, not as an artificial scarcity mechanism.