[004] / SELECTED WORK

·PROJECT

MULTI MODEL

Ask every AI at once. Side-by-side comparison of GPT, Claude, Gemini, DeepSeek, and more — with blind mode, synthesis, and a credit system.

ROLE:
Solo developer
TIMELINE:
April 2025 – Present
STATUS:
Ongoing
STACK:
Next.js 15, TypeScript, Neon Postgres

A web app that runs the same prompt against multiple LLMs simultaneously and surfaces the differences. Built solo to scratch my own itch: I was constantly tabbing between ChatGPT and Claude to compare answers, and I wanted one place to do it with bias removed.

[METRICS]

0+
MODELS SUPPORTED
0
PARALLEL RESPONSES
Solo
BUILD
Ongoing
STATUS

[STACK]

  • Next.js 15
  • TypeScript
  • Neon Postgres
  • Drizzle ORM
  • OpenRouter
  • Neon Auth (Better Auth)
  • Vercel

[01]

THE PROBLEM

Every serious AI user has a tab problem. GPT for one thing, Claude for another, Gemini open for a third opinion — and no easy way to compare them without copying a prompt three times and mentally reconciling the outputs. I wanted a single interface that ran the same prompt against multiple models simultaneously, with a blind mode so I could evaluate quality before I knew which model produced it.

[02]

WHAT I BUILT

A Next.js 15 app (App Router) that connects to OpenRouter's unified API — one key, access to every major LLM. Users pick up to three models, type a prompt once, and see responses stream in side by side. Blind mode hides model labels during reading so evaluation is unanchored from brand. A "combine" action synthesizes the three responses into one with a fourth model call. One-click analyses — pick a winner, TL;DR, key differences — run across the full response set. Under the hood: Neon Postgres + Drizzle ORM for chat history and user state, Neon Auth (Better Auth, hosted) for sessions, and a credit system with pre-debit + reconcile billing so costs are always covered before streaming starts.

[03]

THE BILLING MODEL

LLM costs are variable and hard to cap cleanly on streaming responses — you don't know the final token count until the stream ends. I solved this with a pre-debit + reconcile pattern: each request pre-debits ~40% of the worst-case token cost, writes a `pending_charges` row, then reconciles against actual usage when the stream's final chunk arrives with real token counts. A 15-minute Vercel cron job sweeps any rows that were orphaned by dropped connections (SIGKILL safety net). Anonymous users get 7 free trial messages tracked by an opaque cookie before being prompted to sign in.

[04]

THE BLIND MODE INSIGHT

This was the most interesting design decision. The first version showed model names next to every response — and I noticed I was reading Claude answers more charitably than DeepSeek ones before I'd even finished the first sentence. Blind mode inverts the flow: you read, you pick a preferred answer, then the labels reveal. It doesn't change the content, but it completely changes how you engage with it. The feature is simple to implement and hard to replicate with separate tabs.

[05]

WHAT I'D DO DIFFERENTLY

The credit system was architected correctly but implemented later than it should've been — I added it after the core chat was working, which meant retrofitting the pre-debit logic around an existing streaming architecture. Building the billing model first, or at least alongside the stream handler, would've saved a refactor pass. The data model for pending charges is clean now, but it took two iterations to get there.

[UP NEXT]

001 / EUNO