Adding AI to Existing Apps With OpenRouter — One Endpoint, Many Models
In 2023, adding AI to an app meant signing up with OpenAI, copying their SDK, and writing code against their specific API. That worked fine until you wanted ...
One Endpoint, Many Models
In 2023, adding AI to an app meant signing up with OpenAI, copying their SDK, and writing code against their specific API. That worked fine until you wanted to compare GPT-4 against Claude — now you needed Anthropic's SDK, a separate billing account, different request shapes, different streaming protocols, and a parallel code path. Add Gemini and you had three SDKs, three accounts, three sets of pricing pages, three rate-limit policies. The complexity grew faster than the value.
OpenRouter is the answer to that mess. It is a single API endpoint that proxies to every major model from every major provider — OpenAI, Anthropic, Google, Mistral, Meta, DeepSeek, xAI, and dozens more. You pay OpenRouter; OpenRouter pays the providers. You write your code against OpenAI's chat completion format (the de facto standard), and you change models by changing one string.
This article is what OpenRouter is, why the pattern won, and how to add AI to an existing application without locking yourself into a single provider.
What OpenRouter Actually Does
OpenRouter is a thin proxy with three jobs:
- Translate between the OpenAI-compatible chat completion format and whatever each underlying provider actually wants. The OpenAI format is the lingua franca; you write requests in that shape; OpenRouter handles the translation per-provider behind the scenes.
- Route requests to the cheapest or fastest available endpoint for the model you asked for. Some models are served by multiple providers (LLaMA 3 by Together AI, Anyscale, and others), and OpenRouter picks the best one available at request time.
- Bill under one account. You add credits or set up monthly billing; you never deal with the underlying provider's billing.
That is the whole product. The simplicity is the value.
A Concrete Request
A complete request to OpenRouter, in plain HTTP:
curl https://openrouter.ai/api/v1/chat/completions \
-H "Authorization: Bearer $OPENROUTER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "anthropic/claude-3.5-sonnet",
"messages": [
{"role": "system", "content": "You are a helpful coding tutor."},
{"role": "user", "content": "Explain async/await in JavaScript in two sentences."}
]
}'
The response comes back in the OpenAI shape:
{
"id": "gen-1745678901-abc",
"model": "anthropic/claude-3.5-sonnet",
"choices": [
{
"message": {
"role": "assistant",
"content": "async/await is syntax for working with Promises that lets you write asynchronous code that reads top-to-bottom like synchronous code. An async function always returns a Promise, and await pauses execution until the Promise it is given resolves."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 28,
"completion_tokens": 51,
"total_tokens": 79
}
}
Change model to openai/gpt-4o-mini, send the exact same request — you get the equivalent response from a completely different provider. Change to google/gemini-2.5-flash, same again. The portability is real.
The Models Worth Knowing in 2026
OpenRouter's catalog has hundreds of models. Most of them are noise. The ones worth memorizing for everyday work:
| Model | Provider | Good at | Cost (in/out per 1M tokens) |
|---|---|---|---|
openai/gpt-4o-mini | OpenAI | Cheap general-purpose | $0.15 / $0.60 |
openai/gpt-4o | OpenAI | High-quality general | $5.00 / $20.00 |
anthropic/claude-3.5-haiku | Anthropic | Cheap, fast, good writing | $1.00 / $5.00 |
anthropic/claude-3.5-sonnet | Anthropic | Coding, careful reasoning | $3.00 / $15.00 |
google/gemini-2.5-flash | Cheap, huge context | $0.30 / $2.50 | |
google/gemini-3-flash-preview | Latest, thinking mode | $0.50 / $3.00 | |
meta-llama/llama-3.3-70b | Meta (open) | Self-hostable backup | varies |
deepseek/deepseek-chat | DeepSeek | Cheap reasoning | $0.27 / $1.10 |
A practical rule of thumb: start with a cheap model (gpt-4o-mini, gemini-2.5-flash, or claude-3.5-haiku). If the output is good enough, you are done. If it is not, upgrade to a more expensive model and measure the lift. Most production AI features run on models you would have called "small" two years ago.
Adding AI to an Existing App
Suppose you have an existing Node.js backend and you want to add a "summarize this article" endpoint. The full implementation is short:
// server.js
import express from 'express'
const app = express()
app.use(express.json())
app.post('/api/summarize', async (req, res) => {
const { article } = req.body
if (!article || article.length < 100) {
return res.status(400).json({ error: 'article too short' })
}
const response = await fetch(
'https://openrouter.ai/api/v1/chat/completions',
{
method: 'POST',
headers: {
Authorization: `Bearer ${process.env.OPENROUTER_API_KEY}`,
'Content-Type': 'application/json',
'HTTP-Referer': 'https://example.com',
'X-Title': 'Article Summarizer',
},
body: JSON.stringify({
model: 'google/gemini-2.5-flash',
messages: [
{
role: 'system',
content: 'You summarize articles in 3 short bullets.',
},
{ role: 'user', content: article },
],
max_tokens: 200,
}),
}
)
const data = await response.json()
if (data.error) {
return res.status(500).json({ error: data.error.message })
}
res.json({
summary: data.choices[0].message.content,
usage: data.usage,
})
})
app.listen(3000)
Forty lines. The only secret on the server is OPENROUTER_API_KEY. Switching from Gemini Flash to GPT-4o is a one-character change to the model string. You are not coupled to any provider's SDK.
The two non-obvious headers (HTTP-Referer and X-Title) are optional but recommended — OpenRouter uses them for their public leaderboards and analytics, and including them keeps you in good standing.
What Lives Where
If you have read the Frontend vs Backend article, the placement decisions for AI features are obvious. The OpenRouter API key is a secret. Therefore the call lives on the backend. The frontend calls your API; your API calls OpenRouter.
Two anti-patterns to avoid:
- Calling OpenRouter from the frontend. The API key would be visible in the browser, anyone could steal it, and your monthly bill becomes someone else's free entertainment. There is no scenario where this is the right choice in production.
- Letting users dictate the model. If you accept
modelfrom the request body and pass it through to OpenRouter, a user can pick the most expensive model in the catalog and you get the bill. Pin the model server-side; expose a small named-tier interface ("fast" / "balanced" / "premium") if you need user choice.
Streaming for Better UX
The example above waits for the full response before returning anything. For longer outputs (a 500-word essay, a generated email), this means the user stares at a spinner for several seconds. Almost every modern AI UI streams the response token-by-token instead.
OpenRouter supports the OpenAI-compatible streaming format. Add stream: true to the request body and the response becomes a text/event-stream of partial chunks:
const response = await fetch(/* ... */, {
body: JSON.stringify({ model: '...', messages: [...], stream: true }),
})
const reader = response.body.getReader()
const decoder = new TextDecoder()
while (true) {
const { value, done } = await reader.read()
if (done) break
const chunk = decoder.decode(value)
// parse SSE format, extract delta.content, send to frontend
}
The frontend uses the same Server-Sent Events format to display tokens as they arrive. The implementation is a hundred lines once including the SSE parsing — most production AI apps do this and the UX difference is real.
Cost Control — A Real Concern
A poorly designed AI feature can cost real money quickly. A few practical guardrails every production system should have:
- Set
max_tokens. Without it, a runaway model can produce thousands of output tokens for a tiny prompt. Output is the expensive side. - Validate input length. Cap how big a prompt the user can send. A user who pastes a 50,000-word document is using 50x the tokens of a normal request.
- Rate-limit per user. A single user firing requests in a loop can run up your bill faster than you can patch.
- Log usage in your database. Store the
usageobject from every response so you can charge users per-call later or just see who is costing you what. - Set monthly billing caps in the OpenRouter dashboard. Hard ceiling so a bug or attack cannot bankrupt you overnight.
Most production AI features cost a few cents per user per month. Without these guardrails, the same features can cost dollars per call when something goes wrong. The guardrails are not premature optimization — they are the difference between a feature you can ship and a feature that becomes a finance problem.
Open Models, Self-Hosting, and the Long View
OpenRouter also serves open models — LLaMA, Mistral, DeepSeek, Qwen — through commodity hosts at lower margins. For most apps, paying OpenRouter to serve them is fine. For high-volume use cases or specific privacy requirements, you can self-host the same open models on a GPU server and remove OpenRouter from the loop entirely. The code does not change much; you just point the SDK at your own endpoint instead of OpenRouter's.
The ecosystem is moving toward an OpenAI-compatible interface as the universal standard. Servers like vLLM, llama.cpp's HTTP server, and Ollama all expose this interface. So even if you decide to self-host later, the code you write today against OpenRouter ports cleanly. That portability is the whole point.
Where This Fits
Lesson 16 of the ABCsteps curriculum has you wire AI into your existing project. OpenRouter is the gateway the curriculum uses, for exactly the reasons in this article — one API, many models, no vendor lock-in. With the mental model here, the lesson's keystrokes will feel like assembling familiar parts. By the end, you will have an application that uses an LLM where it adds value, and you will know how to swap the underlying model without rewriting the application.
Apply this hands-on · Module D
Adding AI to Your App
Lesson 16 wires AI into the project. OpenRouter is the gateway the curriculum uses. This article explains the one-endpoint, many-models pattern that lets you swap providers without rewriting code.
Open lesson