ABCsteps lesson path
AI Products Are API Systems
Deconstruct AI app architecture and make a first model API call while separating product experience from backend mechanics. Build one artifact, keep one review trail, and make the work easy to inspect later.
- Lesson
- 11
- Time
- 45 min
- Access
- public lesson
Learning objective
Understand AI products as frontends, servers, providers, and costs.
Lab outcome
Trace an AI request from interface to provider and back.
Module milestone
Build a small full-stack leaderboard with persistent data.
Lesson proof workflow
Read, build, then review the evidence.
- Step 1ReadStart with Model API calls before touching tools.
- Step 2BuildBuild toward: Trace an AI request from interface to provider and back.
- Step 3ReviewReview the evidence using Cost awareness.
Toolchain
AI products are ordinary systems wrapped around model APIs.
These are the practical surfaces used in this lesson. Learn the habit first, then connect it to the wider engineering ecosystem.
Understand provider calls, costs, and responses.
Keep secrets and provider calls off the public client.
Read the structured payload sent to the model.
Proof of work
Leave one inspectable trail from this lesson.
The useful output is not a passive note. It is a small artifact another person can inspect: a working file, a command result, a commit, a screenshot, a README note, or a demo link.
Lesson lab: Trace an AI request from interface to provider and back.
Tool and platform logos are ecosystem references only: no affiliation, endorsement, interview access, hiring promise, salary promise, or placement guarantee.
Build
Produce the artifact
Complete the lab and keep the result visible: Trace an AI request from interface to provider and back.
Record
Save review evidence
Capture what changed, what broke, and how Model API calls became clearer through the work.
Explain
Write the vocabulary
Use your own words for Secret boundaries and Cost awareness; this is what makes the lesson inspectable later.
Skills companies recognize
Translate the lesson into inspectable work language.
This lesson turns one small lab into the language a learner can use in a README, demo note, or technical conversation. The point is not to collect logos; the point is to explain work clearly enough that another engineer can inspect it.
Where this skill appears
AI product teams need engineers who understand the system around the model, not only the chat interface.
Ecosystem references
Platform and company logos are ecosystem references only: no affiliation, endorsement, interview access, hiring preference, salary outcome, or placement guarantee.
README line
Name the artifact
Lab proof: Trace an AI request from interface to provider and back. Connect it to Model API calls so the result reads like work, not a passive note.
Review line
Explain the stack
Use OpenAI, Node.js, JSON to explain Secret boundaries and what changed between the first attempt and the inspected result.
Conversation line
Answer with evidence
If a team asks about Cost awareness, use this proof line: Show the model request path, the server boundary, and where secrets, cost, and response handling live.
Proof translation
Skill signal
Model API calls is the market word. The lesson makes it visible through a small working artifact.
Proof artifact
The inspectable artifact is: Trace an AI request from interface to provider and back.
Interview answer
Use Secret boundaries and Cost awareness to explain what changed, what failed, and how you verified it.
Paid guidance
Read publicly. Upgrade when guidance will help you finish.
This lesson remains part of the public written syllabus. Paid help is online-only and human-led: video walkthroughs as they roll out, live class context, WhatsApp Q&A, and project review around the same work.
No account wall, automated checkout, or placement promise is introduced here. Enrollment stays human-led by WhatsApp or call, and the useful proof remains the learner's own artifact.
Public
Written lesson stays open
Read the prepare and review material for lesson 11 on the public site before buying anything.
Recorded
Recorded and live guidance clarify the work
Paid guidance can add founder-led video walkthroughs as they roll out and live online class context; the teaching explains the work, but does not replace the written lesson.
Human
Questions use real context
When stuck, useful guidance starts from the route, error, screenshot, repo fragment, and the lab artifact: Trace an AI request from interface to provider and back.
Phase 1 · Briefing
Lesson briefing
Before You Study (5 mins)
Lesson focus: ChatGPT, Claude, Gemini, Perplexity, Cursor, GitHub Copilot — every AI product you've used in 2026 looks magical from the outside and is, on the inside, the same kind of system you've already been building. A frontend (the website or app), a backend (a server you control), and a model API (someone else's GPU farm running an LLM). Today we open the box. By the end, you will have written your own ten-line client that talks directly to a real LLM API and gets back a response — proving that the "magic" is mostly architecture.
What you should have ready:
- Lessons 06–10 complete — you have a containerized, tunneled, JSON-aware, API-fluent base
- A free API key from one of: Google AI Studio (Gemini), Groq, OpenAI, or Anthropic
- Node.js installed (
node --versionshould print something) - About 60 minutes
- An honest curiosity about how the apps you use every day actually work
The Concept
A modern AI product has four layers, and once you can see them, you can never un-see them:
┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ 1. Frontend │ ◀▶ │ 2. Backend │ ◀▶ │ 3. Model API │ ◀▶ │ 4. The Model │
│ (chat UI) │ │ (your server) │ │ (provider's API) │ │ (the AI itself) │
└──────────────────┘ └──────────────────┘ └──────────────────┘ └──────────────────┘
Browser / Where YOUR The HTTP The actual
mobile app business logic contract LLM running
(HTML / JSX) + auth + history the provider on someone's
+ rate limits exposes GPUs
+ prompt building
Layer 1 (Frontend) is what the user sees — the input box, the streaming response, the chat history sidebar. Layer 4 (the Model) is the trillion-parameter neural network running on a cluster of GPUs in a datacenter. Everything magical about AI products lives in Layer 4. Everything controllable about AI products lives in Layers 2 and 3.
This separation is what makes the entire LLM-app industry possible. OpenAI, Anthropic, Google, and Meta train and operate the models (Layer 4). Every other company in the AI app space — Cursor, Perplexity, Notion AI, GitHub Copilot, the AI features in Photoshop — builds Layers 1, 2, and 3 and pays the model providers for inference at Layer 4. Mostly per-token: a few cents per million tokens for cheap models, a few dollars for the frontier ones.
The provider API contract is just an HTTP endpoint. Every major LLM provider exposes essentially the same shape:
POST /v1/chat/completions (OpenAI / Groq)
POST /v1/messages (Anthropic)
POST /v1beta/models/... (Google Gemini)
Headers:
Authorization: Bearer YOUR_API_KEY
Content-Type: application/json
Body (JSON):
{
"model": "gemini-2.0-flash" or "gpt-4" or "claude-3-5-sonnet",
"messages": [
{ "role": "system", "content": "You are a helpful assistant." },
{ "role": "user", "content": "Explain Docker in one sentence." }
],
"temperature": 0.7,
"max_tokens": 200
}
Response (JSON):
{
"id": "...",
"choices": [{ "message": { "content": "Docker is a tool for..." }}],
"usage": { "input_tokens": 12, "output_tokens": 18 }
}
That's the entire contract. Three fields you'll meet in every API:
model— which model you want (cheap+fast vs slow+smart)messages— the conversation history; you send the whole history every call (LLM APIs are stateless)temperature— how creative the model is (0 = deterministic, 1+ = wild)
The crucial implication: LLM APIs are stateless. The server forgets you between requests. To have a conversation, your backend (Layer 2) keeps the message history and sends the whole array every time. ChatGPT's "memory" of the last thing you said is your application's memory, not the model's.
Two more concepts you'll use forever:
- Tokens — how the API charges you. A token is roughly 3-4 characters of English text. "Hello world" is 2 tokens. A 1000-word response is ~1300 tokens. Costs are quoted per million tokens (input and output priced separately).
- Streaming vs. non-streaming — non-streaming returns the full response after the model finishes generating; streaming returns each token as soon as it's produced (the typewriter effect in ChatGPT). For latency, you want streaming.
Once you understand "frontend → my server → provider API → model," every AI product becomes legible. Cursor is VS Code (Layer 1) talking to Cursor's backend (Layer 2) talking to Anthropic's Claude API (Layer 3) talking to Claude (Layer 4). Perplexity is a chat UI (Layer 1) talking to their backend (Layer 2) which calls a search API and an LLM API (Layer 3) running models (Layer 4). The architecture is universal.
Quick Concepts
| Term | Simple Meaning |
|---|---|
| LLM | Large Language Model — the actual AI (GPT-4, Gemini, Claude, etc.) |
| API key | The secret string that authenticates your server to the model provider |
| Token | The unit the API charges you in — ~3-4 chars of English text |
| Prompt | The text you send to the model; same as content in a messages array |
| System prompt | A special first message that sets the model's role and constraints |
| Temperature | How random the model's output is (0 = same answer every time) |
| Streaming | Receiving the response token-by-token as it generates |
What We Will Build
By the end of this lesson, you will have done these specific things:
- Picked a provider and got an API key. I recommend Google AI Studio — Gemini has a free tier, no credit card required for starter usage. Save the key to your
.envfile (NEVER commit it):bashecho "GEMINI_API_KEY=your_actual_key_here" >> .env echo ".env" >> .gitignore - Wrote your first ten-line client in Node.js using only
fetch(no SDK):javascriptimport 'dotenv/config' const apiKey = process.env.GEMINI_API_KEY const url = `https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash:generateContent?key=${apiKey}` const response = await fetch(url, { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ contents: [{ parts: [{ text: 'Tell me one engineering joke about Docker.' }] }] }) }) const data = await response.json() console.log(data.candidates[0].content.parts[0].text) - Ran it with
node ai.jsand watched a real LLM response print to your terminal. Noticed the latency (a real measurement, not a vibe). - Inspected the cost — look at
usagein the response (or your provider's dashboard). The first call costs less than a fraction of a paisa; running 1,000 calls might cost ₹2-5. - Tweaked the prompt three times to feel how the same model produces different output for different prompts:
- Same prompt, ran 3 times — observed variation.
- Same prompt with
temperature: 0— observed near-identical output. - Added a system prompt ("You are a senior Linux kernel developer who hates Docker. Answer in 2 sentences.") — observed the persona change.
- Made it a 30-line CLI tool that takes a prompt as a command-line argument:
node ai.js "explain idempotency in 3 sentences". This becomes a Swiss-army knife you'll use for the rest of your life. - Tried streaming (advanced — most providers offer a streaming endpoint). Watched the typewriter effect appear in your terminal.
Step 6 is the lasting takeaway. Once you have a 30-line CLI that talks to an LLM, you have what most AI products fundamentally are, minus the UI polish.
Think About
Before studying, consider:
- ChatGPT.com is a website. The "intelligence" of ChatGPT lives nowhere on chat.openai.com — it lives on OpenAI's training datacenters. The website is a thin client over an API. What does this tell you about who the value goes to in the AI economy? (Hint: model providers, infra companies, and the apps that find unique workflows.)
- Every prompt you send to ChatGPT travels: your browser → OpenAI's web server → their backend → their model API → their inference cluster → back to you. That's at minimum 5 hops. What does this say about latency, privacy, and reliability? Where would you cache, log, or rate-limit if you were running it?
By the End
After this lesson, you'll:
- ✅ Have an API key and know how to keep it out of source control
- ✅ Have written a 10-line LLM client in Node.js using just
fetch - ✅ Have made a real model API call and seen the response in your terminal
- ✅ Understand the four-layer architecture of every AI product
- ✅ Know what tokens, temperature, and system prompts are
- ✅ Have a 30-line CLI tool you'll use for the rest of your life
- ✅ Be ready for Lesson 12 — putting your own backend between the user and the model
The curtain is drawn. The wizard is just an HTTP endpoint. 🎭
Next lesson · 12
Frontend and Backend: The Full Picture
Map the browser, server, and database responsibilities before building a leaderboard system.