ABCsteps lesson path

AI Products Are API Systems

Deconstruct AI app architecture and make a first model API call while separating product experience from backend mechanics. Build one artifact, keep one review trail, and make the work easy to inspect later.

Lesson: 11; Module C
Time: 45 min; read, build, review
Access: public lesson; no account wall

Start reading Module overview

Learning objective

Understand AI products as frontends, servers, providers, and costs.

Lab outcome

Trace an AI request from interface to provider and back.

Module milestone

Build a small full-stack leaderboard with persistent data.

Lesson proof workflow

Read, build, then review the evidence.

Step 1ReadStart with Model API calls before touching tools.
Step 2BuildBuild toward: Trace an AI request from interface to provider and back.
Step 3ReviewReview the evidence using Cost awareness.

Toolchain

AI products are ordinary systems wrapped around model APIs.

These are the practical surfaces used in this lesson. Learn the habit first, then connect it to the wider engineering ecosystem.

OpenAIModel API

Understand provider calls, costs, and responses.

Node.jsServer boundary

Keep secrets and provider calls off the public client.

JSONRequest shape

Read the structured payload sent to the model.

Market vocabulary

Companies do not hire because a logo appears on a page. They screen for evidence that you can explain the work. AI product teams need engineers who understand the system around the model, not only the chat interface.

Model API callsSecret boundariesCost awarenessAI request flow

Skill signal

Model API calls is the market word; OpenAI, Node.js, JSON make that word visible through actual work.

Team language

AI product engineering may use different internal stacks, but they still look for Secret boundaries explained clearly.

Proof answer

Show the model request path, the server boundary, and where secrets, cost, and response handling live.

Team surfaces

AI product engineeringBackend platformApplied ML tooling

Search terms to recognize

Model API calls and Secret boundaries are the words to recognize in job posts, project briefs, and technical interviews.

Portfolio proof

Show the model request path, the server boundary, and where secrets, cost, and response handling live. Use OpenAI, Node.js, JSON in your own words instead of only listing a certificate.

Team surface

The same pattern shows up around AI product engineering, Backend platform, Applied ML tooling even when each company uses a different internal stack.

Industry context

These team surfaces are learning context, not a promise of access. Platform and company logos here are ecosystem references only: no affiliation, endorsement, interview access, hiring promise, salary promise, or placement guarantee.

Proof of work

Leave one inspectable trail from this lesson.

The useful output is not a passive note. It is a small artifact another person can inspect: a working file, a command result, a commit, a screenshot, a README note, or a demo link.

Lesson lab: Trace an AI request from interface to provider and back.

Tool and platform logos are ecosystem references only: no affiliation, endorsement, interview access, hiring promise, salary promise, or placement guarantee.

Build

Produce the artifact

Complete the lab and keep the result visible: Trace an AI request from interface to provider and back.

Record

Save review evidence

Capture what changed, what broke, and how Model API calls became clearer through the work.

Explain

Write the vocabulary

Use your own words for Secret boundaries and Cost awareness; this is what makes the lesson inspectable later.

Skills companies recognize

Translate the lesson into inspectable work language.

This lesson turns one small lab into the language a learner can use in a README, demo note, or technical conversation. The point is not to collect logos; the point is to explain work clearly enough that another engineer can inspect it.

Where this skill appears

AI product teams need engineers who understand the system around the model, not only the chat interface.

AI product engineeringBackend platformApplied ML tooling

Ecosystem references

Platform and company logos are ecosystem references only: no affiliation, endorsement, interview access, hiring preference, salary outcome, or placement guarantee.

README line

Name the artifact

Lab proof: Trace an AI request from interface to provider and back. Connect it to Model API calls so the result reads like work, not a passive note.

Review line

Explain the stack

Use OpenAI, Node.js, JSON to explain Secret boundaries and what changed between the first attempt and the inspected result.

Conversation line

Answer with evidence

If a team asks about Cost awareness, use this proof line: Show the model request path, the server boundary, and where secrets, cost, and response handling live.

Proof translation

Skill signal

Model API calls is the market word. The lesson makes it visible through a small working artifact.

Proof artifact

The inspectable artifact is: Trace an AI request from interface to provider and back.

Interview answer

Use Secret boundaries and Cost awareness to explain what changed, what failed, and how you verified it.

Paid guidance

Read publicly. Upgrade when guidance will help you finish.

This lesson remains part of the public written syllabus. Paid help is online-only and human-led: video walkthroughs as they roll out, live class context, WhatsApp Q&A, and project review around the same work.

No account wall, automated checkout, or placement promise is introduced here. Enrollment stays human-led by WhatsApp or call, and the useful proof remains the learner's own artifact.

Public

Written lesson stays open

Read the prepare and review material for lesson 11 on the public site before buying anything.

Recorded

Recorded and live guidance clarify the work

Paid guidance can add founder-led video walkthroughs as they roll out and live online class context; the teaching explains the work, but does not replace the written lesson.

Human

Questions use real context

When stuck, useful guidance starts from the route, error, screenshot, repo fragment, and the lab artifact: Trace an AI request from interface to provider and back.

Stay publicContinue the syllabusUse the next lesson when the written path is enough and no human guidance is needed.

RecordedJob-Ready TrackUse recorded walkthroughs, study pack, WhatsApp Q&A, and final review when self-reading needs a guided layer.

LiveLive CohortUse cohort only when scheduled online classes, peer pressure, and live Q&A would change consistency.

Private1:1 MentorshipUse mentorship only when a real project, career move, or technical decision needs private founder review.

ProfessionalArchitecture ReviewUse architecture review for a specific codebase, stack, vendor, or deployment decision; it is not a beginner class.

InstitutionWorkshopUse workshops when a college, school, bootcamp, or team needs a shared AI engineering class.

Phase 1 · Briefing

Lesson briefing

Before You Study (5 mins)

Lesson focus: ChatGPT, Claude, Gemini, Perplexity, Cursor, GitHub Copilot — every AI product you've used in 2026 looks magical from the outside and is, on the inside, the same kind of system you've already been building. A frontend (the website or app), a backend (a server you control), and a model API (someone else's GPU farm running an LLM). Today we open the box. By the end, you will have written your own ten-line client that talks directly to a real LLM API and gets back a response — proving that the "magic" is mostly architecture.

What you should have ready:

Lessons 06–10 complete — you have a containerized, tunneled, JSON-aware, API-fluent base
A free API key from one of: Google AI Studio (Gemini), Groq, OpenAI, or Anthropic
Node.js installed (node --version should print something)
About 60 minutes
An honest curiosity about how the apps you use every day actually work

The Concept

A modern AI product has four layers, and once you can see them, you can never un-see them:

text

   ┌──────────────────┐    ┌──────────────────┐    ┌──────────────────┐    ┌──────────────────┐
   │   1. Frontend    │ ◀▶ │   2. Backend     │ ◀▶ │   3. Model API   │ ◀▶ │   4. The Model   │
   │   (chat UI)      │    │ (your server)    │    │ (provider's API) │    │ (the AI itself)  │
   └──────────────────┘    └──────────────────┘    └──────────────────┘    └──────────────────┘
        Browser /              Where YOUR              The HTTP                  The actual
        mobile app            business logic           contract                  LLM running
        (HTML / JSX)          + auth + history         the provider              on someone's
                              + rate limits            exposes                   GPUs
                              + prompt building

Layer 1 (Frontend) is what the user sees — the input box, the streaming response, the chat history sidebar. Layer 4 (the Model) is the trillion-parameter neural network running on a cluster of GPUs in a datacenter. Everything magical about AI products lives in Layer 4. Everything controllable about AI products lives in Layers 2 and 3.

This separation is what makes the entire LLM-app industry possible. OpenAI, Anthropic, Google, and Meta train and operate the models (Layer 4). Every other company in the AI app space — Cursor, Perplexity, Notion AI, GitHub Copilot, the AI features in Photoshop — builds Layers 1, 2, and 3 and pays the model providers for inference at Layer 4. Mostly per-token: a few cents per million tokens for cheap models, a few dollars for the frontier ones.

The provider API contract is just an HTTP endpoint. Every major LLM provider exposes essentially the same shape:

text

POST /v1/chat/completions    (OpenAI / Groq)
POST /v1/messages            (Anthropic)
POST /v1beta/models/...      (Google Gemini)

Headers:
  Authorization: Bearer YOUR_API_KEY
  Content-Type: application/json

Body (JSON):
  {
    "model": "gemini-2.0-flash" or "gpt-4" or "claude-3-5-sonnet",
    "messages": [
      { "role": "system",    "content": "You are a helpful assistant." },
      { "role": "user",      "content": "Explain Docker in one sentence." }
    ],
    "temperature": 0.7,
    "max_tokens": 200
  }

Response (JSON):
  {
    "id": "...",
    "choices": [{ "message": { "content": "Docker is a tool for..." }}],
    "usage": { "input_tokens": 12, "output_tokens": 18 }
  }

That's the entire contract. Three fields you'll meet in every API:

model — which model you want (cheap+fast vs slow+smart)
messages — the conversation history; you send the whole history every call (LLM APIs are stateless)
temperature — how creative the model is (0 = deterministic, 1+ = wild)

The crucial implication: LLM APIs are stateless. The server forgets you between requests. To have a conversation, your backend (Layer 2) keeps the message history and sends the whole array every time. ChatGPT's "memory" of the last thing you said is your application's memory, not the model's.

Two more concepts you'll use forever:

Tokens — how the API charges you. A token is roughly 3-4 characters of English text. "Hello world" is 2 tokens. A 1000-word response is ~1300 tokens. Costs are quoted per million tokens (input and output priced separately).
Streaming vs. non-streaming — non-streaming returns the full response after the model finishes generating; streaming returns each token as soon as it's produced (the typewriter effect in ChatGPT). For latency, you want streaming.

Once you understand "frontend → my server → provider API → model," every AI product becomes legible. Cursor is VS Code (Layer 1) talking to Cursor's backend (Layer 2) talking to Anthropic's Claude API (Layer 3) talking to Claude (Layer 4). Perplexity is a chat UI (Layer 1) talking to their backend (Layer 2) which calls a search API and an LLM API (Layer 3) running models (Layer 4). The architecture is universal.

Quick Concepts

Term	Simple Meaning
LLM	Large Language Model — the actual AI (GPT-4, Gemini, Claude, etc.)
API key	The secret string that authenticates your server to the model provider
Token	The unit the API charges you in — ~3-4 chars of English text
Prompt	The text you send to the model; same as `content` in a `messages` array
System prompt	A special first message that sets the model's role and constraints
Temperature	How random the model's output is (0 = same answer every time)
Streaming	Receiving the response token-by-token as it generates

What We Will Build

By the end of this lesson, you will have done these specific things:

Picked a provider and got an API key. I recommend Google AI Studio — Gemini has a free tier, no credit card required for starter usage. Save the key to your .env file (NEVER commit it):
bash
```
echo "GEMINI_API_KEY=your_actual_key_here" >> .env
echo ".env" >> .gitignore
```

Wrote your first ten-line client in Node.js using only fetch (no SDK):

javascript

import 'dotenv/config'

const apiKey = process.env.GEMINI_API_KEY
const url = `https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash:generateContent?key=${apiKey}`

const response = await fetch(url, {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    contents: [{ parts: [{ text: 'Tell me one engineering joke about Docker.' }] }]
  })
})
const data = await response.json()
console.log(data.candidates[0].content.parts[0].text)

Ran it with node ai.js and watched a real LLM response print to your terminal. Noticed the latency (a real measurement, not a vibe).
Inspected the cost — look at usage in the response (or your provider's dashboard). The first call costs less than a fraction of a paisa; running 1,000 calls might cost ₹2-5.
Tweaked the prompt three times to feel how the same model produces different output for different prompts:
- Same prompt, ran 3 times — observed variation.
- Same prompt with temperature: 0 — observed near-identical output.
- Added a system prompt ("You are a senior Linux kernel developer who hates Docker. Answer in 2 sentences.") — observed the persona change.
Made it a 30-line CLI tool that takes a prompt as a command-line argument: node ai.js "explain idempotency in 3 sentences". This becomes a Swiss-army knife you'll use for the rest of your life.
Tried streaming (advanced — most providers offer a streaming endpoint). Watched the typewriter effect appear in your terminal.

Step 6 is the lasting takeaway. Once you have a 30-line CLI that talks to an LLM, you have what most AI products fundamentally are, minus the UI polish.

Think About

Before studying, consider:

ChatGPT.com is a website. The "intelligence" of ChatGPT lives nowhere on chat.openai.com — it lives on OpenAI's training datacenters. The website is a thin client over an API. What does this tell you about who the value goes to in the AI economy? (Hint: model providers, infra companies, and the apps that find unique workflows.)
Every prompt you send to ChatGPT travels: your browser → OpenAI's web server → their backend → their model API → their inference cluster → back to you. That's at minimum 5 hops. What does this say about latency, privacy, and reliability? Where would you cache, log, or rate-limit if you were running it?

By the End

After this lesson, you'll:

✅ Have an API key and know how to keep it out of source control
✅ Have written a 10-line LLM client in Node.js using just fetch
✅ Have made a real model API call and seen the response in your terminal
✅ Understand the four-layer architecture of every AI product
✅ Know what tokens, temperature, and system prompts are
✅ Have a 30-line CLI tool you'll use for the rest of your life
✅ Be ready for Lesson 12 — putting your own backend between the user and the model

The curtain is drawn. The wizard is just an HTTP endpoint. 🎭

Phase 2 · Debrief

Review and recall

What We Learned

Every AI product has four layers: Frontend (UI) → Backend (your server) → Model API (provider's HTTP contract) → Model (the LLM itself). The "magic" lives in Layer 4; the "control" lives in Layers 2 and 3.
The model provider API is just HTTP. A POST request with JSON, an API key in the header, and JSON response. You can call it with curl. You can call it with 10 lines of fetch.
LLM APIs are stateless. Every request is independent. To have a conversation, your application keeps the history and sends the whole messages array every time. The model has no memory of you between calls.
Tokens are the unit of cost. A token is ~3-4 characters of English. Pricing is quoted per million tokens; input and output are priced separately; cheap models (Gemini Flash, GPT-4o-mini) are 10-100× cheaper than frontier models.
System prompts shape behavior. A first message with role: "system" (or its provider equivalent) sets the persona, constraints, and tone for every following user message in the conversation.
API keys are secrets. They go in .env, .env goes in .gitignore, and the moment one ends up on GitHub by accident, you rotate it.

Commands We Used

bash

# Get an API key from one of the major providers (free tiers exist on most)
# - https://aistudio.google.com/apikey       (Gemini — recommended for this lesson)
# - https://console.groq.com/keys             (Groq — fast Llama models)
# - https://platform.openai.com/api-keys      (OpenAI — GPT models)
# - https://console.anthropic.com             (Anthropic — Claude models)

# Save the key to .env (NEVER commit this file)
echo "GEMINI_API_KEY=your_actual_key_here" >> .env
echo ".env" >> .gitignore

# Install the one dependency we need (loads .env into process.env)
npm install dotenv

# Run the client
node ai.js "Tell me what idempotency means in three sentences."

# Test the same call from the terminal with curl (no Node needed)
curl -s "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash:generateContent?key=$GEMINI_API_KEY" \
  -H 'Content-Type: application/json' \
  -d '{ "contents": [{ "parts": [{ "text": "Hello!" }]}] }' | jq .

The 30-line CLI tool — ai.js:

javascript

import 'dotenv/config'

const apiKey = process.env.GEMINI_API_KEY
if (!apiKey) {
  console.error('Set GEMINI_API_KEY in .env first.')
  process.exit(1)
}

const prompt = process.argv.slice(2).join(' ')
if (!prompt) {
  console.error('Usage: node ai.js "your question here"')
  process.exit(1)
}

const url = `https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash:generateContent?key=${apiKey}`

const response = await fetch(url, {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    contents: [{
      parts: [{
        text: `You are a senior software engineer. Answer concisely (3-4 sentences).\n\nQuestion: ${prompt}`
      }]
    }],
    generationConfig: { temperature: 0.4, maxOutputTokens: 300 }
  })
})

if (!response.ok) {
  console.error(`API error ${response.status}:`, await response.text())
  process.exit(1)
}

const data = await response.json()
const text = data.candidates?.[0]?.content?.parts?.[0]?.text || '(no response)'
console.log('\n' + text + '\n')

The fetch shape is identical for OpenAI / Anthropic / Groq — only the URL, header format, and body schema change slightly. Provider SDKs (openai, @anthropic-ai/sdk, @google/generative-ai) wrap this for ergonomics; under the hood, it's the same HTTP call.

Why It Matters

The single most valuable shift this lesson produces is demystification. Once you have made an actual LLM API call from your own laptop and watched the response come back, "AI" stops being a word you nod at and becomes an ingredient you compose with — like a database, a CDN, or a payment processor. That shift is what unlocks Module D, where you'll add real AI features to a real product.

The deeper professional insight: the model is the commodity; the system around the model is the product. Anyone with $10 can call GPT-5 or Gemini Pro from curl. The companies that win in the AI app market do so because of:

Workflow integration — Cursor isn't a chat UI; it's a code editor with the model deeply wired into the file system and the diff view.
Domain context — Perplexity beats raw GPT for search queries because it injects fresh search results into the context window.
Production-grade reliability — handling rate limits, fallbacks across providers, streaming, caching, observability, billing.
Trust and governance — enterprise customers care which model, which datacenter, what data retention, which compliance certifications.

Layer 4 is rented. Layers 1, 2, and 3 are where engineering judgment compounds.

What this lesson glossed over but you'll meet:

Token windows and context-window limits — every model has a max number of tokens it can process at once. Frontier models in 2026 reach 1M+ tokens; budget models are 8K-128K. Choosing the right window for the task is part of the craft.
Function calling / tool use — modern APIs let the model output a structured "I want to call this function with these args" response, which your backend executes and feeds back. This is how AI agents work.
Embeddings + RAG (Retrieval-Augmented Generation) — for chatting with your own documents, you embed text into vectors, store them in a vector DB, retrieve the relevant chunks at query time, and stuff them in the prompt. We'll touch this in Module D.
Vision and multi-modal — most flagship models in 2026 accept images, audio, and video alongside text. Same API, different parts types.
Cost optimization — caching deterministic prompts, choosing the smallest model that works, batching requests, streaming for latency rather than throughput.

Want the full architectural story including how AI products handle authentication, rate limiting, and the API economy? Read What is an API and How Do APIs Actually Work. Today's lesson made one specific kind of API call; the article zooms out to the protocol every modern API speaks.

Checklist

I got an API key from a real provider and saved it to .env
.env is in my .gitignore (this is non-negotiable)
I wrote ai.js and ran it with node ai.js
I got a real LLM response printed in my terminal
I tweaked the prompt and observed different output
I added a system prompt (persona) and observed behavior change
I saw the usage block in the response and understand tokens are the cost unit
My CLI accepts a prompt as a command-line argument

Quick Quiz

Q1: What lives in Layer 4 (the Model) of an AI product?

A) The HTML and CSS of the chat UI
B) The actual LLM running on the provider's GPUs ✓
C) Your business logic and authentication

Q2: Why do LLM APIs require you to send the entire conversation history every call?

A) To make sure you read the docs
B) Because LLM APIs are stateless — the server forgets you between requests ✓
C) Because the model can only think about the latest message anyway

Q3: What is a token, in LLM API pricing?

A) A login session credential
B) Roughly 3-4 characters of English text — the unit the API charges you in ✓
C) A piece of jewelry given on completion

Q4: Where does the "magic" of AI products actually live?

A) The frontend chat interface
B) Your backend's business logic
C) The trained model running on the provider's GPUs ✓

Q5: What's a "system prompt"?

A) An automatic prompt the OS injects
B) A first message that sets the model's persona, constraints, and tone for the conversation ✓
C) A backup prompt sent if the user's prompt fails

Bonus Challenge

Extend ai.js to maintain conversation history across calls. Save the full messages array to a JSON file after each response; load it at the start of every call so subsequent prompts have memory of previous ones. You will have built the absolute simplest version of ChatGPT in about 50 lines of Node.js — no UI, no auth, no streaming, but conceptually the same shape. From here, every AI feature you'll add in Module D is composition on top of this primitive.

Next Lesson

Lesson 12: Frontend + Backend — The Full Picture

You'll learn: Today you talked to an LLM API from your terminal. Tomorrow we put a backend of your own between the user and the model API — so the user never touches your API key, you can rate-limit them, and the system starts looking like a real product. The shape of every full-stack web application, made concrete.

The wizard is now visible. Tomorrow you start building your own. ⚡

Next lesson · 12

Frontend and Backend: The Full Picture

Map the browser, server, and database responsibilities before building a leaderboard system.