Post 1 was Python variables.
Post 100 is GPT-4.
You've come from writing your first for loop to understanding transformer architectures, building neural networks from scratch, fine-tuning LLMs with LoRA, building RAG pipelines, and creating chatbots with memory.
This final post puts it all together. The OpenAI API is how most people actually ship AI products. Chat completions, function calling, streaming, vision, embeddings. Everything you need to build something real.
Let's finish strong.
What You'll Learn Here
- API setup and authentication
- Chat completions: the core pattern
- System prompts: controlling model behavior
- Function calling: giving LLMs tools
- Streaming: responses that appear word by word
- Vision: analyzing images with GPT-4V
- Embeddings via API: fast, high quality
- Token counting and cost management
- Rate limits and error handling
- A complete project: an AI assistant with tools
Setup
pip install openai tiktoken
import openai
import os
# Set your API key
# Option 1: environment variable (recommended)
# export OPENAI_API_KEY='sk-...'
# Option 2: set directly (never commit this to git)
# openai.api_key = 'sk-...'
client = openai.OpenAI() # reads OPENAI_API_KEY from environment
# Test the connection
models = client.models.list()
print("Connected to OpenAI API")
print(f"Available models (sample): {[m.id for m in list(models)[:5]]}")
Chat Completions: The Core Pattern
Every interaction with GPT-4 goes through the same API call. Messages are a list of role-content pairs: system, user, and assistant.
# Simplest possible call
response = client.chat.completions.create(
model="gpt-4o-mini", # fast and cheap for most tasks
messages=[
{"role": "user", "content": "What is machine learning in one sentence?"}
]
)
print(response.choices[0].message.content)
print(f"\nTokens used: {response.usage.total_tokens}")
print(f"Model: {response.model}")
Output:
Machine learning is a field of AI where computers learn patterns from data to make predictions or decisions without being explicitly programmed for each task.
Tokens used: 52
Model: gpt-4o-mini
# With system prompt and multiple turns
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{
"role": "system",
"content": "You are a senior ML engineer who explains concepts clearly and concisely. Use analogies when helpful. Never use jargon without explaining it."
},
{
"role": "user",
"content": "Explain overfitting."
}
],
temperature=0.7, # creativity (0=deterministic, 2=very random)
max_tokens=300, # cap the response length
top_p=0.95, # nucleus sampling
frequency_penalty=0.1, # penalize repeated tokens
presence_penalty=0.1, # penalize already-mentioned topics
)
print(response.choices[0].message.content)
System Prompts: Controlling Model Behavior
The system prompt is the single most powerful tool for shaping GPT-4's behavior.
system_prompts = {
"concise_explainer": """
You explain technical concepts in 3 sentences or less.
Always use a concrete real-world example.
Never use bullet points.
""",
"code_reviewer": """
You are a senior Python engineer doing code review.
Point out bugs, style issues, and performance problems.
Format your response as:
BUGS: (list any bugs)
STYLE: (list style issues)
PERFORMANCE: (list performance concerns)
SUGGESTIONS: (overall recommendation)
""",
"socratic_tutor": """
You are a Socratic tutor. Never give direct answers.
Instead, guide the student with questions that help them discover the answer themselves.
When they get something right, affirm it and ask a deeper follow-up question.
""",
"strict_json": """
You always respond in valid JSON format only.
No markdown. No explanation. Just raw JSON.
Never include anything outside the JSON structure.
"""
}
# Test the code reviewer persona
code_to_review = """
def get_user_data(user_ids):
results = []
for id in user_ids:
data = database.query(f"SELECT * FROM users WHERE id = {id}")
results.append(data)
return results
"""
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": system_prompts["code_reviewer"]},
{"role": "user", "content": f"Review this code:\n```
{% endraw %}
python{code_to_review}
{% raw %}
```"}
]
)
print(response.choices[0].message.content)
Structured Output: JSON Mode
When you need JSON responses you can parse reliably.
import json
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{
"role": "system",
"content": "Extract structured information from the text. Return JSON only."
},
{
"role": "user",
"content": """
Extract the following from this text:
- person_name
- job_title
- company
- key_skills (list)
Text: "Sarah Chen is a Senior ML Engineer at Anthropic. She specializes in
transformer architectures, reinforcement learning from human feedback, and
large-scale distributed training."
"""
}
],
response_format={"type": "json_object"} # forces JSON output
)
data = json.loads(response.choices[0].message.content)
print(json.dumps(data, indent=2))
Output:
{
"person_name": "Sarah Chen",
"job_title": "Senior ML Engineer",
"company": "Anthropic",
"key_skills": [
"transformer architectures",
"reinforcement learning from human feedback",
"large-scale distributed training"
]
}
Function Calling: Giving GPT-4 Tools
Function calling lets the model request tools (functions) that you define. The model decides when to call a function and with what arguments. You execute it and send results back.
This is how AI agents work.
import json
# Define tools the model can use
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a city",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "City name, e.g. 'London' or 'Tokyo'"
},
"units": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature units"
}
},
"required": ["city"]
}
}
},
{
"type": "function",
"function": {
"name": "search_documents",
"description": "Search the company knowledge base for relevant documents",
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "Search query"
},
"max_results": {
"type": "integer",
"description": "Maximum number of results to return",
"default": 3
}
},
"required": ["query"]
}
}
}
]
# Mock function implementations
def get_weather(city: str, units: str = "celsius") -> dict:
# In production: call a real weather API
return {
"city": city,
"temperature": 18,
"units": units,
"condition": "partly cloudy",
"humidity": "65%"
}
def search_documents(query: str, max_results: int = 3) -> list:
# In production: call your vector database
return [
{"title": f"Document about {query}", "snippet": f"Relevant content for: {query}", "score": 0.92},
{"title": f"Guide to {query}", "snippet": f"Comprehensive overview of {query}", "score": 0.87},
][:max_results]
# Dispatch function calls
def execute_function(name: str, arguments: dict):
if name == "get_weather":
return get_weather(**arguments)
elif name == "search_documents":
return search_documents(**arguments)
else:
return {"error": f"Unknown function: {name}"}
# Full function calling loop
def agent_chat(user_message: str) -> str:
messages = [
{"role": "system", "content": "You are a helpful assistant with access to weather data and a document search tool. Use tools when needed."},
{"role": "user", "content": user_message}
]
while True:
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=messages,
tools=tools,
tool_choice="auto" # model decides when to use tools
)
message = response.choices[0].message
# If no tool calls: we have the final answer
if not message.tool_calls:
return message.content
# Process tool calls
messages.append(message) # add assistant message with tool_calls
for tool_call in message.tool_calls:
function_name = tool_call.function.name
arguments = json.loads(tool_call.function.arguments)
print(f" [Tool call] {function_name}({arguments})")
result = execute_function(function_name, arguments)
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": json.dumps(result)
})
# Test it
queries = [
"What's the weather like in Tokyo right now?",
"Search for documents about machine learning best practices.",
"What's the weather in Paris and London? Compare them.",
]
for query in queries:
print(f"\nUser: {query}")
answer = agent_chat(query)
print(f"Bot: {answer}")
Output:
User: What's the weather like in Tokyo right now?
[Tool call] get_weather({'city': 'Tokyo', 'units': 'celsius'})
Bot: The current weather in Tokyo is 18°C and partly cloudy, with humidity at 65%.
User: Search for documents about machine learning best practices.
[Tool call] search_documents({'query': 'machine learning best practices', 'max_results': 3})
Bot: I found 2 relevant documents about machine learning best practices...
User: What's the weather in Paris and London? Compare them.
[Tool call] get_weather({'city': 'Paris', 'units': 'celsius'})
[Tool call] get_weather({'city': 'London', 'units': 'celsius'})
Bot: Both Paris and London currently show 18°C with partly cloudy conditions...
Streaming: Word-by-Word Responses
Instead of waiting for the full response, stream it token by token. Makes the UI feel much faster.
import sys
def stream_response(user_message: str, system: str = "You are a helpful assistant."):
stream = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": system},
{"role": "user", "content": user_message}
],
stream=True # enable streaming
)
full_response = ""
print("Bot: ", end="", flush=True)
for chunk in stream:
if chunk.choices[0].delta.content is not None:
token = chunk.choices[0].delta.content
print(token, end="", flush=True)
full_response += token
print() # newline at end
return full_response
response = stream_response("Explain gradient descent in 3 sentences.")
Vision: Analyze Images With GPT-4V
import base64
from pathlib import Path
def encode_image(image_path: str) -> str:
with open(image_path, "rb") as f:
return base64.b64encode(f.read()).decode("utf-8")
def analyze_image(image_path: str, question: str) -> str:
base64_image = encode_image(image_path)
response = client.chat.completions.create(
model="gpt-4o", # vision requires gpt-4o or gpt-4-vision-preview
messages=[
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{base64_image}",
"detail": "high" # or "low" for faster/cheaper
}
},
{
"type": "text",
"text": question
}
]
}
],
max_tokens=300
)
return response.choices[0].message.content
# Also works with image URLs directly
def analyze_image_url(url: str, question: str) -> str:
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "user",
"content": [
{"type": "image_url", "image_url": {"url": url}},
{"type": "text", "text": question}
]
}
]
)
return response.choices[0].message.content
# Example
# result = analyze_image_url(
# "https://upload.wikimedia.org/wikipedia/commons/thumb/1/1e/Eiffel_Tower.jpg/640px-Eiffel_Tower.jpg",
# "What is in this image? Describe it in detail."
# )
print("Vision API ready - pass an image path or URL with your question")
Embeddings via API
# High-quality embeddings from OpenAI
def get_embedding(text: str, model: str = "text-embedding-3-small") -> list:
response = client.embeddings.create(
input=text,
model=model
)
return response.data[0].embedding
def get_embeddings_batch(texts: list, model: str = "text-embedding-3-small") -> list:
response = client.embeddings.create(
input=texts,
model=model
)
return [item.embedding for item in response.data]
# Available embedding models
embedding_models = {
"text-embedding-3-small": {"dim": 1536, "cost": "$0.02/1M tokens", "quality": "good"},
"text-embedding-3-large": {"dim": 3072, "cost": "$0.13/1M tokens", "quality": "best"},
"text-embedding-ada-002": {"dim": 1536, "cost": "$0.10/1M tokens", "quality": "older"},
}
print("OpenAI Embedding Models:")
for model, info in embedding_models.items():
print(f" {model}: dim={info['dim']}, cost={info['cost']}, quality={info['quality']}")
# Example usage
# embedding = get_embedding("Machine learning is fascinating.")
# print(f"Embedding dimension: {len(embedding)}")
Token Counting and Cost Management
import tiktoken
def count_tokens(text: str, model: str = "gpt-4o-mini") -> int:
try:
encoding = tiktoken.encoding_for_model(model)
except KeyError:
encoding = tiktoken.get_encoding("cl100k_base")
return len(encoding.encode(text))
def count_message_tokens(messages: list, model: str = "gpt-4o-mini") -> int:
try:
encoding = tiktoken.encoding_for_model(model)
except KeyError:
encoding = tiktoken.get_encoding("cl100k_base")
tokens_per_message = 3 # every message has role + content overhead
tokens_per_name = 1
total = 0
for message in messages:
total += tokens_per_message
for key, value in message.items():
total += len(encoding.encode(str(value)))
if key == "name":
total += tokens_per_name
total += 3 # every reply primed with <|start|>assistant<|message|>
return total
# Pricing (as of 2024, check openai.com/pricing for current rates)
pricing = {
"gpt-4o": {"input": 5.00, "output": 15.00}, # per 1M tokens
"gpt-4o-mini": {"input": 0.15, "output": 0.60},
"gpt-3.5-turbo": {"input": 0.50, "output": 1.50},
"text-embedding-3-small": {"input": 0.02, "output": 0},
}
def estimate_cost(n_input_tokens: int, n_output_tokens: int, model: str) -> float:
if model not in pricing:
return 0
p = pricing[model]
cost_in = (n_input_tokens / 1_000_000) * p["input"]
cost_out = (n_output_tokens / 1_000_000) * p["output"]
return cost_in + cost_out
# Example: estimate cost before making a call
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain transformer architecture in detail."}
]
model = "gpt-4o-mini"
input_tokens = count_message_tokens(messages, model)
estimated_output = 300 # estimate based on max_tokens
cost = estimate_cost(input_tokens, estimated_output, model)
print(f"Input tokens: {input_tokens}")
print(f"Estimated output: {estimated_output}")
print(f"Estimated cost: ${cost:.6f}")
# Track actual usage across calls
class UsageTracker:
def __init__(self):
self.total_input_tokens = 0
self.total_output_tokens = 0
self.total_calls = 0
self.model_usage = {}
def track(self, response, model: str):
usage = response.usage
self.total_input_tokens += usage.prompt_tokens
self.total_output_tokens += usage.completion_tokens
self.total_calls += 1
if model not in self.model_usage:
self.model_usage[model] = {'input': 0, 'output': 0, 'calls': 0}
self.model_usage[model]['input'] += usage.prompt_tokens
self.model_usage[model]['output'] += usage.completion_tokens
self.model_usage[model]['calls'] += 1
def report(self):
print(f"Total API calls: {self.total_calls}")
print(f"Total input tokens: {self.total_input_tokens:,}")
print(f"Total output tokens:{self.total_output_tokens:,}")
total_cost = sum(
estimate_cost(info['input'], info['output'], model)
for model, info in self.model_usage.items()
)
print(f"Estimated cost: ${total_cost:.4f}")
tracker = UsageTracker()
Error Handling and Rate Limits
import time
import random
from openai import RateLimitError, APIError, APIConnectionError
def robust_api_call(messages: list, model: str = "gpt-4o-mini",
max_retries: int = 3, **kwargs) -> str:
for attempt in range(max_retries):
try:
response = client.chat.completions.create(
model=model,
messages=messages,
**kwargs
)
return response.choices[0].message.content
except RateLimitError as e:
if attempt < max_retries - 1:
wait_time = (2 ** attempt) + random.uniform(0, 1)
print(f"Rate limited. Retrying in {wait_time:.1f}s...")
time.sleep(wait_time)
else:
raise
except APIConnectionError as e:
if attempt < max_retries - 1:
print(f"Connection error. Retrying... ({attempt+1}/{max_retries})")
time.sleep(2)
else:
raise
except APIError as e:
if e.status_code == 500 and attempt < max_retries - 1:
print(f"Server error. Retrying...")
time.sleep(1)
else:
raise
raise Exception("Max retries exceeded")
# Usage
# result = robust_api_call(
# messages=[{"role": "user", "content": "Hello"}],
# model="gpt-4o-mini",
# temperature=0.7
# )
print("Robust API call function ready with exponential backoff")
Complete Project: An AI Assistant With Tools and Memory
Bringing it all together. A production-ready assistant that remembers, uses tools, and handles errors.
import json
import time
from typing import List, Optional
from collections import deque
class AIAssistant:
def __init__(
self,
name: str = "Assistant",
system_prompt: str = "You are a helpful AI assistant.",
model: str = "gpt-4o-mini",
max_history: int = 20,
tools: Optional[list] = None,
temperature: float = 0.7
):
self.client = openai.OpenAI()
self.name = name
self.model = model
self.temperature = temperature
self.tools = tools or []
self.history = deque(maxlen=max_history)
self.system = system_prompt
self.usage = {"calls": 0, "tokens": 0}
def _get_messages(self) -> list:
return [{"role": "system", "content": self.system}] + list(self.history)
def _execute_tool(self, name: str, args: dict) -> str:
# Override this method to add your own tools
return json.dumps({"error": f"Tool '{name}' not implemented"})
def chat(self, user_message: str, stream: bool = False) -> str:
self.history.append({"role": "user", "content": user_message})
kwargs = {
"model": self.model,
"messages": self._get_messages(),
"temperature": self.temperature,
}
if self.tools:
kwargs["tools"] = self.tools
kwargs["tool_choice"] = "auto"
if stream:
kwargs["stream"] = True
# Handle function calling loop
while True:
if stream and not self.tools:
# Stream without tools
response_text = ""
stream_resp = self.client.chat.completions.create(**kwargs)
print(f"{self.name}: ", end="", flush=True)
for chunk in stream_resp:
if chunk.choices[0].delta.content:
token = chunk.choices[0].delta.content
print(token, end="", flush=True)
response_text += token
print()
self.history.append({"role": "assistant", "content": response_text})
return response_text
# Non-streaming or tools
response = self.client.chat.completions.create(**kwargs)
message = response.choices[0].message
self.usage["calls"] += 1
self.usage["tokens"] += response.usage.total_tokens
if not message.tool_calls:
self.history.append({"role": "assistant", "content": message.content})
return message.content
# Process tool calls
self.history.append(message)
kwargs["messages"] = self._get_messages()
for tool_call in message.tool_calls:
fn_name = tool_call.function.name
fn_args = json.loads(tool_call.function.arguments)
result = self._execute_tool(fn_name, fn_args)
self.history.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": result
})
kwargs["messages"] = self._get_messages()
def summarize_history(self) -> str:
if not self.history:
return "No conversation history."
summary_prompt = f"Summarize this conversation in 2-3 sentences:\n\n"
for msg in self.history:
if isinstance(msg, dict) and msg.get('role') in ['user', 'assistant']:
role = msg['role'].title()
content = msg.get('content', '')
if content:
summary_prompt += f"{role}: {content[:100]}...\n"
response = self.client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": summary_prompt}],
max_tokens=150
)
return response.choices[0].message.content
def clear(self):
self.history.clear()
def stats(self) -> dict:
return {
"name": self.name,
"model": self.model,
"calls": self.usage["calls"],
"tokens": self.usage["tokens"],
"history": len(self.history),
}
# Concrete assistant with tools
class MLTutorAssistant(AIAssistant):
def __init__(self):
super().__init__(
name="ML Tutor",
system_prompt="""You are an expert ML tutor who has read 'How Machines Learn: A Complete Guide from Zero to AI Engineer'.
You teach clearly with concrete examples and code snippets.
You remember what the student has learned so far in this session.
When relevant, reference concepts from earlier in the conversation.""",
model="gpt-4o-mini",
max_history=20,
tools=[
{
"type": "function",
"function": {
"name": "get_code_example",
"description": "Get a working Python code example for an ML concept",
"parameters": {
"type": "object",
"properties": {
"concept": {"type": "string", "description": "The ML concept to get code for"},
"difficulty": {"type": "string", "enum": ["beginner", "intermediate", "advanced"]}
},
"required": ["concept"]
}
}
}
]
)
def _execute_tool(self, name: str, args: dict) -> str:
if name == "get_code_example":
concept = args.get("concept", "")
difficulty = args.get("difficulty", "beginner")
# In production: pull from a real code database
example = {
"concept": concept,
"difficulty": difficulty,
"code": f"# Example: {concept}\nimport sklearn\n# ... working code here",
"explanation": f"This code demonstrates {concept} at {difficulty} level."
}
return json.dumps(example)
return json.dumps({"error": f"Unknown tool: {name}"})
# Demo the complete assistant
print("=" * 60)
print("ML Tutor Assistant Demo")
print("=" * 60)
tutor = MLTutorAssistant()
demo_questions = [
"What is overfitting and how do I detect it?",
"Can you give me a code example for that?",
"How is this related to the bias-variance tradeoff?",
]
for question in demo_questions:
print(f"\nStudent: {question}")
answer = tutor.chat(question)
print(f"Tutor: {answer[:200]}...")
print(f"\n{tutor.stats()}")
Cost Optimization Tips
cost_tips = {
"1. Model selection": {
"tip": "Use gpt-4o-mini for most tasks. Only upgrade to gpt-4o when quality is genuinely insufficient.",
"saving": "95% cost reduction vs gpt-4o"
},
"2. Prompt caching": {
"tip": "OpenAI automatically caches prompts > 1024 tokens. Long system prompts get cached.",
"saving": "50% discount on cached tokens"
},
"3. Batch API": {
"tip": "Use the Batch API for tasks that don't need real-time responses.",
"saving": "50% discount vs real-time"
},
"4. Token counting": {
"tip": "Count tokens before sending. Trim unnecessary context. Remove long system prompts when not needed.",
"saving": "10-40% depending on your prompts"
},
"5. Max tokens": {
"tip": "Always set max_tokens. Without it, the model can generate very long responses.",
"saving": "Prevents runaway costs"
},
"6. Temperature for deterministic tasks": {
"tip": "Use temperature=0 for classification, extraction, formatting. Deterministic = consistent = cacheable.",
"saving": "Better cache hit rates"
},
"7. Local models for testing": {
"tip": "Use Ollama or llama.cpp during development. Only hit the API when testing production behavior.",
"saving": "90%+ during development"
},
}
print("Cost Optimization Tips:")
for tip_name, info in cost_tips.items():
print(f"\n{tip_name}")
print(f" Tip: {info['tip']}")
print(f" Saving: {info['saving']}")
The Complete Journey: 100 Posts in One View
PHASE 1: Python That Actually Works (Posts 1-10)
Variables, functions, OOP, error handling, file I/O
PHASE 2: Math for ML (Posts 11-20)
Linear algebra, calculus, probability, statistics
PHASE 3: Data Wrangling Tools (Posts 27-39)
NumPy, Pandas, Matplotlib, Seaborn, EDA
PHASE 4: SQL for Data (Posts 40-45)
SELECT, JOINs, window functions, Python + SQL
PHASE 5: Dev Tools (Posts 46-50)
Git, GitHub, Jupyter, Colab, virtual environments
PHASE 6: Machine Learning Core (Posts 51-71)
Linear/logistic regression, trees, XGBoost, SVM,
KNN, Naive Bayes, evaluation metrics, clustering,
PCA, feature engineering, hyperparameter tuning
PHASE 7: Deep Learning (Posts 72-86)
Neural networks, backprop, PyTorch, training loops,
GPUs, CNNs, transfer learning, RNNs, autoencoders, GANs
PHASE 8: NLP and LLMs (Posts 87-100)
Text preprocessing, tokenization, embeddings, attention,
transformers, BERT, GPT, HuggingFace, fine-tuning,
LoRA, vector search, RAG, chatbots, OpenAI API
What You Can Build Now
After 100 posts, you can build:
- Classification systems for any domain
- Regression models for prediction problems
- Document intelligence pipelines with RAG
- Custom chatbots with memory and tools
- Image classification with CNNs
- Fine-tuned domain models with LoRA
- Semantic search engines
- End-to-end ML pipelines with proper evaluation
The fundamentals don't change. Models come and go. APIs change. Architectures evolve. But gradient descent, overfitting, precision vs recall, the training loop, attention mechanisms: these ideas will still matter in 10 years.
You now understand them. Not just how to use them. Why they work.
Quick Cheat Sheet: OpenAI API
| Task | Code |
|---|---|
| Basic chat | client.chat.completions.create(model=..., messages=[...]) |
| System prompt |
{"role": "system", "content": "..."} in messages |
| JSON output | response_format={"type": "json_object"} |
| Function calling |
tools=[...], tool_choice="auto"
|
| Streaming |
stream=True, iterate over chunks |
| Vision | Add {"type": "image_url", "image_url": {"url": "..."}} to content |
| Embeddings | client.embeddings.create(input=text, model="text-embedding-3-small") |
| Count tokens | tiktoken.encoding_for_model(model).encode(text) |
| Cost estimate | tokens / 1M * price_per_million |
| Retry on error | Catch RateLimitError, exponential backoff |
Practice Challenges
Level 1:
Build a simple Q&A bot using the OpenAI API. Give it a custom system prompt that defines a persona. Test it with 10 questions and evaluate response quality.
Level 2:
Add function calling to the assistant. Define at least two tools: one that retrieves weather and one that searches Wikipedia. Verify the model correctly decides when to call each tool.
Level 3:
Build a complete AI assistant that combines: a custom system prompt, conversation memory (sliding window), RAG (ChromaDB with 20+ documents), at least 2 function tools, streaming output, and usage/cost tracking. Deploy it as a simple command-line chatbot.
References
- OpenAI API reference
- OpenAI Cookbook (examples)
- tiktoken: token counting
- OpenAI pricing
- OpenAI function calling guide
This Is Post 100.
You started from zero. You learned Python, math, data wrangling, machine learning, deep learning, and large language models. You built classifiers, regressors, neural networks, transformers, RAG pipelines, and chatbots.
One hundred posts. One complete journey.
The field will keep moving. New architectures will appear. Better models will ship. Benchmarks will fall. But the person who understands why things work is never left behind by what comes next.
That's you now.
Go build something.













