Building Your First AI Chatbot: From API Call to Conversational Memory

Heya! 👋 I love helping people, and one of the best ways I do this is by sharing my knowledge and experiences. My journey reflects the power of growth and transformation, and I’m here to document and share it with you.
I started as a pharmacist, practicing at a tertiary hospital in the Northern Region of Ghana. There, I saw firsthand the challenges in healthcare delivery and became fascinated by how technology could offer solutions. This sparked my interest in digital health, a field I believe holds the key to revolutionizing healthcare.
Determined to contribute, I taught myself programming, mastering tools like HTML, CSS, JavaScript, React, PHP, and more. But I craved deeper knowledge and practical experience. That’s when I joined the ALX Software Engineering program, which became a turning point. Spending over 70 hours a week learning, coding, and collaborating, I transitioned fully into tech.
Today, I am a Software Engineer and Digital Health Solutions Architect, building and contributing to innovative digital health solutions. I combine my healthcare expertise with technical skills to create impactful tools that solve real-world problems in health delivery.
Imposter syndrome has been part of my journey, but I’ve learned to embrace it as a sign of growth. Livestreaming my learning process, receiving feedback, and building in public have been crucial in overcoming self-doubt. Each experience has strengthened my belief in showing up, staying consistent, and growing through challenges.
Through this platform, I document my lessons, challenges, and successes to inspire and guide others—whether you’re transitioning careers, exploring digital health, or diving into software development.
I believe in accountability and the value of shared growth. Your feedback keeps me grounded and motivated to continue this journey. Let’s connect, learn, and grow together! 🚀
As a software engineer diving into AI, I had this assumption that chatbots were fundamentally different from traditional applications. They seemed almost magical in how they maintained context and responded intelligently. Then I built my first one, and I realized something crucial: a chatbot is just a conversation where you're manually managing state, the same way you'd manage a shopping cart or user session in a web app. The "AI" part is just one API call. Everything else is good old-fashioned software engineering.
This realization changed everything for me. Building intelligent applications isn't about learning an entirely new paradigm. It's about understanding how to orchestrate LLM APIs within the patterns you already know. The earlier you start, the more natural this becomes.
So let's build one together.
What You'll Actually Build
By the end of this tutorial, you'll have a working CLI chatbot that can:
Remember the entire conversation context
Handle follow-up questions intelligently
Manage its own memory budget to avoid crashes
Run indefinitely without hitting API limits
Here's what a conversation with your finished chatbot will look like:
You: What's the capital of France?
Bot: The capital of France is Paris.
You: What's the population of that city?
Bot: Paris has a population of approximately 2.2 million people within the city limits,
and about 12 million in the greater metropolitan area.
You: Tell me one famous landmark there.
Bot: The Eiffel Tower is perhaps the most iconic landmark in Paris, standing 330 meters
tall and attracting millions of visitors each year.
Notice how the bot doesn't need you to repeat "Paris" in your follow-up questions. It remembers. That's the magic we're building.
Prerequisites
You'll need:
Python 3.8+ installed
Basic Python comfort (functions, loops, lists)
An OpenAI API Key from platform.openai.com
If you can write a for loop and understand what a dictionary is, you're ready.
Phase 1: Setting Up Your Workspace
Good engineering habits start before you write a single line of code. We're setting up a clean environment so your project doesn't pollute your system Python or accidentally expose your API keys.
Create a Virtual Environment
Think of a virtual environment as a clean room for your project. Any libraries you install here stay here, isolated from other projects.
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
You'll know it worked when you see (venv) appear in your terminal prompt.
Install Your Dependencies
We need three libraries:
pip install openai python-dotenv tiktoken
What each does:
openai: The official client for making API callspython-dotenv: Loads environment variables from a file (keeps secrets out of code)tiktoken: Counts tokens so we don't exceed the model's memory limit
Secure Your API Key
Create a file named .env in your project folder:
OPENAI_API_KEY=your_actual_key_here
Why this matters: Hardcoding API keys in your code is like leaving your house key under the doormat. Anyone who sees your code (GitHub, colleagues, your future self) sees your key. Environment variables keep secrets separate from logic.
Phase 2: Your First Conversation (The Handshake)
Before we build a chatbot, let's prove the connection works. This is the "Hello World" of AI engineering.
Understanding Statelessness
Here's the thing about LLM APIs: they have no memory. Every API call is like meeting someone with amnesia. You say "Hi, I'm Alex," they respond, then immediately forget you exist. If you want them to remember, you have to remind them of the entire conversation every single time.
Think of it like texting someone who can only see one message at a time. To continue a conversation, you'd have to screenshot your entire text thread and send it with each new message. That's exactly what we'll be doing programmatically.
But first, let's make sure we can send a single message.
The Code
Create hello_ai.py:
import os
from dotenv import load_dotenv
from openai import OpenAI
# Load environment variables from .env file
load_dotenv()
# Initialize the OpenAI client with your API key
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
# Make a single call to the LLM
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "user", "content": "Explain the difference between an AI Engineer and a Software Engineer in one sentence."}
]
)
print(response.choices[0].message.content)
Run it:
python hello_ai.py
You should see a thoughtful one-sentence response appear in your terminal.
What Just Happened?
Let's break down that API call:
model="gpt-4o-mini": We're using GPT-4o Mini, which is fast and cost-effective for learning. Think of models like engine sizes. Bigger models (GPT-4) are more capable but slower and pricier. Smaller models (GPT-4o Mini) are perfect for most tasks.messages=[...]: This list is your conversation history. Right now it only has one message with"role": "user". The role tells the model who's speaking. Later, we'll add"assistant"roles for the bot's responses.response.choices[0].message.content: The API returns a bunch of metadata, but we only care about the actual text response, which lives here.
Common Error You Might Hit:
openai.AuthenticationError: Incorrect API key provided
This means your .env file isn't loaded or your key is wrong. Double-check the key and make sure .env is in the same folder as your script.
Phase 3: Adding Memory (Making It Stateful)
Now comes the interesting part. A single API call is neat, but it's not a conversation. Let's build actual conversational memory.
The Mental Model
Imagine you're in a courtroom. Every time someone speaks, a court reporter writes it down. When it's your turn to respond, you can reference the entire transcript. That transcript is your messages list.
Here's the pattern:
User says something → add it to the transcript
Send the entire transcript to the model
Model responds → add its response to the transcript
Repeat
The model doesn't actually remember anything. We're just showing it the full conversation history every time so it can pretend it does.
The Code
Create chatbot.py:
import os
from dotenv import load_dotenv
from openai import OpenAI
load_dotenv()
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
# This list is our conversation memory
messages = []
print("Chatbot ready! Type 'quit' or 'exit' to end the conversation.\n")
while True:
user_input = input("You: ")
# Exit condition
if user_input.lower() in ["quit", "exit"]:
print("Goodbye!")
break
# 1. Add user message to our conversation history
messages.append({"role": "user", "content": user_input})
# 2. Send the ENTIRE history to the model
completion = client.chat.completions.create(
model="gpt-4o-mini",
messages=messages,
)
# 3. Extract the response
bot_response = completion.choices[0].message.content
print(f"Bot: {bot_response}\n")
# 4. Add the bot's response to history so it remembers what it said
messages.append({"role": "assistant", "content": bot_response})
Run it:
python chatbot.py
Try this conversation:
You: My name is Jordan
Bot: Nice to meet you, Jordan! How can I help you today?
You: What's my name?
Bot: Your name is Jordan.
It remembers! Not because the model is storing your name, but because we're sending the entire conversation back with each request.
Why This Works
Every time you send a message, the messages list grows:
# After first exchange:
messages = [
{"role": "user", "content": "My name is Obed"},
{"role": "assistant", "content": "Nice to meet you, Obed! How can I help you today?"}
]
# After second exchange:
messages = [
{"role": "user", "content": "My name is Obed"},
{"role": "assistant", "content": "Nice to meet you, Obed! How can I help you today?"},
{"role": "user", "content": "What's my name?"},
{"role": "assistant", "content": "Your name is Obed."}
]
The model sees the full context and can reference anything that came before.
The Problem We Just Created
Try having a really long conversation. Ask it to tell you a story, then ask follow-up questions, then ask for another story. Keep going.
Eventually, you might hit an error:
openai.BadRequestError: This model's maximum context length is 128K tokens...
We've run out of memory. Our conversation got too long, and the model can't process it anymore. This is where Phase 4 comes in.
Phase 4: Managing the Token Budget
Every LLM has a context window, a limit to how much text it can process at once. For GPT-4o Mini, that's around 128K tokens (roughly 96,000 words). Once your conversation history exceeds that, the API rejects your request.
What's a Token?
Think of tokens as syllables. Short words like "cat" are one token. Longer words like "artificial" might be two or three. The model doesn't count words or characters, it counts tokens.
Why? Because the model's internal architecture processes language in these chunks. Understanding this is crucial because you're billed by tokens and limited by tokens.
The Budget Analogy
Imagine your conversation history is a suitcase, and context windows are the weight limit. Every message you add makes it heavier. Eventually, the airline (API) says "this is too heavy, you can't board."
Your options:
Stop adding stuff (end the conversation)
Remove old stuff to make room (truncate history)
We're going with option 2. We'll implement a "sliding window" that automatically drops the oldest messages when we approach the limit.
Counting Tokens
First, we need a way to measure how many tokens are in our history:
import tiktoken
def count_tokens(messages, model="gpt-4o-mini"):
"""
Count the number of tokens in a list of messages.
This is an approximation based on OpenAI's token counting logic.
Each message has some formatting overhead (about 4 tokens), plus
the actual content tokens.
"""
try:
encoding = tiktoken.encoding_for_model(model)
except KeyError:
encoding = tiktoken.get_encoding("cl100k_base")
num_tokens = 0
for message in messages:
num_tokens += 4 # Every message has formatting overhead
for key, value in message.items():
num_tokens += len(encoding.encode(value))
return num_tokens
What's happening here:
tiktoken.encoding_for_model()loads the tokenizer for our specific modelWe loop through each message and encode its content
We add 4 tokens per message for formatting (this is how OpenAI structures the conversation internally)
Implementing the Sliding Window
Now we modify our chatbot to check token count before each API call:
import os
from dotenv import load_dotenv
from openai import OpenAI
import tiktoken
load_dotenv()
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
# Token budget (leave some room for the response)
MAX_TOKENS = 4096
def count_tokens(messages, model="gpt-4o-mini"):
try:
encoding = tiktoken.encoding_for_model(model)
except KeyError:
encoding = tiktoken.get_encoding("cl100k_base")
num_tokens = 0
for message in messages:
num_tokens += 4
for key, value in message.items():
num_tokens += len(encoding.encode(value))
return num_tokens
messages = []
print("Chatbot ready! Type 'quit' or 'exit' to end the conversation.\n")
while True:
user_input = input("You: ")
if user_input.lower() in ["quit", "exit"]:
print("Goodbye!")
break
# Add user message
messages.append({"role": "user", "content": user_input})
# Truncate oldest messages if we're over budget
while count_tokens(messages) > MAX_TOKENS:
if len(messages) > 1:
removed = messages.pop(0) # Remove the oldest message
print("[System] Conversation history truncated to fit memory budget.")
else:
# If we only have one message and it's too long, we have a problem
print("[System] Warning: Single message exceeds token limit.")
break
# Make the API call
try:
completion = client.chat.completions.create(
model="gpt-4o-mini",
messages=messages,
)
bot_response = completion.choices[0].message.content
print(f"Bot: {bot_response}\n")
# Add bot response to history
messages.append({"role": "assistant", "content": bot_response})
except Exception as e:
print(f"[Error] {e}")
# Remove the last user message since we couldn't process it
messages.pop()
What This Solves (and What It Doesn't)
What we fixed:
The chatbot can now run indefinitely
It won't crash when conversations get long
You have explicit control over memory management
The tradeoff:
The bot will eventually "forget" the beginning of very long conversations
When you see
[System] Conversation history truncated..., the bot just lost its oldest memory
This is actually how most production chatbots work. They use strategies like:
Summarizing old conversation parts
Keeping only the most recent N messages
Storing important facts in a separate database
For now, our simple sliding window is sufficient and teaches the core concept: AI memory isn't magic, it's just data you choose to send.
What You've Actually Learned
Let's step back. You've just built a functional chatbot, but more importantly, you've learned the fundamental patterns of AI engineering:
1. State Management Is Your Responsibility
The model is stateless. You maintain state by passing conversation history. This is conceptually identical to managing user sessions in web apps or game state in game development.
2. Tokens Are Your Currency
Every API call costs tokens (literally, in dollars, but also in context limits). Learning to count, budget, and optimize token usage is core to AI engineering.
3. Constraints Drive Architecture
The context window limit forced us to implement truncation logic. In production, you'll face similar constraints (latency, cost, rate limits) that shape your technical decisions.
4. The AI Part Is Small
Notice how much of this code is traditional software engineering: loops, conditionals, list manipulation, error handling. The actual "AI" is one function call. This realization is liberating. You already have most of the skills you need.
Common Issues I Hit (And You Might Too)
"My bot gives different answers to the same question"
LLMs are non-deterministic by default. Add temperature=0 to your API call for consistent responses:
completion = client.chat.completions.create(
model="gpt-4o-mini",
messages=messages,
temperature=0 # Makes responses more deterministic
)
"I'm getting rate limit errors"
OpenAI has rate limits (requests per minute). For learning, you'll rarely hit them. If you do, add a small delay:
import time
time.sleep(1) # Wait 1 second between requests
"The bot's responses are too long/short"
Control response length with max_tokens:
completion = client.chat.completions.create(
model="gpt-4o-mini",
messages=messages,
max_tokens=150 # Limit response length
)
Where to Go From Here
This chatbot is functional, but basic. Here are the natural next steps in your AI engineering journey:
Add a System Prompt: Give your bot a personality or specific instructions
Implement Streaming: Show responses word-by-word instead of waiting for completion
Add Memory Retrieval: Store facts in a vector database for long-term memory
Build a Web Interface: Move from CLI to a proper chat UI
Function Calling: Let your bot take actions (search the web, query databases)
Each of these builds on the foundation you just created. The pattern remains the same: manage state, respect constraints, orchestrate the API.
Final Thoughts
When I started learning AI engineering, I thought I needed to understand transformers, attention mechanisms, and backpropagation before I could build anything useful. That's like thinking you need to understand internal combustion engines before you can drive a car.
You can absolutely dive deep into the theory later (and I recommend it). But for building intelligent applications, you need to understand APIs, state management, and token budgets. Everything else is optimization.
The fact that you can build a working chatbot in under 100 lines of Python should tell you something: the barrier to entry for AI engineering is lower than you think. The hard part isn't the AI. It's understanding how to integrate it into reliable, scalable systems.
That's the skill I'm building, one project at a time. And now, so are you.
Keep building. Keep learning in public. The proof of work speaks for itself.
Questions or improvements? I'm learning in public, which means I'm wrong sometimes. If you spot an issue or have a better approach, let me know. That's how we all get better.
PS: If you are interested in going through the program that I am currently going through to become an AI engineer in 90 days, then join the waitlist here.