Building Your First AI Chatbot: From API Call to Conversational Memory

As a software engineer diving into AI, I had this assumption that chatbots were fundamentally different from traditional applications. They seemed almost magical in how they maintained context and responded intelligently. Then I built my first one, and I realized something crucial: a chatbot is just a conversation where you're manually managing state, the same way you'd manage a shopping cart or user session in a web app. The "AI" part is just one API call. Everything else is good old-fashioned software engineering.

This realization changed everything for me. Building intelligent applications isn't about learning an entirely new paradigm. It's about understanding how to orchestrate LLM APIs within the patterns you already know. The earlier you start, the more natural this becomes.

So let's build one together.

What You'll Actually Build

By the end of this tutorial, you'll have a working CLI chatbot that can:

Remember the entire conversation context
Handle follow-up questions intelligently
Manage its own memory budget to avoid crashes
Run indefinitely without hitting API limits

Here's what a conversation with your finished chatbot will look like:

You: What's the capital of France?
Bot: The capital of France is Paris.

You: What's the population of that city?
Bot: Paris has a population of approximately 2.2 million people within the city limits, 
     and about 12 million in the greater metropolitan area.

You: Tell me one famous landmark there.
Bot: The Eiffel Tower is perhaps the most iconic landmark in Paris, standing 330 meters 
     tall and attracting millions of visitors each year.

Notice how the bot doesn't need you to repeat "Paris" in your follow-up questions. It remembers. That's the magic we're building.

Prerequisites

You'll need:

Python 3.8+ installed
Basic Python comfort (functions, loops, lists)
An OpenAI API Key from platform.openai.com

If you can write a for loop and understand what a dictionary is, you're ready.

Phase 1: Setting Up Your Workspace

https://youtu.be/MJo3voV7dMQ

Good engineering habits start before you write a single line of code. We're setting up a clean environment so your project doesn't pollute your system Python or accidentally expose your API keys.

Create a Virtual Environment

Think of a virtual environment as a clean room for your project. Any libraries you install here stay here, isolated from other projects.

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

You'll know it worked when you see (venv) appear in your terminal prompt.

Install Your Dependencies

We need three libraries:

pip install openai python-dotenv tiktoken

What each does:

openai: The official client for making API calls
python-dotenv: Loads environment variables from a file (keeps secrets out of code)
tiktoken: Counts tokens so we don't exceed the model's memory limit

Secure Your API Key

Create a file named .env in your project folder:

OPENAI_API_KEY=your_actual_key_here

Why this matters: Hardcoding API keys in your code is like leaving your house key under the doormat. Anyone who sees your code (GitHub, colleagues, your future self) sees your key. Environment variables keep secrets separate from logic.

Phase 2: Your First Conversation (The Handshake)

Before we build a chatbot, let's prove the connection works. This is the "Hello World" of AI engineering.

Understanding Statelessness

Here's the thing about LLM APIs: they have no memory. Every API call is like meeting someone with amnesia. You say "Hi, I'm Alex," they respond, then immediately forget you exist. If you want them to remember, you have to remind them of the entire conversation every single time.

Think of it like texting someone who can only see one message at a time. To continue a conversation, you'd have to screenshot your entire text thread and send it with each new message. That's exactly what we'll be doing programmatically.

But first, let's make sure we can send a single message.

The Code

Create hello_ai.py:

import os
from dotenv import load_dotenv
from openai import OpenAI

# Load environment variables from .env file
load_dotenv()

# Initialize the OpenAI client with your API key
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

# Make a single call to the LLM
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "user", "content": "Explain the difference between an AI Engineer and a Software Engineer in one sentence."}
    ]
)

print(response.choices[0].message.content)

Run it:

python hello_ai.py

You should see a thoughtful one-sentence response appear in your terminal.

What Just Happened?

Let's break down that API call:

model="gpt-4o-mini": We're using GPT-4o Mini, which is fast and cost-effective for learning. Think of models like engine sizes. Bigger models (GPT-4) are more capable but slower and pricier. Smaller models (GPT-4o Mini) are perfect for most tasks.
messages=[...]: This list is your conversation history. Right now it only has one message with "role": "user". The role tells the model who's speaking. Later, we'll add "assistant" roles for the bot's responses.
response.choices[0].message.content: The API returns a bunch of metadata, but we only care about the actual text response, which lives here.

Common Error You Might Hit:

openai.AuthenticationError: Incorrect API key provided

This means your .env file isn't loaded or your key is wrong. Double-check the key and make sure .env is in the same folder as your script.

Phase 3: Adding Memory (Making It Stateful)

https://youtu.be/fn-SkNUxg3Y

Now comes the interesting part. A single API call is neat, but it's not a conversation. Let's build actual conversational memory.

The Mental Model

Imagine you're in a courtroom. Every time someone speaks, a court reporter writes it down. When it's your turn to respond, you can reference the entire transcript. That transcript is your messages list.

Here's the pattern:

User says something → add it to the transcript
Send the entire transcript to the model
Model responds → add its response to the transcript
Repeat

The model doesn't actually remember anything. We're just showing it the full conversation history every time so it can pretend it does.

The Code

Create chatbot.py:

import os
from dotenv import load_dotenv
from openai import OpenAI

load_dotenv()
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

# This list is our conversation memory
messages = []

print("Chatbot ready! Type 'quit' or 'exit' to end the conversation.\n")

while True:
    user_input = input("You: ")

    # Exit condition
    if user_input.lower() in ["quit", "exit"]:
        print("Goodbye!")
        break

    # 1. Add user message to our conversation history
    messages.append({"role": "user", "content": user_input})

    # 2. Send the ENTIRE history to the model
    completion = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=messages,
    )

    # 3. Extract the response
    bot_response = completion.choices[0].message.content
    print(f"Bot: {bot_response}\n")

    # 4. Add the bot's response to history so it remembers what it said
    messages.append({"role": "assistant", "content": bot_response})

Run it:

python chatbot.py

Try this conversation:

You: My name is Jordan
Bot: Nice to meet you, Jordan! How can I help you today?

You: What's my name?
Bot: Your name is Jordan.

It remembers! Not because the model is storing your name, but because we're sending the entire conversation back with each request.

Why This Works

Every time you send a message, the messages list grows:

# After first exchange:
messages = [
    {"role": "user", "content": "My name is Obed"},
    {"role": "assistant", "content": "Nice to meet you, Obed! How can I help you today?"}
]

# After second exchange:
messages = [
    {"role": "user", "content": "My name is Obed"},
    {"role": "assistant", "content": "Nice to meet you, Obed! How can I help you today?"},
    {"role": "user", "content": "What's my name?"},
    {"role": "assistant", "content": "Your name is Obed."}
]

The model sees the full context and can reference anything that came before.

The Problem We Just Created

Try having a really long conversation. Ask it to tell you a story, then ask follow-up questions, then ask for another story. Keep going.

Eventually, you might hit an error:

openai.BadRequestError: This model's maximum context length is 128K tokens...

We've run out of memory. Our conversation got too long, and the model can't process it anymore. This is where Phase 4 comes in.

Phase 4: Managing the Token Budget

Every LLM has a context window, a limit to how much text it can process at once. For GPT-4o Mini, that's around 128K tokens (roughly 96,000 words). Once your conversation history exceeds that, the API rejects your request.

What's a Token?

Think of tokens as syllables. Short words like "cat" are one token. Longer words like "artificial" might be two or three. The model doesn't count words or characters, it counts tokens.

Why? Because the model's internal architecture processes language in these chunks. Understanding this is crucial because you're billed by tokens and limited by tokens.

The Budget Analogy

Imagine your conversation history is a suitcase, and context windows are the weight limit. Every message you add makes it heavier. Eventually, the airline (API) says "this is too heavy, you can't board."

Your options:

Stop adding stuff (end the conversation)
Remove old stuff to make room (truncate history)

We're going with option 2. We'll implement a "sliding window" that automatically drops the oldest messages when we approach the limit.

Counting Tokens

First, we need a way to measure how many tokens are in our history:

import tiktoken

def count_tokens(messages, model="gpt-4o-mini"):
    """
    Count the number of tokens in a list of messages.

    This is an approximation based on OpenAI's token counting logic.
    Each message has some formatting overhead (about 4 tokens), plus
    the actual content tokens.
    """
    try:
        encoding = tiktoken.encoding_for_model(model)
    except KeyError:
        encoding = tiktoken.get_encoding("cl100k_base")

    num_tokens = 0
    for message in messages:
        num_tokens += 4  # Every message has formatting overhead
        for key, value in message.items():
            num_tokens += len(encoding.encode(value))

    return num_tokens

What's happening here:

tiktoken.encoding_for_model() loads the tokenizer for our specific model
We loop through each message and encode its content
We add 4 tokens per message for formatting (this is how OpenAI structures the conversation internally)

Implementing the Sliding Window

Now we modify our chatbot to check token count before each API call:

import os
from dotenv import load_dotenv
from openai import OpenAI
import tiktoken

load_dotenv()
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

# Token budget (leave some room for the response)
MAX_TOKENS = 4096

def count_tokens(messages, model="gpt-4o-mini"):
    try:
        encoding = tiktoken.encoding_for_model(model)
    except KeyError:
        encoding = tiktoken.get_encoding("cl100k_base")

    num_tokens = 0
    for message in messages:
        num_tokens += 4
        for key, value in message.items():
            num_tokens += len(encoding.encode(value))

    return num_tokens

messages = []

print("Chatbot ready! Type 'quit' or 'exit' to end the conversation.\n")

while True:
    user_input = input("You: ")

    if user_input.lower() in ["quit", "exit"]:
        print("Goodbye!")
        break

    # Add user message
    messages.append({"role": "user", "content": user_input})

    # Truncate oldest messages if we're over budget
    while count_tokens(messages) > MAX_TOKENS:
        if len(messages) > 1:
            removed = messages.pop(0)  # Remove the oldest message
            print("[System] Conversation history truncated to fit memory budget.")
        else:
            # If we only have one message and it's too long, we have a problem
            print("[System] Warning: Single message exceeds token limit.")
            break

    # Make the API call
    try:
        completion = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=messages,
        )

        bot_response = completion.choices[0].message.content
        print(f"Bot: {bot_response}\n")

        # Add bot response to history
        messages.append({"role": "assistant", "content": bot_response})

    except Exception as e:
        print(f"[Error] {e}")
        # Remove the last user message since we couldn't process it
        messages.pop()

What This Solves (and What It Doesn't)

What we fixed:

The chatbot can now run indefinitely
It won't crash when conversations get long
You have explicit control over memory management

The tradeoff:

The bot will eventually "forget" the beginning of very long conversations
When you see [System] Conversation history truncated..., the bot just lost its oldest memory

This is actually how most production chatbots work. They use strategies like:

Summarizing old conversation parts
Keeping only the most recent N messages
Storing important facts in a separate database

For now, our simple sliding window is sufficient and teaches the core concept: AI memory isn't magic, it's just data you choose to send.

What You've Actually Learned

Let's step back. You've just built a functional chatbot, but more importantly, you've learned the fundamental patterns of AI engineering:

1. State Management Is Your Responsibility

The model is stateless. You maintain state by passing conversation history. This is conceptually identical to managing user sessions in web apps or game state in game development.

2. Tokens Are Your Currency

Every API call costs tokens (literally, in dollars, but also in context limits). Learning to count, budget, and optimize token usage is core to AI engineering.

3. Constraints Drive Architecture

The context window limit forced us to implement truncation logic. In production, you'll face similar constraints (latency, cost, rate limits) that shape your technical decisions.

4. The AI Part Is Small

Notice how much of this code is traditional software engineering: loops, conditionals, list manipulation, error handling. The actual "AI" is one function call. This realization is liberating. You already have most of the skills you need.

Common Issues I Hit (And You Might Too)

"My bot gives different answers to the same question"
LLMs are non-deterministic by default. Add temperature=0 to your API call for consistent responses:

completion = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=messages,
    temperature=0  # Makes responses more deterministic
)

"I'm getting rate limit errors"
OpenAI has rate limits (requests per minute). For learning, you'll rarely hit them. If you do, add a small delay:

import time
time.sleep(1)  # Wait 1 second between requests

"The bot's responses are too long/short"
Control response length with max_tokens:

completion = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=messages,
    max_tokens=150  # Limit response length
)

Where to Go From Here

This chatbot is functional, but basic. Here are the natural next steps in your AI engineering journey:

Add a System Prompt: Give your bot a personality or specific instructions
Implement Streaming: Show responses word-by-word instead of waiting for completion
Add Memory Retrieval: Store facts in a vector database for long-term memory
Build a Web Interface: Move from CLI to a proper chat UI
Function Calling: Let your bot take actions (search the web, query databases)

Each of these builds on the foundation you just created. The pattern remains the same: manage state, respect constraints, orchestrate the API.

Final Thoughts

When I started learning AI engineering, I thought I needed to understand transformers, attention mechanisms, and backpropagation before I could build anything useful. That's like thinking you need to understand internal combustion engines before you can drive a car.

You can absolutely dive deep into the theory later (and I recommend it). But for building intelligent applications, you need to understand APIs, state management, and token budgets. Everything else is optimization.

The fact that you can build a working chatbot in under 100 lines of Python should tell you something: the barrier to entry for AI engineering is lower than you think. The hard part isn't the AI. It's understanding how to integrate it into reliable, scalable systems.

That's the skill I'm building, one project at a time. And now, so are you.

Keep building. Keep learning in public. The proof of work speaks for itself.

Questions or improvements? I'm learning in public, which means I'm wrong sometimes. If you spot an issue or have a better approach, let me know. That's how we all get better.

PS: If you are interested in going through the program that I am currently going through to become an AI engineer in 90 days, then join the waitlist here.

Building Your First AI Chatbot: From API Call to Conversational Memory

What You'll Actually Build

Prerequisites

Phase 1: Setting Up Your Workspace

Create a Virtual Environment

Install Your Dependencies

Secure Your API Key

Phase 2: Your First Conversation (The Handshake)

Understanding Statelessness

The Code

What Just Happened?

Phase 3: Adding Memory (Making It Stateful)

The Mental Model

The Code

Why This Works

The Problem We Just Created

Phase 4: Managing the Token Budget

What's a Token?

The Budget Analogy

Counting Tokens

Implementing the Sliding Window

What This Solves (and What It Doesn't)

What You've Actually Learned

1. State Management Is Your Responsibility

2. Tokens Are Your Currency

3. Constraints Drive Architecture

4. The AI Part Is Small

Common Issues I Hit (And You Might Too)

Where to Go From Here

Final Thoughts

Comments

AI Engineering

How AI Understands Words (Embeddings)

More from this blog

Choosing the Right AI Model for the Job

How Source Documents Make AI More Reliable

Why AI Sounds Smart Even When It Is Wrong

Prompt Engineering: Thinking Like a Professional AI User

How AI actually processes your prompt under the hood

Command Palette

What You'll Actually Build

Prerequisites

Phase 1: Setting Up Your Workspace

Create a Virtual Environment

Install Your Dependencies

Secure Your API Key

Phase 2: Your First Conversation (The Handshake)

Understanding Statelessness

The Code

What Just Happened?

Phase 3: Adding Memory (Making It Stateful)

The Mental Model

The Code

Why This Works

The Problem We Just Created

Phase 4: Managing the Token Budget

What's a Token?

The Budget Analogy

Counting Tokens

Implementing the Sliding Window

What This Solves (and What It Doesn't)

What You've Actually Learned

1. State Management Is Your Responsibility

2. Tokens Are Your Currency

3. Constraints Drive Architecture

4. The AI Part Is Small

Common Issues I Hit (And You Might Too)

Where to Go From Here

Final Thoughts

Comments

AI Engineering

How AI Understands Words (Embeddings)

More from this blog