System Prompts and Guardrails in AI models
Heya! 👋 I love helping people, and one of the best ways I do this is by sharing my knowledge and experiences. My journey reflects the power of growth and transformation, and I’m here to document and share it with you.
I started as a pharmacist, practicing at a tertiary hospital in the Northern Region of Ghana. There, I saw firsthand the challenges in healthcare delivery and became fascinated by how technology could offer solutions. This sparked my interest in digital health, a field I believe holds the key to revolutionizing healthcare.
Determined to contribute, I taught myself programming, mastering tools like HTML, CSS, JavaScript, React, PHP, and more. But I craved deeper knowledge and practical experience. That’s when I joined the ALX Software Engineering program, which became a turning point. Spending over 70 hours a week learning, coding, and collaborating, I transitioned fully into tech.
Today, I am a Software Engineer and Digital Health Solutions Architect, building and contributing to innovative digital health solutions. I combine my healthcare expertise with technical skills to create impactful tools that solve real-world problems in health delivery.
Imposter syndrome has been part of my journey, but I’ve learned to embrace it as a sign of growth. Livestreaming my learning process, receiving feedback, and building in public have been crucial in overcoming self-doubt. Each experience has strengthened my belief in showing up, staying consistent, and growing through challenges.
Through this platform, I document my lessons, challenges, and successes to inspire and guide others—whether you’re transitioning careers, exploring digital health, or diving into software development.
I believe in accountability and the value of shared growth. Your feedback keeps me grounded and motivated to continue this journey. Let’s connect, learn, and grow together! 🚀
At this point, you already understand two major stages in the life of an AI system.
First:
- the model learns language patterns through pretraining
Then:
- the model is shaped into a helpful assistant through fine-tuning and human feedback
But now we arrive at another important question.
If AI systems are trained to be helpful, then:
Why do they sometimes refuse requests?
Why does an AI sometimes respond with:
“I can’t help with that”
“I’m unable to provide those instructions”
or “That request violates policy”?
And why do different AI assistants sometimes respond differently to the exact same prompt?
The answer is that modern AI systems operate inside invisible control systems.
These systems include:
system prompts
safety filters
moderation layers
behavioral rules
and guardrails
Most users never see them directly.
But they shape almost every interaction you have with an AI assistant.
In this lesson, we are going to unpack:
what system prompts are
what guardrails do
where moderation happens
why refusals occur
why safety systems are difficult to balance
and why AI behavior is more controlled than many users realize
By the end, you should stop thinking:
“The AI just decided not to answer.”
and start understanding:
“There are invisible instruction systems shaping every response.”
What Are Guardrails?
Guardrails are safety systems placed around AI behavior.
They are designed to:
reduce harmful outputs
prevent misuse
enforce policies
and limit dangerous behavior
Your source compares them to security systems inside a building.
That analogy works well because guardrails are not there to stop the AI from functioning.
They are there to make the system safer.
A Simple Analogy: Invisible Security Systems
Imagine entering a bank.
You may not notice all the security systems immediately.
But in the background, there are:
cameras
alarms
locked vaults
access controls
security staff
Most of the time, you do not think about them.
But they are always active.
Guardrails work similarly inside AI systems.
Most prompts pass through normally.
But when certain requests trigger safety systems, the AI may:
refuse
redirect
warn the user
or provide a safer alternative
Where Guardrails Operate
Guardrails usually operate at two major points.
Input Checking
The system examines the user’s prompt before the AI fully processes it.
It looks for:
harmful intent
dangerous requests
policy violations
suspicious wording
Output Checking
The system may also examine the AI’s response before showing it to the user.
This helps catch:
unsafe instructions
violent content
harmful advice
privacy violations
So moderation can happen:
before generation
after generation
or both
What Is a System Prompt?
Now we arrive at one of the most important hidden pieces of modern AI systems.
Every AI conversation usually begins with a hidden instruction set called a:
System prompt
The user usually does not see it.
But the AI does.
The system prompt defines:
who the AI is
how it should behave
what it should avoid
how responses should be formatted
what policies must be followed
In many ways, the system prompt acts like an invisible instruction manual.
A Simplified Example
A real system prompt can be extremely large and complex.
But a simplified version might look like this:
You are a helpful assistant.
Answer clearly and accurately.
Avoid harmful instructions.
Do not provide illegal guidance.
Use structured formatting when useful.
The AI reads instructions like these before interacting with the user.
That means every response is shaped by hidden rules from the very beginning.
Why Different AI Systems Feel Different
This helps explain something many users notice quickly.
ChatGPT, Claude, and Gemini often respond differently to the same request.
That difference does not happen randomly.
Different companies:
write different system prompts
apply different safety priorities
define different behavioral goals
Your source describes this clearly.
For example:
ChatGPT
Often:
structured
concise
policy-oriented
Claude
Often:
cautious
explanatory
reflective about ethical concerns
Gemini
Often:
conversational
exploratory
more flexible in tone
These differences are partly created through:
fine-tuning
and system-level behavioral instructions
What Triggers a Refusal?
AI refusals are usually connected to safety systems.
Common triggers include:
illegal activity
harmful instructions
hate speech
self-harm content
privacy violations
attempts to bypass safety rules
For example:
instructions for violence
hacking guidance
fraud assistance
dangerous chemical instructions
may trigger guardrails automatically.
Context Matters
This part is important.
Modern AI systems increasingly try to evaluate context, not just keywords.
For example:
How do bombs work?
could mean:
a history student studying warfare
a chemistry discussion
or harmful intent
The surrounding context changes how the system interprets the request.
This is one reason why wording matters when interacting with AI.
Why Safety Systems Are Difficult
At first, guardrails may sound simple.
But in practice, they are extremely difficult to design well.
Because AI companies are trying to balance two competing goals:
Goal 1: Be Useful
Users want helpful, flexible AI systems.
Goal 2: Be Safe
Companies want to reduce harmful outputs.
These goals sometimes conflict.
If guardrails are too strict:
- harmless requests may get blocked
This is called:
Over-refusal
If guardrails are too weak:
- dangerous outputs may slip through
This is often considered the larger risk.
So companies constantly adjust this balance.
Your source explains this as a real trade-off in AI safety systems.
Why AI Sometimes Refuses Reasonable Requests
This is something many users experience.
Sometimes an AI refuses a perfectly reasonable question.
Why?
Because safety systems are imperfect.
The model may:
misunderstand intent
misinterpret wording
detect risky patterns incorrectly
Remember:
guardrails are also AI systems
moderation systems also rely on prediction and classification
So they can make mistakes too.
Can Users Influence AI Behavior?
Yes, to some extent.
Many platforms allow:
custom instructions
project instructions
behavioral preferences
These can influence:
tone
formatting
communication style
But they usually do not override core safety rules.
The built-in system instructions still remain active underneath.
Prompt Injection and “Ignore Previous Instructions”
You may have seen prompts online like:
Ignore previous instructions...
These are attempts to override system instructions.
This is called:
Prompt injection
Modern AI systems are specifically trained to resist many of these attempts.
Why?
Because system prompts are considered higher-priority instructions.
Without protection, users could potentially bypass important safeguards.
Why Understanding This Matters
Understanding guardrails changes how you interact with AI.
You begin to realize:
refusals are not random
assistant behavior is engineered
AI responses are shaped by hidden instructions
safety systems influence what you see
This helps you:
write better prompts
provide clearer context
interpret refusals more intelligently
understand differences between AI products
Most importantly:
It helps you stop treating AI as a neutral source of truth.
AI systems are designed products shaped by:
training
policies
human choices
and corporate priorities
Common Beginner Mistakes
Mistake 1: Thinking the AI “decides” emotionally
The AI is not offended, angry, or morally shocked.
Safety systems triggered a refusal.
Mistake 2: Assuming all AI systems follow identical rules
Different companies use different policies and system prompts.
Mistake 3: Thinking guardrails are perfect
Safety systems can:
over-refuse
under-refuse
misunderstand context
Mistake 4: Believing hidden rules mean AI is “thinking”
System prompts are instructions, not consciousness.
The AI is still processing patterns and probabilities.
Mental Model
Here is the clearest way to think about this lesson:
Pretraining
teaches the AI:
language patterns
Fine-tuning
teaches the AI:
preferred behavior
Guardrails and system prompts
control:
what behavior is allowed
Together, these systems shape nearly every AI interaction.
Practice Thinking
Think carefully about these questions:
Why might one AI refuse a request that another AI accepts?
Why are safety systems difficult to balance perfectly?
Why can harmless prompts sometimes trigger refusals?
Why are system prompts hidden from most users?
How might company values influence AI behavior?
These questions matter because they move you from:
- using AI casually
to:
- understanding AI critically
Key Takeaways
Guardrails are safety systems controlling AI behavior
System prompts are hidden instruction sets shaping responses
Moderation can happen before or after response generation
Different AI systems behave differently because they use different training and policies
Safety systems involve trade-offs between usefulness and protection
AI refusals are usually triggered by moderation systems, not emotions
Understanding guardrails helps you become a more effective and informed AI user
What’s Next
At this point, you now understand a major part of the modern AI pipeline:
how AI learns language patterns
how AI becomes a conversational assistant
how AI behavior is controlled through hidden rules
Together, these stages explain much of what users experience when interacting with modern AI systems.
And perhaps most importantly:
You now understand that AI behavior is not magical.
It is engineered.