System Prompts and Guardrails in AI models

UpdatedMay 26, 2026

Heya! 👋 I love helping people, and one of the best ways I do this is by sharing my knowledge and experiences. My journey reflects the power of growth and transformation, and I’m here to document and share it with you.

I started as a pharmacist, practicing at a tertiary hospital in the Northern Region of Ghana. There, I saw firsthand the challenges in healthcare delivery and became fascinated by how technology could offer solutions. This sparked my interest in digital health, a field I believe holds the key to revolutionizing healthcare.

Determined to contribute, I taught myself programming, mastering tools like HTML, CSS, JavaScript, React, PHP, and more. But I craved deeper knowledge and practical experience. That’s when I joined the ALX Software Engineering program, which became a turning point. Spending over 70 hours a week learning, coding, and collaborating, I transitioned fully into tech.

Today, I am a Software Engineer and Digital Health Solutions Architect, building and contributing to innovative digital health solutions. I combine my healthcare expertise with technical skills to create impactful tools that solve real-world problems in health delivery.

Imposter syndrome has been part of my journey, but I’ve learned to embrace it as a sign of growth. Livestreaming my learning process, receiving feedback, and building in public have been crucial in overcoming self-doubt. Each experience has strengthened my belief in showing up, staying consistent, and growing through challenges.

Through this platform, I document my lessons, challenges, and successes to inspire and guide others—whether you’re transitioning careers, exploring digital health, or diving into software development.

I believe in accountability and the value of shared growth. Your feedback keeps me grounded and motivated to continue this journey. Let’s connect, learn, and grow together! 🚀

Comments

Join the discussion

No comments yet. Be the first to comment.

AI Engineering

Part 8 of 12

Up next

How AI actually processes your prompt under the hood

One of the biggest mistakes people make when using AI is assuming the AI reads like a human. It does not. Humans: infer meaning read between the lines fill in missing context guess intentions AI

What Are Guardrails?

Guardrails are safety systems placed around AI behavior.

They are designed to:

reduce harmful outputs
prevent misuse
enforce policies
and limit dangerous behavior

Your source compares them to security systems inside a building.

That analogy works well because guardrails are not there to stop the AI from functioning.

They are there to make the system safer.

A Simple Analogy: Invisible Security Systems

Imagine entering a bank.

You may not notice all the security systems immediately.

But in the background, there are:

cameras
alarms
locked vaults
access controls
security staff

Most of the time, you do not think about them.

But they are always active.

Guardrails work similarly inside AI systems.

Most prompts pass through normally.

But when certain requests trigger safety systems, the AI may:

refuse
redirect
warn the user
or provide a safer alternative

Where Guardrails Operate

Guardrails usually operate at two major points.

Input Checking

The system examines the user’s prompt before the AI fully processes it.

It looks for:

harmful intent
dangerous requests
policy violations
suspicious wording

Output Checking

The system may also examine the AI’s response before showing it to the user.

This helps catch:

unsafe instructions
violent content
harmful advice
privacy violations

So moderation can happen:

before generation
after generation
or both

What Is a System Prompt?

Now we arrive at one of the most important hidden pieces of modern AI systems.

Every AI conversation usually begins with a hidden instruction set called a:

System prompt

The user usually does not see it.

But the AI does.

The system prompt defines:

who the AI is
how it should behave
what it should avoid
how responses should be formatted
what policies must be followed

In many ways, the system prompt acts like an invisible instruction manual.

A Simplified Example

A real system prompt can be extremely large and complex.

But a simplified version might look like this:

You are a helpful assistant.

Answer clearly and accurately.

Avoid harmful instructions.

Do not provide illegal guidance.

Use structured formatting when useful.

The AI reads instructions like these before interacting with the user.

That means every response is shaped by hidden rules from the very beginning.

Why Different AI Systems Feel Different

This helps explain something many users notice quickly.

ChatGPT, Claude, and Gemini often respond differently to the same request.

That difference does not happen randomly.

Different companies:

write different system prompts
apply different safety priorities
define different behavioral goals

Your source describes this clearly.

For example:

ChatGPT

Often:

structured
concise
policy-oriented

Claude

Often:

cautious
explanatory
reflective about ethical concerns

Gemini

Often:

conversational
exploratory
more flexible in tone

These differences are partly created through:

fine-tuning
and system-level behavioral instructions

What Triggers a Refusal?

AI refusals are usually connected to safety systems.

Common triggers include:

illegal activity
harmful instructions
hate speech
self-harm content
privacy violations
attempts to bypass safety rules

For example:

instructions for violence
hacking guidance
fraud assistance
dangerous chemical instructions

may trigger guardrails automatically.

Context Matters

This part is important.

Modern AI systems increasingly try to evaluate context, not just keywords.

For example:

How do bombs work?

could mean:

a history student studying warfare
a chemistry discussion
or harmful intent

The surrounding context changes how the system interprets the request.

This is one reason why wording matters when interacting with AI.

Why Safety Systems Are Difficult

At first, guardrails may sound simple.

But in practice, they are extremely difficult to design well.

Because AI companies are trying to balance two competing goals:

Goal 1: Be Useful

Users want helpful, flexible AI systems.

Goal 2: Be Safe

Companies want to reduce harmful outputs.

These goals sometimes conflict.

If guardrails are too strict:

harmless requests may get blocked

This is called:

Over-refusal

If guardrails are too weak:

dangerous outputs may slip through

This is often considered the larger risk.

So companies constantly adjust this balance.

Your source explains this as a real trade-off in AI safety systems.

Why AI Sometimes Refuses Reasonable Requests

This is something many users experience.

Sometimes an AI refuses a perfectly reasonable question.

Why?

Because safety systems are imperfect.

The model may:

misunderstand intent
misinterpret wording
detect risky patterns incorrectly

Remember:

guardrails are also AI systems
moderation systems also rely on prediction and classification

So they can make mistakes too.

Can Users Influence AI Behavior?

Yes, to some extent.

Many platforms allow:

custom instructions
project instructions
behavioral preferences

These can influence:

tone
formatting
communication style

But they usually do not override core safety rules.

The built-in system instructions still remain active underneath.

Prompt Injection and “Ignore Previous Instructions”

You may have seen prompts online like:

Ignore previous instructions...

These are attempts to override system instructions.

This is called:

Prompt injection

Modern AI systems are specifically trained to resist many of these attempts.

Why?

Because system prompts are considered higher-priority instructions.

Without protection, users could potentially bypass important safeguards.

Why Understanding This Matters

Understanding guardrails changes how you interact with AI.

You begin to realize:

refusals are not random
assistant behavior is engineered
AI responses are shaped by hidden instructions
safety systems influence what you see

This helps you:

write better prompts
provide clearer context
interpret refusals more intelligently
understand differences between AI products

Most importantly:

It helps you stop treating AI as a neutral source of truth.

AI systems are designed products shaped by:

training
policies
human choices
and corporate priorities

Common Beginner Mistakes

Mistake 1: Thinking the AI “decides” emotionally

The AI is not offended, angry, or morally shocked.

Safety systems triggered a refusal.

Mistake 2: Assuming all AI systems follow identical rules

Different companies use different policies and system prompts.

Mistake 3: Thinking guardrails are perfect

Safety systems can:

over-refuse
under-refuse
misunderstand context

Mistake 4: Believing hidden rules mean AI is “thinking”

System prompts are instructions, not consciousness.

The AI is still processing patterns and probabilities.

Mental Model

Here is the clearest way to think about this lesson:

Pretraining

teaches the AI:

language patterns

Fine-tuning

teaches the AI:

preferred behavior

Guardrails and system prompts

control:

what behavior is allowed

Together, these systems shape nearly every AI interaction.

Practice Thinking

Think carefully about these questions:

Why might one AI refuse a request that another AI accepts?
Why are safety systems difficult to balance perfectly?
Why can harmless prompts sometimes trigger refusals?
Why are system prompts hidden from most users?
How might company values influence AI behavior?

These questions matter because they move you from:

using AI casually

to:

understanding AI critically

Key Takeaways

Guardrails are safety systems controlling AI behavior
System prompts are hidden instruction sets shaping responses
Moderation can happen before or after response generation
Different AI systems behave differently because they use different training and policies
Safety systems involve trade-offs between usefulness and protection
AI refusals are usually triggered by moderation systems, not emotions
Understanding guardrails helps you become a more effective and informed AI user

What’s Next

At this point, you now understand a major part of the modern AI pipeline:

how AI learns language patterns
how AI becomes a conversational assistant
how AI behavior is controlled through hidden rules

Together, these stages explain much of what users experience when interacting with modern AI systems.

And perhaps most importantly:

You now understand that AI behavior is not magical.

It is engineered.

Command Palette

Comments

AI Engineering

How AI actually processes your prompt under the hood

More from this blog

What Are Guardrails?

A Simple Analogy: Invisible Security Systems

Where Guardrails Operate

Input Checking

Output Checking

What Is a System Prompt?

A Simplified Example

Why Different AI Systems Feel Different

ChatGPT

Claude

Gemini

What Triggers a Refusal?

Context Matters

Why Safety Systems Are Difficult

Goal 1: Be Useful

Goal 2: Be Safe

Why AI Sometimes Refuses Reasonable Requests

Can Users Influence AI Behavior?

Prompt Injection and “Ignore Previous Instructions”

Why Understanding This Matters

Common Beginner Mistakes

Mistake 1: Thinking the AI “decides” emotionally

Mistake 2: Assuming all AI systems follow identical rules

Mistake 3: Thinking guardrails are perfect

Mistake 4: Believing hidden rules mean AI is “thinking”

Mental Model

Pretraining

Fine-tuning

Guardrails and system prompts

Practice Thinking

Key Takeaways

What’s Next