Skip to main content

Command Palette

Search for a command to run...

System Prompts and Guardrails in AI models

Updated
9 min read
D

Heya! 👋 I love helping people, and one of the best ways I do this is by sharing my knowledge and experiences. My journey reflects the power of growth and transformation, and I’m here to document and share it with you.

I started as a pharmacist, practicing at a tertiary hospital in the Northern Region of Ghana. There, I saw firsthand the challenges in healthcare delivery and became fascinated by how technology could offer solutions. This sparked my interest in digital health, a field I believe holds the key to revolutionizing healthcare.

Determined to contribute, I taught myself programming, mastering tools like HTML, CSS, JavaScript, React, PHP, and more. But I craved deeper knowledge and practical experience. That’s when I joined the ALX Software Engineering program, which became a turning point. Spending over 70 hours a week learning, coding, and collaborating, I transitioned fully into tech.

Today, I am a Software Engineer and Digital Health Solutions Architect, building and contributing to innovative digital health solutions. I combine my healthcare expertise with technical skills to create impactful tools that solve real-world problems in health delivery.

Imposter syndrome has been part of my journey, but I’ve learned to embrace it as a sign of growth. Livestreaming my learning process, receiving feedback, and building in public have been crucial in overcoming self-doubt. Each experience has strengthened my belief in showing up, staying consistent, and growing through challenges.

Through this platform, I document my lessons, challenges, and successes to inspire and guide others—whether you’re transitioning careers, exploring digital health, or diving into software development.

I believe in accountability and the value of shared growth. Your feedback keeps me grounded and motivated to continue this journey. Let’s connect, learn, and grow together! 🚀

At this point, you already understand two major stages in the life of an AI system.

First:

  • the model learns language patterns through pretraining

Then:

  • the model is shaped into a helpful assistant through fine-tuning and human feedback

But now we arrive at another important question.

If AI systems are trained to be helpful, then:

Why do they sometimes refuse requests?

Why does an AI sometimes respond with:

  • “I can’t help with that”

  • “I’m unable to provide those instructions”

  • or “That request violates policy”?

And why do different AI assistants sometimes respond differently to the exact same prompt?

The answer is that modern AI systems operate inside invisible control systems.

These systems include:

  • system prompts

  • safety filters

  • moderation layers

  • behavioral rules

  • and guardrails

Most users never see them directly.

But they shape almost every interaction you have with an AI assistant.

In this lesson, we are going to unpack:

  • what system prompts are

  • what guardrails do

  • where moderation happens

  • why refusals occur

  • why safety systems are difficult to balance

  • and why AI behavior is more controlled than many users realize

By the end, you should stop thinking:

“The AI just decided not to answer.”

and start understanding:

“There are invisible instruction systems shaping every response.”


What Are Guardrails?

Guardrails are safety systems placed around AI behavior.

They are designed to:

  • reduce harmful outputs

  • prevent misuse

  • enforce policies

  • and limit dangerous behavior

Your source compares them to security systems inside a building.

That analogy works well because guardrails are not there to stop the AI from functioning.

They are there to make the system safer.


A Simple Analogy: Invisible Security Systems

Imagine entering a bank.

You may not notice all the security systems immediately.

But in the background, there are:

  • cameras

  • alarms

  • locked vaults

  • access controls

  • security staff

Most of the time, you do not think about them.

But they are always active.

Guardrails work similarly inside AI systems.

Most prompts pass through normally.

But when certain requests trigger safety systems, the AI may:

  • refuse

  • redirect

  • warn the user

  • or provide a safer alternative


Where Guardrails Operate

Guardrails usually operate at two major points.

Input Checking

The system examines the user’s prompt before the AI fully processes it.

It looks for:

  • harmful intent

  • dangerous requests

  • policy violations

  • suspicious wording


Output Checking

The system may also examine the AI’s response before showing it to the user.

This helps catch:

  • unsafe instructions

  • violent content

  • harmful advice

  • privacy violations

So moderation can happen:

  • before generation

  • after generation

  • or both


What Is a System Prompt?

Now we arrive at one of the most important hidden pieces of modern AI systems.

Every AI conversation usually begins with a hidden instruction set called a:

System prompt

The user usually does not see it.

But the AI does.

The system prompt defines:

  • who the AI is

  • how it should behave

  • what it should avoid

  • how responses should be formatted

  • what policies must be followed

In many ways, the system prompt acts like an invisible instruction manual.


A Simplified Example

A real system prompt can be extremely large and complex.

But a simplified version might look like this:

You are a helpful assistant.

Answer clearly and accurately.

Avoid harmful instructions.

Do not provide illegal guidance.

Use structured formatting when useful.

The AI reads instructions like these before interacting with the user.

That means every response is shaped by hidden rules from the very beginning.


Why Different AI Systems Feel Different

This helps explain something many users notice quickly.

ChatGPT, Claude, and Gemini often respond differently to the same request.

That difference does not happen randomly.

Different companies:

  • write different system prompts

  • apply different safety priorities

  • define different behavioral goals

Your source describes this clearly.

For example:

ChatGPT

Often:

  • structured

  • concise

  • policy-oriented


Claude

Often:

  • cautious

  • explanatory

  • reflective about ethical concerns


Gemini

Often:

  • conversational

  • exploratory

  • more flexible in tone


These differences are partly created through:

  • fine-tuning

  • and system-level behavioral instructions


What Triggers a Refusal?

AI refusals are usually connected to safety systems.

Common triggers include:

  • illegal activity

  • harmful instructions

  • hate speech

  • self-harm content

  • privacy violations

  • attempts to bypass safety rules

For example:

  • instructions for violence

  • hacking guidance

  • fraud assistance

  • dangerous chemical instructions

may trigger guardrails automatically.


Context Matters

This part is important.

Modern AI systems increasingly try to evaluate context, not just keywords.

For example:

How do bombs work?

could mean:

  • a history student studying warfare

  • a chemistry discussion

  • or harmful intent

The surrounding context changes how the system interprets the request.

This is one reason why wording matters when interacting with AI.


Why Safety Systems Are Difficult

At first, guardrails may sound simple.

But in practice, they are extremely difficult to design well.

Because AI companies are trying to balance two competing goals:

Goal 1: Be Useful

Users want helpful, flexible AI systems.


Goal 2: Be Safe

Companies want to reduce harmful outputs.


These goals sometimes conflict.

If guardrails are too strict:

  • harmless requests may get blocked

This is called:

Over-refusal


If guardrails are too weak:

  • dangerous outputs may slip through

This is often considered the larger risk.

So companies constantly adjust this balance.

Your source explains this as a real trade-off in AI safety systems.


Why AI Sometimes Refuses Reasonable Requests

This is something many users experience.

Sometimes an AI refuses a perfectly reasonable question.

Why?

Because safety systems are imperfect.

The model may:

  • misunderstand intent

  • misinterpret wording

  • detect risky patterns incorrectly

Remember:

  • guardrails are also AI systems

  • moderation systems also rely on prediction and classification

So they can make mistakes too.


Can Users Influence AI Behavior?

Yes, to some extent.

Many platforms allow:

  • custom instructions

  • project instructions

  • behavioral preferences

These can influence:

  • tone

  • formatting

  • communication style

But they usually do not override core safety rules.

The built-in system instructions still remain active underneath.


Prompt Injection and “Ignore Previous Instructions”

You may have seen prompts online like:

Ignore previous instructions...

These are attempts to override system instructions.

This is called:

Prompt injection

Modern AI systems are specifically trained to resist many of these attempts.

Why?

Because system prompts are considered higher-priority instructions.

Without protection, users could potentially bypass important safeguards.


Why Understanding This Matters

Understanding guardrails changes how you interact with AI.

You begin to realize:

  • refusals are not random

  • assistant behavior is engineered

  • AI responses are shaped by hidden instructions

  • safety systems influence what you see

This helps you:

  • write better prompts

  • provide clearer context

  • interpret refusals more intelligently

  • understand differences between AI products

Most importantly:

It helps you stop treating AI as a neutral source of truth.

AI systems are designed products shaped by:

  • training

  • policies

  • human choices

  • and corporate priorities


Common Beginner Mistakes

Mistake 1: Thinking the AI “decides” emotionally

The AI is not offended, angry, or morally shocked.

Safety systems triggered a refusal.


Mistake 2: Assuming all AI systems follow identical rules

Different companies use different policies and system prompts.


Mistake 3: Thinking guardrails are perfect

Safety systems can:

  • over-refuse

  • under-refuse

  • misunderstand context


Mistake 4: Believing hidden rules mean AI is “thinking”

System prompts are instructions, not consciousness.

The AI is still processing patterns and probabilities.


Mental Model

Here is the clearest way to think about this lesson:

Pretraining

teaches the AI:

language patterns

Fine-tuning

teaches the AI:

preferred behavior

Guardrails and system prompts

control:

what behavior is allowed

Together, these systems shape nearly every AI interaction.


Practice Thinking

Think carefully about these questions:

  1. Why might one AI refuse a request that another AI accepts?

  2. Why are safety systems difficult to balance perfectly?

  3. Why can harmless prompts sometimes trigger refusals?

  4. Why are system prompts hidden from most users?

  5. How might company values influence AI behavior?

These questions matter because they move you from:

  • using AI casually

to:

  • understanding AI critically

Key Takeaways

  • Guardrails are safety systems controlling AI behavior

  • System prompts are hidden instruction sets shaping responses

  • Moderation can happen before or after response generation

  • Different AI systems behave differently because they use different training and policies

  • Safety systems involve trade-offs between usefulness and protection

  • AI refusals are usually triggered by moderation systems, not emotions

  • Understanding guardrails helps you become a more effective and informed AI user


What’s Next

At this point, you now understand a major part of the modern AI pipeline:

  • how AI learns language patterns

  • how AI becomes a conversational assistant

  • how AI behavior is controlled through hidden rules

Together, these stages explain much of what users experience when interacting with modern AI systems.

And perhaps most importantly:

You now understand that AI behavior is not magical.

It is engineered.