Back to Blog

NeMo Guardrails Rail Types Explained: The Five Layers of LLM Defense

March 28, 2026 Engineering Team Technical Deep-Dive

NVIDIA's NeMo Guardrails doesn't treat safety as a single checkpoint. It decomposes guardrail enforcement into five distinct rail types, each operating at a different point in the LLM interaction pipeline. Understanding where each rail fires — and what it can and cannot catch — is the difference between a system that looks safe in demos and one that holds up in production.

The Five Rail Types

Every interaction with an LLM-powered application follows a pipeline: user input arrives, context may be retrieved, the model reasons and responds, and optionally takes actions. NeMo Guardrails places enforcement points at each stage:

Rail Type When It Fires What It Inspects Primary Purpose
Input Before the LLM User message Content safety, jailbreak detection, topic control, PII masking
Retrieval During RAG pipeline Retrieved documents/chunks Document filtering, chunk validation, relevance checking
Dialog During conversation flow Conversation state Flow control, guided conversations, topic steering
Execution When agent calls tools Action inputs/outputs Tool call validation, action parameter checking
Output After the LLM responds Generated response Response filtering, fact checking, sensitive data removal

Input Rails: The First Line of Defense

Input rails evaluate the user's message before it reaches the LLM. They are the most commonly deployed rail type and handle the highest volume of threats.

What Input Rails Catch

  • Jailbreak attempts — Prompts designed to make the model ignore its system instructions ("Ignore all previous instructions and...")
  • Toxic or harmful content — Hate speech, threats, explicit content
  • Off-topic requests — Questions outside the application's intended scope
  • PII in user input — Social security numbers, credit cards, phone numbers that should be masked before the model sees them

Colang Implementation Pattern

define user ask about unrelated topic
  "What's the weather like?"
  "Can you write me a poem?"
  "Tell me a joke"

define flow
  user ask about unrelated topic
  bot refuse and redirect
  "I'm designed to help with [your domain]. How can I assist you with that?"

Input rails are fast because they operate on a single message without needing model inference. Most NeMo deployments start here.

What Input Rails Cannot Catch

Input rails see the user message in isolation. They cannot detect:

  • Multi-turn manipulation (requires dialog rails)
  • Problems with retrieved context (requires retrieval rails)
  • Hallucinations in the model response (requires output rails)

Retrieval Rails: Defending the RAG Pipeline

Retrieval-Augmented Generation (RAG) is the most common pattern for grounding LLMs in organizational data. Retrieval rails operate between the vector search and the model prompt, validating the documents that will become the model's context.

Why Retrieved Documents Need Guardrails

  • Poisoned data — If an attacker can inject content into your document store, the model will treat it as authoritative
  • Stale information — Outdated documents can lead the model to give incorrect answers
  • Irrelevant chunks — Low-relevance retrieved documents dilute response quality and can confuse the model
  • Cross-domain leakage — In multi-tenant RAG systems, retrieval rails ensure documents from one tenant don't leak into another's context

Practical Example

A retrieval rail might enforce: "Retrieved chunks must have a relevance score above 0.8, must have been updated within the last 12 months, and must not contain content tagged as 'internal-only' when the user is external."

This is especially important in enterprise RAG where the document corpus may contain mixed-sensitivity content.

Dialog Rails: Controlling Conversation Flow

Dialog rails operate at the conversation level, not individual messages. They define permissible conversation flows and detect when an interaction is veering off the intended path.

When Dialog Rails Matter Most

  • Multi-step workflows — Guiding users through structured processes (e.g., loan applications, insurance claims) where the model must follow a specific sequence
  • Escalation detection — Recognizing when a conversation should be handed off to a human agent
  • Multi-turn jailbreaks — Attacks that spread manipulation across several messages, incrementally shifting the model's behavior
  • Guided conversations — Ensuring the AI collects required information before providing a recommendation

Colang Flow Definition

define flow insurance_claim
  # Step 1: Collect incident details
  bot ask for incident date
  user provide incident date

  # Step 2: Verify coverage
  bot confirm coverage type
  user confirm coverage

  # Step 3: Process
  bot provide claim reference number

  # If user tries to skip steps:
  user ask for claim number
  bot explain steps required
  "I need to collect some details first before I can process your claim."

Dialog rails are the most complex to implement because they require modeling expected conversation flows. But they catch threats that no single-message rail can detect.

Execution Rails: Securing Agent Actions

As LLMs evolve from chatbots to agents that can call APIs, query databases, and modify state, execution rails become the critical safety layer. They validate action inputs before execution and inspect outputs before they are used.

What Execution Rails Enforce

  • Action allowlisting — Only pre-approved tools can be called. An LLM cannot decide to call an arbitrary endpoint
  • Parameter validation — Check that tool call arguments are within acceptable bounds (e.g., a query limit that prevents SELECT * on a million-row table)
  • Output sanitization — Validate tool responses before they re-enter the LLM context (prevents injection via tool outputs)
  • Rate limiting and cost controls — Prevent agents from making excessive API calls or triggering expensive operations

The Agentic Security Problem

Without execution rails, a common attack pattern is:

  1. Attacker manipulates the model (via prompt injection) to call an internal tool
  2. The tool returns sensitive data
  3. The model includes that data in its response to the attacker

Execution rails break this chain by validating both the call intent and the returned data. This is NVIDIA's answer to the growing concern around agentic AI security.

Output Rails: The Final Checkpoint

Output rails inspect the model's generated response before it reaches the user. They are the last line of defense and catch issues that originate inside the model itself.

What Output Rails Catch

  • Hallucinated facts — Cross-reference claims against a knowledge base or fact-checking system
  • Leaked system prompts — Detect when the model is revealing its own instructions
  • PII in responses — The LLM might generate or echo sensitive data even if input rails didn't detect it
  • Toxic or biased content — The model may generate problematic content even from a clean prompt
  • Format violations — Ensure responses conform to expected structure (JSON schema, required fields, etc.)

Output Rails Are Expensive

Unlike input rails that operate on a short user message, output rails must process the model's full response. In some implementations, they invoke a second LLM call for fact-checking or content classification. This adds latency and cost. Design output rails to be selective — apply heavy checks only when the stakes justify it.

Use Case × Rail Type Matrix

Not every application needs all five rail types. Here is which rails matter most for common use cases:

Use Case Input Retrieval Dialog Execution Output
Content Safety Yes Yes
Jailbreak Protection Yes
Topic Control Yes Yes
PII Detection Yes Yes Yes
RAG / Knowledge Base Yes Yes
Agentic Security Yes
Custom / Enterprise Yes Yes Yes Yes Yes

Deployment Considerations

Library vs. Microservice

NeMo Guardrails can run as an in-process Python library or as a standalone microservice. The rail configurations are portable between both modes:

  • Library mode — Lower latency, simpler deployment. Good for single-service architectures
  • Microservice mode — Independent scaling, centralized policy management, shared across multiple applications. Better for enterprise environments with multiple LLM services

Performance Impact by Rail Type

  • Input rails — Typically 10-50ms (pattern matching and classification, no LLM call needed)
  • Retrieval rails — 5-20ms (metadata checks on already-retrieved documents)
  • Dialog rails — 20-100ms (conversation state evaluation)
  • Execution rails — 10-50ms (parameter validation, allowlist checks)
  • Output rails — 100ms-2s (may require LLM-based fact-checking, depending on configuration)

Start Simple, Layer Up

Most teams should start with input and output rails. Add retrieval rails when you implement RAG. Add dialog rails when you have multi-step workflows. Add execution rails when you give your LLM tool access. Deploying all five from day one without justification adds complexity and latency you may not need.

Building a Defense-in-Depth Stack

NeMo's five rail types are a taxonomy, not a religion. In practice, many organizations combine NeMo with other tools from the Open AI Guardrails Registry:

  • NeMo for conversation-level policy enforcement (dialog and topic control)
  • LLM Guard for high-speed PII scanning (input and output filtering)
  • Hexarch Guardrails for budget and rate enforcement (execution-level controls)
  • Custom rules for domain-specific compliance requirements

The key insight is that each rail type solves a different problem in the pipeline. No single tool covers all five. The best production systems compose multiple frameworks, each handling the layer it does best.

References

  • NVIDIA NeMo Guardrails Documentation — Guardrail Types: Input, Output, Retrieval, Dialog, Execution Rails
  • NVIDIA NeMo Guardrails — Use Cases: Content Safety, Jailbreak Protection, Topic Control, PII Detection, Agentic Security