Exploring Agent Auth

Why I built this

AI agents are calling tools on behalf of real users: booking flights, querying databases, managing infrastructure. Most demos skip past a critical question. Who is the agent acting as, and what is it allowed to do?

Auth for agentic systems is all over the place. Some frameworks pass a single API key for everything. Others embed user tokens in prompts. A few forward OAuth tokens properly but skip fine-grained policy checks. I wanted to work through these approaches side by side, starting from the worst patterns I’ve seen in production and ending with ones that actually work.

Eight runnable notebooks, each implementing a different auth strategy for the exact same agent performing the exact same tasks. The only variable is how identity and authorization flow through the system.

The progression

Each pattern fixes a specific failure of the previous one. Here’s the comparison table:

#	Pattern	Authz lives	Tool sees real user	Crypto proof	Least privilege	Audit trail
01	No auth	nowhere	no	no	no	no
02	Shared API key	service	no	no	no	partial
03	User API key in prompt	service	yes (string)	no	no	partial
04	User API key via tool arg	service	yes (string)	no	no	yes
05	OAuth token forwarding	service	yes (JWT)	yes	no	yes
06	Scoped token + OPA	OPA policy	yes (JWT)	yes	yes	yes
07	On-behalf-of flow	IdP + OPA	yes (delegated JWT)	yes	yes	yes
08	Token-bound tool credentials	IdP + OPA	yes (bound JWT)	yes	yes	yes

Pattern 01 is the “just make it work” baseline with no auth at all. By pattern 08, you have cryptographically bound, least-privilege tokens with full audit trails and policy-as-code authorization. Each step exposes exactly one new class of vulnerability that the next pattern resolves.

How it works

Every notebook uses the same setup:

Three users: Alice (admin), Bob (regular user), Dave (read-only), each with different roles in Keycloak
One agent: an OpenAI-powered agent that can call three tools (read data, write data, manage users)
Same prompts: each notebook runs the same sequence of requests, so you can compare how different auth strategies handle identical scenarios

The agent tries to act across privilege boundaries. In the early patterns, Dave (read-only) can escalate to admin-level operations because nothing is actually checking authorization. By the later patterns, each user is correctly constrained to their role, with cryptographic proof of identity flowing through every tool call.

Each notebook ends with a “What went wrong” section that names the vulnerability the next pattern fixes. Read them in order and the reasoning behind each layer of auth infrastructure becomes obvious.

The plugin architecture

The core design decision: each auth pattern is a self-contained plugin with exactly two files.

patterns/p01_service_credential/
├── mcp_auth.py      # How the MCP server adds auth to outbound requests
├── service_auth.py  # How the service extracts identity from inbound requests
└── notebook.ipynb   # Teaching narrative

This repeats for all eight patterns. The framework handles agent wiring, MCP plumbing, and service scaffolding. Each pattern only owns its auth logic.

The MCP side: `AuthHandler`

Every pattern’s mcp_auth.py subclasses a two-method interface:

class AuthHandler:
    async def prepare_request(self, user_context, headers):
        """Add auth credentials to outbound request headers."""
        return headers

    async def before_tool_call(self, user_context, tool_name):
        """Pre-call authorization gate. Return True to proceed."""
        return True

Pattern 1 just adds an API key header. Pattern 5 forwards the user’s JWT as a Bearer token. Pattern 6 calls Keycloak’s token exchange endpoint to narrow the audience before forwarding. The interface stays the same; only the auth logic changes.

The service side: `Identity`

Each pattern’s service_auth.py exports identity extraction functions that return a single dataclass:

@dataclass
class Identity:
    method: str   # none, api_key, string_id, jwt, scoped_jwt
    user_id: str | None = None
    claims: dict[str, Any] | None = None
    raw_token: str | None = None

The framework provides reusable extractors in auth_presets.py for common patterns: API key validation, unverified JWT decoding, JWKS-validated JWT verification, OPA integration. Patterns either use these directly or customize them.

Runtime wiring

PatternRunner dynamically loads each pattern’s two files at runtime:

runner = PatternRunner("p01_service_credential")
await runner.start()
await runner.run_as("alice", "What are my expenses?")

It imports the pattern’s auth_handler and get_identity functions, then injects them into the MCP server and FastAPI service factories. No central registry. No plugin manager. Just dynamic import and dependency injection.

Why isolation over abstraction

In production you’d have a single flexible auth layer that supports multiple strategies. I deliberately didn’t do that here. Each pattern has its own auth code on both the MCP side and the service side, so you can read exactly what happens at each boundary without tracing through abstractions.

The call flow for every pattern is the same:

Agent → MCP Server → auth_handler.prepare_request() → FastAPI Service → get_identity() → filter/authorize → respond

Pattern 1’s prepare_request adds X-API-Key: shared_secret. Pattern 7’s is identical to pattern 5 (just forwards the JWT), because all the complexity moved to the service side where OPA evaluates relationship-based policies against JWT claims. You can see exactly where the auth boundary shifted by diffing two files.

What the service sees

Each notebook ends by printing what the service actually received. In pattern 1, the service sees Identity(method="api_key", user_id=None) — it knows a valid API key was used, but has no idea who the user is. By pattern 7, it sees Identity(method="jwt", user_id="alice", claims={role: "employee", department: "engineering", reports_to: "bob"}) and can enforce per-resource policies: Alice can read her own expenses, Bob can approve expenses for his reports, Dave can only read platform-wide documents.

Tech stack

Keycloak: IdP for user auth, role assignment, and token issuance. Runs in Docker with pre-configured realms, clients, and users.
OPA: policy engine for fine-grained authz decisions. Policies in Rego, evaluating JWT claims against resource permissions.
FastAPI: the tool services the agent calls. Each endpoint validates tokens and enforces the pattern’s auth strategy.
Docker Compose: orchestrates everything (Keycloak, OPA, services) so the full stack comes up with one command.
OpenAI: LLM for the agent’s reasoning and tool-calling.
Jupyter: each pattern is a notebook with inline explanations, runnable cells, and output comparisons.

Everything runs locally. No cloud accounts needed. You just need an OpenAI API key.

Getting started

git clone https://github.com/The-CarL/exploring-agent-auth.git
cd exploring-agent-auth

# Start the infrastructure
docker compose up -d

# Install Python dependencies
uv sync

# Verify everything is running
jupyter lab
# Open and run notebook 00-verify-setup.ipynb

Once the setup notebook passes, work through 01 through 08 in order. Each one builds on the previous pattern.

What’s next

Writing a companion blog post for carloperottino.com covering the design decisions behind each pattern.

The repo will also get new patterns over time: RAG with auth context, multi-agent delegation chains, and hardware-bound credentials.