Skip to main content
Back to Blog

The 12-Factor Agent: Building Reliable LLM Applications Without the Magic

10 min readBy Brandon J. Redmond
AI AgentsLLM ApplicationsSoftware ArchitectureProduction AIAgent Design Patterns

The 12-Factor Agent: Building Reliable LLM Applications Without the Magic

We've all been there. You spin up a new agent, wire it to your favorite framework, and within hours you're at 70-80% functionality. The CEO is excited. The team doubles. And then... reality hits.

That last 20% becomes a debugging nightmare. You're seven layers deep in a call stack, trying to reverse-engineer how prompts get built and why your agent keeps calling the wrong API in an infinite loop. Eventually, you either throw it all away and start from scratch, or worse—you realize this wasn't even a good problem for agents in the first place.

The DevOps Agent That Wasn't

Personal confession time: My first agent was a DevOps automation bot. "Here's my Makefile," I told it. "Go build the project." Two hours later, after adding increasingly specific instructions to the prompt, I had essentially written a bash script in English. The realization? I could have written the actual bash script in 90 seconds.

🎯 The Problem with Agent Development Today

After talking to 100+ founders, builders, and engineers about their agent development experiences, I noticed something striking: most production agents aren't that agentic at all. They're mostly deterministic software with small, carefully controlled LLM interactions.

The teams succeeding weren't doing greenfield rewrites. They were applying small, modular patterns—patterns that didn't have names or definitions—to their existing code. And here's the kicker: you don't need an AI background to implement these patterns. This is software engineering, plain and simple.

📋 Introducing the 12 Factors

Just as Heroku defined what it meant to build cloud-native applications a decade ago, we need a similar framework for agent development. These 12 factors represent the core patterns I've seen working in production across dozens of successful agent implementations.

Factor 1: JSON Extraction is Your Superpower 🔄

The most magical thing LLMs can do has nothing to do with loops, tools, or complex orchestration. It's this:

That's it. That's the magic. Everything else—what you do with that JSON—is just regular software engineering.

Factor 2: Own Your Prompts 📝

Those prompt abstractions that promise to handle everything for you? They'll get you to 80%. But when you need that last 20% of quality, you'll end up writing every token by hand.

Why this matters: LLMs are pure functions. Tokens in, tokens out. The only thing determining your agent's reliability is the quality of tokens you get out, and the only thing determining that (besides the model itself) is being careful about what tokens you put in.

Factor 3: Context Windows Need Active Management 🪟

The naive approach: Keep appending to the context window until the LLM says it's done. The problem? This breaks down fast with longer workflows.

Factor 4: "Tool Use" is Harmful (The Abstraction, Not the Concept) 🔧

Controversial Take Alert

The term "tool use" makes us think of agents as magical entities interacting with the world. In reality, it's just JSON going into a switch statement.

Factor 5: Small, Focused Agents Beat Monoliths 🎯

The pattern that actually works in production: mostly deterministic workflows with small agent loops handling specific decisions.

Real example from production: Our deployment bot at HumanLayer is 90% deterministic code. The agent only handles:

  1. Deciding deployment order based on PR content
  2. Formatting notifications for human approval
  3. Handling rollback decisions if tests fail

Each micro-agent handles 3-10 steps max. Manageable context. Clear responsibilities. Actually debuggable.

Factor 6: Own Your Control Flow 🔄

Stop letting frameworks hide your control flow. An agent is just:

  • A prompt (instructions for selecting next step)
  • A switch statement (routing JSON to code)
  • A context builder (managing what the LLM sees)
  • A loop with exit conditions

Factor 7: Agents Should Be Stateless 🏗️

Your agent shouldn't manage state—your application should. This enables pause/resume, better testing, and actual production reliability.

Factor 8: Contact Humans as First-Class Operations 👥

Don't treat human interaction as an afterthought. Make it a first-class part of your agent's vocabulary.

Factor 9: Meet Users Where They Are 📍

Nobody wants seven browser tabs open for different agents. Let them interact where they already work.

Factor 10: Explicit Error Handling 🚨

Don't blindly append errors to context. Process them intelligently.

Factor 11: Separate Business State from Execution State 💼

Your agent needs to track two different types of state:

Factor 12: Find the Bleeding Edge 🚀

The NotebookLM Principle

"Find something that is right at the boundary of what the model can do reliably—that it can't get right all the time—and figure out how to get it right reliably anyway. You will have created something magical."

This means:

  • Push the model to its limits
  • Engineer reliability into your system
  • Create value that others can't easily replicate

🛠️ Putting It All Together: A Production Example

Here's how these factors combine in a real customer service agent:

🎬 The Path Forward

Agents are software. You already know how to build software. The magic isn't in the frameworks—it's in thoughtful application of engineering principles to a new domain.

Start small. Pick one factor that resonates with your current challenges. Implement it. See the improvement. Then add another.

Most importantly, remember: the teams succeeding with agents aren't the ones with the most complex frameworks. They're the ones who understand that agents are just software, and software engineering principles still apply.

📚 Resources and Next Steps

Ready to Build Better Agents?

→ Check out the 12-Factor Agents GitHub repo for detailed documentation

→ Join the discussion about production agent patterns

→ Start with Factor 1 (JSON extraction) and Factor 2 (own your prompts)

→ Measure your improvements—track reliability metrics before and after

Remember: Every production agent started as an impressive demo that hit the 70% wall. The difference between the demos and the production systems? The builders who pushed through understood these patterns.

Now it's your turn. Go build something that actually works.

📺 Watch the Original Talk

This post is based on Dex Horthy's excellent presentation on 12-Factor Agents. Watch the full talk for additional insights, live demos, and Q&A with the audience:

12-Factor Agents: Patterns of Reliable LLM Applications

Dex Horthy from HumanLayer shares patterns from interviewing 100+ builders about what makes production agents actually work.

Watch on YouTube →