Choosing AI Agent Architecture for Enterprise Systems: Shallow vs ReAct vs Deep
Last Updated on February 3, 2026 by Editorial Team
Author(s): Mandar Panse
Originally published on Towards AI.
Understanding different execution patterns in modern LLM-powered agents
Important note: These aren’t “types of AI agents” in the classical sense (like reflex agents, goal-based agents, etc.). Instead, these are architectural patterns — different ways of implementing how LLM-powered agents process and respond to requests. Think of them as execution strategies rather than fundamental agent categories.
Everyone’s talking about AI agents right now. But here’s what nobody tells you: LLM-based agents don’t all work the same way.
There are basically three execution patterns, and picking the wrong one will either blow your budget, piss off your users with slow responses, or give you garbage outputs. Sometimes all three.
So let’s talk about Shallow Processing, ReAct (Reasoning + Acting), and Deep Reasoning — the three main ways LLM agents can be architected. Not the marketing BS version — the version that matters when you’re actually building something.
Pattern #1: Shallow Processing — The Speed Demons
Shallow agents are dead simple. You give them input, they spit out an answer. That’s it.
These agents can have RAG integrated into it.
No iteration. No “hmm, let me reconsider that.” Just straight through processing.
The flow looks like this:

That’s literally it. One pass through the model and you’re done.
When do we actually use these:
Honestly? More than you’d think. When you need to:
- Generate a quick email response
- Classify customer support tickets
- Summarize meeting notes
- Answer straightforward questions
They’re fast (like 1–2 seconds), they’re cheap, and for 70% of use cases, they work fine.
Where they fall apart:
They can’t use tools. Can’t look stuff up. Can’t verify their own answers.
If the model hallucinates, you’re getting that hallucination delivered with complete confidence. I’ve seen these confidently tell users their account balance is $5,000 when it’s actually $50. Not great.
They also can’t break down complex problems. Ask a shallow agent to “research market trends and create a competitive analysis” and you’ll get… something. Will it be good? Probably not. Will it be based on actual current data? Definitely not.
The code is boring:
from groq import Groq
client = Groq(api_key=os.environ.get("GROQ_API_KEY"))
# One-shot response
response = client.chat.completions.create(
model="llama-3.3-70b-versatile",
messages=[
{"role": "user", "content": "Explain quantum computing"}
]
)
print(response.choices[0].message.content)
# That's it. No loops, no tools, just input → output
Pattern #2: ReAct (Reasoning + Acting) : Now We’re Talking
This is where it gets interesting. ReAct = Reasoning + Acting.
These agents can actually DO stuff. They think, take an action, see what happens, then think again. It’s a loop.
Here’s the pattern:

This changes everything. The agent isn’t just guessing, it’s actually looking things up.
Real example: A customer service agent needed to answer complex questions about loan accounts.
Shallow approach (single-pass with RAG):
- Query → Retrieve account docs → Generate answer → Done
- Problem: If the retrieved docs don’t have payment history, the agent might guess or give incomplete answers
ReAct approach (iterative with tools):
- Thinks: “I need account details”
- Calls loan_status_api(account_id)
- Observes: Loan is active, but no payment info
- Thinks: “Need payment history to give complete answer”
- Calls payment_history_api(account_id)
- Observes: Last payment was late
- Thinks: “Now I can give accurate, complete answer.
- Responds with verified, comprehensive information
Result: Error rates dropped significantly because the agent could verify information across multiple sources instead of relying on a single retrieval step.
The tool-use thing is huge:
Give these agents access to:
- Web search
- Database queries
- API calls
- Code execution
- File systems
And they’ll figure out when and how to use them. The LLM decides “I need this tool now” and calls it. Then it sees the result and decides what to do next.
Where do we use ReAct:
- Research tasks (search → read → synthesize)
- Data analysis (query DB → analyze → visualize)
- Customer support (look up account → check status → respond)
- Anything needing real-time data
The catch:
They’re slower. Each tool call adds latency. And if the agent makes a bad decision about which tool to use, it can go down rabbit holes.
It is observed that ReAct agents get stuck in loops: search → not satisfied → search again → still not satisfied → search again → eventually timeout. You need guardrails.
Pattern #3: Deep Reasoning — The Slow Thinkers
OK, these are wild. Models like OpenAI’s o1, o3, or DeepSeek R1.
They don’t just answer your question. They think. Like, really think. For a long time. Behind the scenes.
What actually happens:

Your question
↓
[Model generates 10,000+ hidden reasoning tokens]
├─ "What if I try approach A?"
├─ "Wait, that won't work because…"
├─ "Let me try approach B"
├─ "Hmm, edge case here…"
├─ "Actually, let me reconsider…"
├─ "OK, comparing all options…"
└─ "Final answer is X because Y"
↓
You see: "The answer is X"
You never see all that internal reasoning. But it’s happening. Sometimes thousands of tokens of “thinking” before you get a single sentence of output.
When this actually matters:
I tested o1 and GPT-4 on a gnarly system design problem designing a fault-tolerant payment processor.
GPT-4 gave me a decent but generic answer. Mentioned the usual stuff: microservices, load balancers, database replication. Fine.
o1 thought for ~45 seconds, then gave me an answer that considered:
- Specific failure modes I hadn’t thought of
- Trade-offs between different consensus algorithms
- Cost implications of the architecture
- Why certain “obvious” solutions would actually break under load
It was noticeably better. Not just more words but actually deeper reasoning.
Where deep reasoning shines:
- Complex coding problems (competitive programming level)
- Math that requires multiple proof steps
- System design with lots of trade-offs
- Anything where “thinking harder” actually helps
On coding benchmarks, these models hit 70–90% accuracy on problems where regular models get 10–15%. That’s not a small difference.
The brutal trade-offs:
Speed: You’re waiting 30–60 seconds for responses. Sometimes longer.
Cost: All that hidden reasoning uses tokens. Lots of them. Your bill goes up fast.
Overkill: For “What’s the weather?” you’re wasting 99% of the capability.
My honest take:
Do not use deep reasoning agents for most stuff. They’re too slow and expensive for production systems where you need sub-second response times.
But for:
- Initial system design
- Debugging really tricky bugs
- Code review on complex PRs
- Anything where I’d normally spend an hour thinking
They’re worth it. It’s like having a really smart colleague who thinks deeply but talks slowly.
Quick Comparison: Execution Patterns at a Glance

Use shallow by default. Upgrade to ReAct when you need tools. Use deep reasoning sparingly for stuff that’s actually hard.
Which One Should You Actually Use?
Here’s what we should do in practice:
95% of the time: Shallow or ReAct
Most problems don’t need deep reasoning. They need speed and the ability to look stuff up.
- Customer support bot? ReAct with database access.
- Content generation? Shallow.
- Research assistant? ReAct with web search.
- Quick classification? Shallow.
5% of the time: Deep reasoning
When something is genuinely complex:
- System architecture decisions
- Complex debugging
- Algorithm design
- Code that needs to be bulletproof
In production systems:
I recommend to have a tiered approach:
- Shallow agent handles 80% of requests (fast, cheap)
- ReAct agent handles stuff needing tools (medium speed/cost)
- Deep reasoning for the 1% of cases that are actually complex
Auto-route based on the question complexity. Works pretty well.
The real insight:
Different problems need different types of thinking. Sounds obvious, but most people just throw GPT-4 at everything and wonder why their bill is insane or their users are frustrated with slow responses.
Match the pattern to the job. Shallow processing is underrated. Deep reasoning is overused. ReAct is the sweet spot for most production use cases.
That’s it. No fluff about “the future of AI” or “revolutionary paradigms.” Just three different execution patterns for LLM agents, when to use each one, and why it matters.
If you’re building AI systems, you need to know this stuff. The architectural pattern you choose will determine your costs, your latency, and whether your agent actually works.
Questions? Hit me up in the comments. I probably check them more than I should.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.
Published via Towards AI
Towards AI Academy
We Build Enterprise-Grade AI. We'll Teach You to Master It Too.
15 engineers. 100,000+ students. Towards AI Academy teaches what actually survives production.
Start free — no commitment:
→ 6-Day Agentic AI Engineering Email Guide — one practical lesson per day
→ Agents Architecture Cheatsheet — 3 years of architecture decisions in 6 pages
Our courses:
→ AI Engineering Certification — 90+ lessons from project selection to deployed product. The most comprehensive practical LLM course out there.
→ Agent Engineering Course — Hands on with production agent architectures, memory, routing, and eval frameworks — built from real enterprise engagements.
→ AI for Work — Understand, evaluate, and apply AI for complex work tasks.
Note: Article content contains the views of the contributing authors and not Towards AI.