Choosing AI Agent Architecture for Enterprise Systems: Shallow vs ReAct vs Deep

Last Updated on February 3, 2026 by Editorial Team

Author(s): Mandar Panse

Originally published on Towards AI.

Understanding different execution patterns in modern LLM-powered agents

Important note: These aren’t “types of AI agents” in the classical sense (like reflex agents, goal-based agents, etc.). Instead, these are architectural patterns — different ways of implementing how LLM-powered agents process and respond to requests. Think of them as execution strategies rather than fundamental agent categories.

Everyone’s talking about AI agents right now. But here’s what nobody tells you: LLM-based agents don’t all work the same way.

There are basically three execution patterns, and picking the wrong one will either blow your budget, piss off your users with slow responses, or give you garbage outputs. Sometimes all three.

So let’s talk about Shallow Processing, ReAct (Reasoning + Acting), and Deep Reasoning — the three main ways LLM agents can be architected. Not the marketing BS version — the version that matters when you’re actually building something.

Pattern #1: Shallow Processing — The Speed Demons

Shallow agents are dead simple. You give them input, they spit out an answer. That’s it.

These agents can have RAG integrated into it.

No iteration. No “hmm, let me reconsider that.” Just straight through processing.

The flow looks like this:

Choosing AI Agent Architecture for Enterprise Systems: Shallow vs ReAct vs Deep

That’s literally it. One pass through the model and you’re done.

When do we actually use these:

Honestly? More than you’d think. When you need to:

Generate a quick email response
Classify customer support tickets
Summarize meeting notes
Answer straightforward questions

They’re fast (like 1–2 seconds), they’re cheap, and for 70% of use cases, they work fine.

Where they fall apart:

They can’t use tools. Can’t look stuff up. Can’t verify their own answers.

If the model hallucinates, you’re getting that hallucination delivered with complete confidence. I’ve seen these confidently tell users their account balance is $5,000 when it’s actually $50. Not great.

They also can’t break down complex problems. Ask a shallow agent to “research market trends and create a competitive analysis” and you’ll get… something. Will it be good? Probably not. Will it be based on actual current data? Definitely not.

The code is boring:

from groq import Groq

client = Groq(api_key=os.environ.get("GROQ_API_KEY"))

# One-shot response
response = client.chat.completions.create(
 model="llama-3.3-70b-versatile",
 messages=[
 {"role": "user", "content": "Explain quantum computing"}
 ]
)
print(response.choices[0].message.content)
# That's it. No loops, no tools, just input → output

Pattern #2: ReAct (Reasoning + Acting) : Now We’re Talking

This is where it gets interesting. ReAct = Reasoning + Acting.

These agents can actually DO stuff. They think, take an action, see what happens, then think again. It’s a loop.

Here’s the pattern:

This changes everything. The agent isn’t just guessing, it’s actually looking things up.

Real example: A customer service agent needed to answer complex questions about loan accounts.

Shallow approach (single-pass with RAG):

Query → Retrieve account docs → Generate answer → Done
Problem: If the retrieved docs don’t have payment history, the agent might guess or give incomplete answers

ReAct approach (iterative with tools):

Thinks: “I need account details”
Calls loan_status_api(account_id)
Observes: Loan is active, but no payment info
Thinks: “Need payment history to give complete answer”
Calls payment_history_api(account_id)
Observes: Last payment was late
Thinks: “Now I can give accurate, complete answer.
Responds with verified, comprehensive information

Result: Error rates dropped significantly because the agent could verify information across multiple sources instead of relying on a single retrieval step.

The tool-use thing is huge:

Give these agents access to:

Web search
Database queries
API calls
Code execution
File systems

And they’ll figure out when and how to use them. The LLM decides “I need this tool now” and calls it. Then it sees the result and decides what to do next.

Where do we use ReAct:

Research tasks (search → read → synthesize)
Data analysis (query DB → analyze → visualize)
Customer support (look up account → check status → respond)
Anything needing real-time data

The catch:

They’re slower. Each tool call adds latency. And if the agent makes a bad decision about which tool to use, it can go down rabbit holes.

It is observed that ReAct agents get stuck in loops: search → not satisfied → search again → still not satisfied → search again → eventually timeout. You need guardrails.

Pattern #3: Deep Reasoning — The Slow Thinkers

OK, these are wild. Models like OpenAI’s o1, o3, or DeepSeek R1.

They don’t just answer your question. They think. Like, really think. For a long time. Behind the scenes.

What actually happens:

Your question
↓
[Model generates 10,000+ hidden reasoning tokens]
├─ "What if I try approach A?"
├─ "Wait, that won't work because…"
├─ "Let me try approach B"
├─ "Hmm, edge case here…"
├─ "Actually, let me reconsider…"
├─ "OK, comparing all options…"
└─ "Final answer is X because Y"
↓
You see: "The answer is X"

You never see all that internal reasoning. But it’s happening. Sometimes thousands of tokens of “thinking” before you get a single sentence of output.

When this actually matters:

I tested o1 and GPT-4 on a gnarly system design problem designing a fault-tolerant payment processor.

GPT-4 gave me a decent but generic answer. Mentioned the usual stuff: microservices, load balancers, database replication. Fine.

o1 thought for ~45 seconds, then gave me an answer that considered:

Specific failure modes I hadn’t thought of
Trade-offs between different consensus algorithms
Cost implications of the architecture
Why certain “obvious” solutions would actually break under load

It was noticeably better. Not just more words but actually deeper reasoning.

Where deep reasoning shines:

Complex coding problems (competitive programming level)
Math that requires multiple proof steps
System design with lots of trade-offs
Anything where “thinking harder” actually helps

On coding benchmarks, these models hit 70–90% accuracy on problems where regular models get 10–15%. That’s not a small difference.

The brutal trade-offs:

Speed: You’re waiting 30–60 seconds for responses. Sometimes longer.

Cost: All that hidden reasoning uses tokens. Lots of them. Your bill goes up fast.

Overkill: For “What’s the weather?” you’re wasting 99% of the capability.

My honest take:

Do not use deep reasoning agents for most stuff. They’re too slow and expensive for production systems where you need sub-second response times.

But for:

Initial system design
Debugging really tricky bugs
Code review on complex PRs
Anything where I’d normally spend an hour thinking

They’re worth it. It’s like having a really smart colleague who thinks deeply but talks slowly.

Quick Comparison: Execution Patterns at a Glance

Use shallow by default. Upgrade to ReAct when you need tools. Use deep reasoning sparingly for stuff that’s actually hard.

Which One Should You Actually Use?

Here’s what we should do in practice:

95% of the time: Shallow or ReAct

Most problems don’t need deep reasoning. They need speed and the ability to look stuff up.

Customer support bot? ReAct with database access.
Content generation? Shallow.
Research assistant? ReAct with web search.
Quick classification? Shallow.

5% of the time: Deep reasoning

When something is genuinely complex:

System architecture decisions
Complex debugging
Algorithm design
Code that needs to be bulletproof

In production systems:

I recommend to have a tiered approach:

Shallow agent handles 80% of requests (fast, cheap)
ReAct agent handles stuff needing tools (medium speed/cost)
Deep reasoning for the 1% of cases that are actually complex

Auto-route based on the question complexity. Works pretty well.

The real insight:

Different problems need different types of thinking. Sounds obvious, but most people just throw GPT-4 at everything and wonder why their bill is insane or their users are frustrated with slow responses.

Match the pattern to the job. Shallow processing is underrated. Deep reasoning is overused. ReAct is the sweet spot for most production use cases.

That’s it. No fluff about “the future of AI” or “revolutionary paradigms.” Just three different execution patterns for LLM agents, when to use each one, and why it matters.

If you’re building AI systems, you need to know this stuff. The architectural pattern you choose will determine your costs, your latency, and whether your agent actually works.

Questions? Hit me up in the comments. I probably check them more than I should.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

15 engineers. 100,000+ students. Towards AI Academy teaches what actually survives production.

Start free — no commitment:

→ Agents Architecture Cheatsheet — 3 years of architecture decisions in 6 pages

Our courses:

→ AI Engineering Certification — 90+ lessons from project selection to deployed product. The most comprehensive practical LLM course out there.

→ Agent Engineering Course — Hands on with production agent architectures, memory, routing, and eval frameworks — built from real enterprise engagements.

→ AI for Work — Understand, evaluate, and apply AI for complex work tasks.

Note: Article content contains the views of the contributing authors and not Towards AI.

Frequently Used, Contextual References

Resources

Choosing AI Agent Architecture for Enterprise Systems: Shallow vs ReAct vs Deep

Author(s): Mandar Panse

Understanding different execution patterns in modern LLM-powered agents

Pattern #1: Shallow Processing — The Speed Demons

Pattern #2: ReAct (Reasoning + Acting) : Now We’re Talking

Real example: A customer service agent needed to answer complex questions about loan accounts.

Pattern #3: Deep Reasoning — The Slow Thinkers

Quick Comparison: Execution Patterns at a Glance

Which One Should You Actually Use?

Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

Recent Posts

I Deleted Every Static Claude API Key I Owned. Here’s the Keyless Migration, Provider by Provider.

I Replaced ChatGPT With Local AI for 30 Days. Here’s What Actually Happened.

A Practical Guide to Evaluating a Cloud Migration Partner

AsyncIO in Python: What It Actually Is and Why Your ‘Async’ Code Might Not Be Async

Building Long-Running Claude Managed Agents: Why State Matters More Than Compute

The Building Blocks of LangGraph (Part 0)

Five Ways Claude Code Runs Multi-Step Work. The Two Questions That Pick the Right One.

Choose Wisely: Models Should Follow Your Use Case.

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Choosing AI Agent Architecture for Enterprise Systems: Shallow vs ReAct vs Deep

Author(s): Mandar Panse

Understanding different execution patterns in modern LLM-powered agents

Pattern #1: Shallow Processing — The Speed Demons

Pattern #2: ReAct (Reasoning + Acting) : Now We’re Talking

Real example: A customer service agent needed to answer complex questions about loan accounts.

Pattern #3: Deep Reasoning — The Slow Thinkers

Quick Comparison: Execution Patterns at a Glance

Which One Should You Actually Use?

Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

Related posts

Recent Posts

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

GDPR CCPA Statement