Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: pub@towardsai.net
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab VeloxTrend Ultrarix Capital Partners Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Free: 6-day Agentic AI Engineering Email Guide.
Learnings from Towards AI's hands-on work with real clients.
Choosing AI Agent Architecture for Enterprise Systems: Shallow vs ReAct vs Deep
Artificial Intelligence   Latest   Machine Learning

Choosing AI Agent Architecture for Enterprise Systems: Shallow vs ReAct vs Deep

Last Updated on February 3, 2026 by Editorial Team

Author(s): Mandar Panse

Originally published on Towards AI.

Understanding different execution patterns in modern LLM-powered agents

Important note: These aren’t “types of AI agents” in the classical sense (like reflex agents, goal-based agents, etc.). Instead, these are architectural patterns — different ways of implementing how LLM-powered agents process and respond to requests. Think of them as execution strategies rather than fundamental agent categories.

Everyone’s talking about AI agents right now. But here’s what nobody tells you: LLM-based agents don’t all work the same way.

There are basically three execution patterns, and picking the wrong one will either blow your budget, piss off your users with slow responses, or give you garbage outputs. Sometimes all three.

So let’s talk about Shallow Processing, ReAct (Reasoning + Acting), and Deep Reasoning — the three main ways LLM agents can be architected. Not the marketing BS version — the version that matters when you’re actually building something.

Pattern #1: Shallow Processing — The Speed Demons

Shallow agents are dead simple. You give them input, they spit out an answer. That’s it.

These agents can have RAG integrated into it.

No iteration. No “hmm, let me reconsider that.” Just straight through processing.

The flow looks like this:

Choosing AI Agent Architecture for Enterprise Systems: Shallow vs ReAct vs Deep

That’s literally it. One pass through the model and you’re done.

When do we actually use these:

Honestly? More than you’d think. When you need to:

  • Generate a quick email response
  • Classify customer support tickets
  • Summarize meeting notes
  • Answer straightforward questions

They’re fast (like 1–2 seconds), they’re cheap, and for 70% of use cases, they work fine.

Where they fall apart:

They can’t use tools. Can’t look stuff up. Can’t verify their own answers.

If the model hallucinates, you’re getting that hallucination delivered with complete confidence. I’ve seen these confidently tell users their account balance is $5,000 when it’s actually $50. Not great.

They also can’t break down complex problems. Ask a shallow agent to “research market trends and create a competitive analysis” and you’ll get… something. Will it be good? Probably not. Will it be based on actual current data? Definitely not.

The code is boring:

from groq import Groq

client = Groq(api_key=os.environ.get("GROQ_API_KEY"))

# One-shot response
response = client.chat.completions.create(
model="llama-3.3-70b-versatile",
messages=[
{"role": "user", "content": "Explain quantum computing"}
]
)
print(response.choices[0].message.content)
# That's it. No loops, no tools, just input → output

Pattern #2: ReAct (Reasoning + Acting) : Now We’re Talking

This is where it gets interesting. ReAct = Reasoning + Acting.

These agents can actually DO stuff. They think, take an action, see what happens, then think again. It’s a loop.

Here’s the pattern:

ReAct Pattern

This changes everything. The agent isn’t just guessing, it’s actually looking things up.

Real example: A customer service agent needed to answer complex questions about loan accounts.

Shallow approach (single-pass with RAG):

  • Query → Retrieve account docs → Generate answer → Done
  • Problem: If the retrieved docs don’t have payment history, the agent might guess or give incomplete answers

ReAct approach (iterative with tools):

  • Thinks: “I need account details”
  • Calls loan_status_api(account_id)
  • Observes: Loan is active, but no payment info
  • Thinks: “Need payment history to give complete answer”
  • Calls payment_history_api(account_id)
  • Observes: Last payment was late
  • Thinks: “Now I can give accurate, complete answer.
  • Responds with verified, comprehensive information

Result: Error rates dropped significantly because the agent could verify information across multiple sources instead of relying on a single retrieval step.

The tool-use thing is huge:

Give these agents access to:

  • Web search
  • Database queries
  • API calls
  • Code execution
  • File systems

And they’ll figure out when and how to use them. The LLM decides “I need this tool now” and calls it. Then it sees the result and decides what to do next.

Where do we use ReAct:

  • Research tasks (search → read → synthesize)
  • Data analysis (query DB → analyze → visualize)
  • Customer support (look up account → check status → respond)
  • Anything needing real-time data

The catch:

They’re slower. Each tool call adds latency. And if the agent makes a bad decision about which tool to use, it can go down rabbit holes.

It is observed that ReAct agents get stuck in loops: search → not satisfied → search again → still not satisfied → search again → eventually timeout. You need guardrails.

Pattern #3: Deep Reasoning — The Slow Thinkers

OK, these are wild. Models like OpenAI’s o1, o3, or DeepSeek R1.

They don’t just answer your question. They think. Like, really think. For a long time. Behind the scenes.

What actually happens:

Your question

[Model generates 10,000+ hidden reasoning tokens]
├─ "What if I try approach A?"
├─ "Wait, that won't work because…"
├─ "Let me try approach B"
├─ "Hmm, edge case here…"
├─ "Actually, let me reconsider…"
├─ "OK, comparing all options…"
└─ "Final answer is X because Y"

You see: "The answer is X"

You never see all that internal reasoning. But it’s happening. Sometimes thousands of tokens of “thinking” before you get a single sentence of output.

When this actually matters:

I tested o1 and GPT-4 on a gnarly system design problem designing a fault-tolerant payment processor.

GPT-4 gave me a decent but generic answer. Mentioned the usual stuff: microservices, load balancers, database replication. Fine.

o1 thought for ~45 seconds, then gave me an answer that considered:

  • Specific failure modes I hadn’t thought of
  • Trade-offs between different consensus algorithms
  • Cost implications of the architecture
  • Why certain “obvious” solutions would actually break under load

It was noticeably better. Not just more words but actually deeper reasoning.

Where deep reasoning shines:

  • Complex coding problems (competitive programming level)
  • Math that requires multiple proof steps
  • System design with lots of trade-offs
  • Anything where “thinking harder” actually helps

On coding benchmarks, these models hit 70–90% accuracy on problems where regular models get 10–15%. That’s not a small difference.

The brutal trade-offs:

Speed: You’re waiting 30–60 seconds for responses. Sometimes longer.

Cost: All that hidden reasoning uses tokens. Lots of them. Your bill goes up fast.

Overkill: For “What’s the weather?” you’re wasting 99% of the capability.

My honest take:

Do not use deep reasoning agents for most stuff. They’re too slow and expensive for production systems where you need sub-second response times.

But for:

  • Initial system design
  • Debugging really tricky bugs
  • Code review on complex PRs
  • Anything where I’d normally spend an hour thinking

They’re worth it. It’s like having a really smart colleague who thinks deeply but talks slowly.

Quick Comparison: Execution Patterns at a Glance

Use shallow by default. Upgrade to ReAct when you need tools. Use deep reasoning sparingly for stuff that’s actually hard.

Which One Should You Actually Use?

Here’s what we should do in practice:

95% of the time: Shallow or ReAct

Most problems don’t need deep reasoning. They need speed and the ability to look stuff up.

  • Customer support bot? ReAct with database access.
  • Content generation? Shallow.
  • Research assistant? ReAct with web search.
  • Quick classification? Shallow.

5% of the time: Deep reasoning

When something is genuinely complex:

  • System architecture decisions
  • Complex debugging
  • Algorithm design
  • Code that needs to be bulletproof

In production systems:

I recommend to have a tiered approach:

  1. Shallow agent handles 80% of requests (fast, cheap)
  2. ReAct agent handles stuff needing tools (medium speed/cost)
  3. Deep reasoning for the 1% of cases that are actually complex

Auto-route based on the question complexity. Works pretty well.

The real insight:

Different problems need different types of thinking. Sounds obvious, but most people just throw GPT-4 at everything and wonder why their bill is insane or their users are frustrated with slow responses.

Match the pattern to the job. Shallow processing is underrated. Deep reasoning is overused. ReAct is the sweet spot for most production use cases.

That’s it. No fluff about “the future of AI” or “revolutionary paradigms.” Just three different execution patterns for LLM agents, when to use each one, and why it matters.

If you’re building AI systems, you need to know this stuff. The architectural pattern you choose will determine your costs, your latency, and whether your agent actually works.

Questions? Hit me up in the comments. I probably check them more than I should.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI


Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

15 engineers. 100,000+ students. Towards AI Academy teaches what actually survives production.

Start free — no commitment:

6-Day Agentic AI Engineering Email Guide — one practical lesson per day

Agents Architecture Cheatsheet — 3 years of architecture decisions in 6 pages

Our courses:

AI Engineering Certification — 90+ lessons from project selection to deployed product. The most comprehensive practical LLM course out there.

Agent Engineering Course — Hands on with production agent architectures, memory, routing, and eval frameworks — built from real enterprise engagements.

AI for Work — Understand, evaluate, and apply AI for complex work tasks.

Note: Article content contains the views of the contributing authors and not Towards AI.