Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: pub@towardsai.net
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab VeloxTrend Ultrarix Capital Partners Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Free: 6-day Agentic AI Engineering Email Guide.
Learnings from Towards AI's hands-on work with real clients.
One Question, a Number and a Paragraph
Latest   Machine Learning

One Question, a Number and a Paragraph

Last Updated on June 18, 2026 by Editorial Team

Author(s): venkatesh babu sekar

Originally published on Towards AI.

One Question, a Number and a Paragraph

One question often needs a number AND a paragraph. Here is the design for answering both at once, and an honest map of what ships today versus what is still roadmap.

Part 4 of 4 on building a conversational analytics engine. ~10 min read.

Every system I build eventually hits the same wall, and it is always the same shape.

The structured side gets good. Users ask “how many orders did Bike World place last quarter” and get a correct, access-controlled number (that was Parts 2 and 3). Then someone asks:

“What did the Q3 board deck say about churn, and how does that compare to the actual numbers?”

That question has two halves. “What did the deck say” is a paragraph from a PDF. “The actual numbers” is a SQL query. A system that does only one half answers half the question, which in practice means it answers the wrong question.

The interesting work is not retrieving documents, and it is not generating SQL. Both are reasonably well-trodden. The interesting work is tying them together so one answer draws on both, with one set of permissions and one trail of provenance.

I am going to be candid about the line between built and designed, because I chose honesty over a clean story for this series.

What ships vs. what does not, up front:

  • The structured pipeline (Parts 2 and 3): ships.
  • The document retrieval stack in this article: built, and works as retrieval.
  • The fusion of the two (one question fanning out to both, joined on a shared entity): designed and partly scaffolded. I will mark exactly where the code stops.

TL;DR

  • Three knowledge graphs: Domain (schema), Subject (entities, the shared hub), Lexical (documents). Identity is a bridge, not a fourth graph.
  • Built today: hybrid retrieval (vector + keyword, fused with Reciprocal Rank Fusion) and fail-closed document RBAC.
  • The design: a question is classified structured / rag / hybrid / clarify; a hybrid question runs both pipelines and joins them on the subject graph.
  • The honest gap: the cross-graph links that make fusion work exist as scaffolding with test-only callers. Wiring them is real work, not research.

Three graphs, and a bridge that is not a graph

The fusion design rests on the substrate, so a quick recap. (Part 2 noted that every node carries a layer label precisely so these can coexist in one store.)

  • Domain graph: the schema and its meaning. Tables, columns, join paths, metrics. The map of the structured world.
  • Subject graph: the canonical entities. “Bike World” resolved to a real key, with source links. The thing from Part 3 that turns a name into a key.
  • Lexical graph: the documents. Each one broken into chunks, with the structure of which chunk came from where.

And then identity, which I deliberately do not call a fourth graph. Identity is the bridge: it produces the access scope that gates every read across all three graphs. Calling it a graph would imply another silo to query. It is better understood as the function that wraps every query.

One Question, a Number and a Paragraph
Two pipeline fusion

How to read it (the one idea that matters): look at the middle. The subject graph is the shared hub. The structured pipeline reads Domain plus Subject. The RAG pipeline reads Lexical plus Subject. Subject is the overlap, so when both halves come back, a given customer is the same entity object in both. That shared identity is what lets me join a database row to a document passage.

The join key for structured-plus-unstructured fusion is a resolved entity. Hold onto that. It is the whole trick, and it is where the honest gap lives.

What is actually built: the retrieval stack

Let me ground this in running code before the vision.

A document goes through a pipeline: load, chunk, embed, and index two ways.

  • Chunking: character-based, sentence-aware, around 500 characters with a small overlap so a sentence split across a boundary is not lost.
  • Embedding: the same model as the rest of the system, OpenAI text-embedding-3-small, into 1536-dimension vectors, cosine distance. Vectors go to Qdrant.
  • The second index: the chunk text also goes into a BM25 keyword index. This is the part people forget.

Why hybrid and not pure vector search?

  • Vector search is great at meaning, weak on specifics. It knows “churn” and “attrition” are neighbors (good) but shrugs at an exact product code (bad).
  • BM25 is the opposite: literal, exact, unbothered by synonyms.
  • Running both and fusing covers each one’s blind spot.
Rag retrieval stack

How to read it:

  • Reciprocal Rank Fusion is the merge, and it is almost embarrassingly simple: for each result, add weight / (k + rank), with k set to 60. You never have to make a cosine score and a BM25 score comparable (they are not). You just trust the orderings.
  • A chunk that ranks high in both lists rises to the top.
  • The optional reranker uses an LLM as a judge, scoring each candidate. It is off by default, because hybrid retrieval is usually good enough and the judge costs a model call per chunk. Pay for it only when you decide you need it.

Shipped vs. next: the stack above runs, but document ingestion is not wired into startup. A fresh process has an empty index until something explicitly ingests a corpus. So “the RAG pipeline works” is true for the machinery and not-yet-true for a loaded corpus. I would rather you know that than assume documents that are not there.

Security for documents is its own problem

Access control for structured data (Part 3) was about rows and columns. For documents it is about which files you may see. A leaked paragraph from a board deck is every bit as bad as a leaked salary row.

The model reuses the same access-scope object from the structured side. That reuse is the point: one notion of “who may see what,” applied to two very different shapes of data.

Write on Medium

After retrieval, before any chunk is returned:

  • Look up the document the chunk came from.
  • Find its file path, derive the folder.
  • Ask the scope whether this user may read that folder. Yes, it passes. No, it is dropped.

The failure behavior is the part I care about. If the system cannot find the document’s metadata for a chunk (a cold start, an unloaded doc), it does not serve the chunk and hope. It drops it. The default on uncertainty is deny, not allow. Same fail-closed instinct as the SQL side, and, as there, no language model anywhere in the decision.

The fusion design, and the honest gap in the middle

Now the part that is mostly design. A question gets classified two ways: which capability it needs, and then the data path: structured, rag, hybrid, or clarify.

A hybrid question (the churn example) runs both pipelines:

  • The structured leg plans and generates SQL exactly as in Part 3, reading Domain and Subject, and produces the actual churn numbers, scoped to what the user may see.
  • The unstructured leg retrieves the relevant passages, reading Lexical and Subject, scoped by the same access decision.
  • Then the two fuse on the subject graph: because both legs resolved their entities through the same canonical hub, the customer in the SQL row and the customer in the deck passage are the same entity, and that shared identity is the join.

The result is one envelope: the rows and the SQL, the document passages with citations, the caveats from both pipelines, a confidence, and a trace id.

I find the symmetry satisfying. The structured tool reads Domain plus Subject. The RAG tool reads Lexical plus Subject. Subject is the shared cache in the middle, and the hybrid join happens there. Two pipelines that look completely different turn out to be mirror images that meet at the entity hub.

Now the gap, stated plainly, because it is the center of this article:

  • The two cross-graph links that fusion depends on (entity-to-table, entity-to-chunk) are defined but not alive. The schema exists. The methods exist. But today they have only test callers, and nothing in the running system creates the document-chunk nodes they would point at.
  • The lexical graph as currently coded is an in-memory structure that nothing populates in production, separate from the Qdrant index the live retrieval actually uses.

So there are effectively two unstructured stacks right now: the working retrieval path (Qdrant plus BM25), and a graph-shaped representation of documents that is scaffolding, not yet load-bearing.

Built vs designed ledger

How to read it: green is real and running. Yellow is designed and scoped. “The subject graph is the shared cache across structured and RAG” is a true description of the design and a false description of the current wiring, and the difference between those two sentences is the work: create the chunk nodes, wire the link methods, stand up the fusion. None of it is research. A reader deciding whether I understand my own system is better served by this ledger than by a demo video.

Static documents and volatile documents

One more design distinction, because it changes the architecture more than it looks.

  • Static documents change rarely: last quarter’s board deck, a signed contract. Index once. That is exactly the Qdrant-plus-BM25 path that runs today.
  • Volatile documents change constantly: a wiki, a ticketing system. Re-embedding the world on every change is the wrong model. The better approach is to not copy them at all and query the source in place at question time, the same “data does not move” principle as the structured side.

The volatile path is the cleaner design, and it is not built. It is an open decision, deliberately left as a seam rather than filled with code I would have to rip out later. I would rather name the seam than pretend the static path covers a case it does not.

One access decision for two pipelines

The thread I most want to tie off is access control, because it is where the fusion design is most opinionated.

In a system with two retrieval pipelines, the dangerous failure mode is two access-control implementations that drift apart. So the design is a single decision both pipelines honor in their own idiom:

  • One component computes and signs the effective scope (the intersection of what the user’s identity grants and what this query is allowed to reach).
  • Every pipeline verifies the signature and applies the scope locally, without phoning home per check.
  • The structured pipeline applies it by compiling row filters and column masks into the SQL syntax tree (Part 3). The RAG pipeline applies it by filtering folders and chunks. Same decision, two enforcement styles.
  • It fails closed: an unreachable policy component means serve only your own ungated data and deny anything shared; an invalid scope is a denial, never a bypass.

Shipped vs. next: what runs today is the per-pipeline enforcement (the SQL AST rewriting from Part 3, the folder filter above), each correct on its own. The single signed scope that formally unifies them is roadmap. But I built both enforcers against the same scope object on purpose, so unifying them later is a consolidation, not a rewrite. The shape was chosen in advance to make the future cheap.

Walking the blended question end to end

Here is the churn question as the target design answers it, with built and designed labeled as I go.

  1. The question arrives with the user’s identity. The bridge mints a signed scope. (designed)
  2. The classifier returns hybrid. Both legs start. (designed)
  3. Each resolves its entities through the same subject graph and the cascade from Part 3, so both legs talk about the same canonical entities. (cascade built; resolving the RAG leg through it, designed)
  4. The structured leg reads schema from the in-memory cache, builds SQL with graph joins, compiles the scope into the WHERE, and runs in place for the actual churn numbers. (built)
  5. The unstructured leg retrieves the deck passages by hybrid search and filters by the scope. (retrieval built; scope-as-signed-pass, designed)
  6. The two fuse on the shared entities. One answer: the numbers, the passages, citations to both, caveats, confidence, a trace id. (fusion designed)

One question, two pipelines, three graphs, one access decision, one answer.

Key takeaways

  • Make the entity hub shared before you make the pipelines smart. Fusion is tractable only because both sides resolve through the same canonical entities. If your SQL side and your document side invent their own notion of “customer,” you can never join them, and you find out late.
  • Use hybrid retrieval, and fuse by rank, not by score. Vector and keyword search fail in opposite directions; RRF combines them without the false precision of normalizing incomparable scores.
  • Reuse one access-scope object across every retrieval shape, even if you enforce it differently in each. The unification can come later; the shared object cannot.
  • Be candid about the seams. Honest scaffolding is more credible than a polished illusion. The edges are where someone can tell whether you actually built the thing.

That is the series. A map (Part 2) that lets a model reason about a database it has never seen. A conversation (Part 3) that turns a sentence into a safe answer. And a reach (this part) past the database into documents, joined at the entity hub, governed by one access decision.

The hard problem from Part 1 was never that the model was not smart enough. It was that the parts that matter most (joins, cardinality, security, fusion) are not language problems, and you do not solve them by making the model smarter. You solve them with structure. Thanks for reading all four.

Back to the start: Part 1, Text-to-SQL Looks Solved. It Isn’t. Part 2, Your LLM Needs a Map. Part 3, Never Let the LLM Write the Joins.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI


Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

15 engineers. 100,000+ students. Towards AI Academy teaches what actually survives production.

Start free — no commitment:

6-Day Agentic AI Engineering Email Guide — one practical lesson per day

Agents Architecture Cheatsheet — 3 years of architecture decisions in 6 pages

Our courses:

AI Engineering Certification — 90+ lessons from project selection to deployed product. The most comprehensive practical LLM course out there.

Agent Engineering Course — Hands on with production agent architectures, memory, routing, and eval frameworks — built from real enterprise engagements.

AI for Work — Understand, evaluate, and apply AI for complex work tasks.

Note: Article content contains the views of the contributing authors and not Towards AI.