Part 13 — Design the Recommender System
Author(s): Utkarsh Mittal Originally published on Towards AI. Part 13 — Design the Recommender System Part 12 — https://medium.com/p/75cf0a345156 The article explains how to design a production recommender system using a real end-to-end scenario and concrete latency, data, and training considerations. It …
Part 12 -The 80GB Wall: GPU Infrastructure and Scheduling, Worked End to End
Author(s): Utkarsh Mittal Originally published on Towards AI. Our running example, fixed for the whole article Part 11— https://pub.towardsai.com/ml-systems-design-series-retrieval-augmented-generation-rag-why-your-llm-doesnt-know-about-00e885bdbea9?source=friends_link&sk=55c086d99d3f6b7dfadd3d7c5226b4e0 The article walks through how GPU infrastructure, scheduling, and memory constraints determine the design of large-model training and inference systems. Starting from a …
Machine Learning System Design -The Model Serving Triangle, With One Forward Pass Flowing Through Every Trade-off (Part3)
Author(s): Utkarsh Mittal Originally published on Towards AI. The Model Serving Triangle, With One Forward Pass Flowing Through Every Trade-off (Part3) Part 1-p https://pub.towardsai.com/the-ml-system-design-interview-with-numbers-flowing-through-every-stage-part-1-a77888339297?source=friends_link&sk=9064640f37c84a131ef24b1126bc0cf9 Three pieces of memory math that every candidate must have memorizedThis article discusses the complexities and trade-offs of …
The L1 Loss Gradient, Explained From Scratch
Author(s): Utkarsh Mittal Originally published on Towards AI. A complete, step-by-step walkthrough of how gradient descent works with absolute-value loss — with diagrams you can actually follow. If you’ve ever read a deep learning tutorial and hit a derivative that seems to …
Agentic RAG & Semantic Caching: Building Smarter Enterprise Knowledge Systems
Author(s): Utkarsh Mittal Originally published on Towards AI. Section 1: The Rise (and Limitations) of RAG Enterprise data is messy. It lives in Slack threads, Google Drive folders, SharePoint libraries, spreadsheets buried three levels deep in someone’s OneDrive, and meeting transcripts that …
Inside the Forward Pass: Pre-Fill, Decode, and the GPU Economics of Serving Large Language Models
Author(s): Utkarsh Mittal Originally published on Towards AI. Why Inference Is the Endgame Pre-training a frontier large language model typically consumes somewhere between 15 trillion and 30 trillion tokens. That sounds like an enormous number — until you do the arithmetic on …
Understanding XGBoost: A Deep Dive into the Algorithm
Author(s): Utkarsh Mittal Originally published on Towards AI. Introduction XGBoost (Extreme Gradient Boosting) has become the go-to algorithm for winning machine learning competitions and solving real-world prediction problems. But what makes it so powerful? In this comprehensive tutorial, we’ll unpack the mathematical …
Understanding Gradient Boosted Trees: The Foundation of XGBoost
Author(s): Utkarsh Mittal Originally published on Towards AI. Understanding Gradient Boosted Trees: The Foundation of XGBoost Gradient Boosted Trees have revolutionized machine learning, powering some of the most successful algorithms in data science. Before diving into the complexities of XGBoost, it’s essential …
RoPE (Rotary Position Embeddings): A Detailed Example
Author(s): Utkarsh Mittal Originally published on Towards AI. In transformer models, knowing the order of tokens is essential — even though the model processes tokens in parallel. Traditional positional embeddings rely on a fixed “lookup table” (learned for positions up to a …