RAG from Scratch [Part 2]: Loading — The Step Everyone Skips and Everyone Regrets
Author(s): Sumit Vedpathak Originally published on Towards AI. RAG from Scratch [Part 2]: Loading — The Step Everyone Skips and Everyone Regrets Series 2 of 5 The article argues that most RAG failures begin at the ingestion/loading stage rather than later steps …
Self-Hosting Airflow at Home: Automating Stock Price Data Collection
Author(s): FS Stance Originally published on Towards AI. Self-Hosting Airflow at Home: Automating Stock Price Data Collection One of the main goals of creating my home lab is to gain a deeper understanding of Machine Learning Operations (MLOps) and how to productionalize …
Connections, Roles, and Warehouses: Getting CoCo Desktop Production-Ready from Day One
Author(s): Satish Kumar Originally published on Towards AI. Connections, Roles, and Warehouses: Getting CoCo Desktop Production-Ready from Day One Snowflake COCO Desktop| Part 1 of 8 There’s a moment every data engineer hits when first opening Snowflake’s CoCo Desktop: the welcome screen …
From 90 Minutes to 35: How We Achieved 60% Performance Gains in PySpark with One Simple Change
Author(s): Shree Kavya Originally published on Towards AI. From 90 Minutes to 35: How We Achieved 60% Performance Gains in PySpark with One Simple Change We reduced our PySpark batch processing time from 90 minutes to 35–45 minutes (60% improvement) by replacing …
What Nobody Tells You About Putting LLMs Inside Your Data Pipeline
Author(s): Sunil kumar Reddy Originally published on Towards AI. What Nobody Tells You About Putting LLMs Inside Your Data Pipeline A practitioner’s honest account — written from financial data engineering — of what breaks, what surprises you, and what six months of …
The Modern Data Stack Is Broken — Here’s How to Fix It With AI, Governance, and Real Architecture
Author(s): Sunil kumar Reddy Originally published on Towards AI. The Modern Data Stack Is Broken — Here’s How to Fix It With AI, Governance, and Real Architecture There’s a painful gap between how most data talks are given and how data actually …
The Complete SQL Guide: Concepts, Queries & Practice
Author(s): Muaaz Originally published on Towards AI. The Complete SQL Guide: Concepts, Queries & Practice SQL (Structured Query Language) is a language used to interact with relational databases that store structured data. It is primarily used to perform operations such as querying, …
Part 20: Data Manipulation in Multi-Dimensional Aggregation
Author(s): Raj kumar Originally published on Towards AI. When financial analysts need to segment customer profitability across product lines and regions, or when risk managers aggregate exposure metrics across multiple hierarchies, they rely on advanced grouping techniques that go far beyond basic …
Your Postcode Is Deciding Your Care. I Built a Pipeline to Prove It.
Author(s): Yusuf Ismail Originally published on Towards AI. Picture this. It’s 2 am. You’re on a trolley in a hospital corridor. Not a ward. A corridor. Fluorescent lights, the smell of disinfectant, the sound of a ward that’s full somewhere behind a …
Part 19: Data Manipulation in Statistical Profiling
Author(s): Raj kumar Originally published on Towards AI. Statistical profiling sits at the intersection of data validation and analytical insight. In banking operations, descriptive statistics are not academic exercises. They are diagnostic tools that surface anomalies in payment flows, quantify credit portfolio …
Building ML in the Dark: A Survival Guide for the Solo Practitioner
Author(s): Yuval Mehta Originally published on Towards AI. Photo by Boitumelo on Unsplash No GPU cluster. No data team. No ML platform. Here’s what actually ships. Most ML content is written for teams that have things. A labelled dataset. An MLOps platform. …
Part 16: Data Manipulation in Data Validation and Quality Control
Author(s): Raj kumar Originally published on Towards AI. Data quality issues are the silent killers of production systems. A single malformed record can crash your pipeline. A gradual drift in data distributions can slowly degrade model performance. Missing values that sneak through …
Part 9: Data Manipulation in Data Merging and Joins
Author(s): Raj kumar Originally published on Towards AI. Every analysis that combines data from multiple sources faces the same fundamental question: how should these datasets align? Which records match? What happens when they don’t? These aren’t just technical decisions. They shape what …
Part 6: Data Manipulation in String and Text Processing
Author(s): Raj kumar Originally published on Towards AI. If you’ve ever worked with real-world data, you know the struggle. Names come in all caps when they should be title case. Email addresses have trailing spaces. Phone numbers show up in a dozen …
Why AI in CRM Fails Without a Warehouse-First Architecture
Author(s): Clarencer R. Mercer Originally published on Towards AI. When Model Accuracy Is Not Enough In Part 1 of this series, we explored how a warehouse-first composable CDP restores architectural control to modern CRM systems. In Part 2, we examined the Identity …