Token Waste: The Silent Tax on Every AI Tools

Last Updated on May 27, 2026 by Editorial Team

Author(s): Sudiksha Acharya

Originally published on Towards AI.

Token Waste: The Silent Tax on Every AI Tools

ChatGPT, Claude, Gemini — all three charge per token. All three are silently inflated by how most people write prompts. Here’s the research, the real cost, and a free tool that fixes it.

What every major AI charges per million tokens:

*Sources: Anthropic docs, Google Gemini API pricing page, published pricing guides. Prices as of May 2026.*

The AI pricing model is deceptively simple: you pay per token. What most engineers don’t realize is that a significant portion of the tokens they’re paying for carry zero informational value. They’re filler. Hedging. Politeness. Context the model doesn’t need. Format instructions that were never given, so the model guesses and generates twice as much output as necessary.

This isn’t a theoretical problem. OpenAI’s own CEO confirmed it publicly. A peer-reviewed research paper quantified it. And across every major AI platform — ChatGPT, Claude, Gemini — the same structural waste compounds silently in every production pipeline.

The evidence: all three platforms, real numbers

At the Stripe Sessions 2024 conference, someone asked Sam Altman how much it costs OpenAI when users say “please” and “thank you” to ChatGPT. His answer: “tens of millions of dollars” in compute costs. [1] He called it “money well spent.” But the number reveals something important — even the most benign form of prompt inefficiency, at billions of queries a day, becomes a material cost.

A peer-reviewed arXiv paper went further. [2] The researchers found that polite phrasing doesn’t just add input tokens — it systematically inflates output tokens, because the model mirrors the tone of the input. Output tokens cost more than input tokens on every platform. The study estimated this linguistic effect alone generates up to $11 million per month in extra revenue for OpenAI — coming directly from users’ bills.

“Every time you say ‘please’ to ChatGPT — that’s like a penny or something. When you compound that over billions of users, it’s tens of millions of dollars of compute.”

— Sam Altman, CEO of OpenAI, Stripe Sessions 2024 [1]

Claude users face the same dynamic, with higher stakes. Claude Sonnet output tokens cost $15 per million — five times what GPT-4o charges for input. [3] Anthropic’s own documentation for Claude Code estimates average developer costs at $100–200 per month, with “large variance depending on how many instances users are running.” [3] That variance is almost entirely driven by how prompts are written.

Gemini adds another wrinkle. Gemini 2.5 Pro doubles its pricing above 200,000 tokens — from $1.25 to $2.50 per million input tokens, and $10 to $15 for output. [4] Context bloat, one of the most common prompting mistakes, triggers that cliff automatically. Teams injecting full files instead of relevant excerpts cross it without knowing.

The compounding problem at scale

Politeness is the headline case because it’s relatable. But it’s actually one of the smaller waste drivers. The bigger ones — vague scope, missing format instructions, over-injected files — generate far more token waste per call, and they affect output tokens, not just input.

Consider a team running 10 million API calls per day on Claude Sonnet. Average prompt: 200 tokens. If 40% is unnecessary filler, that’s 80 wasted tokens per call. The math:

200 tokens × 40% waste = 80 wasted tokens per call

10M calls/day × 80 tokens = 800M wasted tokens/day

800M tokens × $3/1M = $2,400/day wasted on input alone

→ $876,000 per year, one team, illustrative example

Well-structured prompts consistently reduce token usage by 40–70% without any change in output quality, according to multiple independent studies. [5,6]

The deeper problem is invisible feedback. Engineers send prompts, get responses, and move on. There is no signal telling them whether their prompt was efficient. Only 51% of organizations can confidently evaluate the ROI of their AI spend, per CloudZero’s 2025 State of AI Costs report. [7] Waste accumulates because it’s never visible.

prompt-coach: the feedback loop that was missing

I built prompt-coach as a Claude skill to close exactly this gap. It silently analyzes every prompt you send, scores it across five dimensions grounded in Anthropic’s prompt engineering framework, and appends a one-line coaching note after every response — without interrupting your answer. No commands. No setup. It runs on every message automatically.

GitHub: prompt-coach — Open source · MIT license

After every response, you see one line like this:

One line. The exact waste. The exact fix. The dimension you violated. Over sessions, the patterns that inflate bills at scale stop being invisible defaults and become conscious choices.

The 5 principles it scores — with real diffs

Each prompt is scored across five dimensions (20 points each): Clarity, Concision, Context, Structure, and Specificity — mapped to Anthropic’s official prompt engineering framework. [8] The same principles apply across ChatGPT, Claude, and Gemini.

01 — Clarity: Start with an imperative verb

02 — Specificity: Scope your output format

03 — Context: Inject only what changes the answer

04 — Structure: Use XML tags for multi-part prompts

05 — Specificity: State done criteria upfront

The Live Dashboard

Type show dashboard at any point and prompt-coach renders a full interactive session breakdown — score trend across every prompt, token used vs optimal, your top recurring issues, and a PE scorecard across all five dimensions.

The dashboard pulls real data from your session. Every number is calculated from your actual prompts — not estimates. It shows you exactly where your tokens are going, which habits are costing the most, and how your score is trending across the conversation.

Install prompt-coach in 60 seconds

prompt-coach is a Claude skill. Install it once in a Project and it coaches every conversation automatically. It’s open source and free.

# Claude.ai (recommended)

1. Projects → New Project

2. Paste SKILL.md contents into Project Instructions

3. Every conversation in that project is coached automatically

# Claude Code

unzip prompt-coach.zip -d ~/.claude/skills/

# Restart Claude Code — loads automatically

Type show dashboard for a full session breakdown: score trend, top issues, and a PE scorecard. Type ? after any coaching line for a full rewrite with token counts explained.

The bottom line

Token waste is not a ChatGPT problem. It’s not a Claude problem. It’s not a Gemini problem. It’s a prompting problem — and it compounds the same way across every platform that charges per token. The fix isn’t switching models. It’s learning to write better prompts, and getting a feedback loop that makes the waste visible every time it happens.

That’s what prompt-coach does. One line after every response. No interruption. No configuration. Free.

References

[1] Sam Altman on polite prompts costing OpenAI “tens of millions of dollars” — Stripe Sessions 2024. Reported by LiveNOW from FOX, The Express Tribune, April 2025.

[2] Cost Transparency of Enterprise AI Adoption — arXiv:2511.11761, November 2024. Peer-reviewed study on polite prompts inflating output tokens; estimates up to $11M/month in additional OpenAI revenue.

[3] Claude Code Costs — Anthropic Documentation. Primary source for Claude Sonnet pricing and Claude Code average developer costs.

[4] Gemini Developer API Pricing — Google. Primary source for Gemini 2.5 Pro pricing and the 200K token billing cliff.

[5] Optimizing Prompts to Reduce Token Usage and Costs — InventiveHQ, January 2026. Documents 40–70% token reduction from optimized prompts.

[6] Semantic Prompt Engineering: Cut AI Token Waste 60–74% — CostLayer, April 2026.

[7] The State of AI Costs 2025 — CloudZero, May 2025. Average monthly enterprise AI spend $62,964 in 2024; only 51% can evaluate ROI

[8] Anthropic Prompt Engineering Overview — Official Anthropic documentation.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

15 engineers. 100,000+ students. Towards AI Academy teaches what actually survives production.

Start free — no commitment:

→ Agents Architecture Cheatsheet — 3 years of architecture decisions in 6 pages

Our courses:

→ AI Engineering Certification — 90+ lessons from project selection to deployed product. The most comprehensive practical LLM course out there.

→ Agent Engineering Course — Hands on with production agent architectures, memory, routing, and eval frameworks — built from real enterprise engagements.

→ AI for Work — Understand, evaluate, and apply AI for complex work tasks.

Note: Article content contains the views of the contributing authors and not Towards AI.

Frequently Used, Contextual References

Resources

Token Waste: The Silent Tax on Every AI Tools

Author(s): Sudiksha Acharya

Token Waste: The Silent Tax on Every AI Tools

ChatGPT, Claude, Gemini — all three charge per token. All three are silently inflated by how most people write prompts. Here’s the research, the real cost, and a free tool that fixes it.

Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

Recent Posts

I Deleted Every Static Claude API Key I Owned. Here’s the Keyless Migration, Provider by Provider.

I Replaced ChatGPT With Local AI for 30 Days. Here’s What Actually Happened.

A Practical Guide to Evaluating a Cloud Migration Partner

AsyncIO in Python: What It Actually Is and Why Your ‘Async’ Code Might Not Be Async

Building Long-Running Claude Managed Agents: Why State Matters More Than Compute

The Building Blocks of LangGraph (Part 0)

Five Ways Claude Code Runs Multi-Step Work. The Two Questions That Pick the Right One.

Choose Wisely: Models Should Follow Your Use Case.

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Token Waste: The Silent Tax on Every AI Tools

Author(s): Sudiksha Acharya

Token Waste: The Silent Tax on Every AI Tools

ChatGPT, Claude, Gemini — all three charge per token. All three are silently inflated by how most people write prompts. Here’s the research, the real cost, and a free tool that fixes it.

Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

Related posts

Recent Posts

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

GDPR CCPA Statement