99%

Empirical Research · June 2026

The Hidden
Cost

Empirical analysis of file upload token consumption in Claude LLM conversation sessions. The cost isn't in what you say — it's in what you upload.

99%+ of session tokens
5 file types tested
8 figures
84% savings with caching
Read the paper ↓ scroll
"Most people optimizing their prompts while the real cost is quietly sitting in the attachments tab."
Medium PDF tokens/turn 14,800
Avg. message tokens/turn 25
Rfile at 20 turns 98.8%
Cost savings w/ caching @ T=20 84.2%

Large language models such as Anthropic's Claude operate on a context window — a fixed-size buffer of tokens that constitutes every piece of information the model processes per inference call. When users upload files to a Claude session, those files are tokenized and injected into this context window, where they persist for the entire conversation. This paper presents an empirical investigation into the token cost of file uploads relative to total session token consumption. Across five file types (PDF, DOCX, XLSX, plain text, and source code), four conversation lengths (5, 10, 20, and 40 turns), and controlled prompt sizes, we measure and report the proportion of tokens attributable to uploaded content versus user-generated messages. Our findings confirm that uploaded files account for approximately 95–99% of cumulative token usage in typical multi-turn sessions. These results carry significant financial implications for both API users and SaaS consumers, and motivate a set of session design recommendations — including prompt caching, selective context injection, and document pre-summarization — to minimize unnecessary token expenditure.

Rfile = (Tfile × Nturns) / Ttotal

Tfile = token count of the uploaded file (constant per turn)
Nturns = number of conversation turns
Ttotal = cumulative input tokens for the session

Eight figures. Full empirical record.

Token Counts by File Type and Size
Figure 1
Token Counts by File Type and Size
Token counts across Small, Medium, and Large files for all five types. PDFs and DOCX files consistently carry the heaviest token loads — a large PDF reaches 38.5K tokens per turn alone.
The Compounding Effect Across Turns
Figure 2
The Compounding Effect Across Turns
Cumulative token growth for Small, Medium, and Large PDFs across 44 turns. The gap between file tokens (orange) and message tokens (dashed) widens relentlessly with every turn.
Rₑ Heatmap: File Token Proportion
Figure 3
Rₑ Heatmap: File Token Proportion
Proportion of session input tokens attributable to the uploaded file, across all file types and conversation lengths. Near-uniform deep red confirms file dominance exceeds 99% in every scenario tested.
Token Composition: Stacked Area View
Figure 4
Token Composition: Stacked Area View
Stacked area chart for a medium PDF over 44 turns. File tokens (orange) occupy virtually the entire stack — user messages and system prompts form an invisible sliver at the bottom.
Session Cost Breakdown in USD
Figure 5
Session Cost Breakdown in USD
Left: dollar cost per component in a 20-turn session. Right: pie chart of input token cost distribution. File tokens account for over 99% of all input spend.
Rₑ by File Size — All Types
Figure 6
Rₑ by File Size — All Types
File token proportion as a function of file size for all five types at 10 turns. All lines cluster tightly above 99%, confirming that even small files dominate once attached.
Full Cost Scaling Grid
Figure 7
Full Cost Scaling Grid
Session input cost (USD) heatmap across all file types, sizes, and turn counts up to 80 turns. Large files in long sessions reveal how costs compound rapidly in production.
Prompt Caching: Cost Savings
Figure 8
Prompt Caching: Cost Savings
Top: cumulative cost with vs without prompt caching. Bottom: percentage savings curve. By turn 20, caching delivers 84.2% cost reduction — the single highest-impact optimization available.

Key Findings

Finding 01
Files dominate, not prompts
Across every file type and conversation length tested, uploaded files account for 95–99% of all cumulative input tokens. Your prompt is statistically irrelevant to the token bill.
Finding 02
The ratio is stable, not growing
Rfile converges to a near-constant value regardless of turn count, because both file tokens and message tokens scale linearly. The dominance is structural, not compounding.
Finding 03
PDFs cost the most per turn
A large PDF reaches 38,500 tokens per turn. A Python source file of equivalent "size" costs only 14,200. File format matters: PDFs carry markup overhead that inflates token counts significantly.
Finding 04
Prompt caching saves 84% at T=20
Using Anthropic's prompt caching API, the file prefix is billed at the cache-read rate (~10× cheaper) from turn 2 onward. At a 20-turn session, this translates to 84.2% cost reduction on file tokens alone.
Finding 05
The 70% claim is conservative
The commonly cited "70% from uploads" figure underestimates reality. It only holds for very short files paired with unusually verbose user messages. In standard developer usage, 99% is closer to the truth.
Finding 06
Cost modeling validates the concern
At Claude Sonnet 4 pricing ($3/M input tokens), a single 20-turn session with a medium PDF costs ~$0.93 — 95.9% of which is attributable to the file. Multiply by thousands of users and the numbers become serious.
K
Kidus Yohannes
AI/ML Engineer and researcher based in Addis Ababa, Ethiopia. Focused on practical LLM deployment, cost optimization, and agentic systems. This paper grew out of direct observation while building Claude-powered products at scale.
AI/ML Engineering LLM Research Addis Ababa