The Hidden
Cost

Empirical analysis of file upload token consumption in Claude LLM conversation sessions. The cost isn't in what you say — it's in what you upload.

99%+ of session tokens

5 file types tested

8 figures

84% savings with caching

Read the paper ↓ scroll

Abstract

Large language models such as Anthropic's Claude operate on a context window — a fixed-size buffer of tokens that constitutes every piece of information the model processes per inference call. When users upload files to a Claude session, those files are tokenized and injected into this context window, where they persist for the entire conversation. This paper presents an empirical investigation into the token cost of file uploads relative to total session token consumption. Across five file types (PDF, DOCX, XLSX, plain text, and source code), four conversation lengths (5, 10, 20, and 40 turns), and controlled prompt sizes, we measure and report the proportion of tokens attributable to uploaded content versus user-generated messages. Our findings confirm that uploaded files account for approximately 95–99% of cumulative token usage in typical multi-turn sessions. These results carry significant financial implications for both API users and SaaS consumers, and motivate a set of session design recommendations — including prompt caching, selective context injection, and document pre-summarization — to minimize unnecessary token expenditure.

Research Figures

Eight figures. Full empirical record.

Figure 1

Token Counts by File Type and Size

Token counts across Small, Medium, and Large files for all five types. PDFs and DOCX files consistently carry the heaviest token loads — a large PDF reaches 38.5K tokens per turn alone.

Figure 2

The Compounding Effect Across Turns

Cumulative token growth for Small, Medium, and Large PDFs across 44 turns. The gap between file tokens (orange) and message tokens (dashed) widens relentlessly with every turn.

Figure 3

Rₑ Heatmap: File Token Proportion

Proportion of session input tokens attributable to the uploaded file, across all file types and conversation lengths. Near-uniform deep red confirms file dominance exceeds 99% in every scenario tested.

Figure 4

Token Composition: Stacked Area View

Stacked area chart for a medium PDF over 44 turns. File tokens (orange) occupy virtually the entire stack — user messages and system prompts form an invisible sliver at the bottom.

Figure 5

Session Cost Breakdown in USD

Left: dollar cost per component in a 20-turn session. Right: pie chart of input token cost distribution. File tokens account for over 99% of all input spend.

Figure 6

Rₑ by File Size — All Types

File token proportion as a function of file size for all five types at 10 turns. All lines cluster tightly above 99%, confirming that even small files dominate once attached.

Figure 7

Full Cost Scaling Grid

Session input cost (USD) heatmap across all file types, sizes, and turn counts up to 80 turns. Large files in long sessions reveal how costs compound rapidly in production.

Figure 8

Prompt Caching: Cost Savings

Top: cumulative cost with vs without prompt caching. Bottom: percentage savings curve. By turn 20, caching delivers 84.2% cost reduction — the single highest-impact optimization available.

Key Findings

Finding 01

Files dominate, not prompts

Across every file type and conversation length tested, uploaded files account for 95–99% of all cumulative input tokens. Your prompt is statistically irrelevant to the token bill.

Finding 02

The ratio is stable, not growing

R_file converges to a near-constant value regardless of turn count, because both file tokens and message tokens scale linearly. The dominance is structural, not compounding.

Finding 03

PDFs cost the most per turn

A large PDF reaches 38,500 tokens per turn. A Python source file of equivalent "size" costs only 14,200. File format matters: PDFs carry markup overhead that inflates token counts significantly.

Finding 04

Prompt caching saves 84% at T=20

Using Anthropic's prompt caching API, the file prefix is billed at the cache-read rate (~10× cheaper) from turn 2 onward. At a 20-turn session, this translates to 84.2% cost reduction on file tokens alone.

Finding 05

The 70% claim is conservative

The commonly cited "70% from uploads" figure underestimates reality. It only holds for very short files paired with unusually verbose user messages. In standard developer usage, 99% is closer to the truth.

Finding 06

Cost modeling validates the concern

At Claude Sonnet 4 pricing ($3/M input tokens), a single 20-turn session with a medium PDF costs ~$0.93 — 95.9% of which is attributable to the file. Multiply by thousands of users and the numbers become serious.

The HiddenCost

Eight figures. Full empirical record.

Key Findings

The Hidden
Cost