How to fix: prompt caching not working / cache_read_input_tokens is zero
Cause
Something in the cached prefix changes every request — a timestamp, UUID, unsorted JSON, or a varying tool set — invalidating the cache.
The fix
- 1Diff the rendered prompt bytes between two requests to find what differs in the prefix.
- 2Move volatile content (timestamps, per-request IDs, the varying question) to the end, after the last cache breakpoint.
- 3Serialize JSON deterministically (sorted keys) and keep the tool list stable and ordered.
- 4Ensure the cached prefix exceeds the model’s minimum cacheable length — short prefixes silently won’t cache.
- 5Verify with `cache_read_input_tokens` in the usage response once fixed.
Prevent it
Freeze the system prompt and tool list, inject dynamic context later in the messages, and audit for silent cache invalidators.
Frequently asked questions
What causes “prompt caching not working / cache_read_input_tokens is zero”?
Something in the cached prefix changes every request — a timestamp, UUID, unsorted JSON, or a varying tool set — invalidating the cache.
How do I prevent “prompt caching not working / cache_read_input_tokens is zero” from recurring?
Freeze the system prompt and tool list, inject dynamic context later in the messages, and audit for silent cache invalidators.