modelcost

Calculate LLM API call costs from token usage using price catalogs from multiple sources.

Supported pricing sources:

litellm (default)
openrouter
tokencost

Install

python -m pip install modelcost

CLI

The default command calculates cost, so you can omit the cost subcommand.

# Default (cost)
modelcost gpt-4o 1000 500

# Explicit cost (optional)
modelcost cost gpt-4o 1000 500

# All sources in one run
modelcost --source all gpt-4o 1000 500

# JSON output
modelcost --json gpt-4o 1000 500

Cached and reasoning tokens

Modern LLM APIs charge differently for cached input and reasoning output tokens. Pass them as optional flags — they default to 0 so existing usage is unchanged.

# Cached input tokens (served from prompt cache)
modelcost gpt-4o 1000 500 --cached-input-tokens 200

# Cache creation tokens (first-time cache writes)
modelcost gpt-4o 1000 500 --cache-creation-input-tokens 100

# Reasoning tokens (subset of output_tokens, e.g. o1/R1 thinking)
modelcost deepseek/deepseek-r1 2000 5000 --reasoning-tokens 3000

# All together
modelcost gpt-4.1-mini 1000 500 \
  --cached-input-tokens 200 \
  --cache-creation-input-tokens 100 \
  --reasoning-tokens 150

List models

modelcost models
modelcost models --source openrouter
modelcost models --filter gpt
modelcost models --json

Help

modelcost --help
modelcost models --help

Library

from modelcost.calculator import calculate_cost, list_models

# Basic usage (backward compatible)
result = calculate_cost("gpt-4o", 1000, 500)

for source in result.available_sources:
    print(f"{source.source}: ${source.total_cost_usd:.6f}")

# With cached and reasoning tokens
result = calculate_cost(
    "gpt-4.1-mini",
    input_tokens=1000,
    output_tokens=500,
    cached_input_tokens=200,
    cache_creation_input_tokens=100,
    reasoning_tokens=150,
)

s = result.sources[0]
print(f"${s.total_cost_usd:.6f}")
print(f"  cache read:  ${s.price_per_million_cache_read}/M")
print(f"  reasoning:   ${s.price_per_million_reasoning}/M")

models = list_models("openrouter")

Output details

calculate_cost() returns a CostResult with:

model, input_tokens, output_tokens
cached_input_tokens, cache_creation_input_tokens, reasoning_tokens (0 when not used)
sources: list of SourceCost objects
available_sources: only sources with prices found

Each SourceCost includes:

source
total_cost_usd
price_per_million_input, price_per_million_output
price_per_million_cache_read, price_per_million_cache_creation, price_per_million_reasoning (present only when the source has specific pricing for these)
error (when not available)

Cost formula

All subset tokens (cached_input_tokens, cache_creation_input_tokens, reasoning_tokens) are treated as subsets of their parent total and are clamped accordingly:

text_input  = input_tokens  - cached_input - cache_creation
text_output = output_tokens - reasoning_tokens

total = text_input     * input_rate
      + cached_input   * cache_read_rate    (fallback: input_rate)
      + cache_creation * cache_creation_rate (fallback: input_rate)
      + text_output    * output_rate
      + reasoning      * reasoning_rate      (fallback: output_rate)

This matches how most APIs report usage — input_tokens and output_tokens are the totals including cached/reasoning, and the detail fields are subsets. When a specific rate is missing, the base rate for that category is used as fallback.

JSON output

--json / result.to_dict() includes the new fields only when non-zero:

{
  "model": "gpt-4.1-mini",
  "input_tokens": 1000,
  "output_tokens": 500,
  "cached_input_tokens": 200,
  "reasoning_tokens": 150,
  "costs": [
    {
      "source": "litellm",
      "total_cost_usd": 0.001148,
      "price_per_million_input": 0.4,
      "price_per_million_output": 1.6,
      "price_per_million_cache_read": 0.1,
      "price_per_million_cache_creation": 0.48,
      "price_per_million_reasoning": 1.6,
      "error": null
    }
  ]
}

Caching

openrouter responses are cached in ~/.modelcost_cache.json for 1 hour.

Notes

Prices are fetched at runtime from the upstream catalogs.
If a model is missing in a source, that source is marked as unavailable.
Network sources are fetched in parallel for the all option.
tokencost does not expose cache/reasoning pricing — when used with source="all", its cost may be higher than litellm for calls that include cached or reasoning tokens.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
.github/workflows		.github/workflows
modelcost		modelcost
tests		tests
.gitignore		.gitignore
.python-version		.python-version
HACKING.md		HACKING.md
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

modelcost

Install

CLI

Cached and reasoning tokens

List models

Help

Library

Output details

Cost formula

JSON output

Caching

Notes

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

modelcost

Install

CLI

Cached and reasoning tokens

List models

Help

Library

Output details

Cost formula

JSON output

Caching

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages