Token Counting
Eneo tracks token usage for every LLM request to provide accurate insights into AI consumption across the platform.
How It Works
Every request to a language model records two values:
- Input tokens — the prompt, system instructions, conversation history, and any tool definitions sent to the model
- Output tokens — the response generated by the model, including any internal reasoning
These counts are aggregated and displayed in the Insights and Admin dashboards.
Accurate Counting With Major Providers
When using hosted providers such as OpenAI, Anthropic, or Azure, Eneo receives the actual token counts directly from the provider’s API response. This is the most accurate method because:
- The provider uses its own tokenizer, purpose-built for its models
- All overhead is included — system prompt formatting, tool schemas, and internal message framing
- Reasoning tokens (used by models like OpenAI o-series or Claude with extended thinking) are captured
This works for both streaming and non-streaming requests, and token usage is correctly accumulated even when the model makes multiple rounds of tool calls.
Estimation for Self-Hosted Models
Self-hosted models (such as those running on vLLM or Ollama) do not always return token usage data. When accurate counts are unavailable, Eneo falls back to a local estimation using the cl100k_base tokenizer.
This estimation has some known limitations:
- It may over- or undercount tokens since different models use different tokenizers
- Formatting overhead and tool definitions added by the provider are not included in the estimate
- Reasoning tokens that do not appear in the response text are not captured
As a result, token counts for self-hosted models should be treated as approximations rather than exact values.
Data Flow
LLM Provider
│
▼
Eneo Backend (via LiteLLM)
│
├── Provider returns usage ──► Actual token counts recorded
│
└── No usage available ──────► Local estimation used
│
▼
Insights / Admin Dashboard