AI Usage

What is it for?

Know who consumes tokens and when.
Detect runaway prompts before they spike the bill.
Report cost per tenant, agent, or module.

How it works

Every LLM call logs tokens, model, latency, and attribution metadata. The dashboard aggregates by day/week/month with per-agent and per-tenant breakdown.

When to use this

Refer to this page when you want to understand how much your AI operations cost, how the final price is calculated, or when you need to reduce costs without losing functionality.

Real-world example: Your WhatsApp agent is costing more than expected. You check this page and discover the agent has max_context_messages set to 30, sending the entire conversation history in every request. By reducing it to 10, you lower input token consumption by 60% without significantly affecting response quality.

What is tracked

The platform records the usage of each AI service, enabling detailed cost control per tenant.

Gemini LLM

Input and output tokens are recorded for each interaction with the language models:

Service	Base price	Unit
gemini-3.1-pro-preview (input)	$2.00	per million tokens
gemini-3.1-pro-preview (output)	$12.00	per million tokens
gemini-3-flash-preview (input)	$0.30	per million tokens
gemini-3-flash-preview (output)	$2.50	per million tokens

Postmark Email

Service	Base price	Unit
Postmark Email	$0.0013	per email sent

Brave Search

Service	Base price	Unit
Brave Search	$0.005	per search

For more details on web search, see Web Search.

Platform markup

The final cost charged to the tenant is calculated by applying a markup on the provider's actual cost:

Final cost = Actual cost x Markup

Default markup: 2.5x
The markup is configurable per tenant
The actual cost is stored separately as an internal reference

Example

If an interaction with Gemini consumes 1,000 input tokens with gemini-3-flash-preview:

Actual cost: 1,000 / 1,000,000 x $0.30 = $0.0003
Final cost: $0.0003 x 2.5 = $0.00075

Monthly summary

The usage summary is available at Dashboard > Administration > Usage. It shows:

Monthly total: aggregated cost across all services
Breakdown by service: usage separated by channel and operation type
- WhatsApp
- Email
- Extractions
- Web search
- Other services

Optimization tips

Reduce tokens per interaction

Reducing the max_context_messages parameter in the agent configuration limits the number of previous messages included in each LLM request. Less context means fewer input tokens.

Use exact search instead of semantic

When querying collections does not require natural language understanding, using exact (field-based) search instead of semantic search avoids the cost of generating embeddings.

Limit web search domains

Configuring allowed_domains in the web search tool reduces unnecessary searches and keeps results focused on relevant sources.

Choose the right model

Not all tasks need the most powerful model. To reduce costs:

Use gemini-3-flash-preview for simple queries, short answers, and classification tasks. It is 6x cheaper on input and 5x cheaper on output than the pro model.
Reserve gemini-3.1-pro-preview for complex tasks that require advanced reasoning, long document analysis, or detailed report generation.

Real-time attribution per agent and tenant.
Early warning on anomalous token spikes.
Export ready for accounting and chargeback.
Latency and error metrics alongside cost.
Zero setup — auto-instrumented on every LLM call.