AI Usage
AI service cost tracking, platform markup, and optimization tips.
When to use this
Refer to this page when you want to understand how much your AI operations cost, how the final price is calculated, or when you need to reduce costs without losing functionality.
Real-world example: Your WhatsApp agent is costing more than expected. You check this page and discover the agent has max_context_messages set to 30, sending the entire conversation history in every request. By reducing it to 10, you lower input token consumption by 60% without significantly affecting response quality.
What is tracked
The platform records the usage of each AI service, enabling detailed cost control per tenant.
Gemini LLM
Input and output tokens are recorded for each interaction with the language models:
| Service | Base price | Unit |
|---|---|---|
| gemini-3.1-pro-preview (input) | $2.00 | per million tokens |
| gemini-3.1-pro-preview (output) | $12.00 | per million tokens |
| gemini-3-flash-preview (input) | $0.30 | per million tokens |
| gemini-3-flash-preview (output) | $2.50 | per million tokens |
Postmark Email
| Service | Base price | Unit |
|---|---|---|
| Postmark Email | $0.0013 | per email sent |
Brave Search
| Service | Base price | Unit |
|---|---|---|
| Brave Search | $0.005 | per search |
For more details on web search, see Web Search.
Platform markup
The final cost charged to the tenant is calculated by applying a markup on the provider's actual cost:
Final cost = Actual cost x Markup- Default markup: 2.5x
- The markup is configurable per tenant
- The actual cost is stored separately as an internal reference
Example
If an interaction with Gemini consumes 1,000 input tokens with gemini-3-flash-preview:
- Actual cost: 1,000 / 1,000,000 x $0.30 = $0.0003
- Final cost: $0.0003 x 2.5 = $0.00075
Monthly summary
The usage summary is available at Dashboard > Administration > Usage. It shows:
- Monthly total: aggregated cost across all services
- Breakdown by service: usage separated by channel and operation type
- Extractions
- Web search
- Other services
Optimization tips
Reduce tokens per interaction
Reducing the max_context_messages parameter in the agent configuration limits the number of previous messages included in each LLM request. Less context means fewer input tokens.
Use exact search instead of semantic
When querying collections does not require natural language understanding, using exact (field-based) search instead of semantic search avoids the cost of generating embeddings.
Limit web search domains
Configuring allowed_domains in the web search tool reduces unnecessary searches and keeps results focused on relevant sources.
Choose the right model
Not all tasks need the most powerful model. To reduce costs:
- Use gemini-3-flash-preview for simple queries, short answers, and classification tasks. It is 6x cheaper on input and 5x cheaper on output than the pro model.
- Reserve gemini-3.1-pro-preview for complex tasks that require advanced reasoning, long document analysis, or detailed report generation.
Limit context messages
A max_context_messages value between 10 and 15 is generally sufficient for most agents. Higher values significantly increase input tokens without proportionally improving response quality.
Write concise prompts
Long prompts with redundant instructions generate unnecessary costs on every interaction. Review your system prompts periodically to eliminate repetitions and keep them direct.
See also
- Billing & Subscriptions — Plans, usage limits, and overages
- Web Search — Cost per search and domain configuration
- The Dashboard — Monthly cost summary in the main panel
- Create a WhatsApp Agent — Where to configure max_context_messages
- Create an Email Agent — Email agent configuration and its consumption
Web Search
Web search allows the agent to query up-to-date information from the internet during a conversation — market prices, regulations, manufacturer technical specs, or any data not stored in company collections.
Privacy and Personal Data
How Rela AI protects your organization's data, portability and deletion rights under GDPR and local laws, and complete isolation between organizations.