Rela AIRela AI Docs
Features

AI Usage

AI service cost tracking, platform markup, and optimization tips.

When to use this

Refer to this page when you want to understand how much your AI operations cost, how the final price is calculated, or when you need to reduce costs without losing functionality.

Real-world example: Your WhatsApp agent is costing more than expected. You check this page and discover the agent has max_context_messages set to 30, sending the entire conversation history in every request. By reducing it to 10, you lower input token consumption by 60% without significantly affecting response quality.


What is tracked

The platform records the usage of each AI service, enabling detailed cost control per tenant.

Gemini LLM

Input and output tokens are recorded for each interaction with the language models:

ServiceBase priceUnit
gemini-3.1-pro-preview (input)$2.00per million tokens
gemini-3.1-pro-preview (output)$12.00per million tokens
gemini-3-flash-preview (input)$0.30per million tokens
gemini-3-flash-preview (output)$2.50per million tokens

Postmark Email

ServiceBase priceUnit
Postmark Email$0.0013per email sent
ServiceBase priceUnit
Brave Search$0.005per search

For more details on web search, see Web Search.

Platform markup

The final cost charged to the tenant is calculated by applying a markup on the provider's actual cost:

Final cost = Actual cost x Markup
  • Default markup: 2.5x
  • The markup is configurable per tenant
  • The actual cost is stored separately as an internal reference

Example

If an interaction with Gemini consumes 1,000 input tokens with gemini-3-flash-preview:

  • Actual cost: 1,000 / 1,000,000 x $0.30 = $0.0003
  • Final cost: $0.0003 x 2.5 = $0.00075

Monthly summary

The usage summary is available at Dashboard > Administration > Usage. It shows:

  • Monthly total: aggregated cost across all services
  • Breakdown by service: usage separated by channel and operation type
    • WhatsApp
    • Email
    • Extractions
    • Web search
    • Other services

Optimization tips

Reduce tokens per interaction

Reducing the max_context_messages parameter in the agent configuration limits the number of previous messages included in each LLM request. Less context means fewer input tokens.

Use exact search instead of semantic

When querying collections does not require natural language understanding, using exact (field-based) search instead of semantic search avoids the cost of generating embeddings.

Limit web search domains

Configuring allowed_domains in the web search tool reduces unnecessary searches and keeps results focused on relevant sources.

Choose the right model

Not all tasks need the most powerful model. To reduce costs:

  • Use gemini-3-flash-preview for simple queries, short answers, and classification tasks. It is 6x cheaper on input and 5x cheaper on output than the pro model.
  • Reserve gemini-3.1-pro-preview for complex tasks that require advanced reasoning, long document analysis, or detailed report generation.

Limit context messages

A max_context_messages value between 10 and 15 is generally sufficient for most agents. Higher values significantly increase input tokens without proportionally improving response quality.

Write concise prompts

Long prompts with redundant instructions generate unnecessary costs on every interaction. Review your system prompts periodically to eliminate repetitions and keep them direct.


See also

On this page