# Bring Your Own LLM Guide

This page provides recommendations for configuring your LLM and outlines the cost calculation methodology for the AskAI feature. For general information on AskAI, please refer to [the dedicated documentation](https://docs.vectice.com/introduction/readme/askai).

## Bring your own LLM

Vectice supports integration with any LLM of your choice as long as they meet minimum performance requirements (see requirements below). Whether you use commercial APIs or host models privately, Vectice can connect seamlessly. Our team is available to assist with advanced customization needs.

**Note:** Vectice does not require a **multimodal** LLM. All current features operate using **text-only** models.

### Benefits of Using Your Own LLM

Bringing your own LLM allows to:

* **Preserve data confidentiality** by processing prompts in a secure, private environment
* **Control costs** by selecting the model and infrastructure that fit your usage profile
* **Meet compliance and IT policies** with regional hosting or on-premise setups
* **Fine-tune behavior** using domain-specific data and enterprise instructions
* **Ensure long-term flexibility** across proprietary and open-source model providers

### Best Practices for Your LLM Deployment

To ensure a **smooth experience** and prevent service interruptions in Vectice, especially under concurrent usage, we recommend:

**Token Throughput**

* Support a **throughput of 450,000 tokens per minute**
* This baseline ensures stable usage for \~5 to 10 active users
* Applies to both input and output tokens

**Recommended Model Capabilities**

* At a minimum, your LLM should perform on par with **GPT-4o-mini** for work summarization and code analysis.
* For more demanding use cases like long form document generation and automated compliance checking, it should match or exceed **GPT-4o** in reasoning and content generation.

### Azure OpenAI Configuration for GPT-4o (Recommended)

If using **GPT-4o or GPT-4o-mini** on **Azure**, configure content filtering to avoid macro execution errors:

1. Go to **Safety + Security → Content Filters**
2. Create a dedicated filter for Vectice usage
3. **Disable the following input filters**:
   * Prompt shields for jailbreak attacks
   * Prompt shields for indirect attacks
4. Associate the filter with your GPT-4o or 4o-mini deployment

### Cost Calculation per User

Standard usage for a user actively documenting model development or validation is about 1.3 Millions tokens per month. For reference, this is the current pricing for LLMs (April 2025).

<table><thead><tr><th width="183.8148193359375"></th><th width="80.59259033203125">GPT 4o</th><th width="121.333251953125">GPT 4o-mini</th><th width="122.0740966796875">Claude 3.7 Sonnet</th><th width="119.851806640625">LLaMA 70B</th><th>Mixtral 8x22B+</th><th></th></tr></thead><tbody><tr><td>Cost per Million token</td><td>$ 5</td><td>$ 0.6</td><td>$ 3</td><td>$ 0.72</td><td>$ 2</td><td></td></tr><tr><td>Monthly Estimated Cost per User</td><td>&#x3C; $ 7</td><td>&#x3C; $ 1</td><td>&#x3C; $ 4</td><td>&#x3C; $ 1</td><td>&#x3C; $ 3</td><td></td></tr></tbody></table>

For the latest pricing, refer to [OpenAI’s API pricing page](https://openai.com/api/pricing/) and [Amazon's Bedrock pricing page](https://aws.amazon.com/bedrock/pricing/).

### Common LLMs Used with Vectice

Enterprises typically integrate the following models based on their deployment strategy and governance needs. Underlying LLMs can be switched at any time without disruption to Vectice users:

<table><thead><tr><th>Model</th><th width="175.48150634765625">Provider</th><th width="163.2962646484375">Parameters</th><th>Deployment Mode</th></tr></thead><tbody><tr><td>GPT-4o / GPT 4.1</td><td>OpenAI (Azure)</td><td>N/A</td><td>Cloud (Azure-hosted) / Self-hosted</td></tr><tr><td>Claude 3.7+ Sonnet, Claude 4.0</td><td>Anthropic</td><td>N/A</td><td>Cloud (Amazon Bedrock) / Self-hosted</td></tr><tr><td>LLaMA 4+ Scout ⚠️</td><td>Meta</td><td>17B</td><td>Cloud (Amazon Bedrock) / Self-hosted</td></tr><tr><td>gpt-oss-20b</td><td>Open AI</td><td>20B</td><td>Cloud (Amazon Bedrock) / Self-hosted</td></tr></tbody></table>

*⚠️ Some advanced features, such as Automated Compliance Checking and Automated Report Generation with AskAI, may experience degraded performance.*

Need help evaluating which LLM works best for your use case? Reach out to our team at **<support@vectice.com>**.
