Bring Your Own LLM Guide
This page provides recommendations for configuring your LLM and outlines the cost calculation methodology for the AskAI feature. For general information on AskAI, please refer to the dedicated documentation.
Bring your own LLM
Vectice supports integration with any LLM of your choice as long as they meet minimum performance requirements (see requirements below). Whether you use commercial APIs or host models privately, Vectice can connect seamlessly. Our team is available to assist with advanced customization needs.
Note: Vectice does not require a multimodal LLM. All current features operate using text-only models.
Benefits of Using Your Own LLM
Bringing your own LLM allows to:
Preserve data confidentiality by processing prompts in a secure, private environment
Control costs by selecting the model and infrastructure that fit your usage profile
Meet compliance and IT policies with regional hosting or on-premise setups
Fine-tune behavior using domain-specific data and enterprise instructions
Ensure long-term flexibility across proprietary and open-source model providers
Best Practices for Your LLM Deployment
To ensure a smooth experience and prevent service interruptions in Vectice, especially under concurrent usage, we recommend:
Token Throughput
Support a throughput of 450,000 tokens per minute
This baseline ensures stable usage for ~5 to 10 active users
Applies to both input and output tokens
Recommended Model Capabilities
Your LLM should match or exceed the performance of GPT-4o-mini on reasoning, summarization, and code analysis tasks.
Minimum equivalent model size: 7B–8B parameters trained on high-quality data (e.g., LLaMA 3.2+ 8B+, Mixtral 8x22B)
Azure OpenAI Configuration for GPT-4o (Recommended)
If using GPT-4o or GPT-4o-mini on Azure, configure content filtering to avoid macro execution errors:
Go to Safety + Security → Content Filters
Create a dedicated filter for Vectice usage
Disable the following input filters:
Prompt shields for jailbreak attacks
Prompt shields for indirect attacks
Associate the filter with your GPT-4o or 4o-mini deployment
Cost Calculation per User
Standard usage for a user actively documenting model development or validation is about 1.3 Millions tokens per month. For reference, this is the current pricing for LLMs (March 2025).
Cost per Million token
$ 10
$ 0.6
$ 15
$ 0.72
$ 0.70
Monthly Estimated Cost per User
$ 13
< $ 1
$ 19
< $ 1
< $ 1
For the latest pricing, refer to OpenAI’s API pricing page and Amazon's Bedrock princing page.
Common LLMs Used with Vectice
Enterprises typically integrate the following models based on their deployment strategy and governance needs:
GPT-4o / GPT-4o-mini
OpenAI (Azure)
N/A
Cloud (Azure-hosted) / Self-hosted
Claude 3 Haiku / Sonnet
Anthropic
N/A
Cloud (Amazon Bedrock) / Self-hosted
LLaMA 3.2+ 8B / 70B
Meta
8B / 70B
Cloud (Amazon Bedrock) / Self-hosted
Mistral 8x7B / 8x22B
Mistral AI
8x7B / 8x22B
Cloud (Amazon Bedrock) / Self-hosted
Need help evaluating which LLM works best for your use case? Reach out to our team at support@vectice.com.
Last updated
Was this helpful?