Bring Your Own LLM Guide

This page provides recommendations for configuring your LLM and outlines the cost calculation methodology for the AskAI feature. For general information on AskAI, please refer to the dedicated documentation.

Bring your own LLM

Vectice supports integration with any LLM of your choice as long as they meet minimum performance requirements (see requirements below). Whether you use commercial APIs or host models privately, Vectice can connect seamlessly. Our team is available to assist with advanced customization needs.

Note: Vectice does not require a multimodal LLM. All current features operate using text-only models.

Benefits of Using Your Own LLM

Bringing your own LLM allows to:

Preserve data confidentiality by processing prompts in a secure, private environment
Control costs by selecting the model and infrastructure that fit your usage profile
Meet compliance and IT policies with regional hosting or on-premise setups
Fine-tune behavior using domain-specific data and enterprise instructions
Ensure long-term flexibility across proprietary and open-source model providers

Best Practices for Your LLM Deployment

To ensure a smooth experience and prevent service interruptions in Vectice, especially under concurrent usage, we recommend:

Token Throughput

Support a throughput of 450,000 tokens per minute
This baseline ensures stable usage for ~5 to 10 active users
Applies to both input and output tokens

Recommended Model Capabilities

Your LLM should match or exceed the performance of GPT-4o-mini on reasoning, summarization, and code analysis tasks.
Minimum equivalent model size: 7B–8B parameters trained on high-quality data (e.g., LLaMA 3.2+ 8B+, Mixtral 8x22B)

Azure OpenAI Configuration for GPT-4o (Recommended)

If using GPT-4o or GPT-4o-mini on Azure, configure content filtering to avoid macro execution errors:

Go to Safety + Security → Content Filters
Create a dedicated filter for Vectice usage
Disable the following input filters:
- Prompt shields for jailbreak attacks
- Prompt shields for indirect attacks
Associate the filter with your GPT-4o or 4o-mini deployment

Cost Calculation per User

Standard usage for a user actively documenting model development or validation is about 1.3 Millions tokens per month. For reference, this is the current pricing for LLMs (March 2025).

GPT 4o

GPT 4o-mini

Claude 3 Haiku/Sonnet

LLaMA 70B

Mixtral 8x7B

Cost per Million token

$ 10

$ 0.6

$ 15

$ 0.72

$ 0.70

Monthly Estimated Cost per User

$ 13

< $ 1

$ 19

< $ 1

For the latest pricing, refer to OpenAI’s API pricing page and Amazon's Bedrock princing page.

Common LLMs Used with Vectice

Enterprises typically integrate the following models based on their deployment strategy and governance needs:

Model

Provider

Parameters

Deployment Mode

GPT-4o / GPT-4o-mini

OpenAI (Azure)

N/A

Cloud (Azure-hosted) / Self-hosted

Claude 3 Haiku / Sonnet

Anthropic

N/A

Cloud (Amazon Bedrock) / Self-hosted

LLaMA 3.2+ 8B / 70B