Bring Your Own LLM Guide

This page provides recommendations for configuring your LLM and outlines the cost calculation methodology for the AskAI feature. For general information on AskAI, please refer to the dedicated documentation.

Bring your own LLM

Vectice supports integration with any LLM of your choice as long as they meet minimum performance requirements (see requirements below). Whether you use commercial APIs or host models privately, Vectice can connect seamlessly. Our team is available to assist with advanced customization needs.

Note: Vectice does not require a multimodal LLM. All current features operate using text-only models.

Benefits of Using Your Own LLM

Bringing your own LLM allows to:

  • Preserve data confidentiality by processing prompts in a secure, private environment

  • Control costs by selecting the model and infrastructure that fit your usage profile

  • Meet compliance and IT policies with regional hosting or on-premise setups

  • Fine-tune behavior using domain-specific data and enterprise instructions

  • Ensure long-term flexibility across proprietary and open-source model providers

Best Practices for Your LLM Deployment

To ensure a smooth experience and prevent service interruptions in Vectice, especially under concurrent usage, we recommend:

Token Throughput

  • Support a throughput of 450,000 tokens per minute

  • This baseline ensures stable usage for ~5 to 10 active users

  • Applies to both input and output tokens

Recommended Model Capabilities

  • Your LLM should match or exceed the performance of GPT-4o-mini on reasoning, summarization, and code analysis tasks.

  • Minimum equivalent model size: 7B–8B parameters trained on high-quality data (e.g., LLaMA 3.2+ 8B+, Mixtral 8x22B)

If using GPT-4o or GPT-4o-mini on Azure, configure content filtering to avoid macro execution errors:

  1. Go to Safety + Security → Content Filters

  2. Create a dedicated filter for Vectice usage

  3. Disable the following input filters:

    • Prompt shields for jailbreak attacks

    • Prompt shields for indirect attacks

  4. Associate the filter with your GPT-4o or 4o-mini deployment

Cost Calculation per User

Standard usage for a user actively documenting model development or validation is about 1.3 Millions tokens per month. For reference, this is the current pricing for LLMs (March 2025).

GPT 4o
GPT 4o-mini
Claude 3 Haiku/Sonnet
LLaMA 70B
Mixtral 8x7B

Cost per Million token

$ 10

$ 0.6

$ 15

$ 0.72

$ 0.70

Monthly Estimated Cost per User

$ 13

< $ 1

$ 19

< $ 1

< $ 1

For the latest pricing, refer to OpenAI’s API pricing page and Amazon's Bedrock princing page.

Common LLMs Used with Vectice

Enterprises typically integrate the following models based on their deployment strategy and governance needs:

Model
Provider
Parameters
Deployment Mode

GPT-4o / GPT-4o-mini

OpenAI (Azure)

N/A

Cloud (Azure-hosted) / Self-hosted

Claude 3 Haiku / Sonnet

Anthropic

N/A

Cloud (Amazon Bedrock) / Self-hosted

LLaMA 3.2+ 8B / 70B

Meta

8B / 70B

Cloud (Amazon Bedrock) / Self-hosted

Mixtral 8x7B / 8x22

Mistral AI

8x7B / 8x22B

Cloud (Amazon Bedrock) / Self-hosted

Need help evaluating which LLM works best for your use case? Reach out to our team at [email protected].