Text Embedding Models

Model rankings updated May 2026 based on real usage data.

Embedding models convert text into dense vector representations, enabling semantic search, retrieval-augmented generation (RAG), clustering, and similarity matching. OpenRouter provides access to leading embedding models through a single API gateway, so you can test models and compare performance and pricing without managing multiple provider integrations.

Whether you're building a knowledge base, powering search across documents, or feeding context into an LLM pipeline, these are the most popular embedding models available on OpenRouter today.

Embedding Models on OpenRouter

Qwen: Qwen3 Embedding 8B

76.6B tokens

The Qwen3 Embedding model series is the latest proprietary model of the Qwen family, specifically designed for text embedding and ranking tasks. This series inherits the exceptional multilingual capabilities, long-text understanding, and reasoning skills of its foundational model. The Qwen3 Embedding series represents significant advancements in multiple text embedding and ranking tasks, including text retrieval, code retrieval, text classification, text clustering, and bitext mining.

by qwen32K context$0.01/M input tokens$0/M output tokens

OpenAI: Text Embedding 3 Small

64B tokens

text-embedding-3-small is OpenAI's improved, more performant version of the ada embedding model. Embeddings are a numerical representation of text that can be used to measure the relatedness between two pieces of text. Embeddings are useful for search, clustering, recommendations, anomaly detection, and classification tasks.

by openai8K context$0.02/M input tokens$0/M output tokens

BAAI: bge-m3

10.6B tokens

The bge-m3 embedding model encodes sentences, paragraphs, and long documents into a 1024-dimensional dense vector space, delivering high-quality semantic embeddings optimized for multilingual retrieval, semantic search, and large-context applications.

by baai8K context$0.01/M input tokens$0/M output tokens

Qwen: Qwen3 Embedding 4B

9.83B tokens

by qwen33K context$0.02/M input tokens$0/M output tokens

Google: Gemini Embedding 001

8.81B tokens

gemini-embedding-001 provides a unified cutting edge experience across domains, including science, legal, finance, and coding. This embedding model has consistently held a top spot on the Massive Text Embedding Benchmark (MTEB) Multilingual leaderboard since the experimental launch in March.

by google20K context$0.15/M input tokens$0/M output tokens

OpenAI: Text Embedding 3 Large

8.17B tokens

text-embedding-3-large is OpenAI's most capable embedding model for both english and non-english tasks. Embeddings are a numerical representation of text that can be used to measure the relatedness between two pieces of text. Embeddings are useful for search, clustering, recommendations, anomaly detection, and classification tasks.

by openai8K context$0.13/M input tokens$0/M output tokens

Google: Gemini Embedding 2 Preview

2.18B tokens

Gemini Embedding 2 Preview is Google's first multimodal embedding model. We currently support mapping text and images into a unified vector space for semantic search and retrieval-augmented generation (RAG). It supports input context up to 8,192 tokens and flexible output dimensions from 128 to 3,072 (recommended: 768, 1536, or 3,072). Designed for cross-modal similarity — you can embed a text query and retrieve the most relevant images, or vice versa — making it well-suited for multimodal search, recommendation, and document understanding pipelines.

by google8K context$0.20/M input tokens$0/M output tokens$6.50/M audio tokens

NVIDIA: Llama Nemotron Embed VL 1B V2 (free)

1.99B tokens

The Llama Nemotron Embed VL 1B V2 embedding model is optimized for multimodal question-answering retrieval. The model can embed 'documents' in the form of image, text, or image and text combined. Documents can be retrieved given a user query in text form. The model supports images containing text, tables, charts, and infographics.

by nvidia131K context$0/M input tokens$0/M output tokens

Sentence Transformers: all-MiniLM-L6-v2

1.32B tokens

The all-MiniLM-L6-v2 embedding model maps sentences and short paragraphs into a 384-dimensional dense vector space, enabling high-quality semantic representations that are ideal for downstream tasks such as information retrieval, clustering, similarity scoring, and text ranking.

by sentence-transformers512 context$0.005/M input tokens$0/M output tokens

Perplexity: Embed V1 0.6B

1.16B tokens

pplx-embed-v1-0.6B is one of Perplexity's state-of-the-art text embedding models built for real-world, web-scale retrieval. pplx-embed-v1 is optimized for standard dense text retrieval with the 0.6B parameter model targeting lightweight, low-latency embedding generation.

by perplexity32K context$0.004/M input tokens$0/M output tokens