Technology & AI

Alibaba Open-Sources Zvec: An Embedded Vector Database Bringing SQLite-like Simplicity and High Performance to the RAG Device at the Edge Application

Alibaba Tongyi Lab’s research team has released ‘Zvec’, an open-source, in-process vector database that targets on-device performance and discovery. It is classified as a ‘data vector SQLite’ because it works as a library within your application and does not require any external service or daemon. It is designed for advanced retrieval generation (RAG), semantic search, and agent workloads that must run locally on laptops, mobile devices, or other tethered edge hardware/devices.

The main idea is simple. Many applications now require vector search and filtering metadata but do not want to use a separate vector data service. Traditional server-style systems are heavy on desktop tools, mobile apps, or command-line utilities. An embedded engine that works like SQLite but for embedding fits this space.

Why embedded vector search is important in RAG?

RAG and semantic search pipelines require more than an empty index. They require vectors, scalar fields, full CRUD, and safe persistence. Local knowledge bases change as files, notes, and project environments change.

Index libraries such as Faiss provide nearest neighbor search but do not support scalar storage, crash detection, or hybrid queries. You end up building your own storage and consistency layer. Embedded extensions such as DuckDB-VSS add vector search to DuckDB but expose fewer indexing and scaling options and weaker resource management in edge cases. Service-based systems like Milvus or managed vector clouds require separate network calls and deployments, which often overload on-device tools.

Zvec claims to be particularly suited to these local conditions. It provides you with a vector-native engine with persistence, resource management, and RAG-oriented features, packaged as a lightweight library.

Core architecture: in-process and vector-native

Zvec is used as an embedded library. You enter it with pip install zvec and open collections directly in your Python process. There is no external server or RPC layer. You define schemas, insert documents, and run queries with the Python API.

The engine is built on Proxima, Alibaba Group’s high-performance, production-grade, battle-tested vector search engine. Zvec wraps Proxima with a simple API and embedded runtime. This project is released under the Apache 2.0 license.

Current support includes Python 3.10 to 3.12 on Linux x86_64, Linux ARM64, and macOS ARM64.

The design principles are clear:

  • Embedded processing continues
  • Native Vector index and storage
  • Productivity-friendly persistence and crash safety

This makes it ideal for edge devices, desktop applications, and zero-ops deployments.

The quickstart documentation shows a short path from installation to query.

  1. Install the package:
    pip install zvec
  2. Explain a CollectionSchema with one or more vector fields and optional scalar fields.
  3. Make a phone call create_and_open to create or open a cluster on disk.
  4. Enter Doc objects that contain ID, vectors, and scalar attributes.
  5. Create an index and use a VectorQuery to bring back the nearest neighbors.

Example:

import zvec

# Define collection schema
schema = zvec.CollectionSchema(
    name="example",
    vectors=zvec.VectorSchema("embedding", zvec.DataType.VECTOR_FP32, 4),
)

# Create collection
collection = zvec.create_and_open(path="./zvec_example", schema=schema,)

# Insert documents
collection.insert([
    zvec.Doc(id="doc_1", vectors={"embedding": [0.1, 0.2, 0.3, 0.4]}),
    zvec.Doc(id="doc_2", vectors={"embedding": [0.2, 0.3, 0.4, 0.1]}),
])

# Search by vector similarity
results = collection.query(
    zvec.VectorQuery("embedding", vector=[0.4, 0.3, 0.3, 0.1]),
    topk=10
)

# Results: list of {'id': str, 'score': float, ...}, sorted by relevance 
print(results)

The results are returned as dictionaries containing matching IDs and scores. This is sufficient to build a local semantic search or RAG retrieval layer on top of any embedding model.

Performance: VectorDBBench with 8,000+ QPS

Zvec is optimized for high performance and low latency on CPUs. It uses multithreading, cache friendly memory layouts, SIMD instructions, and CPU prefetching.

In VectorDBBench on the Cohere 10M dataset, with the same hardware and memory, Zvec reports over 8,000 QPS. This is more than 2 × the previous #1 leaderboard, ZillizCloud, while also significantly reducing index building time in a similar setup.

These metrics show that the embedded library can achieve cloud-level performance for high-volume parallel searches, as long as the performance matches the benchmark conditions.

RAG Capabilities: CRUD, hybrid search, clustering, repositioning

The feature set is enabled for RAG and agent recovery.

Zvec supports:

  • Complete CRUD on documents so that the local knowledge base can change over time.
  • Schema evolution to configure strategies and index fields.
  • Retrieval of multiple vectors for queries involving several embedding channels.
  • Built-in reranker that supports weighted merge and Reciprocal Rank Fusion.
  • Scalar vector hybrid search that pushes scalar filters to the indexing method, with optional inverted indexes for scalar attributes.

This allows you to build in device assistants that combine semantic retrieval, filters such as user, time, or type, and multiple embedding models, all within one embedded engine.

Key Takeaways

  • Zvec is an embedded, in-process vector database positioned as a ‘vector SQLite database’ for on-device and edge RAG deployments.
  • Built on Proxima, Alibaba’s high-performance, production-grade, battle-tested vector search engine, and released under Apache 2.0 with Python support on Linux x86_64, Linux ARM64, and macOS ARM64.
  • Zvec delivers >8,000 QPS on VectorDBBench with Cohere 10M dataset, achieving more than 2× the previous #1 leaderboard (ZillizCloud) while also reducing index build time.
  • The engine provides transparent resource management with 64 MB stream writes, optional mmap mode, testing memory_limit_mband it is adjustable concurrency, optimize_threadsagain query_threads for CPU control.
  • Zvec is RAG-ready with full CRUD, schema evolution, multi-vector retrieval, structured refactoring (weighted fusion and RRF), and scalar vector hybrid search with optional inverted indexes, and an ecosystem roadmap targeting LangChain, LlamaIndex, DuckDB, PostgreSQL, and real-device implementation.

Check it out Technical details again Repo. Also, feel free to follow us Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to Our newspaper. Wait! are you on telegram? now you can join us on telegram too.


Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button