pgvector

Last Updated : 16 Dec, 2025

pgvector is an open‑source PostgreSQL extension that brings native vector similarity search directly into the relational database. It allows you to store, index and query high‑dimensional embeddings ike those from language or image models, without relying on a separate vector database.

  • Seamlessly integrates AI‑powered search and recommendations into Postgres
  • Supports similarity metrics such as cosine, inner product and L2 distance
  • Ideal for building retrieval, ranking and personalization systems
vector_representation
pgvector

Key features

  • Native PostgreSQL Integration: Works as a standard Postgres extension, letting you perform vector search using familiar SQL syntax without external dependencies.
  • Efficient Vector Storage: Supports fixed-dimension vector data types for storing embeddings from text, image, or audio models directly in database tables.
  • Multiple Similarity Metrics: Enables vector comparisons using cosine similarity, Euclidean (L2) distance, and inner product for flexible semantic search.
  • Indexing for Fast Search: Provides IVFFlat and HNSW indexing options for approximate nearest neighbor (ANN) retrieval, ensuring efficient search across millions of vectors.

Implementation

Let's see how pgvector can be implemented.

Step 1: Install Packages

We will:

  • Installs PostgreSQL and its contrib modules.
  • postgresql-server-dev-14 provides postgres.h, required for building pgvector.
  • Also installs compilers and libraries needed for the build.
bash
!apt-get update -qq
!apt-get install -y postgresql postgresql-contrib postgresql-server-dev-14 git make gcc libpq-dev

Step 2: Start PostgreSQL and Set Password

We will:

  • Starts the PostgreSQL service.
  • Sets a password for the postgres user.
bash
!service postgresql start
!sudo -u postgres psql -c "ALTER USER postgres PASSWORD 'password';"

Output:

Screenshot-2025-10-28-103923
Output

Step 3: Build pgvector

Now we will:

  • Clone the official pgvector repository.
  • Compiles the extension from source using PostgreSQL dev headers.
  • Installs the built files into the PostgreSQL extension directory.
bash
!git clone https://github.com/pgvector/pgvector.git
%cd pgvector
!make
!make install
%cd /content

Step 4: Enable the pgvector Extension

We will start the PostgreSQL again to ensure the service is running and load the pgvector extension into the current Postgres instance.

bash
!service postgresql start
!sudo -u postgres psql -c "CREATE EXTENSION IF NOT EXISTS vector;"

Output:

Screenshot-2025-10-28
Output

Step 5: Create Table and Perform Similarity Query

Here we will:

  • Create a documents table with a vector column of dimension 3.
  • Inserts three example embeddings.
  • Uses the <-> operator to compute Euclidean distance between stored vectors and a query vector.
  • Orders results by distance to find the most similar entries.
bash
!sudo -u postgres psql -c "CREATE TABLE IF NOT EXISTS documents (id SERIAL PRIMARY KEY, content TEXT, embedding vector(3)); INSERT INTO documents (content, embedding) VALUES ('AI in healthcare', '[0.11, 0.45, 0.33]'), ('Machine learning', '[0.12, 0.44, 0.34]'), ('Cooking recipes', '[0.87, 0.13, 0.55]'); SELECT id, content, embedding <-> '[0.10,0.46,0.32]' AS distance FROM documents ORDER BY distance LIMIT 3;"

Output:

Screenshot-2025-10-28-103910
Result

Step 6: Fix Authentication

PostgreSQL by default uses peer authentication, we’ll switch it to password authentication.

  • Edits pg_hba.conf to allow password-based access.
  • Restarts PostgreSQL to apply changes.
  • Resets the password for the postgres user for Python access.
bash
!sudo sed -i 's/local\s*all\s*postgres\s*peer/local all postgres md5/' /etc/postgresql/14/main/pg_hba.conf
!service postgresql restart
!sudo -u postgres psql -c "ALTER USER postgres WITH PASSWORD 'password';"

Output:

Step 7: Connect to PostgreSQL

We will connect to PostgreSQL in python:

  • Installs Python bindings for PostgreSQL and pgvector.
  • Connects securely using psycopg2.
  • Registers pgvector’s custom vector data type.
  • Performs a similarity search using a query embedding.
Python
!pip install psycopg2-binary pgvector


import psycopg2
from pgvector.psycopg2 import register_vector

conn = psycopg2.connect(
    "dbname=postgres user=postgres password=password host=localhost port=5432")
register_vector(conn)

cur = conn.cursor()
cur.execute(
    "SELECT id, content FROM documents ORDER BY embedding <-> %s LIMIT 2",
    ([0.10, 0.46, 0.32],)
)
print(cur.fetchall())

cur.close()
conn.close()

Output:

[(1, 'AI in healthcare'), (2, 'Machine learning')]

pgvector vs. Other Vector Databases

Let's compare pgvector with other vector databases such as FAISS, ChromaDB and Milvus.

FeaturepgvectorFAISSChromaDBMilvus
BasePostgreSQL extensionStandalone C++ libraryPython-native DBDistributed vector DB
Index TypesIVFFlat, HNSWFlat, IVFFlat, HNSWHNSWIVF, HNSW, DiskANN
PersistencePostgreSQL-backedIn-memory / DiskLocal / PersistentHighly persistent
DeploymentBuilt into PostgresSeparateLocal or cloudCluster-based
Ease of UseEasy (SQL)MediumEasyComplex
Best ForIntegrating AI search into existing Postgres DBsResearch / Local appsLight-weight AI prototypesLarge-scale enterprise search

Applications

  • Semantic Search: Find documents, FAQs or images based on meaning rather than keywords.
  • Recommendation Systems: Suggest similar products, songs or movies based on embeddings.
  • Chatbot Memory (RAG): Store and retrieve context embeddings for large language models.
  • Image and Audio Similarity: Compare multimedia embeddings for similarity detection.
  • Hybrid Search: Combine traditional SQL filters with vector search for context-aware retrieval.

Advantages

  • Seamless Integration: Works directly within PostgreSQL, i.e., no separate DB needed.
  • Flexible Indexing: Supports both exact and approximate search (IVFFlat, HNSW).
  • ACID-Compliant: Inherits all Postgres transaction and consistency guarantees.
  • Unified Querying: Combine SQL and vector search in one query (hybrid search).
  • Lightweight Setup: Easy to integrate into existing infrastructure.

Limitations

  • Performance at Scale: Slower than dedicated vector databases for billions of vectors.
  • Limited GPU Acceleration: Relies on CPU-based operations only.
  • Storage Overhead: Large vectors can increase DB size quickly.
Comment

Explore