Giving Drupal's Search "Superpowers": My Adventure with Vector Databases and Milvus

19 June 2025

Hello everyone! If you've been following my latest adventures, you know I've been immersed in the exciting world of Artificial Intelligence, experimenting with Docker Model Runner and, more recently, creating my own AI provider for Drupal (ai_provider_docker). Well, my next challenge to strengthen that module led me to a concept that's fundamental to modern AI: Embeddings!

To properly implement embeddings in my Drupal provider, I found myself needing to thoroughly investigate a very particular type of database: Vector Databases. Honestly, they sounded a bit intimidating at first, but I've discovered they are incredibly powerful tools for handling AI data.

What is a Vector Database (very briefly)?

Imagine you have millions of documents, images, or audio files. A normal database would search for exact keywords. A vector database, on the other hand, can understand the "meaning" or "context" of that data because it stores them as "vectors" (essentially, long lists of numbers that represent their characteristics). This allows for much smarter searches, like "find me documents that talk about a similar topic to this one," even if they don't use the exact same words. It's like going from searching by exact matches to searching by "similar ideas."

Milvus: My Choice to Start

In my exploration exercise, I decided on Milvus (milvusdb/milvus). Why Milvus? It seemed like a robust and flexible option to start understanding how these databases work in practice. Milvus doesn't work alone; to function completely, it uses other components like Etcd (for service coordination), MinIO (for object storage, like large vectors), and tools like Zilliz/Attu (a web interface to manage Milvus). All of this sounds complex, but the magic is that they work together to handle millions of vectors efficiently.

To get my local Milvus instance up and running for testing, I use a docker-compose.yml file. I'm still one of those who prefers to manually configure my development environments and create my own versions, but if you're using tools like DDEV, Lando, or others, that's perfectly fine—they often come with similar setups pre-configured.

Here's a simplified docker-compose.yml I use for my local Milvus setup:

services:

  # Drupal container
  drupal:
    build:
      context: ./docker/php
    volumes:
      - ./app:/var/www/app:delegated
    working_dir: /var/www/app
    healthcheck:
      test: ["CMD", "php-fpm", "-t"]
      interval: 30s
      timeout: 10s
      retries: 3
    networks:
      - backend

  # Nginx container
  nginx:
    image: nginx:latest
    ports:
      - 80:80
    volumes:
      - ./docker/nginx/nginx.conf:/etc/nginx/conf.d/default.conf
      - ./app:/var/www/app:delegated
    depends_on:
      - drupal
    networks:
      - backend

  # MariaDB container
  mariadb:
    image: mariadb:10.11
    ports:
      - 3306:3306
    restart: always
    command:
      - --disable-log-bin
      - --innodb-buffer-pool-size=256M
      - --max-connections=200
    stop_grace_period: 30s
    environment:
      MYSQL_DATABASE: ${MYSQL_DATABASE}
      MYSQL_USER: ${MYSQL_USER}
      MYSQL_PASSWORD: ${MYSQL_PASSWORD}
      MYSQL_ALLOW_EMPTY_PASSWORD: 1
      MYSQL_TRANSACTION_ISOLATION: READ-COMMITTED
    volumes:
      - mariadb-data:/var/lib/mysql:delegated
    networks:
      - backend

  # Key-value store
  etcd:
    image: quay.io/coreos/etcd:v3.5.0
    container_name: etcd
    environment:
      - ETCD_AUTO_COMPACTION_MODE=revision
      - ETCD_AUTO_COMPACTION_RETENTION=1000
      - ETCD_QUOTA_BACKEND_BYTES=4294967296
    volumes:
      - ./etcd:/etcd
    command: etcd -advertise-client-urls=http://127.0.0.1:2379 -listen-client-urls http://0.0.0.0:2379 --data-dir /etcd
    networks:
      - backend

  # Object storage
  minio:
    image: minio/minio:RELEASE.2020-12-03T00-03-10Z
    container_name: minio
    environment:
      - MINIO_ACCESS_KEY=minioadmin
      - MINIO_SECRET_KEY=minioadmin
    volumes:
      - ./minio-data:/minio_data
    command: minio server /minio_data
    healthcheck:
      test: [ "CMD", "curl", "-f", "http://localhost:9000/minio/health/live" ]
      interval: 30s
      timeout: 20s
      retries: 3
    networks:
      - backend
    ports:
      - "9000:9000"

  # Vector database
  milvus:
    image: milvusdb/milvus:v2.4.8
    container_name: milvus
    command: [ "milvus", "run", "standalone" ]
    environment:
      - ETCD_ENDPOINTS=etcd:2379
      - MINIO_ADDRESS=minio:9000
    ports:
      - "19530:19530"
      - "19121:19121"
    volumes:
      - ./milvus-data:/var/lib/milvus/db
    depends_on:
      - etcd
      - minio
    networks:
      - backend

  # Milvus UI
  attu:
    container_name: attu
    image: zilliz/attu:v2.5.11
    environment:
      MILVUS_URL: milvus:19530
    ports:
      - "3000:3000"
    depends_on:
      - milvus
    networks:
      - backend

volumes:
  mariadb-data:

networks:
  backend:

To get this running, you just save it as docker-compose.yml in a directory and run docker compose up -d in your terminal. This spins up all the necessary components!

Integrating Milvus and "Superpowers" into Drupal

My main goal was to see how I could inject data from Drupal into a vector database and, more importantly, understand which specific use cases it could be useful for. I wanted to give our beloved Drupal the ability to perform searches beyond keywords, empowering its search engine with AI "superpowers."

To achieve this integration, I'm using the Drupal Search API AI module (https://www.drupal.org/project/search_api_ai). This module is the perfect starting point because it allows us to create a "server" for our vector database and an "indexer" to send our data from Drupal to Milvus. Additionally, I found a contributed module that facilitates the connection with Milvus: ai_vdb_provider_milvus (https://www.drupal.org/project/ai_vdb_provider_milvus). This module is key to bridging Drupal with Milvus!

After a good amount of configuration and testing, I can happily confirm that the embedding implementation in my contributed module (ai_provider_docker) worked! This means I can now take text in Drupal, convert it into an embedding (that numerical vector I mentioned earlier) using a local AI model, and then store that embedding in Milvus for advanced searches.

What's Next...

This is just a first glimpse of what I'm doing with vector databases. My next goal is to dedicate a full article to detailing how to implement a complete search engine from Drupal using Search API AI and, of course, leveraging this new capability of embeddings and Milvus.

I hope this small introduction to vector databases and Milvus has been as exciting for you as it has been for me! It's a field with enormous potential to improve how we interact with information on our websites. We keep learning!