How Vector Databases Work In KingBotGPT?

University and other related webpages and documents are first scraped and converted into embeddings by an embedding model. These embeddings are then stored in the vector database. For our KingbotGPT, we use Chroma because it is open-source, lightweight, and well-suited for storing and retrieving library-specific data.

1. Indexing and Storage

Once embeddings are added to Chroma, the database automatically organizes them using indexing techniques that make searches efficient, even when storing thousands or millions of vectors.
Examples of these techniques include:

  • HNSW (Hierarchical Navigable Small World Graphs): builds a graph of vectors to speed up similarity search.

  • IVF (Inverted File Index): groups vectors into clusters so the database only searches in the most relevant clusters.

  • PQ (Product Quantization): compresses vectors to save memory and improve speed at scale.

These methods happen in the background, so developers only need to insert embeddings. Chroma handles the complexity automatically.

2. Similarity Search

When a student asks a question, the query is also converted into an embedding. Chroma then compares this embedding with the stored embeddings and retrieves the most similar ones. This process is called nearest neighbor search, and results are ranked by semantic similarity rather than exact keyword matches.

3. Retrieval Process

After the most relevant embeddings are identified, Chroma retrieves the associated text and metadata (such as the document title, link, or section). These retrieved chunks are then passed to the language model.
In KingbotGPT, this retrieval step ensures that answers are grounded in library webpages, LibGuides, and university resources, rather than relying only on the language model’s training data.

4. Updates in KingbotGPT

To keep information current, KingbotGPT refreshes Chroma on a weekly basis. New webpages, LibGuides, and university documents are scraped, converted into embeddings, and added to the database. This ensures that answers reflect the most up-to-date resources available at the King Library and San José State University.

vectordb

Supporting Links