Introduction to Vector Databases
A vector database is a type of database designed to store and search information based on meaning rather than exact keywords. Instead of saving data as plain text or numbers only, it saves data as vectors, also called embeddings, which are lists of numbers that represent meaning.
Embeddings are numerical representations of words, sentences, or documents created by machine learning models. Each embedding is a long list of real numbers that captures the context and relationships of the data in a high-dimensional space. This allows concepts with similar meanings, like “book” and “novel”, to be stored close together, while unrelated concepts, like “book” and “airplane”, are stored far apart.
This makes vector databases especially powerful for tasks like:
-
Semantic search: finding documents that are conceptually related, not just keyword-matched.
-
Recommendation systems: suggesting items based on meaning (like articles, research papers, or books).
-
AI memory: storing and retrieving relevant information for chatbots or AI agents.

Figure: An overview of vector embeddings and databases
Why does this matter to us?
Large Language Models (LLMs) like ChatGPT are powerful but have limited memory. They don’t automatically know about your library’s holdings, course readings, or specialized local data.
Vector databases solve this by:
-
Acting as a knowledge base.
-
Letting the model “look up” relevant information on demand.
-
Making searches more intuitive (e.g., searching for “Shakespeare tragedies” retrieves Hamlet even if “tragedy” isn’t in the catalog description).
Traditional search tools rely on keywords, but with vector search in KingbotGPT, students could ask questions in natural language and receive answers drawn from both library resources and university websites. This bridges the gap between how students think and ask questions and how information is stored.