Challenges and Considerations

While vector databases provide powerful capabilities for semantic search and AI-driven applications, they also come with challenges that need to be understood in both technical and library contexts.

1. Privacy and Security

Libraries handle sensitive information, from student data to access logs. If embeddings are generated from private or licensed content, that information must be protected. Cloud-based vector databases may raise concerns about where and how data is stored.

2. Cost and Scalability
  • Cost: Managed services can become expensive as the number of embeddings grows.
  • Scalability: Even open-source tools like Chroma need careful planning if millions of documents are stored. Indexing and refreshing large datasets require processing power and storage space.
3. Bias in Embeddings

Embeddings inherit biases from the language models that generate them. This means a vector search might unintentionally prioritize some topics or viewpoints over others. For academic libraries committed to balanced access, this is an important consideration.

4. Data Freshness

Vector databases do not automatically stay up-to-date. They need to be refreshed regularly when content changes.

  • At KingbotGPT, we address this by scraping new webpages and updating Chroma on a weekly basis.
  • Without regular updates, users might receive outdated or incomplete results.
5. Integration Complexity

Although inserting embeddings is simple, connecting a vector database to library systems, catalogs, and discovery layers requires technical expertise. Middleware like LlamaIndex helps, but integration is still a non-trivial challenge for many institutions.

By understanding these challenges, universities and companies can:

  • Make informed choices about whether to use self-hosted or cloud-hosted vector databases.
  • Plan for sustainable workflows that keep content updated.
  • Address issues of privacy, equity, and representation when applying AI tools to knowledge discovery.
Learn More

If you’d like to explore the challenges and considerations around vector databases further, the following resources provide accessible insights from industry and practitioner perspectives: