Vector Databases
Vector databases store vector representations of data, known as vector embeddings. Data such as audio recordings and images are converted to vectors of numerical values, storing a representation of the object and its features. These representations make it easier to find similarities between an object and a large set of unstructured data. Vector databases are most useful for semantic search, recommendation engines, and anomaly detection.
Unstructured data is growing exponentially, and we are all part of a huge unstructured data workforce. This blog post is unstructured data; your visit here produces unstructured and semi-structured data with every web interaction, as does every photo you take or email you send. The global datasphere will grow to 165 zettabytes by 2025, and about 80% of that will be unstructured. At the same time, the rising demand for AI is vastly outpacing existing infrastructure. Around 90% of machine learning research results fail to reach production because of a lack of tools.
Thankfully there’s a new generation of tools that let developers work with unstructured data in the form of vector embeddings, which are deep representations of objects obtained from a neural network model. A vector database, also known as a vector similarity search engine or approximate nearest neighbour (ANN) search database, is a database designed to store, manage, and search high-dimensional data with an additional payload.
On Unstructured Data, Vector Databases, New AI Age, and Our Seed Round, Qdrant
Funding
Oblivious, a framework to build apps that live inside of enclaves, which are useful for processing confidential data, raised €5.35m in Seed funding.
Qdrant, a vector database and vector similarity search engine, raised $7.5m in Seed funding.
Fluree, an open-source semantic graph database, raised $10m in Series A funding.
Groundlight, a startup combining natural language and computer vision, raised $10m in Seed funding.
Ditto, cross-platform peer-to-peer database that allows apps to sync with and even without internet connectivity, raised $45m in Series A funding.
Weaviate, an open-source vector database allowing teams to store data objects and vector embeddings from ML models, raised $50m in Series B funding.
Semgrep, an open-source static analysis engine for finding bugs and vulnerabilities in code, raised $53m in Series C funding.
CoreWeave, a GPU-focused cloud provider, raised $221m in Series B funding.