Vector databases have been on the rise as a long term storage option for large language models. The most common use case is for retrieval augment generation in which the prompt for the large language model is populate with relevant context extracted from the vector database. You can also perform unstructured search (unstructured in the sense that the inputs don’t follow a schema per say) such as images and videos.

Representation Learning

But there are other interesting use cases I have encountered during my involvement in SemaDB as part of a project at Semafind. Their potential lies in representation learning, a concept that has been around for a long time. The modern day version of the story involves neural networks learning dense vector representations of its inputs. The process of converting inputs into these vectors, often using such neural networks, are colloquially referred to as embedding. Now, what we can embed is not restricted to text, images or videos. We can even embed structured data that has clear schema and relationships.

Detecting Diseased Animals from Behaviour

Imagine a world where we can identify health problems in livestock early on, simply by analysing their movement patterns. This is possible by converting animal movement data into dense vectors. These vectors encapsulate the essence of their behaviour. A vector database then allows us to quickly compare these movement vectors to those of animals with known diseases. This could lead to early interventions and better outcomes for the animals.

Predicting Patient Episodes in Hospitals

Healthcare is ripe for disruption by vector-based technologies. By converting patient data (temperature, blood pressure, etc.) into dense vectors at regular intervals, we create a time-series representation of their health. This data, stored in a vector database, empowers lightning-fast retrieval of similar health patterns from the patient’s own history. This allows us to potentially anticipate chronic episodes, personalize treatment plans, and improve outcomes.

E-commerce platforms are no strangers to image and text search, but vector databases offer a fascinating new dimension. We can construct vectors representing a multifaceted snapshot of shopper behaviour – item categories, frequency, colors, and more. Searching these vectors reveals broader trends (like early Christmas shopping) and their similarities to past events. This empowers retailers to make incredibly accurate predictions and better understand their customer base.

The Power of Representations

Why not just have a neural network predict the desired outcome? Let it predict whether an animal is diseased or whether the patient is going to have an episode. Well, in that case you have to train the model to do one thing, at best you can have multiple predictions from a single model. On the other hand, when we leverage dense vector representations, we can search for similar events, animals, behaviours without worrying about that original outcome. For example, if animal behaviour changes over time, we can compare current ones to previous ones as opposed to our original predictor slowly drifting away. In other words, we are interested in the representation of something rather than the one outcome itself because we can potentially represent many things.

Embedding Anything and Everything

There are many methods of learning dense vector representations of things. One of the most common and easiest is to use auto-encoders. The idea is to let a neural network predict its input by using a smaller vector bottleneck.

In effect, it acts as a lossy compression of the original data. We can nudge it to give weight to different features to ensure what we care about isn’t lost but ultimately this compressed data is a summary of our potentially large, messy input. We then use the vector database to work with these vectors: mainly index and quickly search them.


If you have a project that could make use of vector database but not sure where to start, feel free to reach out.