Thursday 

Room 4 - Level 4 

13:40 - 14:40 

(UTC±00

Workshop (60 min)

An Introduction to Vector Databases

Advances in machine learning have brought new opportunities but also new challenges to the field of databases. Today's users have gotten used to natural language search and robust recommendation systems. They expect to get what they search for without needing to remember the exact keyword.

Database
Big Data
Machine Learning

To tackle these challenges, we need machine learning models that create vectors that represent the underlying semantic similarities tailored to the task at hand and a way to efficiently store and retrieve large amounts of vector and non-vector data. Scaling machine learning models to work reliably in a production environment is hard, and implementing efficient vector search while keeping the full real-time CRUD support that is expected from databases is even harder.

Vector search engines are changing the game of search. With the support of many amazing models, they enable us to search through vast amounts of data (text, images, audio, video, etc.) measured in tens of milliseconds. Think: you could ask Wikipedia (containing 30m+ paragraphs) any question in a spoken language, and the VSE will give you an answer in no time. Now couple it with a multi-lingual Large Language Model, and you could ask your question in any (supported) language, which would search the data across documents written in multiple languages.

Join me for this session to learn more about vector embeddings, how vector search engines work, why they are so fast, and how they could help you take your search to production.
I will also take you through a full live-coding demo, showing you all the steps from setup to query.

Zain Hasan

Zain Hasan is a Senior Developer Advocate at Weaviate an open-source vector database. He is an engineer and data scientist by training, who pursued his undergraduate and graduate work at the University of Toronto building artificially intelligent assistive technologies. He then founded his company developing a digital health platform that leveraged machine learning to remotely monitor chronically ill patients. More recently he practiced as a consultant senior data scientist in Toronto. He is passionate about open-source software, education, community, and machine learning and has delivered workshops and talks at multiple events and conferences.