Embedding models are at the core of search, recommendation, and retrieval-augmented generation (RAG) systems, transforming data into meaningful representations. We can adapt state-of-the-art large language models (LLMs) into embedding models that generate high-quality embeddings, but deploying these models in large-scale applications presents significant challenges.
This talk explores the end-to-end lifecycle of embedding systems, including:
- Leveraging LLMs for high-quality embeddings and adapting them for domain-specific use cases using contrastive learning.
- Designing custom architectures optimized for use-case specific serving requirements.
- Distilling large embedding models into smaller, production-friendly sizes.
- Serving embeddings efficiently with optimization strategies like variable batch sizes and post-training quantization.
Attendees will leave with practical strategies for scaling embedding models from research to production, ensuring high performance and efficiency in real-world applications like retrieving best matching documents, passages or images, data de-duplication, generating personalized recommendations, content clustering, and grounding GenAI responses using RAG approach.
Speaker
Sahil Dua
Senior Software Engineer, Machine Learning @Google, Stanford AI, Co-Author of “The Kubernetes Workshop”, Open-Source Enthusiast
Sahil Dua is a Tech Lead focused on developing and adapting large language models (LLMs) with an expertise in Representation Learning. He oversees the full LLM lifecycle, from designing data pipelines and model architectures to optimizing models for highly efficient serving. Before Google, Sahil worked on the ML platform at Booking.com to scale machine learning model development and deployment.
A co-author of “The Kubernetes Workshop” book and an open-source enthusiast, Sahil has contributed to projects like Git, Pandas, and Linguist. As a frequent speaker at global conferences, he shares insights on AI, machine learning, and tech innovation, inspiring professionals across the industry.