Building Embedding Models for Large-Scale Real-World Applications

Disclaimer: This summary has been generated by AI. It is experimental, and feedback is welcomed. Please reach out to info@qconlondon.com with any comments or concerns.

The presentation titled Building Embedding Models for Large-Scale Real-World Applications by Sahil Dua discusses the fundamental aspects and practical insights into deploying embedding models in large-scale applications. Here is a structured summary of the key points from the presentation:

  • Introduction to Embedding Models:
    • Embedding models transform data into meaningful vector representations, useful in various applications like search, recommendations, and retrieval-augmented generation (RAG).
    • Embeddings for similar inputs are close to each other in the vector space, while different inputs are far apart.
  • Applications:
    • Retrieving the best matching documents, passages, images, or videos from vast data collections.
    • Generating personalized recommendations based on user preferences.
    • RAG applications for enhancing language model outputs with factual accuracy.
  • Model Lifecycle:
    • Designing architectures that cater to specific serving requirements.
    • Distilling large models into smaller, efficient productions models.
    • Optimizing model serving through tools like post-training quantization.
  • Challenges and Solutions:
    • Addressing query latency and document retrieval efficiency using dynamic batching and model quantization.
    • Utilizing techniques like contrastive learning to train models effectively.
  • Practical Strategies: The presentation also covered strategies for transitioning embedding models from research to production while ensuring high performance and scalability.

This is the end of the AI-generated content.


Embedding models are at the core of search, recommendation, and retrieval-augmented generation (RAG) systems, transforming data into meaningful representations. We can adapt state-of-the-art large language models (LLMs) into embedding models that generate high-quality embeddings, but deploying these models in large-scale applications presents significant challenges.

This talk explores the end-to-end lifecycle of embedding systems, including:

  • Leveraging LLMs for high-quality embeddings and adapting them for domain-specific use cases using contrastive learning.
  • Designing custom architectures optimized for use-case specific serving requirements.
  • Distilling large embedding models into smaller, production-friendly sizes.
  • Serving embeddings efficiently with optimization strategies like variable batch sizes and post-training quantization.

Attendees will leave with practical strategies for scaling embedding models from research to production, ensuring high performance and efficiency in real-world applications like retrieving best matching documents, passages or images, data de-duplication, generating personalized recommendations, content clustering, and grounding GenAI responses using RAG approach.


Speaker

Sahil Dua

Senior Software Engineer, Machine Learning @Google, Stanford AI, Co-Author of “The Kubernetes Workshop”, Open-Source Enthusiast

Sahil Dua is a Tech Lead focused on developing and adapting large language models (LLMs) with an expertise in Representation Learning. He oversees the full LLM lifecycle, from designing data pipelines and model architectures to optimizing models for highly efficient serving. Before Google, Sahil worked on the ML platform at Booking.com to scale machine learning model development and deployment.

 

A co-author of “The Kubernetes Workshop” book and an open-source enthusiast, Sahil has contributed to projects like Git, Pandas, and Linguist. As a frequent speaker at global conferences, he shares insights on AI, machine learning, and tech innovation, inspiring professionals across the industry.

Read more
Find Sahil Dua at:

Date

Tuesday Apr 8 / 03:55PM BST ( 50 minutes )

Location

Churchill (Ground Fl.)

Topics

AI/ML embedding models rag

Slides

Slides are not available

Share

From the same track

Session AI/ML

Deploy MultiModal RAG Systems with vLLM

Tuesday Apr 8 / 10:35AM BST

While text-based RAG systems have been everywhere in the last year and a half, there is so much more than text data. Images, audio, and documents often need to be processed together to provide meaningful insights, yet most RAG implementations focus solely on text.

Speaker image - Stephen Batifol

Stephen Batifol

Developer Advocate @Zilliz, Founding Member of the MLOps Community Berlin, Previously Machine Learning Engineer @Wolt, and Data Scientist @Brevo

Session AI/ML

How to Unlock Insights and Enable Discovery Within Petabytes of Autonomous Driving Data

Tuesday Apr 8 / 05:05PM BST

For autonomous vehicle companies, finding valuable insights within millions of hours of video data is essential yet challenging.

Speaker image - Kyra Mozley

Kyra Mozley

ML Engineer @Wayve, Previously Security & AI PhD Candidate @Royal Holloway University

Session AI/ML

AI for Food Image Generation in Production: How & Why

Tuesday Apr 8 / 01:35PM BST

In this talk, we will conduct a technical overview of a client-facing Food Image Generation solution developed at Delivery Hero.

Speaker image - Iaroslav  Amerkhanov

Iaroslav Amerkhanov

Senior Data Scientist @Delivery Hero, Founder of T4lky, Creator & Host of EPAM Podcast, Speaker

Session AI/ML

Foundation Models for Ranking: Challenges, Successes, and Lessons Learned

Tuesday Apr 8 / 02:45PM BST

Recommender systems are an integral part of most products nowadays and are often a key driver of discovery for users of the product.

Speaker image - Moumita Bhattacharya

Moumita Bhattacharya

Senior Research Scientist @Netflix, Previously @Etsy

Session

Unconference: AI and ML for Software Engineers

Tuesday Apr 8 / 11:45AM BST