Abstract

Self-hosted Language Models are going to power the next generation of applications in critical industries like financial services, healthcare, and defence. Self-hosting LLMs, as opposed to using API-based models, comes with its own host of challenges - as well as needing to solve business problems, engineers need to wrestle with the intricacies of model inference, deployment and infrastructure. In this talk we are going to discuss the best practices in model optimisation, serving and monitoring - with practical tips and real case-studies.

Interview:

What's the focus of your work these days?

At TitanML our focus is on making Generative AI applications easier to develop, deploy and serve. A large focus of our work recently is making it easier to build applications that involve both RAG and JSON constrained outputs.

What's the motivation for your talk at QCon London 2024?

Almost every business is trying to build and deploy LLM applications at the moment, however very few of them have successfully got these applications into production. Our teams are experts in deploying and serving LLM apps so we have a lot of tips and tricks to help other developers avoid common pitfalls.

How would you describe your main persona and target audience for this session?

This session is interesting for those working with or thinking of building with Generative AI, especially self-hosted open source AI. It is not a 'code-along' session, however there may be some technical concepts.

Is there anything specific that you'd like people to walk away with after watching your session?

I want this persona to realize that deploying LLMs within your own environment is a viable option and is not as scary as it might appear!

Speaker

Meryem Arik

Co-Founder and CEO @Doubleword (Previously TitanML), Recognized as a Technology Leader in Forbes 30 Under 30, Recovering Physicist

Meryem is the Co-founder and CEO of Doubleword (previously TitanML), a self-hosted AI inference platform empowering enterprise teams to deploy domain-specific or custom models in their private environment. An alumna of Oxford University, Meryem studied Theoretical Physics and Philosophy. She frequently speaks at leading conferences, including TEDx and QCon, sharing insights on inference technology and enterprise AI. Meryem has been recognized as a Forbes 30 Under 30 honoree for her contributions to the AI field.

From the same track

Session AI/ML

Retrieval-Augmented Generation (RAG) Patterns and Best Practices

Monday Apr 8 / 10:35AM BST

The rise of LLMs that coherently use language has led to an appetite to ground the generation of these models in facts and private collections of data.

Jay Alammar

Director & Engineering Fellow @Cohere & Co-Author of "Hands-On Large Language Models"

Session AI/ML

Reach Next-Level Autonomy with LLM-Based AI Agents

Monday Apr 8 / 01:35PM BST

Generative AI has emerged rapidly since the release of ChatGPT, yet the industry is still at its very early stage with unclear prospects and potential.

Tingyi Li

Enterprise Solutions Architect @AWS

Session AI/ML

LLM and Generative AI for Sensitive Data - Navigating Security, Responsibility, and Pitfalls in Highly Regulated Industries

Monday Apr 8 / 02:45PM BST

As large language models (LLM) become more prevalent in highly regulated industries, dealing with sensitive data and ensuring the security and ethical design of machine learning (ML) models is paramount.

Stefania Chaplin

Founder & CEO @DevStefOps, Previously Solutions Architect @GitLab, AWS Certified Security - Speciality

Azhir Mahmood

Research Scientist @PhysicsX

Session AI/ML

How Green is Green: LLMs to Understand Climate Disclosure at Scale

Monday Apr 8 / 05:05PM BST

Assessment of the validity of climate finance claims requires a system that can handle significant variation in language, format, and structure present in climate and financial reporting documentation, and knowledge of the domain-specific language of climate science and finance.

Leo Browning

First ML Engineer @ClimateAligned

Session AI/ML

The AI Revolution Will Not Be Monopolized: How Open-Source Beats Economies of Scale, Even for LLMs

Monday Apr 8 / 03:55PM BST

With the latest advancements in Natural Language Processing and Large Language Models (LLMs), and big companies like OpenAI dominating the space, many people wonder: Are we heading further into a black box era with larger and larger models, obscured behind APIs controlled by big t

Ines Montani

Co-Founder & CEO @Explosion, Core Developer of spaCy

Navigating LLM Deployment: Tips, Tricks, and Techniques

Abstract

Interview:

What's the focus of your work these days?

What's the motivation for your talk at QCon London 2024?

How would you describe your main persona and target audience for this session?

Is there anything specific that you'd like people to walk away with after watching your session?

Speaker

Meryem Arik

Speaker

Meryem Arik

Date

Location

Track

Topics

Share

From the same track

Retrieval-Augmented Generation (RAG) Patterns and Best Practices

Reach Next-Level Autonomy with LLM-Based AI Agents

LLM and Generative AI for Sensitive Data - Navigating Security, Responsibility, and Pitfalls in Highly Regulated Industries

How Green is Green: LLMs to Understand Climate Disclosure at Scale

The AI Revolution Will Not Be Monopolized: How Open-Source Beats Economies of Scale, Even for LLMs

Follow QCon

Contact

Menu

Conferences around the World