Abstract
Self-hosted Language Models are going to power the next generation of applications in critical industries like financial services, healthcare, and defence. Self-hosting LLMs, as opposed to using API-based models, comes with its own host of challenges - as well as needing to solve business problems, engineers need to wrestle with the intricacies of model inference, deployment and infrastructure. In this talk we are going to discuss the best practices in model optimisation, serving and monitoring - with practical tips and real case-studies.
Interview:
What's the focus of your work these days?
At TitanML our focus is on making Generative AI applications easier to develop, deploy and serve. A large focus of our work recently is making it easier to build applications that involve both RAG and JSON constrained outputs.
What's the motivation for your talk at QCon London 2024?
Almost every business is trying to build and deploy LLM applications at the moment, however very few of them have successfully got these applications into production. Our teams are experts in deploying and serving LLM apps so we have a lot of tips and tricks to help other developers avoid common pitfalls.
How would you describe your main persona and target audience for this session?
This session is interesting for those working with or thinking of building with Generative AI, especially self-hosted open source AI. It is not a 'code-along' session, however there may be some technical concepts.
Is there anything specific that you'd like people to walk away with after watching your session?
I want this persona to realize that deploying LLMs within your own environment is a viable option and is not as scary as it might appear!
Speaker
Meryem Arik
Co-Founder and CEO @Doubleword (Previously TitanML), Recognized as a Technology Leader in Forbes 30 Under 30, Recovering Physicist
Meryem is the Co-founder and CEO of Doubleword (previously TitanML), a self-hosted AI inference platform empowering enterprise teams to deploy domain-specific or custom models in their private environment. An alumna of Oxford University, Meryem studied Theoretical Physics and Philosophy. She frequently speaks at leading conferences, including TEDx and QCon, sharing insights on inference technology and enterprise AI. Meryem has been recognized as a Forbes 30 Under 30 honoree for her contributions to the AI field.