The AI Revolution Will Not Be Monopolized: How Open-Source Beats Economies of Scale, Even for LLMs

With the latest advancements in Natural Language Processing and Large Language Models (LLMs), and big companies like OpenAI dominating the space, many people wonder: Are we heading further into a black box era with larger and larger models, obscured behind APIs controlled by big tech monopolies?

I don’t think so, and in this talk, I’ll show you why. I’ll dive deeper into the open-source model ecosystem, some common misconceptions about use cases for LLMs in industry, practical real-world examples, and how basic principles of software development and best practices such as modularity, testability and flexibility still apply. LLMs are a great new tool in our toolkits, but the end goal remains to create a system that does what you want it to do. Explicit is still better than implicit, and composable building blocks still beat huge black boxes.

Interview:

What's the focus of your work these days?

I'm the co-founder and CEO of Explosion and a core developer of spaCy, a popular open-source library for Natural Language Processing in Python, and Prodigy, a modern annotation and data development tool for machine learning.

The majority of my time is spent on making it easier for developers of all kinds of different backgrounds to use the latest developments in Natural Language Processing and helping teams develop high-quality datasets efficiently and build modular and transparent natural language understanding applications.

What's the motivation for your talk at QCon London 2024?

There's a lot of talk about Large Language Models and I believe some people are wrong about how exactly they will transform the way we build AI systems. As ideas develop, we’re seeing more and more ways to use compute efficiently, producing AI systems that are cheaper to run and easier to control. 

I want to share some practical approaches that you can apply today. If you’re trying to build a system that does a particular thing, you don’t need to transform your request into arbitrary language and call into the largest model that understands arbitrary language the best. The people developing those models are telling that story, but the rest of us aren’t obliged to believe them.

How would you describe your main persona and target audience for this session?

ML / NLP Engineers

Is there anything specific that you'd like people to walk away with after watching your session?

The future won't just consist of larger and larger black-box models by big tech companies and you don't need to spend tons of money to join in. Best practices from software development still apply and there's so much you can do right now to build better and transparent AI systems.


Speaker

Ines Montani

Co-Founder & CEO @Explosion, Core Developer of spaCy

Ines Montani is a developer specializing in tools for AI and NLP technology. She’s the co-founder and CEO of Explosion and a core developer of spaCy, a popular open-source library for Natural Language Processing in Python, and Prodigy, a modern annotation tool for creating training data for machine learning models.

Read more
Find Ines Montani at:

Date

Monday Apr 8 / 03:55PM BST ( 50 minutes )

Location

Whittle (3rd Fl.)

Topics

AI/ML LLM open source

Share

From the same track

Session AI/ML

Retrieval-Augmented Generation (RAG) Patterns and Best Practices

Monday Apr 8 / 10:35AM BST

The rise of LLMs that coherently use language has led to an appetite to ground the generation of these models in facts and private collections of data.

Speaker image - Jay Alammar

Jay Alammar

Director & Engineering Fellow @Cohere & Co-Author of "Hands-On Large Language Models"

Session AI/ML

Navigating LLM Deployment: Tips, Tricks, and Techniques

Monday Apr 8 / 11:45AM BST

Self-hosted Language Models are going to power the next generation of applications in critical industries like financial services, healthcare, and defence.

Speaker image - Meryem Arik

Meryem Arik

Co-Founder @TitanML

Session AI/ML

Reach Next-Level Autonomy with LLM-Based AI Agents

Monday Apr 8 / 01:35PM BST

Generative AI has emerged rapidly since the release of ChatGPT, yet the industry is still at its very early stage with unclear prospects and potential.

Speaker image - Tingyi Li

Tingyi Li

Enterprise Solutions Architect @AWS

Session AI/ML

LLM and Generative AI for Sensitive Data - Navigating Security, Responsibility, and Pitfalls in Highly Regulated Industries

Monday Apr 8 / 02:45PM BST

As large language models (LLM) become more prevalent in highly regulated industries, dealing with sensitive data and ensuring the security and ethical design of machine learning (ML) models is paramount.

Speaker image - Stefania Chaplin

Stefania Chaplin

Solutions Architect @GitLab

Speaker image - Azhir Mahmood

Azhir Mahmood

Research Scientist @PhysicsX

Session AI/ML

How Green is Green: LLMs to Understand Climate Disclosure at Scale

Monday Apr 8 / 05:05PM BST

Assessment of the validity of climate finance claims requires a system that can handle significant variation in language, format, and structure present in climate and financial reporting documentation, and knowledge of the domain-specific language of climate science and finance.

Speaker image - Leo Browning

Leo Browning

First ML Engineer @ClimateAligned