Want to Run LLMs on Your Device? Meet MLC

Authors

Running ChatGPT or any other large language model locally on your device is a dream for many. But before that dream becomes reality, there are quite a few hurdles to clear. Some, like hardware limitations or the quality of available models, are out of our hands. Others, like how the model is prepared, delivered, and executed on the device, are problems we can tackle.

And if local LLMs are your goal, I’ve got good news—MLC might be exactly what you need.

What is MLC?

MLC stands for Machine Learning Compilation. At first glance, you might think it just compiles the model—but it actually goes further. It also compiles the runtime, which caught me off guard. I assumed it was just the model getting optimized, but when you think about it, it wouldn’t make sense to also ship every utility function along with it.

And since we’re talking about mobile and edge devices, every bit of optimization counts. But MLC doesn’t stop at mobile—it targets any environment that can run C++.

How does it work?

MLC analyzes your model’s architecture, figures out which functions are needed to run it, and then compiles both an optimized version of the model and a custom runtime. Since that runtime uses C++, it can run in just about any environment.

In layman’s terms, imagine following a cooking recipe. Before you start cooking, you grab the ingredients from the fridge. Now imagine the “fridge” is a massive library of functions, and the “recipe” is your model’s architecture.

And let’s be honest—sometimes recipes call for super complex ingredients, fancy techniques, or rare tools. But we don’t always need those. We just want the end result. That’s what MLC does—it finds all the “we don’t need that” parts and simplifies them.

In the end, we’re left with clean ingredients and a streamlined recipe, and we’re ready to cook (aka: run inference). And the best part? Now that recipe works in any kitchen.

What does this look like in practice?

Let’s say you want to run a quantized 3B LLM on your Android phone.

With MLC, you would:

Convert the model (e.g., LLaMA or Mistral) into the correct format using MLC’s model converter,
Compile the runtime for your target platform (e.g., Android, Web, macOS),
Bundle both into your app and use MLC’s C++ or JavaScript bindings to call it.

In the end, you get something like this in your mobile code when using our react-native-ai library, integrated with ai-sdk:

const { textStream } = streamText({
      model: getModel(modelId),
      temperature: 0.6,
      messages: messages
});

Cooking a meal for an army... or AI for mobile

Can it run on your device?

One of the most common questions I’ve seen on social media (and from my colleagues): “Will this library let me run an LLM on my low-end device?”

Short answer: it’s not really about the library—it’s about the model.

Let’s jump back into the kitchen for a second.

You’re at home, cooking a meal. Suddenly, someone asks you to feed an army. No matter how good your tools or prep are, there’s a hard limit to what your kitchen can handle. It’s the same with LLMs.

When you see a “3B” model, “7B”, or “70B”, think of cooking for a small party (3B), a packed club (7B), or an entire stadium (70B).

MLC helps create the best conditions—smart practices, runtime optimization—to run a model on your device. But some models are just too big. They’ll run slow, stutter, or even crash your device. That’s reality.

Still, if you stay within your limits… your kitchen can start to feel Michelin-star-worthy.

MLC helps create the best conditions, but sometimes models are just too big

Why do we like MLC?

There are other frameworks that aim to bring machine learning to edge devices—like TensorFlow Lite, ONNX Runtime, or Core ML (on Apple devices). These solutions also offer optimizations, and they often come with strong backing from big tech companies.

However, MLC stands out in a few key ways:

It compiles not just the model, but also the runtime, making the result more lightweight and optimized for your specific target.
It’s designed to work cross-platform, not just Android or iOS. If it supports C++, it supports MLC.
It’s completely open-source, meaning you can dig into the internals or customize it however you want.

While the ecosystem around MLC is still growing, it brings a unique blend of portability, performance, and flexibility that’s hard to find in other tools.

Right now, it’s focused on LLMs. But at the end of the day, all models—whether vision, audio, or analysis—are just math ops stacked into layers. So maybe, in the future, this same framework could power every kind of model.

And that’s something worth getting excited about.

Table of contents

This is some text inside of a div block.

Integrating AI into your React Native workflow?

We help teams leverage AI to accelerate development and deliver smarter user experiences.

Let’s chat

Link copied to clipboard!

Button