Intelligent Middleware

Route each prompt to Perfect Expert.

l3mcore acts as the central brain between your users and Artificial Intelligence. Analyze what you need and redirect the conversation to the ideal model in milliseconds, whether in the cloud or on your own local servers.

$ curl -sSL https://raw.githubusercontent.com/lemoelink/l3mcore/refs/heads/master/setup.sh | bash

View Documentation Features

Natively compatible with

OpenAI API Ollama Groq Open WebUI Local Models (ONNX) AnythingLLM Llama.cpp Custom APIs OpenAI API Ollama Groq Open WebUI Local Models (ONNX) AnythingLLM Llama.cpp Custom APIs

"Translate this to Japanese..."

Local Model (Llama 3)

"Debug this Python script..."

Expert (Qwen Coder API)

Massive Cost Savings

Don't use GPT-4 to reply a simple "Hello". l3mcore sends easy tasks to free local models and reserves expensive APIs exclusively for complex tasks.

Total Privacy

Automatically routes prompts with sensitive info (medical data, internal source code) towards your local models, ensuring they never leave to the cloud.

Drop-in Integration

100% compatible with OpenAI and Ollama API. Just change a single URL in your current app or client and you'll get smart routing without touching your code.

Watch it in action

Here we see the console and Open WebUI. We are using 4 experts: 1 local ONNX model (Malbec), 1 on Ollama, and 2 API calls with Groq.

Why l3mcore?

Designed for speed, privacy and maximum flexibility in production.

Extreme Efficiency

The core is so optimized that in real stress tests with 15 active experts it consumes only 1.5 GB of RAM. Forget about bottlenecks and server overhead.

Memory Usage (15 Experts)

1.5 GB

Audited Security

Being open source, we guarantee transparency. Prevents Path Traversal, SSRF and automatically obfuscates sensitive logs to prevent data leaks (Zero Data Leak).

Multi-Backend System

Unify all your AI sources. Connect local models, CPU inference and the most powerful APIs on the market in a single proxy.

Ollama (Local GPU/CPU)
ONNX (Local CPU RAM)
OpenAI / Groq / Anthropic

Plugin System

Extend l3mcore capabilities to your needs. Discover, download and create custom modules in our Plugin Directory.

Semantic Routing

100% local decision engine. Instantly understands the real context of each message and selects the appropriate model using vector mathematics.

Drop-in Integration

You don't have to learn anything new. Keep using the OpenAI SDK.

Before (Direct to OpenAI)

      from openai import OpenAI

      # Connected to commercial cloud

      client = OpenAI(

          api_key="sk-proj-...",

          base_url="https://api.openai.com/v1"

      )

      response = client.chat.completions.create(

          model="gpt-4o",

          messages=[{"role": "user", "content": "Hello"}]

      )

After (Using l3mcore)

      from openai import OpenAI

      # Connected to your local smart router

      client = OpenAI(

          api_key="lm-...",

          base_url="http://localhost:11435/v1"

      )

      response = client.chat.completions.create(

          model="auto", # <-- l3mcore chooses the ideal expert

          messages=[{"role": "user", "content": "Hello"}]

      )

Frequently Asked Questions

We resolve typical doubts before you have them.

Do I need a powerful graphics card (GPU) to use ONNX? +

No. l3mcore's ONNXRunner is designed to run small model inference on CPU by loading them directly into system RAM. In fact, it is so optimized that it works perfectly on modest hardware.

Can I connect to Anthropic or Gemini instead of OpenAI? +

l3mcore speaks the universal dialect of OpenAI (/v1/chat/completions). You can connect third-party APIs without problem using proxies that translate the API (like LiteLLM) or directly use those that are already supported natively (like Groq, Together, etc.).

How many experts can I put in? +

Practically unlimited. The router compares mathematical vectors using cosine similarity ultra-fast. Having 50 or 100 experts will only add a few extra milliseconds to the decision phase, being imperceptible to the human user.

Can I use a custom routing model? +

Yes. Although by default l3mcore uses HuggingFace fast models like E5-small, you can configure your own model or routing algorithm on the backend to tailor the decision logic to your exact needs.

What happens if my server runs out of RAM? +

For local models (ONNX), l3mcore implements a system of LRU cache (Least Recently Used). You can limit, for example, that there are only 2 models loaded at a time. When the third party is called, l3mcore automatically evicts the model that has not been used the longest from memory.

Ready to optimize your AI?

Install l3mcore in less than 1 minute and start saving time, money, and resources in your Artificial Intelligence infrastructure.

Get Started Now