Intelligent Middleware

Route each prompt to Perfect Expert.

LEMoE acts as the central brain between your users and Artificial Intelligence. Analyze what you need and redirect the conversation to the ideal model in milliseconds, whether in the cloud or on your own local servers.

lemoe

Why use LEMoE?

Designed for speed, privacy and maximum flexibility.

Smart Routing

100% local semantic decision engine. Understand the real context of each message instantly and select the right model without sending your data to the cloud.

Extreme Efficiency

Super optimized systems. In real stress tests with 15 experts available in the system, the kernel consumes only 1.5GB RAM.

Audited Security

Being from open source and auditable, we guarantee transparency. Prevents Path Traversal, SSRF and obfuscates sensitive logs automatically.

Multi-Backend

Connect local Ollama models, ultra-light inference in RAM (ONNX), Llama.cpp and external APIs (Groq, OpenAI) into a single central system.

See all features

How magic works

A solid architecture that decides in milliseconds.

Frontend (UI)
"command to start nginx on port 80"
LEMoE Router
Vectorization E5 + Softmax (Score: 0.98)
External API (OpenAI Compatible)
Legal Expert / Copywriter
Local ONNX (T5)
DevOps Expert (malbec)
Local Ollama
Python programmer

Solving Real Problems

How LEMoE fits into your infrastructure.

AI switchboard

A single bot that routes customer questions to specialized models (legal, support, shipping) in milliseconds.

Zero Data Leak

It keeps your code and secrets on secure local servers, while pushing only trivial queries to the public cloud.

Smart Routing

Save thousands of dollars by submitting easy tasks to local free models and using premium APIs only when necessary.

Business Scale

For the user, there is only one "model". All the complexity of orchestrating 15 or 100 experts behind them is 100% invisible to them.

Explore Use Cases

Pricing Plans

Open License. Ready to adapt to your Artificial Intelligence adoption level.

🟢 Community

Free / Self-hosted

Target audience: Solo developers, students, and very small startups (1-5 employees).

  • Internal use exclusively (Non-commercial)
  • Full source code on GitHub
  • Community support
Download Code
RECOMMENDED

🟣 Coming Soon

Commercial

Target audience: Agencies, SMBs, and large corporations wanting to use LEMoE commercially.

  • Legal commercial use permit
  • Priority support / direct access to creator
  • Consulting, Onboarding, and SLA
Contact

Frequently Asked Questions

We resolve typical doubts before you have them.

Do I need a powerful graphics card (GPU) to use ONNX? +
No. LEMoE's ONNXRunner is designed to run small model inference on CPU by loading them directly into system RAM. In fact, it is so optimized that it works perfectly on modest hardware.
Can I connect to Anthropic or Gemini instead of OpenAI? +
LEMoE speaks the universal dialect of OpenAI (/v1/chat/completions). You can connect third-party APIs without problem using proxies that translate the API (like LiteLLM) or directly use those that are already supported natively (like Groq, Together, etc.).
How many experts can I put in? +
Practically unlimited. The router compares mathematical vectors using cosine similarity ultra-fast. Having 50 or 100 experts will only add a few extra milliseconds to the decision phase, being imperceptible to the human user.
Can I use a custom routing model? +
Yes. Although by default LEMoE uses HuggingFace fast models like E5-small, you can configure your own model or routing algorithm on the backend to tailor the decision logic to your exact needs.
What happens if my server runs out of RAM? +
For local models (ONNX), LEMoE implements a system of LRU cache (Least Recently Used). You can limit, for example, that there are only 2 models loaded at a time. When the third party is called, LEMoE automatically evicts the model that has not been used the longest from memory.