The guts of LEMoE

Find out what makes this middleware so special.

Smart Routing

At the heart of LEMoE beats an engine of Local Machine Learning based on dense HuggingFace models. Instead of just searching for keywords, LEMoE converts your message into a mathematical vector and compares it with the semantic space of each expert configured in milliseconds. If the model doubts, an advanced algorithm Fuzzy Matching comes in as a backup.

"Write a script in Python"
python_programmer
0.95
legal_advisor
0.12
copywriter
0.08

Integrated Multi-Backend and Extreme Efficiency

Why limit yourself to just one technology? LEMoE is agnostic to the underlying engine. Can wake up a model ONNX local loading it into RAM with an LRU cache system for instant commands, fire a REST request to Ollama, or send the traffic to external APIs. Super optimized systems: in stress tests with 15 experts available in the system, the core consumes only 1.5GB RAM.

LEMoE Core
External API
RAM (ONNX)
Ollama

Security by Design (Open Source)

Being from open source and auditable, we guarantee total transparency. In exposed environments, LEMoE acts as a firewall that intercepts and blocks security vulnerabilities. Path Traversal y SSRF, preventing the injection of arbitrary routes. Additionally, it filters the maximum payload size and obfuscates any sensitive information in the logs to ensure regulatory compliance.

FIREWALL
{"role": "user", "content": "Hola"}
"model": "../../etc/shadow"
BLOCKED!

100% Compatible API

You don't have to reschedule your clients. LEMoE exposes a Flask server whose main endpoint mimics the industry standard. If you use a Frontend, AnythingLLM or the official Python library itself, you just need to change the connection URL to http://localhost:11435. LEMoE will act as a transparent translator.

Frontend
POST /v1/chat/completions
LEMoE