The guts of LEMoE
Find out what makes this middleware so special.
Smart Routing
At the heart of LEMoE beats an engine of Local Machine Learning based on dense HuggingFace models. Instead of just searching for keywords, LEMoE converts your message into a mathematical vector and compares it with the semantic space of each expert configured in milliseconds. If the model doubts, an advanced algorithm Fuzzy Matching comes in as a backup.
Integrated Multi-Backend and Extreme Efficiency
Why limit yourself to just one technology? LEMoE is agnostic to the underlying engine. Can wake up a model ONNX local loading it into RAM with an LRU cache system for instant commands, fire a REST request to Ollama, or send the traffic to external APIs. Super optimized systems: in stress tests with 15 experts available in the system, the core consumes only 1.5GB RAM.
Security by Design (Open Source)
Being from open source and auditable, we guarantee total transparency. In exposed environments, LEMoE acts as a firewall that intercepts and blocks security vulnerabilities. Path Traversal y SSRF, preventing the injection of arbitrary routes. Additionally, it filters the maximum payload size and obfuscates any sensitive information in the logs to ensure regulatory compliance.
100% Compatible API
You don't have to reschedule your clients. LEMoE exposes a Flask server whose main endpoint mimics the industry standard. If you use a Frontend, AnythingLLM or the official Python library itself, you just need to change the connection URL to http://localhost:11435. LEMoE will act as a transparent translator.