Run 16+ AI coding agents locally — in parallel, on your hardware.
Run MLX and llama.cpp simultaneously on Apple Silicon, or vLLM on Linux/CUDA — across three coordinator modes (Flat, Pipeline, Router). No cloud, no API keys, no data leaves your machine.
npm install -g @keepdevops/matrix
Matrix Swarm is a local-first orchestration layer for open-weight LLMs. Spin up a fleet of role-specialized agents — Architect, Programmer, Security, DevOps, and more — then broadcast one prompt to all of them, pipe it through a sequence, or let a Router model dispatch each task to the best fit.
Models run entirely on your hardware: MLX + llama.cpp on Apple Silicon, vLLM on Linux/CUDA. Nothing leaves the box. Built for engineers who want Cursor-class productivity without sending their codebase to the cloud.
npm install -g @keepdevops/matrix
Node.js 18+ on macOS (Apple Silicon) or Linux/CUDA.
matrix init --preset 16gb
Generates a swarm config with sensible agent + model defaults for your hardware.
matrix run "build a REST API"
Broadcasts to all agents (Flat) or pipe through with --mode pipeline / router.
Cursor-class productivity without sending your codebase to a cloud LLM. Your laptop, your models, your repo.
Mix MLX, llama.cpp, and vLLM in a single run. Compare model behavior across agents on identical prompts.
Financial, healthcare, defense — anywhere proprietary code can't leave the box. Air-gapped friendly.
| Matrix Swarm | Cursor | Aider | Cline | |
|---|---|---|---|---|
| Runs fully local | Yes | No | Optional | Optional |
| Multi-agent orchestration | Yes (16+) | No | No | No |
| Mix backends per agent | MLX + llama.cpp + vLLM | No | No | No |
| Coordinator modes | Flat · Pipeline · Router | — | — | — |
| Open source | Yes | No | Yes | Yes |
Head-to-head against the major multi-agent frameworks. Matrix Swarm's lane: local-first coding swarms with mixable backends per agent.
| Aspect | Matrix Swarm | CrewAI | LangGraph | AutoGen (MS) | OpenDevin | MetaGPT |
|---|---|---|---|---|---|---|
| Core focus | Local coding/DevOps swarms | Role-based team workflows | Stateful graph workflows | Conversational multi-agent | Autonomous coding agent | Software company simulation |
| Local-first | Yes (air-gapped) | Optional (Ollama) | Optional | Optional | Strong (sandboxed terminal) | Optional |
| Backend support | MLX + llama.cpp + vLLM, mixable per agent | Any (incl. local) | LangChain ecosystem | Multiple + local | Ollama / local | Any |
| # of agents | 16+ pre-built | User-defined roles | Nodes in graph | Dynamic conversational | Single main + tools | Fixed dev team roles |
| Orchestration | Flat · Pipeline · Router | Sequential / hierarchical | Graph (loops, branches, state) | Message-based chats | Loop with tools | Pipeline (spec → code → test) |
| UI | Real-time React UI + code editing | CLI + basic | Visualization tools | AutoGen Studio | VS Code-like interface | CLI |
| Code execution | Real-time extraction & editing | Via tools | Via tools | Via tools | Full terminal sandbox | Generates artifacts |
| Ease of start | npm i -g + matrix run | Python crew kickoff | Graph definition | Conversational setup | Docker + web UI | Python setup |
| Best for | Solo devs, air-gapped teams, fast local coding | Content / research pipelines | Complex logic & production | Research, dynamic chats | Autonomous software engineering | End-to-end code generation |
| Customization | JSON config for agents/models | High (Python) | Very high | High | Moderate | Moderate |
| Hardware optimization | Excellent (Apple Silicon presets) | Good | Neutral | Neutral | Good | Neutral |
No. On Apple Silicon, MLX and llama.cpp use the unified memory and Neural Engine. On Linux, vLLM needs an NVIDIA GPU (CUDA 12+). CPU-only llama.cpp also works — just slower.
You bring your own GGUF (llama.cpp), MLX, or HuggingFace weights. Matrix Swarm doesn't ship models. Recommended starters: Llama 3, Qwen 2.5, DeepSeek-Coder.
Flat broadcasts your prompt to all agents in parallel. Pipeline chains them in a fixed sequence (e.g., Architect → Programmer → Reviewer). Router uses a small dispatcher model to pick the best agent per request.
No. Inference, code extraction, and config all happen locally. There are no telemetry calls. (You can optionally point an agent at a remote OpenAI-compatible endpoint if you want — but it's off by default.)
Drop a JSON entry into your swarm-config.json with a system prompt, model binding, and role. See swarm-config-16gb.json in the repo for examples.
The CLI installs via npm. Docker is used to run vLLM model servers (via Docker Model Runner on ports 8080–8083) when running the Linux/CUDA backend. See docker/Dockerfile.vllm-metal.