coficube

Matrix Swarm

Run 16+ AI coding agents locally — in parallel, on your hardware.

Run MLX and llama.cpp simultaneously on Apple Silicon, or vLLM on Linux/CUDA — across three coordinator modes (Flat, Pipeline, Router). No cloud, no API keys, no data leaves your machine.

npm @keepdevops/matrix • v2.0.3

$ npm install -g @keepdevops/matrix

npm install -g @keepdevops/matrix GitHub Repository →

$ matrix "build a REST API for a todo app"

→ broadcasting to 4 agents…

[Architect] proposing schema + endpoints…

[Programmer] generating Express + SQLite…

[Security] auditing auth & input validation…

[DevOps] writing Dockerfile + CI…

✓ done in 11.4s — 312 LOC across 6 files

Architect Programmer Security DevOps Reviewer Tester Researcher Debugger Refactorer Documenter + 6 more

What it is

Matrix Swarm is a local-first orchestration layer for open-weight LLMs. Spin up a fleet of role-specialized agents — Architect, Programmer, Security, DevOps, and more — then broadcast one prompt to all of them, pipe it through a sequence, or let a Router model dispatch each task to the best fit.

Models run entirely on your hardware: MLX + llama.cpp on Apple Silicon, vLLM on Linux/CUDA. Nothing leaves the box. Built for engineers who want Cursor-class productivity without sending their codebase to the cloud.

Quickstart — under a minute

Install

npm install -g @keepdevops/matrix

Node.js 18+ on macOS (Apple Silicon) or Linux/CUDA.

Configure

matrix init --preset 16gb

Generates a swarm config with sensible agent + model defaults for your hardware.

Run

matrix run "build a REST API"

Broadcasts to all agents (Flat) or pipe through with --mode pipeline / router.

How it works

Your Prompt

↓

Coordinator

↓ Flat / Pipeline / Router

Architect

Programmer

Security

DevOps

Reviewer

+ 11 more

↓

MLX

llama.cpp

vLLM

↓

Code · Files · Reviews

Who it's for

Solo developers

Cursor-class productivity without sending your codebase to a cloud LLM. Your laptop, your models, your repo.

ML researchers

Mix MLX, llama.cpp, and vLLM in a single run. Compare model behavior across agents on identical prompts.

Regulated teams

Financial, healthcare, defense — anywhere proprietary code can't leave the box. Air-gapped friendly.

Features & requirements

Key Features

16+ specialized agents (Architect, Programmer, DevOps, Security, etc.)
Run MLX + llama.cpp concurrently on Apple Silicon, or vLLM on Linux/CUDA (via Docker Model Runner) — mix backends per agent
Three coordinator modes: Flat (broadcast), Pipeline (sequential), Router (smart dispatch)
Real-time code extraction & editing

Requirements

macOS (Apple Silicon) — MLX + llama.cpp
Linux (NVIDIA GPU, CUDA 12+) — vLLM via Docker Model Runner
Node.js ≥ 18
Local GGUF / MLX / HF models
16GB+ RAM (32GB+ recommended)

How it compares

	Matrix Swarm	Cursor	Aider	Cline
Runs fully local	Yes	No	Optional	Optional
Multi-agent orchestration	Yes (16+)	No	No	No
Mix backends per agent	MLX + llama.cpp + vLLM	No	No	No
Coordinator modes	Flat · Pipeline · Router	—	—	—
Open source	Yes	No	Yes	Yes

30-second take — vs. other agent frameworks

Head-to-head against the major multi-agent frameworks. Matrix Swarm's lane: local-first coding swarms with mixable backends per agent.

Aspect	Matrix Swarm	CrewAI	LangGraph	AutoGen (MS)	OpenDevin	MetaGPT
Core focus	Local coding/DevOps swarms	Role-based team workflows	Stateful graph workflows	Conversational multi-agent	Autonomous coding agent	Software company simulation
Local-first	Yes (air-gapped)	Optional (Ollama)	Optional	Optional	Strong (sandboxed terminal)	Optional
Backend support	MLX + llama.cpp + vLLM, mixable per agent	Any (incl. local)	LangChain ecosystem	Multiple + local	Ollama / local	Any
# of agents	16+ pre-built	User-defined roles	Nodes in graph	Dynamic conversational	Single main + tools	Fixed dev team roles
Orchestration	Flat · Pipeline · Router	Sequential / hierarchical	Graph (loops, branches, state)	Message-based chats	Loop with tools	Pipeline (spec → code → test)
UI	Real-time React UI + code editing	CLI + basic	Visualization tools	AutoGen Studio	VS Code-like interface	CLI
Code execution	Real-time extraction & editing	Via tools	Via tools	Via tools	Full terminal sandbox	Generates artifacts
Ease of start	`npm i -g` + `matrix run`	Python crew kickoff	Graph definition	Conversational setup	Docker + web UI	Python setup
Best for	Solo devs, air-gapped teams, fast local coding	Content / research pipelines	Complex logic & production	Research, dynamic chats	Autonomous software engineering	End-to-end code generation
Customization	JSON config for agents/models	High (Python)	Very high	High	Moderate	Moderate
Hardware optimization	Excellent (Apple Silicon presets)	Good	Neutral	Neutral	Good	Neutral

FAQ

Do I need a GPU?

No. On Apple Silicon, MLX and llama.cpp use the unified memory and Neural Engine. On Linux, vLLM needs an NVIDIA GPU (CUDA 12+). CPU-only llama.cpp also works — just slower.

Where do models come from?

You bring your own GGUF (llama.cpp), MLX, or HuggingFace weights. Matrix Swarm doesn't ship models. Recommended starters: Llama 3, Qwen 2.5, DeepSeek-Coder.

What's the difference between Flat, Pipeline, and Router?

Flat broadcasts your prompt to all agents in parallel. Pipeline chains them in a fixed sequence (e.g., Architect → Programmer → Reviewer). Router uses a small dispatcher model to pick the best agent per request.

Is anything sent to the cloud?

No. Inference, code extraction, and config all happen locally. There are no telemetry calls. (You can optionally point an agent at a remote OpenAI-compatible endpoint if you want — but it's off by default.)

How do I add a custom agent?

Drop a JSON entry into your swarm-config.json with a system prompt, model binding, and role. See swarm-config-16gb.json in the repo for examples.

Is there a Docker install?

The CLI installs via npm. Docker is used to run vLLM model servers (via Docker Model Runner on ports 8080–8083) when running the Linux/CUDA backend. See docker/Dockerfile.vllm-metal.