Matrix Swarm

Run 16+ AI coding agents locally — in parallel, on your hardware.

Run MLX and llama.cpp simultaneously on Apple Silicon, or vLLM on Linux/CUDA — across three coordinator modes (Flat, Pipeline, Router). No cloud, no API keys, no data leaves your machine.

npm @keepdevops/matrix • v1.1

$ npm i @keepdevops/matrix

npm install @keepdevops/matrix GitHub Repository →

$ matrix "build a REST API for a todo app"

→ broadcasting to 4 agents…

[Architect] proposing schema + endpoints…

[Programmer] generating Express + SQLite…

[Security] auditing auth & input validation…

[DevOps] writing Dockerfile + CI…

✓ done in 11.4s — 312 LOC across 6 files

Architect Programmer Security DevOps Reviewer Tester Researcher Debugger Refactorer Documenter + 6 more

What it is

Matrix Swarm is a local-first orchestration layer for open-weight LLMs. Spin up a fleet of role-specialized agents — Architect, Programmer, Security, DevOps, and more — then broadcast one prompt to all of them, pipe it through a sequence, or let a Router model dispatch each task to the best fit.

Models run entirely on your hardware: MLX + llama.cpp on Apple Silicon, vLLM on Linux/CUDA. Nothing leaves the box. Built for engineers who want Cursor-class productivity without sending their codebase to the cloud.

Key Features

16+ specialized agents (Architect, Programmer, DevOps, Security, etc.)
Run MLX + llama.cpp concurrently on Apple Silicon, or vLLM on Linux/CUDA (via Docker Model Runner) — mix backends per agent
Three coordinator modes: Flat (broadcast), Pipeline (sequential), Router (smart dispatch)
Real-time code extraction & editing

Requirements

macOS (Apple Silicon) — MLX + llama.cpp
Linux (NVIDIA GPU, CUDA 12+) — vLLM via Docker Model Runner
Node.js ≥ 18
Local GGUF / MLX / HF models
16GB+ RAM (32GB+ recommended)