coficube
coficube
coficube
coficube
coficube
coficube
caficube glass cube mug

Matrix Swarm

Run 16+ AI coding agents locally — in parallel, on your hardware.

Run MLX and llama.cpp simultaneously on Apple Silicon, or vLLM on Linux/CUDA — across three coordinator modes (Flat, Pipeline, Router). No cloud, no API keys, no data leaves your machine.

npm @keepdevops/matrix • v1.1
$ npm i @keepdevops/matrix
$ matrix "build a REST API for a todo app"
→ broadcasting to 4 agents…
[Architect] proposing schema + endpoints…
[Programmer] generating Express + SQLite…
[Security] auditing auth & input validation…
[DevOps] writing Dockerfile + CI…
✓ done in 11.4s — 312 LOC across 6 files
Architect Programmer Security DevOps Reviewer Tester Researcher Debugger Refactorer Documenter + 6 more

What it is

Matrix Swarm is a local-first orchestration layer for open-weight LLMs. Spin up a fleet of role-specialized agents — Architect, Programmer, Security, DevOps, and more — then broadcast one prompt to all of them, pipe it through a sequence, or let a Router model dispatch each task to the best fit.

Models run entirely on your hardware: MLX + llama.cpp on Apple Silicon, vLLM on Linux/CUDA. Nothing leaves the box. Built for engineers who want Cursor-class productivity without sending their codebase to the cloud.

Quickstart — under a minute

1

Install

npm i @keepdevops/matrix

Node.js 18+ on macOS (Apple Silicon) or Linux/CUDA.

2

Configure

matrix init --preset 16gb

Generates a swarm config with sensible agent + model defaults for your hardware.

3

Run

matrix run "build a REST API"

Broadcasts to all agents (Flat) or pipe through with --mode pipeline / router.

How it works

Your Prompt
Coordinator
↓ Flat / Pipeline / Router
Architect
Programmer
Security
DevOps
Reviewer
+ 11 more
MLX
llama.cpp
vLLM
Code · Files · Reviews

Who it's for

Solo developers

Cursor-class productivity without sending your codebase to a cloud LLM. Your laptop, your models, your repo.

ML researchers

Mix MLX, llama.cpp, and vLLM in a single run. Compare model behavior across agents on identical prompts.

Regulated teams

Financial, healthcare, defense — anywhere proprietary code can't leave the box. Air-gapped friendly.

Features & requirements

Key Features

  • 16+ specialized agents (Architect, Programmer, DevOps, Security, etc.)
  • Run MLX + llama.cpp concurrently on Apple Silicon, or vLLM on Linux/CUDA (via Docker Model Runner) — mix backends per agent
  • Three coordinator modes: Flat (broadcast), Pipeline (sequential), Router (smart dispatch)
  • Real-time code extraction & editing

Requirements

  • macOS (Apple Silicon) — MLX + llama.cpp
  • Linux (NVIDIA GPU, CUDA 12+) — vLLM via Docker Model Runner
  • Node.js ≥ 18
  • Local GGUF / MLX / HF models
  • 16GB+ RAM (32GB+ recommended)

How it compares

Matrix Swarm Cursor Aider Cline
Runs fully localYesNoOptionalOptional
Multi-agent orchestrationYes (16+)NoNoNo
Mix backends per agentMLX + llama.cpp + vLLMNoNoNo
Coordinator modesFlat · Pipeline · Router
Open sourceYesNoYesYes

FAQ

Do I need a GPU?

No. On Apple Silicon, MLX and llama.cpp use the unified memory and Neural Engine. On Linux, vLLM needs an NVIDIA GPU (CUDA 12+). CPU-only llama.cpp also works — just slower.

Where do models come from?

You bring your own GGUF (llama.cpp), MLX, or HuggingFace weights. Matrix Swarm doesn't ship models. Recommended starters: Llama 3, Qwen 2.5, DeepSeek-Coder.

What's the difference between Flat, Pipeline, and Router?

Flat broadcasts your prompt to all agents in parallel. Pipeline chains them in a fixed sequence (e.g., Architect → Programmer → Reviewer). Router uses a small dispatcher model to pick the best agent per request.

Is anything sent to the cloud?

No. Inference, code extraction, and config all happen locally. There are no telemetry calls. (You can optionally point an agent at a remote OpenAI-compatible endpoint if you want — but it's off by default.)

How do I add a custom agent?

Drop a JSON entry into your swarm-config.json with a system prompt, model binding, and role. See swarm-config-16gb.json in the repo for examples.

Is there a Docker install?

The CLI installs via npm. Docker is used to run vLLM model servers (via Docker Model Runner on ports 8080–8083) when running the Linux/CUDA backend. See docker/Dockerfile.vllm-metal.