MLX

MLX is an inference engine for Apple Silicon (M1 and later). It uses Metal GPU acceleration for fast, efficient local inference — available on macOS 14+.

⚠️

MLX support in Jan is experimental and will improve over time. Current limitations:

Embeddings are unavailable — use llama.cpp for embedding/RAG workflows.
Reasoning is not yet wired — reasoning-style output isn't surfaced separately.
Some newer model architectures fail to load — if a model won't start, try its llama.cpp (GGUF) version instead.

For the broadest model support and feature set, use llama.cpp. MLX is worth trying on Apple Silicon when you want Metal-tuned performance for a supported model.

Requirements

macOS 14 or higher
Apple Silicon (M1, M2, M3, M4)

MLX is macOS-only. On Windows and Linux it isn't shown — use llama.cpp.

Accessing Engine Settings

Find MLX settings at Settings > MLX under Model Providers:

MLX Settings

Backend and Version

MLX uses a separate version (the mlx-swift-lm server build) and backend selector. In most cases the default is correct; only change it if you're testing a specific build or troubleshooting a model that won't load.

Model Management

MLX supports models in MLX-Swift format. Jan ships with MLX-compatible versions of its foundation models.

Download from Hub

Browse and download MLX-compatible models directly from the Hub tab in the left sidebar. Downloaded models will appear here automatically.

Import Local Files

Click Import to link an MLX model file already on your computer. This is useful for models downloaded via your browser from Hugging Face, or models shared with other apps — Jan links to the file in place without copying it.

Delete a Model

Click the trash icon next to any model to remove it. Linked files leave the original intact.

MLX or llama.cpp?

Both run locally on Apple Silicon. Choose based on what you need:

	MLX	llama.cpp
Platform	Apple Silicon only	All platforms
Model format	MLX	GGUF
Embeddings / RAG	Not available	Supported
Reasoning output	Not wired yet	Supported
Model coverage	Good, but some new architectures fail	Broadest
Status	Experimental	Stable

If you're unsure, start with llama.cpp. Try MLX when you want Metal-tuned performance for a model that's available in MLX format.

Troubleshooting

Model won't load: Some newer architectures aren't supported yet — use the model's GGUF version with llama.cpp instead.
Need embeddings or RAG: Not available on MLX; switch to llama.cpp.
MLX provider missing: You're on Windows or Linux — MLX is macOS-only.
Slow or failing on large models: The model may exceed available unified memory; try a smaller or more quantized model. See Troubleshooting.

How It Works

Jan integrates MLX via mlx-swift-lm (opens in a new tab), running a local inference server on top of it. The server is spawned and managed by Jan's MLX plugin, which handles model loading, lifecycle, and communication with the rest of the app.

Llama.cpp Model Parameters