January 31st, 2025
Jan v0.6.6: Enhanced llama.cpp integration and smarter model management

Highlights 🎉
Jan v0.6.6 delivers significant improvements to the llama.cpp backend, introduces Hugging Face as a built-in provider, and brings smarter model management with auto-unload capabilities. This release also includes numerous MCP refinements and platform-specific enhancements.
🚀 Major llama.cpp Backend Overhaul
We've completely revamped the llama.cpp integration with:
- Smart Backend Management: The backend now auto-updates and persists your settings properly
- Device Detection: Jan automatically detects available GPUs and hardware capabilities
- Direct llama.cpp Access: Models now interface directly with llama.cpp (previously hidden behind Cortex)
- Automatic Migration: Your existing models seamlessly move from Cortex to direct llama.cpp management
- Better Error Handling: Clear error messages when models fail to load, with actionable solutions
- Per-Model Overrides: Configure specific settings for individual models
🤗 Hugging Face Cloud Router Integration
Connect to Hugging Face's new cloud inference service:
- Access pre-configured models running on various providers (Fireworks, Together AI, and more)
- Hugging Face handles the routing to the best available provider
- Simplified setup with just your HF token
- Non-deletable provider status to prevent accidental removal
- Note: Direct model ID search in Hub remains available as before
🧠 Smarter Model Management
New intelligent features to optimize your system resources:
- Auto-Unload Old Models: Automatically free up memory by unloading unused models
- Persistent Settings: Your model capabilities and settings now persist across app restarts
- Zero GPU Layers Support: Set N-GPU Layers to 0 for CPU-only inference
- Memory Calculation Improvements: More accurate memory usage reporting
🎯 MCP Refinements
Enhanced MCP experience with:
- Tool approval dialog improvements with scrollable parameters
- Better experimental feature edge case handling
- Fixed tool call button disappearing issue
- JSON editing tooltips for easier configuration
- Auto-focus on "Always Allow" action for smoother workflows
📚 New MCP Integration Tutorials
Comprehensive guides for powerful MCP integrations:
- Canva MCP: Create and manage designs through natural language - generate logos, presentations, and marketing materials directly from chat
- Browserbase MCP: Control cloud browsers with AI - automate web tasks, extract data, and monitor sites without complex scripting
- Octagon Deep Research MCP: Access finance-focused research capabilities - analyze markets, investigate companies, and generate investment insights
🖥️ Platform-Specific Improvements
Windows:
- Fixed terminal windows popping up during model loading
- Better process termination handling
- VCRuntime included in installer for compatibility
- Improved NSIS installer with app running checks
Linux:
- AppImage now works properly with newest Tauri version and it went from almost 1GB to less than 200MB
- Better Wayland compatibility
macOS:
- Improved build process and artifact naming
🎨 UI/UX Enhancements
Quality of life improvements throughout:
- Fixed rename thread dialog showing incorrect thread names
- Assistant instructions now have proper defaults
- Download progress indicators remain visible when scrolling
- Better error pages with clearer messaging
- GPU detection now shows accurate backend information
- Improved clickable areas for better usability
🔧 Developer Experience
Behind the scenes improvements:
- New automated QA system using CUA (Computer Use Automation)
- Standardized build process across platforms
- Enhanced error stream handling and parsing
- Better proxy support for the new downloader
- Reasoning format support for advanced models
🐛 Bug Fixes
Notable fixes include:
- Factory reset no longer fails with access denied errors
- OpenRouter provider stays selected properly
- Model search in Hub shows latest data only
- Temporary download files are cleaned up on cancel
- Legacy threads no longer appear above new threads
- Fixed encoding issues on various platforms
Breaking Changes
- Models previously managed by Cortex now interface directly with llama.cpp (automatic migration included)
- Some sampling parameters have been removed from the llama.cpp extension for consistency
- Cortex extension is deprecated in favor of direct llama.cpp integration
Coming Next
We're working on expanding MCP capabilities, improving model download speeds, and adding more provider integrations. Stay tuned!
Update your Jan or download the latest (opens in a new tab).
For the complete list of changes, see the GitHub release notes (opens in a new tab).