Prerequisites
Before setting up Jan Server, ensure you have the following components installed:
Required Components
Important: Windows and macOS users can only run mock servers for development. Real LLM model inference with vLLM is only supported on Linux systems with NVIDIA GPUs.
-
Docker Desktop
- Windows: Download from Docker Desktop for Windows (opens in a new tab)
- macOS: Download from Docker Desktop for Mac (opens in a new tab)
- Linux: Follow Docker Engine installation guide (opens in a new tab)
-
Minikube
- Windows:
choco install minikube
or download from minikube releases (opens in a new tab) - macOS:
brew install minikube
or download from minikube releases (opens in a new tab) - Linux:
curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64 && sudo install minikube-linux-amd64 /usr/local/bin/minikube
- Windows:
-
Helm
- Windows:
choco install kubernetes-helm
or download from Helm releases (opens in a new tab) - macOS:
brew install helm
or download from Helm releases (opens in a new tab) - Linux:
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
- Windows:
-
kubectl
- Windows:
choco install kubernetes-cli
or download from kubectl releases (opens in a new tab) - macOS:
brew install kubectl
or download from kubectl releases (opens in a new tab) - Linux:
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl" && sudo install kubectl /usr/local/bin/kubectl
- Windows:
Optional: NVIDIA GPU Support (for Real LLM Models)
If you plan to run real LLM models (not mock servers) and have an NVIDIA GPU:
-
Install NVIDIA Container Toolkit: Follow the official NVIDIA Container Toolkit installation guide (opens in a new tab)
-
Configure Minikube for GPU support: Follow the official minikube GPU tutorial (opens in a new tab) for complete setup instructions.
Quick Start
Local Development Setup
Option 1: Mock Server Setup (Recommended for Development)
-
Start Minikube and configure Docker:
minikube starteval $(minikube docker-env) -
Build and deploy all services:
./scripts/run.sh -
Access the services:
- API Gateway: http://localhost:8080 (opens in a new tab)
- Swagger UI: http://localhost:8080/api/swagger/index.html (opens in a new tab)
- Health Check: http://localhost:8080/healthcheck (opens in a new tab)
- Version Info: http://localhost:8080/v1/version (opens in a new tab)
Option 2: Real LLM Setup (Requires NVIDIA GPU)
-
Start Minikube with GPU support:
minikube start --gpus alleval $(minikube docker-env) -
Configure GPU memory utilization (if you have limited GPU memory):
GPU memory utilization is configured in the vLLM Dockerfile. See the vLLM CLI documentation (opens in a new tab) for all available arguments.
To modify GPU memory utilization, edit the vLLM launch command in:
apps/jan-inference-model/Dockerfile
(for Docker builds)- Helm chart values (for Kubernetes deployment)
-
Build and deploy all services:
# For GPU setup, modify run.sh to use GPU-enabled minikube# Edit scripts/run.sh and change "minikube start" to "minikube start --gpus all"./scripts/run.sh
Production Deployment
For production deployments, modify the Helm values in charts/umbrella-chart/values.yaml
and deploy using:
helm install jan-server ./charts/umbrella-chart
Manual Installation
Build Docker Images
Build both required Docker images:
# Build API Gatewaydocker build -t jan-api-gateway:latest ./apps/jan-api-gateway# Build Inference Modeldocker build -t jan-inference-model:latest ./apps/jan-inference-model
The inference model image downloads the Jan-v1-4B model from Hugging Face during build. This requires an internet connection and several GB of download.
Deploy with Helm
Install the Helm chart:
# Update Helm dependencieshelm dependency update ./charts/umbrella-chart# Install Jan Serverhelm install jan-server ./charts/umbrella-chart
Port Forwarding
Forward the API gateway port to access from your local machine:
kubectl port-forward svc/jan-server-jan-api-gateway 8080:8080
Verify Installation
Check that all pods are running:
kubectl get pods
Expected output:
NAME READY STATUS RESTARTSjan-server-jan-api-gateway-xxx 1/1 Running 0jan-server-jan-inference-model-xxx 1/1 Running 0jan-server-postgresql-0 1/1 Running 0
Test the API gateway:
curl http://localhost:8080/health
Uninstalling
To remove Jan Server:
helm uninstall jan-server
To stop minikube:
minikube stop
Troubleshooting
Common Issues and Solutions
1. LLM Pod Not Starting (Pending Status)
Symptoms: The jan-server-jan-inference-model
pod stays in Pending
status.
Diagnosis Steps:
# Check pod statuskubectl get pods# Get detailed pod information (replace with your actual pod name)kubectl describe pod jan-server-jan-inference-model-<POD_ID>
Common Error Messages and Solutions:
Error: "Insufficient nvidia.com/gpu"
0/1 nodes are available: 1 Insufficient nvidia.com/gpu. no new claims to deallocate, preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.
Solution for Real LLM Setup:
- Ensure you have NVIDIA GPU and drivers installed
- Install NVIDIA Container Toolkit (see Prerequisites section)
- Start minikube with GPU support:
minikube start --gpus all
Error: vLLM Pod Keeps Restarting
# Check pod logs to see the actual errorkubectl logs jan-server-jan-inference-model-<POD_ID>
Common vLLM startup issues:
- CUDA Out of Memory: Modify vLLM arguments in Dockerfile to reduce memory usage
- Model Loading Errors: Check if model path is correct and accessible
- GPU Not Detected: Ensure NVIDIA Container Toolkit is properly installed
2. Helm Issues
Symptoms: Helm commands fail or charts won't install.
Solutions:
# Update Helm dependencieshelm dependency update ./charts/umbrella-chart# Check Helm statushelm list# Uninstall and reinstallhelm uninstall jan-serverhelm install jan-server ./charts/umbrella-chart
3. Common Development Issues
Pods in ImagePullBackOff
state
- Ensure Docker images were built in the minikube environment
- Run
eval $(minikube docker-env)
before building images
Port forwarding connection refused
- Verify the service is running:
kubectl get svc
- Check pod status:
kubectl get pods
- Review logs:
kubectl logs deployment/jan-server-jan-api-gateway
Inference model download fails
- Ensure internet connectivity during Docker build
- The Jan-v1-4B model is approximately 2.4GB
Resource Requirements
Minimum System Requirements:
- 8GB RAM
- 20GB free disk space
- 4 CPU cores
Recommended System Requirements:
- 16GB RAM
- 50GB free disk space
- 8 CPU cores
- GPU support (for faster inference)
The inference model requires significant memory. Ensure your minikube cluster has adequate resources allocated.