System Overview
Jan Server implements a microservices architecture on Kubernetes with three core components communicating over HTTP and managed by Helm charts.
Components
API Gateway (jan-api-gateway
)
Technology Stack:
- Language: Go 1.24.6
- Framework: Gin web framework
- ORM: GORM with PostgreSQL driver
- DI: Google Wire for dependency injection
- Documentation: Swagger/OpenAPI auto-generated
Responsibilities:
- HTTP request routing and middleware
- User authentication via JWT and OAuth2
- Database operations and data persistence
- External API integration (Serper, Google OAuth)
- OpenAI-compatible API endpoints
- Request forwarding to inference service
Key Directories:
application/├── cmd/server/ # Main entry point and DI wiring├── app/ # Core business logic├── config/ # Environment variables and settings└── docs/ # Auto-generated Swagger docs
Inference Model (jan-inference-model
)
Technology Stack:
- Base Image: VLLM OpenAI v0.10.0
- Model: Jan-v1-4B (downloaded from Hugging Face)
- Protocol: OpenAI-compatible HTTP API
- Features: Tool calling, reasoning parsing
Configuration:
- Model Path:
/models/Jan-v1-4B
- Served Name:
jan-v1-4b
- Port: 8101
- Batch Tokens: 1024 max
- Tool Parser: Hermes
- Reasoning Parser: Qwen3
Capabilities:
- Text generation and completion
- Tool calling and function execution
- Multi-turn conversations
- Reasoning and chain-of-thought
Database (PostgreSQL)
Configuration:
- Database:
jan
- User:
jan-user
- Password:
jan-password
- Port: 5432
Schema:
- User accounts and authentication
- Conversation history
- Project and organization management
- API keys and access control
Data Flow
Request Processing
- Client Request: HTTP request to API gateway on port 8080
- Authentication: JWT token validation or OAuth2 flow
- Request Routing: Gateway routes to appropriate handler
- Database Operations: GORM queries for user data/state
- Inference Call: HTTP request to model service on port 8101
- Response Assembly: Gateway combines results and returns to client
Authentication Flow
JWT Authentication:
- User provides credentials
- Gateway validates against database
- JWT token issued with HMAC-SHA256 signing
- Subsequent requests include JWT in Authorization header
OAuth2 Flow:
- Client redirected to Google OAuth2
- Authorization code returned to redirect URL
- Gateway exchanges code for access token
- User profile retrieved from Google
- Local JWT token issued
Deployment Architecture
Kubernetes Resources
Deployments:
jan-api-gateway
: Single replica Go applicationjan-inference-model
: Single replica VLLM serverpostgresql
: StatefulSet with persistent storage
Services:
jan-api-gateway
: ClusterIP exposing port 8080jan-inference-model
: ClusterIP exposing port 8101postgresql
: ClusterIP exposing port 5432
Configuration:
- Environment variables via Helm values
- Secrets for sensitive data (JWT keys, OAuth credentials)
- ConfigMaps for application settings
Helm Chart Structure
charts/├── umbrella-chart/ # Main deployment chart│ ├── Chart.yaml│ ├── values.yaml # Configuration values│ └── Chart.lock└── apps-charts/ # Individual service charts ├── jan-api-gateway/ └── jan-inference-model/
Security Architecture
Authentication Methods
- JWT Tokens: HMAC-SHA256 signed tokens for API access
- OAuth2: Google OAuth2 integration for user login
- API Keys: HMAC-SHA256 signed keys for service access
Network Security
- Internal Communication: Services communicate over Kubernetes cluster network
- External Access: Only API gateway exposed via port forwarding or ingress
- Database Access: PostgreSQL accessible only within cluster
Data Security
- Secrets Management: Kubernetes secrets for sensitive configuration
- Environment Variables: Non-sensitive config via environment variables
- Database Encryption: Standard PostgreSQL encryption at rest
Production deployments should implement additional security measures including TLS termination, network policies, and secret rotation.
Scalability Considerations
Current Limitations:
- Single replica deployments
- No horizontal pod autoscaling
- Local storage for database
Future Enhancements:
- Multi-replica API gateway with load balancing
- Horizontal pod autoscaling based on CPU/memory
- External database with clustering
- Redis caching layer
- Message queue for async processing
Development Architecture
Code Generation
- Swagger: API documentation generated from Go annotations
- Wire: Dependency injection code generated from providers
- GORM Gen: Database model generation from schema
Build Process
- API Gateway: Multi-stage Docker build with Go compilation
- Inference Model: Base VLLM image with model download
- Helm Charts: Dependency management and templating
- Documentation: Auto-generation during development
Local Development
- Hot Reload: Source code changes reflected without full rebuild
- Database Migrations: Automated schema updates
- API Testing: Swagger UI for interactive testing
- Logging: Structured logging with configurable levels