Jan Server - Overview

Overview

Jan Server is a comprehensive self-hosted AI server platform that provides OpenAI-compatible APIs, multi-tenant organization management, and AI model inference capabilities. Jan Server enables organizations to deploy their own private AI infrastructure with full control over data, models, and access.

Jan Server is a Kubernetes-native platform consisting of multiple microservices that work together to provide a complete AI infrastructure solution. It offers:

OpenAI-Compatible API: Full compatibility with OpenAI's chat completion API
Multi-Tenant Architecture: Organization and project-based access control
AI Model Inference: Scalable model serving with health monitoring
Database Management: PostgreSQL with read/write replicas
Authentication & Authorization: JWT + Google OAuth2 integration
API Key Management: Secure API key generation and management
Model Context Protocol (MCP): Support for external tools and resources
Web Search Integration: Serper API integration for web search capabilities
Monitoring & Profiling: Built-in performance monitoring and health checks

System Architecture

System Architecture Diagram

Services

Jan API Gateway

The core API service that provides OpenAI-compatible endpoints and manages all client interactions.

Key Features:

OpenAI-compatible chat completion API with streaming support
Multi-tenant organization and project management
JWT-based authentication with Google OAuth2 integration
API key management at organization and project levels
Model Context Protocol (MCP) support for external tools
Web search integration via Serper API
Comprehensive monitoring and profiling capabilities
Database transaction management with automatic rollback

Technology Stack:

Go 1.24.6 with Gin web framework
PostgreSQL with GORM and read/write replicas
JWT authentication and Google OAuth2
Swagger/OpenAPI documentation
Built-in pprof profiling with Grafana Pyroscope integration

Jan Inference Model

The AI model serving service that handles model inference requests.

Key Features:

Scalable model serving infrastructure
Health monitoring and automatic failover
Load balancing across multiple model instances
Integration with various AI model backends

Technology Stack:

Python-based model serving
Docker containerization
Kubernetes-native deployment

PostgreSQL Database

The persistent data storage layer with enterprise-grade features.

Key Features:

Read/write replica support for high availability
Automatic schema migrations with Atlas
Connection pooling and optimization
Transaction management with rollback support

Key Features

Core Features

OpenAI-Compatible API: Full compatibility with OpenAI's chat completion API with streaming support and reasoning content handling
Multi-Tenant Architecture: Organization and project-based access control with hierarchical permissions and member management
Conversation Management: Persistent conversation storage and retrieval with item-level management, including message, function call, and reasoning content types
Authentication & Authorization: JWT-based auth with Google OAuth2 integration and role-based access control
API Key Management: Secure API key generation and management at organization and project levels with multiple key types (admin, project, organization, service, ephemeral)
Model Registry: Dynamic model endpoint management with automatic health checking and service discovery
Streaming Support: Real-time streaming responses with Server-Sent Events (SSE) and chunked transfer encoding
MCP Integration: Model Context Protocol support for external tools and resources with JSON-RPC 2.0
Web Search: Serper API integration for web search capabilities via MCP with webpage fetching
Database Management: PostgreSQL with read/write replicas and automatic migrations using Atlas
Transaction Management: Automatic database transaction handling with rollback support
Health Monitoring: Automated health checks with cron-based model endpoint monitoring
Performance Profiling: Built-in pprof endpoints for performance monitoring and Grafana Pyroscope integration
Request Logging: Comprehensive request/response logging with unique request IDs and structured logging
CORS Support: Cross-origin resource sharing middleware with configurable allowed hosts
Swagger Documentation: Auto-generated API documentation with interactive UI
Email Integration: SMTP support for invitation and notification systems
Response Management: Comprehensive response tracking with status management and usage statistics

Installation