Documentation
Model Parameters

Model Parameters

To customize model settings for a conversation or a model:

  • In Threads, click the Gear icon next to selected model
  • Or, in Settings > Model Providers > Llama.cpp, click the gear icon next to a model for advanced settings
  • Click the edit button next to a model to configure capabilities

Inference & Engine Parameters (Gear Icon)

These settings are available in the model settings modal:

ParameterDescription
Context SizeMaximum prompt context length (how much text the model can consider at once).
GPU LayersNumber of model layers to offload to GPU. More layers = faster, but uses more VRAM.
TemperatureControls response randomness. Lower = more focused, higher = more creative.
Top KTop-K sampling. Limits next token selection to the K most likely.
Top PTop-P (nucleus) sampling. Limits next token selection to a cumulative probability.
Min PMinimum probability for token selection.
Repeat Last NNumber of tokens to consider for repeat penalty.
Repeat PenaltyPenalize repeating token sequences.
Presence PenaltyPenalize alpha presence (encourages new topics).
Max TokensMaximum length of the model's response.
Stop SequencesTokens or phrases that will end the model's response.
Frequency PenaltyReduces word repetition.

Model Parameters


Model Capabilities (Edit Button)

These toggles are available when you click the edit button next to a model:

  • Vision: Enable image input/output
  • Tools: Enable advanced tools (web search, file ops, code)
  • Embeddings: Enable embedding generation
  • Web Search: Allow model to search the web
  • Reasoning: Enable advanced reasoning features

Model Capabilities Edit 01 Model Capabilities Edit 02


Model Parameters

This setting defines and configures the model's behavior.

ParameterDescription
Prompt TemplateA structured format that guides how the model should respond. Contains placeholders and instructions that help shape the model's output in a consistent way.

Engine Parameters

These settings parameters control how the model runs on your hardware.

ParameterDescription
Number of GPU Layers (ngl)- Controls how many layers of the model run on your GPU.
- More layers on GPU generally means faster processing, but requires more GPU memory.
Context Length- Controls how much text the model can consider at once.
- Longer context allows the model to handle more input but uses more memory and runs slower.
- The maximum context length varies with the model used.

By default, Jan defaults to the minimum between 8192 and the model's maximum context length, you can adjust this based on your needs.