Skip to content

Optimus-DL

config

alexdremov/optimus-dl

Optimus-DL

alexdremov/optimus-dl

Home
User Guide
User Guide
API Reference
API Reference
- core
  core
  - Index
  - bootstrap
  - device
  - env
  - log
  - model_utils
  - numerical
  - omegaconf
  - profile
  - registry
  - seed
- modules
  modules
  - Index
  - checkpoint
    checkpoint
    
    Index
    
    checkpoint_manager
    
    load_strategy
  - criterion
    criterion
    
    Index
    
    base
    
    config
    
    cross_entropy
  - data
    data
    
    Index
    
    config
    
    datasets
    datasets
    
    Index
    
    base
    
    composite
    
    huggingface
    
    loop_dataset
    
    strategies
    strategies
    
    Index
    
    base
    
    concat_random
    
    document
    
    tokenized_dataset
    
    tokenized_flat_dataset
    
    txt_lines
    
    presets
    presets
    
    Index
    
    fineweb-edu
    
    slimpajama
    
    tinyshakespeare
    
    profiling
    
    transforms
    transforms
    
    Index
    
    base
    
    basic_batcher
    
    chunk_tokens
    
    composite
    
    flat_tokens_batcher
    
    prefetch
    
    shuffle
    
    to_device
    
    tokenize
  - distributed
    distributed
    
    Index
    
    base
    
    config config
    Table of contents
    
    config
    
    DistributedConfig
    
    fake
    
    mesh
  - eval
    eval
    
    Index
    
    model
  - loggers
    loggers
    
    Index
    
    base
    
    config
    
    jsonl
    
    wandb
  - lr_scheduler
    lr_scheduler
    
    Index
    
    base
    
    cosine_annealing
    
    linear_warmup
    
    wsd_scheduler
  - metrics
    metrics
    
    Index
    
    base
    
    common
    
    engine
    
    metrics
    
    source
    
    sources
    sources
    
    Index
    
    causal_lm
    
    generation
    
    model_info
  - model
    model
    
    Index
    
    base
    
    blocks
    blocks
    
    Index
    
    attention
    
    layer_norms
    
    mlp
    
    rope
    
    transformer
    
    config
    
    gpt2
    
    llama2
    
    olmo3
    
    presets
    presets
    
    Index
    
    hf_llama
    
    hf_olmo3
    
    hf_qwen3
    
    utils
    
    qwen3
  - model_transforms
    model_transforms
    
    Index
    
    base
    
    checkpoint
    
    compile
    
    config
    
    distributed
    
    load_weights
    
    tensor_parallel
  - optim
    optim
    
    Index
    
    adamw
    
    config
    
    muon
    
    soap
  - tokenizer
    tokenizer
    
    Index
    
    base
    
    config
    
    implementations
    implementations
    
    Index
    
    char
    
    huggingface
    
    inline_tokens
    
    tiktoken
- recipe
  recipe
  - Index
  - eval
    eval
    
    Index
    
    base
    
    config
  - metrics
    metrics
    
    Index
    
    base
    
    config
  - mixins
    mixins
    
    Index
    
    model_builder
  - pretokenize
    pretokenize
    
    Index
    
    checkpoint
    
    config
    
    processor
    
    recipe
    
    sharder
    
    source
  - serve
    serve
    
    Index
    
    base
    
    config
    
    models
  - train
    train
    
    Index
    
    base
    
    builders
    builders
    
    Index
    
    criterion_builder
    
    data_builder
    
    optimizer_builder
    
    scheduler_builder
    
    config
    
    mixins
    mixins
    
    Index
    
    execution
    execution
    
    Index
    
    context_mixin
    
    interruption_mixin
    
    iteration_mixin
    
    managers
    managers
    
    Index
    
    evaluation_manager
    
    logger_manager

config

`optimus_dl.modules.distributed.config` ¶

`DistributedConfig` `dataclass` ¶

Bases: RegistryConfigStrict

Configuration for distributed training topologies.

Attributes:

Name	Type	Description

Parameters:

Name	Type	Description	Default
`tp_size`	`int`		`1`
`sharding_world_size`	`int \| None`		`None`

Source code in optimus_dl/modules/distributed/config.py

@dataclass
class DistributedConfig(RegistryConfigStrict):
    """Configuration for distributed training topologies.

    Attributes:
        tp_size: Degree of Tensor Parallelism (number of GPUs to shard each layer across).
        sharding_world_size: Size of FSDP sharding groups. If None, defaults to
            the number of GPUs per node (intra-node sharding).
    """

    tp_size: int = 1
    sharding_world_size: int | None = None