config
optimus_dl.recipe.train.config
¶
Training recipe configuration.
This module defines the configuration classes for the training recipe, including all hyperparameters, component configurations, and training settings.
TrainConfig
dataclass
¶
Bases: RegistryConfigStrict
Complete training configuration.
This is the root configuration class for training. It contains all component configurations (model, data, optimizer, etc.) and uses the registry system for flexible component selection.
The configuration is hierarchical and supports OmegaConf interpolation for
sharing values across components. The args field serves as a "scratch space"
for high-level variables that can be referenced throughout the config.
Example
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
args
|
dict
|
dict() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2) |
<class 'dict'>
|
common
|
TrainRecipeConfig
|
Configuration for training recipe common settings. This class contains all the common settings shared across training runs, including experiment metadata, logging frequency, checkpointing, evaluation, and distributed training settings. |
<dynamic>
|
model
|
ModelConfig
|
|
'???'
|
data
|
DataConfig
|
|
'???'
|
criterion
|
CriterionConfig
|
|
'???'
|
optimization
|
OptimizationConfig
|
|
'???'
|
lr_scheduler
|
RegistryConfig | None
|
|
None
|
loggers
|
list[MetricsLoggerConfig] | None
|
List of metrics logger configurations |
None
|
metrics
|
dict[str, list[dict]]
|
Metric configurations mapped by dataset name (e.g., 'train', 'val_slice_1') |
<class 'dict'>
|
model_transforms
|
list[ModelTransformConfig]
|
List of model transforms to apply after model building |
<dynamic>
|
model_builder
|
RegistryConfig
|
|
ModelBuilderConfig(_name='base')
|
optimizer_builder
|
RegistryConfig
|
|
OptimizerBuilderConfig(_name='base')
|
criterion_builder
|
RegistryConfig
|
|
CriterionBuilderConfig(_name='base')
|
data_builder
|
RegistryConfig
|
|
DataBuilderConfig(_name='base')
|
scheduler_builder
|
RegistryConfig
|
|
SchedulerBuilderConfig(_name='base')
|
logger_manager
|
RegistryConfig
|
|
LoggerManagerConfig(_name='base')
|
checkpoint_manager
|
RegistryConfig
|
|
CheckpointManagerConfig(_name='base')
|
evaluator
|
RegistryConfig
|
|
EvaluatorConfig(_name='base')
|
Source code in optimus_dl/recipe/train/config.py
127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 | |
TrainRecipeConfig
dataclass
¶
Configuration for training recipe common settings.
This class contains all the common settings shared across training runs, including experiment metadata, logging frequency, checkpointing, evaluation, and distributed training settings.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
exp_name
|
str
|
Experiment name |
'optimus-dl-run-${config_hash:}'
|
exp_description
|
str | None
|
Experiment description |
None
|
exp_tags
|
list[str]
|
Experiment tags |
<dynamic>
|
log_freq
|
int
|
Frequency of train metrics logging |
16
|
seed
|
int
|
Seed to seed everything that's possible |
42
|
data_seed
|
int
|
Seed to seed everything data-related. Will be different on each rank. |
42
|
deterministic
|
bool
|
If True, force deterministic algorithms in PyTorch. |
True
|
eval_iterations
|
int | None
|
Max number of iterations of validation data for every subset |
None
|
eval_freq
|
int
|
Frequency of evaluations. Zero disables |
100
|
save_freq
|
int
|
Frequency of checkpoint savings. As eval_freq by default |
'${.eval_freq}'
|
last_save_freq
|
int | None
|
Frequency of saving last checkpoint. As save_freq by default |
None
|
output_path
|
str
|
Directory to dump checkpoints to |
"${oc.env:PERSISTENT_PATH,'./outputs'}/${.exp_name}"
|
load_checkpoint
|
str | None
|
Path to checkpoint to load from, what to load from it is controlled by load_checkpoint_strategy |
None
|
load_checkpoint_strategy
|
LoadStrategy
|
Strategy what to load from the checkpoint |
<dynamic>
|
use_gpu
|
bool
|
|
True
|
distributed
|
DistributedConfig
|
Distributed training configuration (GPU, TP, etc.) |
<dynamic>
|