config
optimus_dl.modules.data.config
¶
DataConfig
dataclass
¶
Bases: EvalDataConfig
DataConfig(eval_datasets: dict[str, optimus_dl.modules.data.config.EvalDataPipelineConfig] =
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
train_datasets
|
DataPipelineConfig
|
Config for the training batches: dataset and transforms |
'???'
|
Source code in optimus_dl/modules/data/config.py
DataPipelineConfig
dataclass
¶
DataPipelineConfig(source: optimus_dl.core.registry.RegistryConfig = '???', transform: optimus_dl.core.registry.RegistryConfig | None = None, profile: bool = False, report_freq: int = 100)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
source
|
RegistryConfig
|
Config for the dataset source |
'???'
|
transform
|
RegistryConfig | None
|
Config for the dataset transforms |
None
|
profile
|
bool
|
Whether to profile this data pipeline |
False
|
report_freq
|
int
|
Frequency of profiling report in number of iterations |
100
|
Source code in optimus_dl/modules/data/config.py
EvalDataConfig
dataclass
¶
EvalDataConfig(eval_datasets: dict[str, optimus_dl.modules.data.config.EvalDataPipelineConfig] =
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
eval_datasets
|
dict[str, EvalDataPipelineConfig]
|
Config for the evaluation batches: dataset and transforms. The key is the name of the dataset, which will be used to identify the dataset in the metrics. The value is the config for the dataset and transforms. |
<class 'dict'>
|
scratch
|
Any
|
Any data whatsoever to be used in dataset configs with config interpolations like ${data.scratch.my_config} to reduce duplication |
None
|
Source code in optimus_dl/modules/data/config.py
EvalDataPipelineConfig
dataclass
¶
Bases: DataPipelineConfig
EvalDataPipelineConfig(source: optimus_dl.core.registry.RegistryConfig = '???', transform: optimus_dl.core.registry.RegistryConfig | None = None, profile: bool = False, report_freq: int = 100, eval_freq: int | None = None, eval_iterations: int | None = None, eval_guaranteed_same_batches: bool | None = None, eval_checkpointing: int | None = None)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
eval_freq
|
int | None
|
Frequency of evaluation in number of training steps specifically for this dataset. If None, use the global eval_freq. |
None
|
eval_iterations
|
int | None
|
Max number of iterations of validation data for this dataset. If None, use the global eval_iterations. |
None
|
eval_guaranteed_same_batches
|
bool | None
|
Whether it is guaranteed that each DP rank sees the same number of batches. If None, use the global eval_guaranteed_same_batches. |
None
|
eval_checkpointing
|
int | None
|
Frequency of saving checkpoints during evaluation. If None or non-positive (for example, 0), do not save checkpoints during evaluation. This is useful for long evaluations to be able to resume evaluation if it gets interrupted. Saves are fast and light, as they only contain the state of the meters and dataloader, not the model or optimizer states. |
None
|