config
optimus_dl.modules.data.config
¶
DataConfig
dataclass
¶
DataConfig(train_datasets: optimus_dl.modules.data.config.DataPipelineConfig = '???', eval_datasets: dict[str, optimus_dl.modules.data.config.EvalDataPipelineConfig] =
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
train_datasets
|
DataPipelineConfig
|
Config for the training batches: dataset and transforms |
'???'
|
eval_datasets
|
dict[str, EvalDataPipelineConfig]
|
Config for the evaluation batches: dataset and transforms. The key is the name of the dataset, which will be used to identify the dataset in the metrics. The value is the config for the dataset and transforms. |
<class 'dict'>
|
scratch
|
Any
|
Any data whatsoever to be used in dataset configs with config interpolations like ${data.scratch.my_config} to reduce duplication |
None
|
Source code in optimus_dl/modules/data/config.py
DataPipelineConfig
dataclass
¶
DataPipelineConfig(source: optimus_dl.core.registry.RegistryConfig = '???', transform: optimus_dl.core.registry.RegistryConfig | None = None, profile: bool = False, report_freq: int = 100)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
source
|
RegistryConfig
|
Config for the dataset source |
'???'
|
transform
|
RegistryConfig | None
|
Config for the dataset transforms |
None
|
profile
|
bool
|
Whether to profile this data pipeline |
False
|
report_freq
|
int
|
Frequency of profiling report in number of iterations |
100
|
Source code in optimus_dl/modules/data/config.py
EvalDataPipelineConfig
dataclass
¶
Bases: DataPipelineConfig
EvalDataPipelineConfig(source: optimus_dl.core.registry.RegistryConfig = '???', transform: optimus_dl.core.registry.RegistryConfig | None = None, profile: bool = False, report_freq: int = 100, eval_freq: int | None = None, eval_iterations: int | None = None)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
eval_freq
|
int | None
|
Frequency of evaluation in number of training steps specifically for this dataset. If None, use the global eval_freq. |
None
|
eval_iterations
|
int | None
|
Max number of iterations of validation data for this dataset. If None, use the global eval_iterations. |
None
|