Skip to content

config

optimus_dl.modules.data.config

DataConfig dataclass

DataConfig(train_datasets: optimus_dl.modules.data.config.DataPipelineConfig = '???', eval_datasets: dict[str, optimus_dl.modules.data.config.EvalDataPipelineConfig] = , scratch: Any = None)

Parameters:

Name Type Description Default
train_datasets DataPipelineConfig

Config for the training batches: dataset and transforms

'???'
eval_datasets dict[str, EvalDataPipelineConfig]

Config for the evaluation batches: dataset and transforms. The key is the name of the dataset, which will be used to identify the dataset in the metrics. The value is the config for the dataset and transforms.

<class 'dict'>
scratch Any

Any data whatsoever to be used in dataset configs with config interpolations like ${data.scratch.my_config} to reduce duplication

None
Source code in optimus_dl/modules/data/config.py
@dataclass
class DataConfig:
    train_datasets: DataPipelineConfig = field(
        default=MISSING,
        metadata={
            "description": "Config for the training batches: dataset and transforms"
        },
    )
    eval_datasets: dict[str, EvalDataPipelineConfig] = field(
        default_factory=dict,
        metadata={
            "description": (
                "Config for the evaluation batches: dataset and transforms. "
                "The key is the name of the dataset, which will be used to identify the dataset in the metrics. "
                "The value is the config for the dataset and transforms."
            )
        },
    )

    scratch: Any = field(
        default=None,
        metadata={
            "description": "Any data whatsoever to be used in dataset configs with config interpolations like ${data.scratch.my_config} to reduce duplication"
        },
    )

DataPipelineConfig dataclass

DataPipelineConfig(source: optimus_dl.core.registry.RegistryConfig = '???', transform: optimus_dl.core.registry.RegistryConfig | None = None, profile: bool = False, report_freq: int = 100)

Parameters:

Name Type Description Default
source RegistryConfig

Config for the dataset source

'???'
transform RegistryConfig | None

Config for the dataset transforms

None
profile bool

Whether to profile this data pipeline

False
report_freq int

Frequency of profiling report in number of iterations

100
Source code in optimus_dl/modules/data/config.py
@dataclass
class DataPipelineConfig:
    source: RegistryConfig = field(
        default=MISSING,
        metadata={"description": "Config for the dataset source"},
    )
    transform: RegistryConfig | None = field(
        default=None,
        metadata={"description": "Config for the dataset transforms"},
    )
    profile: bool = field(
        default=False,
        metadata={"description": "Whether to profile this data pipeline"},
    )
    report_freq: int = field(
        default=100,
        metadata={
            "description": "Frequency of profiling report in number of iterations"
        },
    )

EvalDataPipelineConfig dataclass

Bases: DataPipelineConfig

EvalDataPipelineConfig(source: optimus_dl.core.registry.RegistryConfig = '???', transform: optimus_dl.core.registry.RegistryConfig | None = None, profile: bool = False, report_freq: int = 100, eval_freq: int | None = None, eval_iterations: int | None = None)

Parameters:

Name Type Description Default
eval_freq int | None

Frequency of evaluation in number of training steps specifically for this dataset. If None, use the global eval_freq.

None
eval_iterations int | None

Max number of iterations of validation data for this dataset. If None, use the global eval_iterations.

None
Source code in optimus_dl/modules/data/config.py
@dataclass
class EvalDataPipelineConfig(DataPipelineConfig):
    eval_freq: int | None = field(
        default=None,
        metadata={
            "description": "Frequency of evaluation in number of training steps specifically for this dataset. If None, use the global eval_freq."
        },
    )

    eval_iterations: int | None = field(
        default=None,
        metadata={
            "description": "Max number of iterations of validation data for this dataset. If None, use the global eval_iterations."
        },
    )