config

`optimus_dl.modules.data.config` ¶

`DataConfig` `dataclass` ¶

Bases: EvalDataConfig

DataConfig(eval_datasets: dict[str, optimus_dl.modules.data.config.EvalDataPipelineConfig] = , scratch: Any = None, train_datasets: optimus_dl.modules.data.config.DataPipelineConfig = '???')

Parameters:

Name	Type	Description	Default
`train_datasets`	`DataPipelineConfig`	Config for the training batches: dataset and transforms	`'???'`

Source code in optimus_dl/modules/data/config.py

@dataclass
class DataConfig(EvalDataConfig):
    train_datasets: DataPipelineConfig = field(
        default=MISSING,
        metadata={
            "description": "Config for the training batches: dataset and transforms"
        },
    )

`DataPipelineConfig` `dataclass` ¶

DataPipelineConfig(source: optimus_dl.core.registry.RegistryConfig = '???', transform: optimus_dl.core.registry.RegistryConfig | None = None, profile: bool = False, report_freq: int = 100)

Parameters:

Name	Type	Description	Default
`source`	`RegistryConfig`	Config for the dataset source	`'???'`
`transform`	`RegistryConfig \| None`	Config for the dataset transforms	`None`
`profile`	`bool`	Whether to profile this data pipeline	`False`
`report_freq`	`int`	Frequency of profiling report in number of iterations	`100`

Source code in optimus_dl/modules/data/config.py

@dataclass
class DataPipelineConfig:
    source: RegistryConfig = field(
        default=MISSING,
        metadata={"description": "Config for the dataset source"},
    )
    transform: RegistryConfig | None = field(
        default=None,
        metadata={"description": "Config for the dataset transforms"},
    )
    profile: bool = field(
        default=False,
        metadata={"description": "Whether to profile this data pipeline"},
    )
    report_freq: int = field(
        default=100,
        metadata={
            "description": "Frequency of profiling report in number of iterations"
        },
    )

`EvalDataConfig` `dataclass` ¶

EvalDataConfig(eval_datasets: dict[str, optimus_dl.modules.data.config.EvalDataPipelineConfig] = , scratch: Any = None)

Parameters:

Name	Type	Description	Default
`eval_datasets`	`dict[str, EvalDataPipelineConfig]`	Config for the evaluation batches: dataset and transforms. The key is the name of the dataset, which will be used to identify the dataset in the metrics. The value is the config for the dataset and transforms.	`<class 'dict'>`
`scratch`	`Any`	Any data whatsoever to be used in dataset configs with config interpolations like ${data.scratch.my_config} to reduce duplication	`None`

Source code in optimus_dl/modules/data/config.py

@dataclass
class EvalDataConfig:
    eval_datasets: dict[str, EvalDataPipelineConfig] = field(
        default_factory=dict,
        metadata={
            "description": (
                "Config for the evaluation batches: dataset and transforms. "
                "The key is the name of the dataset, which will be used to identify the dataset in the metrics. "
                "The value is the config for the dataset and transforms."
            )
        },
    )

    scratch: Any = field(
        default=None,
        metadata={
            "description": "Any data whatsoever to be used in dataset configs with config interpolations like ${data.scratch.my_config} to reduce duplication"
        },
    )

`EvalDataPipelineConfig` `dataclass` ¶

Bases: DataPipelineConfig

EvalDataPipelineConfig(source: optimus_dl.core.registry.RegistryConfig = '???', transform: optimus_dl.core.registry.RegistryConfig | None = None, profile: bool = False, report_freq: int = 100, eval_freq: int | None = None, eval_iterations: int | None = None, eval_guaranteed_same_batches: bool | None = None, eval_checkpointing: int | None = None)

Parameters:

Name	Type	Description	Default
`eval_freq`	`int \| None`	Frequency of evaluation in number of training steps specifically for this dataset. If None, use the global eval_freq.	`None`
`eval_iterations`	`int \| None`	Max number of iterations of validation data for this dataset. If None, use the global eval_iterations.	`None`
`eval_guaranteed_same_batches`	`bool \| None`	Whether it is guaranteed that each DP rank sees the same number of batches. If None, use the global eval_guaranteed_same_batches.	`None`
`eval_checkpointing`	`int \| None`	Frequency of saving checkpoints during evaluation. If None or non-positive (for example, 0), do not save checkpoints during evaluation. This is useful for long evaluations to be able to resume evaluation if it gets interrupted. Saves are fast and light, as they only contain the state of the meters and dataloader, not the model or optimizer states.	`None`

Source code in optimus_dl/modules/data/config.py

@dataclass
class EvalDataPipelineConfig(DataPipelineConfig):
    eval_freq: int | None = field(
        default=None,
        metadata={
            "description": "Frequency of evaluation in number of training steps specifically for this dataset. If None, use the global eval_freq."
        },
    )

    eval_iterations: int | None = field(
        default=None,
        metadata={
            "description": "Max number of iterations of validation data for this dataset. If None, use the global eval_iterations."
        },
    )

    eval_guaranteed_same_batches: bool | None = field(
        default=None,
        metadata={
            "description": "Whether it is guaranteed that each DP rank sees the same number of batches. If None, use the global eval_guaranteed_same_batches."
        },
    )

    eval_checkpointing: int | None = field(
        default=None,
        metadata={
            "description": "Frequency of saving checkpoints during evaluation. If None or non-positive (for example, 0), "
            "do not save checkpoints during evaluation. This is useful for long evaluations to be able to resume "
            "evaluation if it gets interrupted. "
            "Saves are fast and light, as they only contain the state of the meters and dataloader, not the model or optimizer states."
        },
    )

config

optimus_dl.modules.data.config ¶

DataConfig dataclass ¶

DataPipelineConfig dataclass ¶

EvalDataConfig dataclass ¶

EvalDataPipelineConfig dataclass ¶

`optimus_dl.modules.data.config` ¶

`DataConfig` `dataclass` ¶

`DataPipelineConfig` `dataclass` ¶

`EvalDataConfig` `dataclass` ¶

`EvalDataPipelineConfig` `dataclass` ¶