base
optimus_dl.modules.data.datasets.base
¶
Base dataset class for data sources.
This module defines the BaseDataset class that all data sources must inherit from. It provides integration with torchdata's pipeline system and checkpointing support.
BaseDataset
¶
Bases: BaseNode
Base class for all dataset implementations.
All data sources in Optimus-DL should inherit from this class. It provides:
- Integration with torchdata's pipeline system
- Checkpointing support for resuming data iteration
- Configuration storage
Subclasses should implement:
- The data iteration logic (inherited from torchdata.nodes.BaseNode)
- Optionally override
load_state_dict()for custom checkpointing
Example
Source code in optimus_dl/modules/data/datasets/base.py
__init__(cfg, **kwargs)
¶
Initialize the base dataset.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cfg
|
Configuration object for this dataset. |
required | |
**kwargs
|
Additional keyword arguments passed from the data builder. |
{}
|
Source code in optimus_dl/modules/data/datasets/base.py
load_state_dict(state_dict)
¶
Load dataset state from checkpoint.
This method restores the dataset's iteration state, allowing training
to resume from the same position in the dataset. The default implementation
uses torchdata's reset() method.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
state_dict
|
dict
|
Dictionary containing the dataset's saved state. Typically includes iteration position, random state, etc. |
required |
Note
Subclasses can override this to handle custom state restoration. The state_dict is typically saved by the checkpoint manager.