loop_dataset
optimus_dl.modules.data.datasets.loop_dataset
¶
LoopDataset
¶
Bases: BaseDataset
Dataset that infinitely loops over an inner dataset.
When the inner dataset is exhausted, it is automatically re-initialized, creating an endless stream of data. This is useful for training loops where the model needs to see the data multiple times.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cfg
|
LoopDatasetConfig
|
Loop dataset configuration. |
required |
rank
|
int
|
Distributed rank for sharding. |
required |
world_size
|
int
|
Total number of ranks. |
required |
Source code in optimus_dl/modules/data/datasets/loop_dataset.py
get_state()
¶
Return the current state for checkpointing.
Source code in optimus_dl/modules/data/datasets/loop_dataset.py
next()
¶
Yield the next item from the inner dataset, resetting it if exhausted.
Source code in optimus_dl/modules/data/datasets/loop_dataset.py
reset(initial_state=None)
¶
Reset or restore the loop dataset state.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
initial_state
|
dict | None
|
Optional state dictionary for resuming. |
None
|
Source code in optimus_dl/modules/data/datasets/loop_dataset.py
LoopDatasetConfig
dataclass
¶
Bases: RegistryConfigStrict
LoopDatasetConfig(_name: str | None = None, inner: Any = '???')
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
inner
|
Any
|
|
'???'
|