shuffle
optimus_dl.modules.data.transforms.shuffle
¶
ShuffleTransform
¶
Bases: BaseTransform
Transform that shuffles data items using an internal buffer.
Ensures that items are yielded in a randomized order within a sliding window
of buffer_size. Seed is automatically adjusted per rank to ensure variety
in distributed training.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cfg
|
ShuffleTransformConfig
|
Shuffling configuration. |
required |
rank
|
int
|
Distributed rank. |
required |
Source code in optimus_dl/modules/data/transforms/shuffle.py
build(source)
¶
Apply the shuffling transformation to a source node.
ShuffleTransformConfig
dataclass
¶
Bases: RegistryConfigStrict
Configuration for data shuffling.
Attributes:
| Name | Type | Description |
|---|
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
buffer_size
|
int
|
|
1024
|
Source code in optimus_dl/modules/data/transforms/shuffle.py
ShuffleTransformNode
¶
Bases: BaseNode
Internal node for performing buffer-based shuffling.
Fills an internal buffer from the source node and yields items selected randomly from that buffer.
Source code in optimus_dl/modules/data/transforms/shuffle.py
get_state()
¶
Collect current buffer, terminated flag, and RNG state for checkpointing.
Source code in optimus_dl/modules/data/transforms/shuffle.py
next()
¶
Yield a randomly selected item from the shuffle buffer.
Source code in optimus_dl/modules/data/transforms/shuffle.py
reset(initial_state=None)
¶
Restore the shuffle buffer and RNG state.