Skip to content

skip

optimus_dl.modules.data.transforms.skip

SkipInterleavedTransform

Bases: BaseTransform

Transform that deterministically skips a fixed number of data items.

This is useful for downsampling a dataset or creating interleaved subsets. It guarantees that the first item is always produced, followed by skipping exactly skip_count items before producing the next one.

Parameters:

Name Type Description Default
cfg SkipInterleavedTransformConfig

Configuration containing the skip_count parameter.

required
Source code in optimus_dl/modules/data/transforms/skip.py
@register_transform("skip_interleaved", SkipInterleavedTransformConfig)
class SkipInterleavedTransform(BaseTransform):
    """Transform that deterministically skips a fixed number of data items.

    This is useful for downsampling a dataset or creating interleaved subsets.
    It guarantees that the first item is always produced, followed by skipping
    exactly `skip_count` items before producing the next one.

    Args:
        cfg: Configuration containing the `skip_count` parameter.
    """

    def __init__(self, cfg: SkipInterleavedTransformConfig, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.cfg = cfg

    def build(self, source: BaseNode) -> BaseNode:
        """Wrap the source node with a SkipInterleavedTransformNode."""
        return SkipInterleavedTransformNode(source, self.cfg)

build(source)

Wrap the source node with a SkipInterleavedTransformNode.

Source code in optimus_dl/modules/data/transforms/skip.py
def build(self, source: BaseNode) -> BaseNode:
    """Wrap the source node with a SkipInterleavedTransformNode."""
    return SkipInterleavedTransformNode(source, self.cfg)

SkipInterleavedTransformConfig dataclass

Bases: RegistryConfigStrict

Configuration for skip_interleaved.

Attributes:

Name Type Description

Parameters:

Name Type Description Default
skip_count int
1
Source code in optimus_dl/modules/data/transforms/skip.py
@dataclass
class SkipInterleavedTransformConfig(RegistryConfigStrict):
    """Configuration for skip_interleaved.

    Attributes:
        skip_count: Number of items to skip before producing a new one. First item is always produced.
    """

    skip_count: int = 1

SkipRandomTransform

Bases: BaseTransform

Transform that randomly skips data items.

This adds stochastic sub-sampling to the data pipeline. Note that since skipping is probabilistic, the final dataset length will vary slightly unless exactly replicated with the same random seed.

Parameters:

Name Type Description Default
cfg SkipRandomTransformConfig

Configuration with the skip probability.

required
Source code in optimus_dl/modules/data/transforms/skip.py
@register_transform("skip_random", SkipRandomTransformConfig)
class SkipRandomTransform(BaseTransform):
    """Transform that randomly skips data items.

    This adds stochastic sub-sampling to the data pipeline. Note that since
    skipping is probabilistic, the final dataset length will vary slightly
    unless exactly replicated with the same random seed.

    Args:
        cfg: Configuration with the skip probability.
    """

    def __init__(self, cfg: SkipRandomTransformConfig, seed: int, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.cfg = cfg
        self.seed = seed

    def build(self, source: BaseNode) -> BaseNode:
        """Wrap the source node with a SkipRandomTransformNode."""
        return SkipRandomTransformNode(source, self.cfg, seed=self.seed)

build(source)

Wrap the source node with a SkipRandomTransformNode.

Source code in optimus_dl/modules/data/transforms/skip.py
def build(self, source: BaseNode) -> BaseNode:
    """Wrap the source node with a SkipRandomTransformNode."""
    return SkipRandomTransformNode(source, self.cfg, seed=self.seed)

SkipRandomTransformConfig dataclass

Bases: RegistryConfigStrict

Configuration for skip_random.

Attributes:

Name Type Description

Parameters:

Name Type Description Default
probability float
0.5
Source code in optimus_dl/modules/data/transforms/skip.py
@dataclass
class SkipRandomTransformConfig(RegistryConfigStrict):
    """Configuration for skip_random.

    Attributes:
        probability: Probability (0.0 to 1.0) that any given item is skipped.
    """

    probability: float = 0.5