Skip to content

prefetch

optimus_dl.modules.data.transforms.prefetch

PrefetchTransform

Bases: BaseTransform

Transform that pre-fetches data items in a background thread.

This helps hide data loading and transformation latency by keeping a buffer of items ready for the training loop.

Parameters:

Name Type Description Default
cfg PrefetchTransformConfig

Prefetching configuration.

required
Source code in optimus_dl/modules/data/transforms/prefetch.py
@register_transform("prefetch", PrefetchTransformConfig)
class PrefetchTransform(BaseTransform):
    """Transform that pre-fetches data items in a background thread.

    This helps hide data loading and transformation latency by keeping a buffer
    of items ready for the training loop.

    Args:
        cfg: Prefetching configuration.
    """

    def __init__(self, cfg: PrefetchTransformConfig, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.cfg = cfg

    def build(self, source: BaseNode) -> BaseNode:
        """Wrap the source node with a Prefetcher."""
        return torchdata.nodes.Prefetcher(
            source,
            prefetch_factor=self.cfg.prefetch_factor,
            snapshot_frequency=self.cfg.snapshot_frequency,
        )

build(source)

Wrap the source node with a Prefetcher.

Source code in optimus_dl/modules/data/transforms/prefetch.py
def build(self, source: BaseNode) -> BaseNode:
    """Wrap the source node with a Prefetcher."""
    return torchdata.nodes.Prefetcher(
        source,
        prefetch_factor=self.cfg.prefetch_factor,
        snapshot_frequency=self.cfg.snapshot_frequency,
    )

PrefetchTransformConfig dataclass

Bases: RegistryConfigStrict

Configuration for prefetching.

Attributes:

Name Type Description

Parameters:

Name Type Description Default
prefetch_factor int
8
snapshot_frequency int
128
Source code in optimus_dl/modules/data/transforms/prefetch.py
@dataclass
class PrefetchTransformConfig(RegistryConfigStrict):
    """Configuration for prefetching.

    Attributes:
        prefetch_factor: Number of items to fetch ahead of request.
    """

    prefetch_factor: int = 8
    snapshot_frequency: int = 128