device

`optimus_dl.core.device` ¶

Device and distributed setup utilities.

This module provides functions for automatically detecting and setting up the best available compute device (CUDA, MPS, XPU, or CPU) and initializing distributed training collectives.

`DeviceSetup` ¶

Bases: NamedTuple

Container for device and collective setup results.

Attributes:

Name	Type	Description

Parameters:

Name	Type	Description	Default
`device`	`device`		`None`
`collective`	`Any`		`None`

Source code in optimus_dl/core/device.py

class DeviceSetup(NamedTuple):
    """Container for device and collective setup results.

    Attributes:
        device: The PyTorch device to use for computation.
        collective: The distributed collective object for multi-GPU/multi-node training.
    """

    device: torch.device
    collective: Any

`get_best_device()` ¶

Detect and return the best available compute device.

Checks for available devices in order of preference: 1. CUDA (NVIDIA GPUs) 2. MPS (Apple Silicon GPUs) 3. XPU (Intel GPUs) 4. CPU (fallback)

Returns:

Type	Description
`device`	The best available torch.device. Always returns a valid device,
`device`	defaulting to CPU if no accelerators are available.

Example

device = get_best_device()
print(device)  # cuda, mps, xpu, or cpu

Source code in optimus_dl/core/device.py

def get_best_device() -> torch.device:
    """Detect and return the best available compute device.

    Checks for available devices in order of preference:
    1. CUDA (NVIDIA GPUs)
    2. MPS (Apple Silicon GPUs)
    3. XPU (Intel GPUs)
    4. CPU (fallback)

    Returns:
        The best available torch.device. Always returns a valid device,
        defaulting to CPU if no accelerators are available.

    Example:
        ```python
        device = get_best_device()
        print(device)  # cuda, mps, xpu, or cpu
        ```
    """
    if torch.cuda.is_available():
        return torch.device("cuda")
    if torch.mps.is_available():
        return torch.device("mps")
    if torch.xpu.is_available():
        return torch.device("xpu")
    return torch.device("cpu")

`setup_device_and_collective(use_gpu, config)` ¶

Setup compute device and distributed training collective.

This function initializes the training environment by: 1. Selecting the appropriate compute device (GPU or CPU) 2. Setting up distributed communication if multiple devices are available 3. Returning both the device and collective for use in training

Parameters:

Name	Type	Description	Default
`use_gpu`	`bool`	If True, attempts to use GPU if available. If False, uses CPU.	required
`config`	`DistributedConfig`	Distributed configuration specifying how to set up multi-GPU mesh.	required

Returns:

Type	Description
`DeviceSetup`	DeviceSetup namedtuple containing:
`DeviceSetup`	device: The PyTorch device to use for computation
`DeviceSetup`	collective: Distributed collective object for multi-GPU coordination

Example

from optimus_dl.modules.distributed.config import DistributedConfig
config = DistributedConfig()
setup = setup_device_and_collective(use_gpu=True, config=config)
model = model.to(setup.device)
# Use setup.collective for distributed operations

Source code in optimus_dl/core/device.py

def setup_device_and_collective(
    use_gpu: bool, config: DistributedConfig
) -> DeviceSetup:
    """Setup compute device and distributed training collective.

    This function initializes the training environment by:
    1. Selecting the appropriate compute device (GPU or CPU)
    2. Setting up distributed communication if multiple devices are available
    3. Returning both the device and collective for use in training

    Args:
        use_gpu: If True, attempts to use GPU if available. If False, uses CPU.
        config: Distributed configuration specifying how to set up multi-GPU mesh.

    Returns:
        DeviceSetup namedtuple containing:

        - device: The PyTorch device to use for computation
        - collective: Distributed collective object for multi-GPU coordination

    Example:
        ```python
        from optimus_dl.modules.distributed.config import DistributedConfig
        config = DistributedConfig()
        setup = setup_device_and_collective(use_gpu=True, config=config)
        model = model.to(setup.device)
        # Use setup.collective for distributed operations
        ```
    """
    from optimus_dl.modules.distributed import build_best_collective

    device = torch.device("cpu")
    if use_gpu:
        device = get_best_device()
    collective = build_best_collective(config=config, device=device)
    device = collective.default_device
    return DeviceSetup(device=device, collective=collective)

device

optimus_dl.core.device ¶

DeviceSetup ¶

get_best_device() ¶

setup_device_and_collective(use_gpu, config) ¶

`optimus_dl.core.device` ¶

`DeviceSetup` ¶

`get_best_device()` ¶

`setup_device_and_collective(use_gpu, config)` ¶