Skip to content

common

optimus_dl.modules.metrics.common

Common meter implementations and logging utilities.

This module provides standard meter types (averages, sums, frequencies, etc.) and convenience functions for logging meter values during training. All meters support distributed aggregation and checkpointing.

AverageMeter

Bases: BaseMeter

Meter that computes a weighted average of logged values.

This meter accumulates weighted values and computes the average when compute() is called. Useful for values like loss, accuracy, etc. that should be averaged over batches.

Attributes:

Name Type Description
round

Number of decimal places to round the result to (None = no rounding).

sum

Accumulated sum of (value * weight).

count

Accumulated sum of weights.

Example
meter = AverageMeter(round=4)
meter.log(value=0.5, weight=32)  # Batch size 32
meter.log(value=0.6, weight=32)
meter.compute()  # (0.5*32 + 0.6*32) / (32+32) = 0.55
Source code in optimus_dl/modules/metrics/common.py
class AverageMeter(BaseMeter):
    """Meter that computes a weighted average of logged values.

    This meter accumulates weighted values and computes the average when
    `compute()` is called. Useful for values like loss, accuracy, etc.
    that should be averaged over batches.

    Attributes:
        round: Number of decimal places to round the result to (None = no rounding).
        sum: Accumulated sum of (value * weight).
        count: Accumulated sum of weights.

    Example:
        ```python
        meter = AverageMeter(round=4)
        meter.log(value=0.5, weight=32)  # Batch size 32
        meter.log(value=0.6, weight=32)
        meter.compute()  # (0.5*32 + 0.6*32) / (32+32) = 0.55

        ```"""

    def __init__(self, round: int | None = None):
        """Initialize the average meter.

        Args:
            round: Number of decimal places to round results to. If None,
                results are not rounded.
        """
        self.round = round
        self.sum = 0
        self.count = 0

    def compute(self) -> float | int:
        """Compute the weighted average.

        Returns:
            Weighted average of all logged values, optionally rounded.
        """
        if self.count == 0:
            return 0
        return safe_round(self.sum / self.count, self.round)

    def log(self, value: float | int, weight: float | int) -> None:
        """Log a value with an associated weight.

        Args:
            value: The value to add to the average.
            weight: The weight for this value (typically batch size).
        """
        self.sum += value * weight
        self.count += weight

    def merge(self, other_state: dict[str, Any]) -> None:
        """Merge state from another meter instance (for distributed aggregation).

        Args:
            other_state: State dictionary from another AverageMeter instance.
        """
        self.sum += other_state["sum"]
        self.count += other_state["count"]

__init__(round=None)

Initialize the average meter.

Parameters:

Name Type Description Default
round int | None

Number of decimal places to round results to. If None, results are not rounded.

None
Source code in optimus_dl/modules/metrics/common.py
def __init__(self, round: int | None = None):
    """Initialize the average meter.

    Args:
        round: Number of decimal places to round results to. If None,
            results are not rounded.
    """
    self.round = round
    self.sum = 0
    self.count = 0

compute()

Compute the weighted average.

Returns:

Type Description
float | int

Weighted average of all logged values, optionally rounded.

Source code in optimus_dl/modules/metrics/common.py
def compute(self) -> float | int:
    """Compute the weighted average.

    Returns:
        Weighted average of all logged values, optionally rounded.
    """
    if self.count == 0:
        return 0
    return safe_round(self.sum / self.count, self.round)

log(value, weight)

Log a value with an associated weight.

Parameters:

Name Type Description Default
value float | int

The value to add to the average.

required
weight float | int

The weight for this value (typically batch size).

required
Source code in optimus_dl/modules/metrics/common.py
def log(self, value: float | int, weight: float | int) -> None:
    """Log a value with an associated weight.

    Args:
        value: The value to add to the average.
        weight: The weight for this value (typically batch size).
    """
    self.sum += value * weight
    self.count += weight

merge(other_state)

Merge state from another meter instance (for distributed aggregation).

Parameters:

Name Type Description Default
other_state dict[str, Any]

State dictionary from another AverageMeter instance.

required
Source code in optimus_dl/modules/metrics/common.py
def merge(self, other_state: dict[str, Any]) -> None:
    """Merge state from another meter instance (for distributed aggregation).

    Args:
        other_state: State dictionary from another AverageMeter instance.
    """
    self.sum += other_state["sum"]
    self.count += other_state["count"]

AveragedExponentMeter

Bases: BaseMeter

Meter that computes the exponent of a weighted average.

Commonly used for computing perplexity (exp(loss)).

Attributes:

Name Type Description
_internal

An AverageMeter instance used to compute the weighted average.

round

Number of decimal places to round the final result to.

Source code in optimus_dl/modules/metrics/common.py
class AveragedExponentMeter(BaseMeter):  # Ensures it inherits from BaseMeter
    """Meter that computes the exponent of a weighted average.

    Commonly used for computing perplexity (exp(loss)).

    Attributes:
        _internal: An `AverageMeter` instance used to compute the weighted average.
        round: Number of decimal places to round the final result to.
    """

    def __init__(self, round: int | None = None):
        """Initialize the averaged exponent meter.

        Args:
            round: Number of decimal places to round results to. If None,
                results are not rounded.
        """
        self._internal = AverageMeter()
        self.round = round

    def log(self, value: float | int, weight: float | int):
        """Log a log-scale value with its weight to the internal average meter.

        Args:
            value: The log-scale value to add.
            weight: The weight for this value.
        """
        self._internal.log(value, weight)

    def compute(self) -> float | int:
        """Return the exponent of the average.

        Computes the weighted average of logged values using the internal
        `AverageMeter` and then returns `exp()` of that average.

        Returns:
            The exponent of the weighted average, optionally rounded.
        """
        return safe_round(np.exp(self._internal.compute()), self.round)

    def merge(self, other_state: dict[str, Any]):
        """Merge state from another AveragedExponentMeter.

        Args:
            other_state: State dictionary from another AveragedExponentMeter instance.
        """
        self._internal.merge(other_state["internal"])

    def load_state_dict(self, state_dict: dict[str, Any]):
        """Restore state.

        Args:
            state_dict: State dictionary to restore from.
        """
        self._internal.load_state_dict(state_dict["internal"])
        self.round = state_dict["round"]

    def state_dict(self) -> dict[str, Any]:
        """Collect state for checkpointing.

        Returns:
            A dictionary containing the internal state of the `AverageMeter`
            and the rounding precision.
        """
        return {
            "internal": self._internal.state_dict(),
            "round": self.round,
        }

__init__(round=None)

Initialize the averaged exponent meter.

Parameters:

Name Type Description Default
round int | None

Number of decimal places to round results to. If None, results are not rounded.

None
Source code in optimus_dl/modules/metrics/common.py
def __init__(self, round: int | None = None):
    """Initialize the averaged exponent meter.

    Args:
        round: Number of decimal places to round results to. If None,
            results are not rounded.
    """
    self._internal = AverageMeter()
    self.round = round

compute()

Return the exponent of the average.

Computes the weighted average of logged values using the internal AverageMeter and then returns exp() of that average.

Returns:

Type Description
float | int

The exponent of the weighted average, optionally rounded.

Source code in optimus_dl/modules/metrics/common.py
def compute(self) -> float | int:
    """Return the exponent of the average.

    Computes the weighted average of logged values using the internal
    `AverageMeter` and then returns `exp()` of that average.

    Returns:
        The exponent of the weighted average, optionally rounded.
    """
    return safe_round(np.exp(self._internal.compute()), self.round)

load_state_dict(state_dict)

Restore state.

Parameters:

Name Type Description Default
state_dict dict[str, Any]

State dictionary to restore from.

required
Source code in optimus_dl/modules/metrics/common.py
def load_state_dict(self, state_dict: dict[str, Any]):
    """Restore state.

    Args:
        state_dict: State dictionary to restore from.
    """
    self._internal.load_state_dict(state_dict["internal"])
    self.round = state_dict["round"]

log(value, weight)

Log a log-scale value with its weight to the internal average meter.

Parameters:

Name Type Description Default
value float | int

The log-scale value to add.

required
weight float | int

The weight for this value.

required
Source code in optimus_dl/modules/metrics/common.py
def log(self, value: float | int, weight: float | int):
    """Log a log-scale value with its weight to the internal average meter.

    Args:
        value: The log-scale value to add.
        weight: The weight for this value.
    """
    self._internal.log(value, weight)

merge(other_state)

Merge state from another AveragedExponentMeter.

Parameters:

Name Type Description Default
other_state dict[str, Any]

State dictionary from another AveragedExponentMeter instance.

required
Source code in optimus_dl/modules/metrics/common.py
def merge(self, other_state: dict[str, Any]):
    """Merge state from another AveragedExponentMeter.

    Args:
        other_state: State dictionary from another AveragedExponentMeter instance.
    """
    self._internal.merge(other_state["internal"])

state_dict()

Collect state for checkpointing.

Returns:

Type Description
dict[str, Any]

A dictionary containing the internal state of the AverageMeter

dict[str, Any]

and the rounding precision.

Source code in optimus_dl/modules/metrics/common.py
def state_dict(self) -> dict[str, Any]:
    """Collect state for checkpointing.

    Returns:
        A dictionary containing the internal state of the `AverageMeter`
        and the rounding precision.
    """
    return {
        "internal": self._internal.state_dict(),
        "round": self.round,
    }

CachedLambda

Wrapper that caches the result of a callable function.

This is useful for expensive computations that are used multiple times in meter logging. The function is only called once, and subsequent calls return the cached result.

Example
# Expensive computation
def compute_expensive_value():
    return complex_calculation()

cached = CachedLambda(compute_expensive_value)
value1 = cached()  # Computes and caches
value2 = cached()  # Returns cached value
Source code in optimus_dl/modules/metrics/common.py
class CachedLambda:
    """Wrapper that caches the result of a callable function.

    This is useful for expensive computations that are used multiple times
    in meter logging. The function is only called once, and subsequent calls
    return the cached result.

    Example:
        ```python
        # Expensive computation
        def compute_expensive_value():
            return complex_calculation()

        cached = CachedLambda(compute_expensive_value)
        value1 = cached()  # Computes and caches
        value2 = cached()  # Returns cached value

        ```"""

    def __init__(self, func: Callable[[], Any]):
        """Initialize the cached lambda.

        Args:
            func: Callable function that takes no arguments and returns a value.
        """
        self._func = func
        self._cache = None
        self._cached = False

    def __call__(self) -> Any:
        """Call the function, caching the result.

        Returns:
            The result of the function call. On first call, the function is
            executed and the result is cached. On subsequent calls, the cached value
            is returned.
        """
        if not self._cached:
            self._cache = self._func()
            self._cached = True
        return self._cache

__call__()

Call the function, caching the result.

Returns:

Type Description
Any

The result of the function call. On first call, the function is

Any

executed and the result is cached. On subsequent calls, the cached value

Any

is returned.

Source code in optimus_dl/modules/metrics/common.py
def __call__(self) -> Any:
    """Call the function, caching the result.

    Returns:
        The result of the function call. On first call, the function is
        executed and the result is cached. On subsequent calls, the cached value
        is returned.
    """
    if not self._cached:
        self._cache = self._func()
        self._cached = True
    return self._cache

__init__(func)

Initialize the cached lambda.

Parameters:

Name Type Description Default
func Callable[[], Any]

Callable function that takes no arguments and returns a value.

required
Source code in optimus_dl/modules/metrics/common.py
def __init__(self, func: Callable[[], Any]):
    """Initialize the cached lambda.

    Args:
        func: Callable function that takes no arguments and returns a value.
    """
    self._func = func
    self._cache = None
    self._cached = False

FrequencyMeter

Bases: BaseMeter

Meter that computes the frequency (duration per call) of an event.

Measures time between successive calls to log().

Attributes:

Name Type Description
round

Rounding precision for the result.

start int | None

Internal timestamp of the last call to log().

elapsed int

Total elapsed time in nanoseconds.

counter int

Number of events recorded.

Source code in optimus_dl/modules/metrics/common.py
class FrequencyMeter(BaseMeter):
    """Meter that computes the frequency (duration per call) of an event.

    Measures time between successive calls to `log()`.

    Attributes:
        round: Rounding precision for the result.
        start: Internal timestamp of the last call to `log()`.
        elapsed: Total elapsed time in nanoseconds.
        counter: Number of events recorded.
    """

    def __init__(self, round: int | None = None):
        """Initialize the frequency meter.

        Args:
            round: Number of decimal places to round results to. If None,
                results are not rounded.
        """
        self.round = round
        self.start: int | None = None
        self.elapsed: int = 0
        self.counter: int = 0

    def log(self):
        """Record an occurrence and compute elapsed time since the last call.

        If this is the first call, it initializes the start time.
        Subsequent calls update the elapsed time and increment the counter.
        """
        if self.start is None:
            self.start = time.perf_counter_ns()
            return
        self.counter += 1
        self.elapsed += time.perf_counter_ns() - self.start
        self.start = time.perf_counter_ns()

    def compute(self) -> float | int | dict[str, float | int]:
        """Compute average time per occurrence in milliseconds.

        Returns:
            The average time per occurrence in milliseconds, optionally rounded.
            Returns 0 if no events have been recorded.
        """
        if self.counter == 0:
            return 0
        return safe_round(self.elapsed / self.counter / 1e6, self.round)

    def merge(self, other_state: dict[str, Any]):
        """Merge state from another FrequencyMeter.

        Args:
            other_state: State dictionary from another FrequencyMeter instance.
        """
        self.elapsed += other_state["elapsed"]
        self.counter += other_state["counter"]

    def load_state_dict(self, state_dict: dict[str, Any]):
        """Restore state and reset the start timer.

        Args:
            state_dict: State dictionary to restore from.
        """
        super().load_state_dict(state_dict)
        self.start = None  # Reset start timer on load to avoid inaccurate timing across checkpoints

__init__(round=None)

Initialize the frequency meter.

Parameters:

Name Type Description Default
round int | None

Number of decimal places to round results to. If None, results are not rounded.

None
Source code in optimus_dl/modules/metrics/common.py
def __init__(self, round: int | None = None):
    """Initialize the frequency meter.

    Args:
        round: Number of decimal places to round results to. If None,
            results are not rounded.
    """
    self.round = round
    self.start: int | None = None
    self.elapsed: int = 0
    self.counter: int = 0

compute()

Compute average time per occurrence in milliseconds.

Returns:

Type Description
float | int | dict[str, float | int]

The average time per occurrence in milliseconds, optionally rounded.

float | int | dict[str, float | int]

Returns 0 if no events have been recorded.

Source code in optimus_dl/modules/metrics/common.py
def compute(self) -> float | int | dict[str, float | int]:
    """Compute average time per occurrence in milliseconds.

    Returns:
        The average time per occurrence in milliseconds, optionally rounded.
        Returns 0 if no events have been recorded.
    """
    if self.counter == 0:
        return 0
    return safe_round(self.elapsed / self.counter / 1e6, self.round)

load_state_dict(state_dict)

Restore state and reset the start timer.

Parameters:

Name Type Description Default
state_dict dict[str, Any]

State dictionary to restore from.

required
Source code in optimus_dl/modules/metrics/common.py
def load_state_dict(self, state_dict: dict[str, Any]):
    """Restore state and reset the start timer.

    Args:
        state_dict: State dictionary to restore from.
    """
    super().load_state_dict(state_dict)
    self.start = None  # Reset start timer on load to avoid inaccurate timing across checkpoints

log()

Record an occurrence and compute elapsed time since the last call.

If this is the first call, it initializes the start time. Subsequent calls update the elapsed time and increment the counter.

Source code in optimus_dl/modules/metrics/common.py
def log(self):
    """Record an occurrence and compute elapsed time since the last call.

    If this is the first call, it initializes the start time.
    Subsequent calls update the elapsed time and increment the counter.
    """
    if self.start is None:
        self.start = time.perf_counter_ns()
        return
    self.counter += 1
    self.elapsed += time.perf_counter_ns() - self.start
    self.start = time.perf_counter_ns()

merge(other_state)

Merge state from another FrequencyMeter.

Parameters:

Name Type Description Default
other_state dict[str, Any]

State dictionary from another FrequencyMeter instance.

required
Source code in optimus_dl/modules/metrics/common.py
def merge(self, other_state: dict[str, Any]):
    """Merge state from another FrequencyMeter.

    Args:
        other_state: State dictionary from another FrequencyMeter instance.
    """
    self.elapsed += other_state["elapsed"]
    self.counter += other_state["counter"]

GatherMeter

Bases: BaseMeter

Accumulator that gathers all raw values across the entire dataset.

Use this for meters that require full dataset context (e.g., BLEU, ROC-AUC).

Source code in optimus_dl/modules/metrics/common.py
class GatherMeter(BaseMeter):
    """Accumulator that gathers all raw values across the entire dataset.

    Use this for meters that require full dataset context (e.g., BLEU, ROC-AUC).
    """

    def __init__(self):
        """Initializes the GatherMeter with an empty list to store values."""
        self.values: list[Any] = []

    def log(self, value: Any):
        """Logs a single value to be gathered."""
        self.values.append(value)

    def compute(self) -> list[Any]:
        """Returns the list of all gathered values."""
        return list(self.values)

    def merge(self, other_state: dict[str, Any]):
        """Merges the state from another GatherMeter instance."""
        self.values.extend(other_state["values"])

    def state_dict(self) -> dict[str, Any]:
        """Returns the state of the GatherMeter for checkpointing."""
        return {"values": self.values}

    def load_state_dict(self, state_dict: dict[str, Any]):
        """Restores the state of the GatherMeter from a state dictionary."""
        self.values = state_dict["values"]

__init__()

Initializes the GatherMeter with an empty list to store values.

Source code in optimus_dl/modules/metrics/common.py
def __init__(self):
    """Initializes the GatherMeter with an empty list to store values."""
    self.values: list[Any] = []

compute()

Returns the list of all gathered values.

Source code in optimus_dl/modules/metrics/common.py
def compute(self) -> list[Any]:
    """Returns the list of all gathered values."""
    return list(self.values)

load_state_dict(state_dict)

Restores the state of the GatherMeter from a state dictionary.

Source code in optimus_dl/modules/metrics/common.py
def load_state_dict(self, state_dict: dict[str, Any]):
    """Restores the state of the GatherMeter from a state dictionary."""
    self.values = state_dict["values"]

log(value)

Logs a single value to be gathered.

Source code in optimus_dl/modules/metrics/common.py
def log(self, value: Any):
    """Logs a single value to be gathered."""
    self.values.append(value)

merge(other_state)

Merges the state from another GatherMeter instance.

Source code in optimus_dl/modules/metrics/common.py
def merge(self, other_state: dict[str, Any]):
    """Merges the state from another GatherMeter instance."""
    self.values.extend(other_state["values"])

state_dict()

Returns the state of the GatherMeter for checkpointing.

Source code in optimus_dl/modules/metrics/common.py
def state_dict(self) -> dict[str, Any]:
    """Returns the state of the GatherMeter for checkpointing."""
    return {"values": self.values}

MaxMeter

Bases: BaseMeter

Meter that tracks the maximum logged value.

This meter accumulates values and computes the maximum when compute() is called. Useful for tracking peak memory, max loss, etc.

Attributes:

Name Type Description
round

Number of decimal places to round the result to (None = no rounding).

value

Accumulated maximum value.

Example
meter = MaxMeter()
meter.log(10)
meter.log(20)
meter.log(15)
meter.compute()  # 20
Source code in optimus_dl/modules/metrics/common.py
class MaxMeter(BaseMeter):
    """Meter that tracks the maximum logged value.

    This meter accumulates values and computes the maximum when
    `compute()` is called. Useful for tracking peak memory, max loss, etc.

    Attributes:
        round: Number of decimal places to round the result to (None = no rounding).
        value: Accumulated maximum value.

    Example:
        ```python
        meter = MaxMeter()
        meter.log(10)
        meter.log(20)
        meter.log(15)
        meter.compute()  # 20

        ```"""

    def __init__(self, round: int | None = None):
        """Initialize the max meter.

        Args:
            round: Number of decimal places to round results to. If None,
                results are not rounded.
        """
        self.round = round
        self.value = -float("inf")

    def compute(self) -> float | int:
        """Compute the maximum value.

        Returns:
            Maximum of all logged values, optionally rounded.
        """
        value = get_item(self.value)
        if not np.isfinite(value):
            return value
        return safe_round(value, self.round)

    def log(self, value: float | int) -> None:
        """Log a value to be tracked for maximum.

        Args:
            value: The value to accumulate to the max.
        """
        self.value = max(self.value, get_item(value))

    def merge(self, other_state: dict[str, Any]) -> None:
        """Merge state from another meter instance (for distributed aggregation).

        Args:
            other_state: State dictionary from another MaxMeter instance.
        """
        self.value = max(self.value, get_item(other_state["value"]))

__init__(round=None)

Initialize the max meter.

Parameters:

Name Type Description Default
round int | None

Number of decimal places to round results to. If None, results are not rounded.

None
Source code in optimus_dl/modules/metrics/common.py
def __init__(self, round: int | None = None):
    """Initialize the max meter.

    Args:
        round: Number of decimal places to round results to. If None,
            results are not rounded.
    """
    self.round = round
    self.value = -float("inf")

compute()

Compute the maximum value.

Returns:

Type Description
float | int

Maximum of all logged values, optionally rounded.

Source code in optimus_dl/modules/metrics/common.py
def compute(self) -> float | int:
    """Compute the maximum value.

    Returns:
        Maximum of all logged values, optionally rounded.
    """
    value = get_item(self.value)
    if not np.isfinite(value):
        return value
    return safe_round(value, self.round)

log(value)

Log a value to be tracked for maximum.

Parameters:

Name Type Description Default
value float | int

The value to accumulate to the max.

required
Source code in optimus_dl/modules/metrics/common.py
def log(self, value: float | int) -> None:
    """Log a value to be tracked for maximum.

    Args:
        value: The value to accumulate to the max.
    """
    self.value = max(self.value, get_item(value))

merge(other_state)

Merge state from another meter instance (for distributed aggregation).

Parameters:

Name Type Description Default
other_state dict[str, Any]

State dictionary from another MaxMeter instance.

required
Source code in optimus_dl/modules/metrics/common.py
def merge(self, other_state: dict[str, Any]) -> None:
    """Merge state from another meter instance (for distributed aggregation).

    Args:
        other_state: State dictionary from another MaxMeter instance.
    """
    self.value = max(self.value, get_item(other_state["value"]))

MinMeter

Bases: MaxMeter

Meter that tracks the minimum logged value.

This meter accumulates values and computes the minimum when compute() is called. Useful for tracking minimum learning rate, etc.

Example
meter = MinMeter()
meter.log(10)
meter.log(5)
meter.log(15)
meter.compute()  # 5
Source code in optimus_dl/modules/metrics/common.py
class MinMeter(MaxMeter):
    """Meter that tracks the minimum logged value.

    This meter accumulates values and computes the minimum when
    `compute()` is called. Useful for tracking minimum learning rate, etc.

    Example:
        ```python
        meter = MinMeter()
        meter.log(10)
        meter.log(5)
        meter.log(15)
        meter.compute()  # 5

        ```"""

    def compute(self) -> float | int:
        """Compute the minimum value.

        Returns:
            Minimum of all logged values, optionally rounded.
        """
        return -super().compute()

    def log(self, value: float | int) -> None:
        """Log a value to be tracked for minimum.

        Args:
            value: The value to add to the meter.
        """
        super().log(-value)

compute()

Compute the minimum value.

Returns:

Type Description
float | int

Minimum of all logged values, optionally rounded.

Source code in optimus_dl/modules/metrics/common.py
def compute(self) -> float | int:
    """Compute the minimum value.

    Returns:
        Minimum of all logged values, optionally rounded.
    """
    return -super().compute()

log(value)

Log a value to be tracked for minimum.

Parameters:

Name Type Description Default
value float | int

The value to add to the meter.

required
Source code in optimus_dl/modules/metrics/common.py
def log(self, value: float | int) -> None:
    """Log a value to be tracked for minimum.

    Args:
        value: The value to add to the meter.
    """
    super().log(-value)

StopwatchMeter

Bases: BaseMeter

Meter that acts as a manual stopwatch for measuring event durations.

Expects pairs of log(mode="start") and log(mode="end") calls.

Attributes:

Name Type Description
round

Rounding precision for the result.

_start float | None

Internal timestamp of when the stopwatch was started.

elapsed int

Total elapsed time in nanoseconds across all recorded intervals.

counter int

Number of completed intervals.

Source code in optimus_dl/modules/metrics/common.py
class StopwatchMeter(BaseMeter):  # Ensures it inherits from BaseMeter
    """Meter that acts as a manual stopwatch for measuring event durations.

    Expects pairs of `log(mode="start")` and `log(mode="end")` calls.

    Attributes:
        round: Rounding precision for the result.
        _start: Internal timestamp of when the stopwatch was started.
        elapsed: Total elapsed time in nanoseconds across all recorded intervals.
        counter: Number of completed intervals.
    """

    def __init__(self, round: int | None = None):
        """Initialize the stopwatch meter.

        Args:
            round: Number of decimal places to round results to. If None,
                results are not rounded.
        """
        self.round = round
        self._start: float | None = None
        self.elapsed: int = 0
        self.counter: int = 0

    def log(self, mode: str):
        """Start or stop the timer.

        Args:
            mode: "start" to begin timing, "end" to stop and record duration.

        Raises:
            AssertionError: If an unknown mode is provided.
        """
        if mode == "start":
            self.start()
        elif mode == "end":
            self.end()
        else:
            raise AssertionError("Unknown mode")

    def start(self):
        """Start the timer."""
        self._start = time.perf_counter_ns()

    def end(self):
        """Stop the timer and record the duration.

        Raises:
            AssertionError: If `start()` was not called before `end()`.
        """
        assert self._start is not None, "Stopwatch was never started"
        self.elapsed += time.perf_counter_ns() - self._start
        self.counter += 1
        self._start = None

    def compute(self) -> float | int | dict[str, float | int]:
        """Compute average duration in milliseconds.

        Returns:
            The average duration in milliseconds, optionally rounded.
            Returns 0 if no intervals have been recorded.
        """
        if self.counter == 0:
            return 0
        return safe_round(self.elapsed / self.counter / 1e6, self.round)

    def merge(self, other_state: dict[str, Any]):
        """Merge state from another StopwatchMeter.

        Args:
            other_state: State dictionary from another StopwatchMeter instance.
        """
        self.elapsed += other_state["elapsed"]
        self.counter += other_state["counter"]

    def load_state_dict(self, state_dict: dict[str, Any]):
        """Restore state and reset current timer.

        Args:
            state_dict: State dictionary to restore from.
        """
        super().load_state_dict(state_dict)
        self._start = None  # Reset current timer on load to avoid inaccurate timing across checkpoints

__init__(round=None)

Initialize the stopwatch meter.

Parameters:

Name Type Description Default
round int | None

Number of decimal places to round results to. If None, results are not rounded.

None
Source code in optimus_dl/modules/metrics/common.py
def __init__(self, round: int | None = None):
    """Initialize the stopwatch meter.

    Args:
        round: Number of decimal places to round results to. If None,
            results are not rounded.
    """
    self.round = round
    self._start: float | None = None
    self.elapsed: int = 0
    self.counter: int = 0

compute()

Compute average duration in milliseconds.

Returns:

Type Description
float | int | dict[str, float | int]

The average duration in milliseconds, optionally rounded.

float | int | dict[str, float | int]

Returns 0 if no intervals have been recorded.

Source code in optimus_dl/modules/metrics/common.py
def compute(self) -> float | int | dict[str, float | int]:
    """Compute average duration in milliseconds.

    Returns:
        The average duration in milliseconds, optionally rounded.
        Returns 0 if no intervals have been recorded.
    """
    if self.counter == 0:
        return 0
    return safe_round(self.elapsed / self.counter / 1e6, self.round)

end()

Stop the timer and record the duration.

Raises:

Type Description
AssertionError

If start() was not called before end().

Source code in optimus_dl/modules/metrics/common.py
def end(self):
    """Stop the timer and record the duration.

    Raises:
        AssertionError: If `start()` was not called before `end()`.
    """
    assert self._start is not None, "Stopwatch was never started"
    self.elapsed += time.perf_counter_ns() - self._start
    self.counter += 1
    self._start = None

load_state_dict(state_dict)

Restore state and reset current timer.

Parameters:

Name Type Description Default
state_dict dict[str, Any]

State dictionary to restore from.

required
Source code in optimus_dl/modules/metrics/common.py
def load_state_dict(self, state_dict: dict[str, Any]):
    """Restore state and reset current timer.

    Args:
        state_dict: State dictionary to restore from.
    """
    super().load_state_dict(state_dict)
    self._start = None  # Reset current timer on load to avoid inaccurate timing across checkpoints

log(mode)

Start or stop the timer.

Parameters:

Name Type Description Default
mode str

"start" to begin timing, "end" to stop and record duration.

required

Raises:

Type Description
AssertionError

If an unknown mode is provided.

Source code in optimus_dl/modules/metrics/common.py
def log(self, mode: str):
    """Start or stop the timer.

    Args:
        mode: "start" to begin timing, "end" to stop and record duration.

    Raises:
        AssertionError: If an unknown mode is provided.
    """
    if mode == "start":
        self.start()
    elif mode == "end":
        self.end()
    else:
        raise AssertionError("Unknown mode")

merge(other_state)

Merge state from another StopwatchMeter.

Parameters:

Name Type Description Default
other_state dict[str, Any]

State dictionary from another StopwatchMeter instance.

required
Source code in optimus_dl/modules/metrics/common.py
def merge(self, other_state: dict[str, Any]):
    """Merge state from another StopwatchMeter.

    Args:
        other_state: State dictionary from another StopwatchMeter instance.
    """
    self.elapsed += other_state["elapsed"]
    self.counter += other_state["counter"]

start()

Start the timer.

Source code in optimus_dl/modules/metrics/common.py
def start(self):
    """Start the timer."""
    self._start = time.perf_counter_ns()

SummedMeter

Bases: BaseMeter

Meter that sums logged values.

This meter simply accumulates values without averaging. Useful for values like total tokens processed, total examples seen, etc.

Attributes:

Name Type Description
round

Number of decimal places to round results to (None = no rounding).

sum

Accumulated sum of all logged values.

Example
meter = SummedMeter()
meter.log(100)
meter.log(200)
meter.compute()  # 300
Source code in optimus_dl/modules/metrics/common.py
class SummedMeter(BaseMeter):
    """Meter that sums logged values.

    This meter simply accumulates values without averaging. Useful for
    values like total tokens processed, total examples seen, etc.

    Attributes:
        round: Number of decimal places to round results to (None = no rounding).
        sum: Accumulated sum of all logged values.

    Example:
        ```python
        meter = SummedMeter()
        meter.log(100)
        meter.log(200)
        meter.compute()  # 300

        ```"""

    def __init__(self, round: int | None = None):
        """Initialize the summed meter.

        Args:
            round: Number of decimal places to round results to. If None,
                results are not rounded.
        """
        self.round = round
        self.sum = 0

    def compute(self) -> float | int:
        """Compute the sum.

        Returns:
            Sum of all logged values, optionally rounded.
        """
        return safe_round(self.sum, self.round)

    def log(self, value: float | int) -> None:
        """Log a value to add to the sum.

        Args:
            value: The value to add to the sum.
        """
        self.sum += value

    def merge(self, other_state: dict[str, Any]) -> None:
        """Merge state from another meter instance (for distributed aggregation).

        Args:
            other_state: State dictionary from another SummedMeter instance.
        """
        self.sum += other_state["sum"]

__init__(round=None)

Initialize the summed meter.

Parameters:

Name Type Description Default
round int | None

Number of decimal places to round results to. If None, results are not rounded.

None
Source code in optimus_dl/modules/metrics/common.py
def __init__(self, round: int | None = None):
    """Initialize the summed meter.

    Args:
        round: Number of decimal places to round results to. If None,
            results are not rounded.
    """
    self.round = round
    self.sum = 0

compute()

Compute the sum.

Returns:

Type Description
float | int

Sum of all logged values, optionally rounded.

Source code in optimus_dl/modules/metrics/common.py
def compute(self) -> float | int:
    """Compute the sum.

    Returns:
        Sum of all logged values, optionally rounded.
    """
    return safe_round(self.sum, self.round)

log(value)

Log a value to add to the sum.

Parameters:

Name Type Description Default
value float | int

The value to add to the sum.

required
Source code in optimus_dl/modules/metrics/common.py
def log(self, value: float | int) -> None:
    """Log a value to add to the sum.

    Args:
        value: The value to add to the sum.
    """
    self.sum += value

merge(other_state)

Merge state from another meter instance (for distributed aggregation).

Parameters:

Name Type Description Default
other_state dict[str, Any]

State dictionary from another SummedMeter instance.

required
Source code in optimus_dl/modules/metrics/common.py
def merge(self, other_state: dict[str, Any]) -> None:
    """Merge state from another meter instance (for distributed aggregation).

    Args:
        other_state: State dictionary from another SummedMeter instance.
    """
    self.sum += other_state["sum"]

cached_lambda(x)

Create a cached lambda wrapper.

Convenience function for creating a CachedLambda instance.

Parameters:

Name Type Description Default
x Callable[[], Any]

Callable function to cache.

required

Returns:

Type Description
CachedLambda

CachedLambda instance that caches the function's result.

Example
expensive = cached_lambda(lambda: expensive_computation())
result = expensive()  # Computes once
result = expensive()  # Uses cache
Source code in optimus_dl/modules/metrics/common.py
def cached_lambda(x: Callable[[], Any]) -> CachedLambda:
    """Create a cached lambda wrapper.

    Convenience function for creating a CachedLambda instance.

    Args:
        x: Callable function to cache.

    Returns:
        CachedLambda instance that caches the function's result.

    Example:
        ```python
        expensive = cached_lambda(lambda: expensive_computation())
        result = expensive()  # Computes once
        result = expensive()  # Uses cache

        ```"""
    return CachedLambda(x)

log_averaged(name, value, weight=1.0, round=None, reset=True, priority=100)

Log a value to an averaged meter.

This is a convenience function for logging values to an AverageMeter. The value and weight can be callables (lambdas) that are only evaluated when the meter is actually logged (lazy evaluation).

Parameters:

Name Type Description Default
name str

Name of the meter (e.g., "train/loss").

required
value DelayedScalar

Value to log. Can be a number or a callable that returns a number.

required
weight DelayedScalar

Weight for this value (typically batch size). Can be a number or a callable. Defaults to 1.0.

1.0
round int | None

Number of decimal places to round the result to.

None
reset bool

If True, the meter is reset after logging (for per-iteration meters). If False, the meter accumulates across iterations.

True
priority int

Priority for meter ordering when logging. Higher priority meters appear first.

100
Example
# Log a simple value
log_averaged("train/loss", 0.5, weight=32)

# Log with lazy evaluation (only computed if meter is logged)
log_averaged("train/loss", lambda: compute_loss(), weight=lambda: batch_size)

# Log with rounding
log_averaged("train/accuracy", 0.95, round=4)
Source code in optimus_dl/modules/metrics/common.py
def log_averaged(
    name: str,
    value: DelayedScalar,
    weight: DelayedScalar = 1.0,
    round: int | None = None,
    reset: bool = True,
    priority: int = 100,
) -> None:
    """Log a value to an averaged meter.

    This is a convenience function for logging values to an `AverageMeter`.
    The value and weight can be callables (lambdas) that are only evaluated
    when the meter is actually logged (lazy evaluation).

    Args:
        name: Name of the meter (e.g., "train/loss").
        value: Value to log. Can be a number or a callable that returns a number.
        weight: Weight for this value (typically batch size). Can be a number
            or a callable. Defaults to 1.0.
        round: Number of decimal places to round the result to.
        reset: If True, the meter is reset after logging (for per-iteration meters).
            If False, the meter accumulates across iterations.
        priority: Priority for meter ordering when logging. Higher priority
            meters appear first.

    Example:
        ```python
        # Log a simple value
        log_averaged("train/loss", 0.5, weight=32)

        # Log with lazy evaluation (only computed if meter is logged)
        log_averaged("train/loss", lambda: compute_loss(), weight=lambda: batch_size)

        # Log with rounding
        log_averaged("train/accuracy", 0.95, round=4)

        ```"""
    log_meter(
        name=name,
        meter_factory=lambda: AverageMeter(round=round),
        reset=reset,
        priority=priority,
        value=value,
        weight=weight,
    )

log_averaged_exponent(name, value, weight=1.0, round=None, reset=True, priority=100)

Log a value to an averaged exponent meter.

This is a convenience function for logging values to an AveragedExponentMeter. The value and weight can be callables (lambdas) for lazy evaluation.

Parameters:

Name Type Description Default
name str

Name of the meter (e.g., "train/perplexity").

required
value DelayedScalar

Log-scale value to log. Can be a number or a callable.

required
weight DelayedScalar

Weight for this value. Can be a number or a callable. Defaults to 1.0.

1.0
round int | None

Number of decimal places to round the result to.

None
reset bool

If True, the meter is reset after logging.

True
priority int

Priority for meter ordering.

100
Source code in optimus_dl/modules/metrics/common.py
def log_averaged_exponent(
    name: str,
    value: DelayedScalar,
    weight: DelayedScalar = 1.0,
    round: int | None = None,
    reset: bool = True,
    priority: int = 100,
):
    """Log a value to an averaged exponent meter.

    This is a convenience function for logging values to an `AveragedExponentMeter`.
    The value and weight can be callables (lambdas) for lazy evaluation.

    Args:
        name: Name of the meter (e.g., "train/perplexity").
        value: Log-scale value to log. Can be a number or a callable.
        weight: Weight for this value. Can be a number or a callable. Defaults to 1.0.
        round: Number of decimal places to round the result to.
        reset: If True, the meter is reset after logging.
        priority: Priority for meter ordering.
    """
    log_meter(
        name=name,
        meter_factory=lambda: AveragedExponentMeter(round=round),
        reset=reset,
        priority=priority,
        value=value,
        weight=weight,
    )

log_event_end(name, round=None, reset=True, priority=100)

End timing an event.

This function stops a stopwatch that was started with log_event_start(). The duration between start and end is recorded and averaged across multiple occurrences.

Parameters:

Name Type Description Default
name str

Name of the event (must match the name used in log_event_start()).

required
round int | None

Number of decimal places to round the duration to (in milliseconds).

None
reset bool

If True, the meter is reset after logging.

True
priority int

Priority for meter ordering when logging.

100

Raises:

Type Description
AssertionError

If log_event_start() was not called for this event name.

Example
log_event_start("perf/backward_pass")
# ... do backward pass ...
log_event_end("perf/backward_pass")
Source code in optimus_dl/modules/metrics/common.py
def log_event_end(
    name: str,
    round: int | None = None,
    reset: bool = True,
    priority: int = 100,
) -> None:
    """End timing an event.

    This function stops a stopwatch that was started with `log_event_start()`.
    The duration between start and end is recorded and averaged across
    multiple occurrences.

    Args:
        name: Name of the event (must match the name used in `log_event_start()`).
        round: Number of decimal places to round the duration to (in milliseconds).
        reset: If True, the meter is reset after logging.
        priority: Priority for meter ordering when logging.

    Raises:
        AssertionError: If `log_event_start()` was not called for this event name.

    Example:
        ```python
        log_event_start("perf/backward_pass")
        # ... do backward pass ...
        log_event_end("perf/backward_pass")

        ```"""
    log_meter(
        name=name,
        meter_factory=lambda: StopwatchMeter(round=round),
        reset=reset,
        priority=priority,
        mode="end",
        force_log=True,  # Always log event occurrences
    )

log_event_occurence(name, round=None, reset=True, priority=100)

Log an occurrence of an event and track its frequency.

This function uses a FrequencyMeter to measure the time between successive calls to log_event_occurrence() for a given name. It can be used to track the rate at which certain events happen.

Parameters:

Name Type Description Default
name str

Name of the event to track (e.g., "perf/dataloader_ready").

required
round int | None

Number of decimal places to round the duration to (in milliseconds).

None
reset bool

If True, the meter is reset after logging.

True
priority int

Priority for meter ordering when logging.

100
Source code in optimus_dl/modules/metrics/common.py
def log_event_occurence(
    name: str,
    round: int | None = None,
    reset: bool = True,
    priority: int = 100,
):
    """Log an occurrence of an event and track its frequency.

    This function uses a `FrequencyMeter` to measure the time between
    successive calls to `log_event_occurrence()` for a given `name`.
    It can be used to track the rate at which certain events happen.

    Args:
        name: Name of the event to track (e.g., "perf/dataloader_ready").
        round: Number of decimal places to round the duration to (in milliseconds).
        reset: If True, the meter is reset after logging.
        priority: Priority for meter ordering when logging.
    """
    log_meter(
        name=name,
        meter_factory=lambda: FrequencyMeter(round=round),
        reset=reset,
        priority=priority,
        force_log=True,  # Always log event occurrences
    )

log_event_start(name, round=None, reset=True, priority=100)

Start timing an event.

This function starts a stopwatch for measuring the duration of an event. Call log_event_end() with the same name to stop timing and record the duration. The duration is automatically averaged across multiple occurrences.

Parameters:

Name Type Description Default
name str

Name of the event to time (e.g., "perf/forward_pass").

required
round int | None

Number of decimal places to round the duration to (in milliseconds).

None
reset bool

If True, the meter is reset after logging.

True
priority int

Priority for meter ordering when logging.

100
Example
log_event_start("perf/forward_pass")
# ... do work ...
log_event_end("perf/forward_pass")
# Meter will show average duration in milliseconds
Source code in optimus_dl/modules/metrics/common.py
def log_event_start(
    name: str,
    round: int | None = None,
    reset: bool = True,
    priority: int = 100,
) -> None:
    """Start timing an event.

    This function starts a stopwatch for measuring the duration of an event.
    Call `log_event_end()` with the same name to stop timing and record the
    duration. The duration is automatically averaged across multiple occurrences.

    Args:
        name: Name of the event to time (e.g., "perf/forward_pass").
        round: Number of decimal places to round the duration to (in milliseconds).
        reset: If True, the meter is reset after logging.
        priority: Priority for meter ordering when logging.

    Example:
        ```python
        log_event_start("perf/forward_pass")
        # ... do work ...
        log_event_end("perf/forward_pass")
        # Meter will show average duration in milliseconds

        ```"""
    log_meter(
        name=name,
        meter_factory=lambda: StopwatchMeter(round=round),
        reset=reset,
        priority=priority,
        mode="start",
        force_log=True,  # Always log event occurrences
    )

log_max(name, value, round=None, reset=True, priority=100)

Log a value to a max meter.

This is a convenience function for logging values to a MaxMeter. The value can be a callable (lambda) for lazy evaluation.

Parameters:

Name Type Description Default
name str

Name of the meter (e.g., "perf/max_memory").

required
value DelayedScalar

Value to log. Can be a number or a callable.

required
round int | None

Number of decimal places to round the result to.

None
reset bool

If True, the meter is reset after logging.

True
priority int

Priority for meter ordering when logging.

100
Source code in optimus_dl/modules/metrics/common.py
def log_max(
    name: str,
    value: DelayedScalar,
    round: int | None = None,
    reset: bool = True,
    priority: int = 100,
) -> None:
    """Log a value to a max meter.

    This is a convenience function for logging values to a `MaxMeter`.
    The value can be a callable (lambda) for lazy evaluation.

    Args:
        name: Name of the meter (e.g., "perf/max_memory").
        value: Value to log. Can be a number or a callable.
        round: Number of decimal places to round the result to.
        reset: If True, the meter is reset after logging.
        priority: Priority for meter ordering when logging.
    """
    log_meter(
        name=name,
        meter_factory=lambda: MaxMeter(round=round),
        reset=reset,
        priority=priority,
        value=value,
    )

log_min(name, value, round=None, reset=True, priority=100)

Log a value to a min meter.

This is a convenience function for logging values to a MinMeter. The value can be a callable (lambda) for lazy evaluation.

Parameters:

Name Type Description Default
name str

Name of the meter (e.g., "optimization/min_lr").

required
value DelayedScalar

Value to log. Can be a number or a callable.

required
round int | None

Number of decimal places to round the result to.

None
reset bool

If True, the meter is reset after logging.

True
priority int

Priority for meter ordering when logging.

100
Source code in optimus_dl/modules/metrics/common.py
def log_min(
    name: str,
    value: DelayedScalar,
    round: int | None = None,
    reset: bool = True,
    priority: int = 100,
) -> None:
    """Log a value to a min meter.

    This is a convenience function for logging values to a `MinMeter`.
    The value can be a callable (lambda) for lazy evaluation.

    Args:
        name: Name of the meter (e.g., "optimization/min_lr").
        value: Value to log. Can be a number or a callable.
        round: Number of decimal places to round the result to.
        reset: If True, the meter is reset after logging.
        priority: Priority for meter ordering when logging.
    """
    log_meter(
        name=name,
        meter_factory=lambda: MinMeter(round=round),
        reset=reset,
        priority=priority,
        value=value,
    )

log_summed(name, value, round=None, reset=True, priority=100)

Log a value to a summed meter.

This is a convenience function for logging values to a SummedMeter. The value can be a callable (lambda) that is only evaluated when the meter is actually logged (lazy evaluation).

Parameters:

Name Type Description Default
name str

Name of the meter (e.g., "train/tokens_processed").

required
value DelayedScalar

Value to add to the sum. Can be a number or a callable that returns a number.

required
round int | None

Number of decimal places to round the result to.

None
reset bool

If True, the meter is reset after logging. If False, the meter accumulates across iterations.

True
priority int

Priority for meter ordering when logging.

100
Example
# Log total tokens processed
log_summed("train/tokens", batch_size * seq_len)

# Log with lazy evaluation
log_summed("train/tokens", lambda: get_token_count())
Source code in optimus_dl/modules/metrics/common.py
def log_summed(
    name: str,
    value: DelayedScalar,
    round: int | None = None,
    reset: bool = True,
    priority: int = 100,
) -> None:
    """Log a value to a summed meter.

    This is a convenience function for logging values to a `SummedMeter`.
    The value can be a callable (lambda) that is only evaluated when the
    meter is actually logged (lazy evaluation).

    Args:
        name: Name of the meter (e.g., "train/tokens_processed").
        value: Value to add to the sum. Can be a number or a callable that
            returns a number.
        round: Number of decimal places to round the result to.
        reset: If True, the meter is reset after logging. If False, the
            meter accumulates across iterations.
        priority: Priority for meter ordering when logging.

    Example:
        ```python
        # Log total tokens processed
        log_summed("train/tokens", batch_size * seq_len)

        # Log with lazy evaluation
        log_summed("train/tokens", lambda: get_token_count())

        ```"""
    log_meter(
        name=name,
        meter_factory=lambda: SummedMeter(round=round),
        reset=reset,
        priority=priority,
        value=value,
    )