interruption_mixin
optimus_dl.recipe.train.mixins.execution.interruption_mixin
¶
Training interruption mixin for handling errors and keyboard interrupts.
TrainingInterruptionMixin
¶
Mixin for gracefully handling training interruptions.
Provides a mechanism to catch KeyboardInterrupt (Ctrl+C) and trigger a
safe shutdown sequence, which typically involves saving a final checkpoint
to ensure progress is not lost.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
save_freq
|
int
|
Frequency of regular checkpoints. If 0, saving is disabled. |
0
|
output_path
|
str | None
|
Path where checkpoints are saved. |
None
|
checkpoint_callback
|
Callable[..., None] | None
|
Callable to execute for saving the checkpoint. |
None
|
Source code in optimus_dl/recipe/train/mixins/execution/interruption_mixin.py
handle_training_interruption(iteration, collective, **kwargs)
¶
Handle interruption by saving a final checkpoint.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
iteration
|
int
|
The current training iteration count. |
required |
collective
|
Collective | None
|
The distributed collective instance. |
required |
**kwargs
|
Any
|
Additional arguments to pass to the checkpoint callback. |
{}
|