Skip to content

Index

optimus_dl.modules.model.blocks

Modules and Sub-packages

  • attention: Mask function for flex_attention supporting causal, sliding window, padding, and flat batching.
  • layer_norms: LayerNorm with optional bias.
  • mlp: SwiGLU MLP variant used in Llama, Qwen, and Mistral.
  • rope: Rotary Positional Embeddings (RoPE) implementation.
  • transformer: Unified Transformer block with RMSNorm, Rotary Attention, and SwiGLU MLP.