Index
optimus_dl.modules.model.blocks
¶
Modules and Sub-packages¶
attention: Mask function for flex_attention supporting causal, sliding window, padding, and flat batching.layer_norms: LayerNorm with optional bias.mlp: SwiGLU MLP variant used in Llama, Qwen, and Mistral.rope: Rotary Positional Embeddings (RoPE) implementation.transformer: Unified Transformer block with RMSNorm, Rotary Attention, and SwiGLU MLP.