mlp
optimus_dl.modules.model.blocks.mlp
¶
GELUMLP
¶
Bases: Module
Standard GPT-2 style MLP with GELU activation.
Consists of an expansion layer, GELU activation, and a contraction layer.
Attributes:
| Name | Type | Description |
|---|---|---|
c_fc |
Expansion projection layer. |
|
gelu |
GELU activation layer. |
|
c_proj |
Contraction projection layer. |
|
dropout |
Dropout layer. |
Source code in optimus_dl/modules/model/blocks/mlp.py
forward(x)
¶
Perform the forward pass.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
Input tensor. |
required |
Returns:
| Type | Description |
|---|---|
Tensor
|
Output tensor. |
Source code in optimus_dl/modules/model/blocks/mlp.py
SwiGLUMLP
¶
Bases: Module
SwiGLU MLP variant used in Llama, Qwen, and Mistral.
Consists of three linear layers (gate, up, down) and a SiLU (Swish) activation. Supports optional Liger kernel for performance.
Attributes:
| Name | Type | Description |
|---|---|---|
w1 |
Gate projection layer. |
|
w2 |
Up projection layer. |
|
c_proj |
Down projection layer. |
|
use_liger |
Whether Liger kernel is enabled. |
Source code in optimus_dl/modules/model/blocks/mlp.py
forward(x)
¶
Perform the forward pass.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
Input tensor. |
required |
Returns:
| Type | Description |
|---|---|
Tensor
|
Output tensor. |