rope
optimus_dl.modules.model.blocks.rope
¶
Rotary Positional Embeddings (RoPE) implementation.
This module provides utilities for computing and applying Rotary Positional Embeddings, as used in models like Llama and Qwen.
apply_rotary_emb(q, k, freqs_cis, position_ids=None)
¶
Apply Rotary Positional Embeddings to Query and Key tensors.
Handles both standard Tensors and distributed DTensors.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
q
|
Tensor
|
Query tensor of shape (B, T, nh, hs). |
required |
k
|
Tensor
|
Key tensor of shape (B, T, n_kv_h, hs). |
required |
freqs_cis
|
Tensor
|
Precomputed frequency tensor of shape (max_T, hs // 2, 2). |
required |
position_ids
|
Tensor | None
|
Optional tensor of shape (B, T) specifying the positions for each token. |
None
|
Returns:
| Type | Description |
|---|---|
tuple[Tensor, Tensor]
|
Tuple of (q, k) with rotary embeddings applied. |
Source code in optimus_dl/modules/model/blocks/rope.py
precompute_freqs_cis(dim, end, theta=10000.0, scaling_config=None)
¶
Precompute the frequency tensor for complex exponential (cis) with optional scaling.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dim
|
int
|
Dimension of the head. |
required |
end
|
int
|
Maximum sequence length. |
required |
theta
|
float
|
Base frequency for the positional encoding. |
10000.0
|
scaling_config
|
dict | None
|
Optional RoPE scaling configuration (e.g., YaRN, Llama3). |
None
|
Returns:
| Type | Description |
|---|---|
Tensor
|
Tensor of shape (end, dim // 2, 2) representing the real and imaginary |
Tensor
|
parts of the frequencies. |
Source code in optimus_dl/modules/model/blocks/rope.py
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 | |