Posts
Comments
Comment by
Chakshu Mira (chakshu-mira) on
Ophiology (or, how the Mamba architecture works) ·
2024-05-02T22:05:26.715Z ·
LW ·
GW
## Discretize B ## # [B,N] [E->N] [B,E] B = layer.W_B(x[b,l]) # no bias
Shouldn't this be x[:,l] instead of x[b,l]?
Comment by
Chakshu Mira (chakshu-mira) on
Ophiology (or, how the Mamba architecture works) ·
2024-04-22T20:46:15.331Z ·
LW ·
GW
y_t=[N]C[E,N]h_t+[E] <this one> E [E]xt
Shouldn't this be 'D'?
Comment by
Chakshu Mira (chakshu-mira) on
Ophiology (or, how the Mamba architecture works) ·
2024-04-18T21:42:54.516Z ·
LW ·
GW
E
Did you mean 'D' here? (2nd equation of the structured SSM)
Comment by
Chakshu Mira (chakshu-mira) on
Ophiology (or, how the Mamba architecture works) ·
2024-04-18T01:08:10.937Z ·
LW ·
GW
Is this a typo? (Δtvt+1)xt−1