Posts

Comments

Comment by Chakshu Mira (chakshu-mira) on Ophiology (or, how the Mamba architecture works) · 2024-05-02T22:05:26.715Z · LW · GW

## Discretize B ## # [B,N] [E->N] [B,E] B = layer.W_B(x[b,l]) # no bias

Shouldn't this be x[:,l] instead of x[b,l]?

Comment by Chakshu Mira (chakshu-mira) on Ophiology (or, how the Mamba architecture works) · 2024-04-22T20:46:15.331Z · LW · GW

y_t=[N]C[E,N]h_t+[E]    <this one> E    [E]xt

Shouldn't this be 'D'?

Comment by Chakshu Mira (chakshu-mira) on Ophiology (or, how the Mamba architecture works) · 2024-04-18T21:42:54.516Z · LW · GW

E

Did you mean 'D' here? (2nd equation of the structured SSM)

Comment by Chakshu Mira (chakshu-mira) on Ophiology (or, how the Mamba architecture works) · 2024-04-18T01:08:10.937Z · LW · GW

Is this a typo? (Δtvt+1)xt−1