Posts

Mechanistic Interpretability Reading group 2023-09-26T16:26:44.757Z
How to Read Papers Efficiently: Fast-then-Slow Three pass method 2023-02-25T02:56:30.814Z

Comments

Comment by 1stuserhere (firstuser-here) on How to accelerate recovery from sleep debt with biohacking? · 2024-04-18T10:31:27.590Z · LW · GW

This is purely anecdotal - supplementing sleep debt with cardio-intensive exercise works for me. For example, I usually need 7 hrs of sleep. If I sleep for only 5 hrs, I'm likely to feel a drop in mental sharpness around midway the next day. However, if I go for an hour long run, I miss that drop almost completely and feel just as good I normally would've with a complete sleep.

Comment by 1stuserhere (firstuser-here) on A framing for interpretability · 2023-11-15T12:42:59.918Z · LW · GW

It's also worth noting that LLMs are not learning directly from the raw input stream but from a crux of that data (LLMs learn on compressed data) i.e. the LLMs are fed tokenized data, and the tokenizers act as compressors. This benefits the models by enabling them to have a more information-rich context.

Comment by 1stuserhere (firstuser-here) on Are Mixture-of-Experts Transformers More Interpretable Than Dense Transformers? · 2023-09-15T11:38:47.091Z · LW · GW

I think that the answer is no

 

In this “VRAM-constrained regime,” MoE models (trained from scratch) are nowhere near competitive with dense LLMs.

Curious whether your high-level thoughts on these topics still hold or have changed.

Comment by 1stuserhere (firstuser-here) on How to Think About Activation Patching · 2023-06-05T21:30:07.826Z · LW · GW

On a more narrow distribution this head could easily exhibit just one behaviour and eg seem like a monosemantic inductin head

induction* head

Comment by 1stuserhere (firstuser-here) on What 2026 looks like · 2023-04-23T12:59:32.346Z · LW · GW

The 2023 predictions seem to hold up really well, so far, especially the SDM in interactive environment one, image synthesis, passing the bar exam, legal NLP systems, enthusiasm of programmers, and Elon Musk re-entering the space of building AI systems.

Comment by 1stuserhere (firstuser-here) on How to Read Papers Efficiently: Fast-then-Slow Three pass method · 2023-02-27T16:50:08.091Z · LW · GW

Interesting perspective especially your comments on citations. Agreed with the diagrams/figures/tables being some of the most interesting parts of the paper, but I also try to find the problem that motivated the authors (which is frequently embedded better in the introduction imo than the abstract). 

Comment by 1stuserhere (firstuser-here) on AI alignment researchers don't (seem to) stack · 2023-02-23T13:46:02.255Z · LW · GW

In this analogy, the trouble is, we do not know whether we're building tunnels in parallel (same direction) or the opposite, or zig zag. The reason for that is a lack of clarity about what will turn out to be a fundamentally important approach towards building a safe AGI. So, it seems to me that for now, exploration for different approaches might be a good thing and the next generation of researchers does less digging and is able to stack more on the existing work

Comment by 1stuserhere (firstuser-here) on AI alignment researchers don't (seem to) stack · 2023-02-23T13:41:37.080Z · LW · GW

I agree. It seems like striking a balance between exploration and exploitation. We're barely entering the 2nd generation of alignment researchers. It's important to generate new directions of approaching the problem especially at this stage, so that we have a better chance of covering more of the space of possible solutions before deciding to go in deeper. The barrier to entry also remains slightly lower in this case for new researchers. When some research directions "outcompete" other directions, we'll naturally see more interest in those promising directions and subsequently more exploitation, and researchers will be stacking.