LessWrong 2.0 Reader

View: New · Old · Top

← previous page (newer posts) · next page (older posts) →

← previous page (newer posts) · next page (older posts) →

Recent comments

michael-chen on We might be missing some key feature of AI takeoff; it'll probably seem like "we could've seen this coming"

Hindsight is 20/20. I think you're underemphasizing how our current state of affairs is fairly contingent on social factors, like the actions of people concerned about AI safety.

For example, I think this world is actually quite plausible, not incongruent:

A world where AI capabilities progressed far enough to get us to something like chat-gpt, but somehow this didn’t cause a stir or wake-up moment for anyone who wasn’t already concerned about AI risk.

I can easily imagine a counterfactual world in which:

ChatGPT shows that AI is helpful, safe, and easy to align
Policymakers are excited about accelerating the benefits of AI and unconvinced of risks
Industry leaders and respectable academics are not willing to make public statements claiming that AI is an extinction risk, especially given the lack of evidence or analysis
Instead of the UK AI Safety Summit, we get a summit which is about driving innovation
AI labs play up how AIs can help with safety and prosperity and dismiss anything related to AI risk

ali-shehper on Sparse Autoencoders Work on Attention Layer Outputs

Since the feature activation is just the dot product (plus encoder bias) of the concatenated z vector and the corresponding column of the encoder matrix, we can rewrite this as the sum of n_heads dot products, allowing us to look at the direct contribution from each head.

Nice work. But I have one comment.

The feature activation is the output of ReLU applied to this dot product plus the encoder bias, and ReLU is a non-linear function. So it is not clear that we can find the contribution of each head to the feature activation.

habryka4 on Is there a place to find the most cited LW articles of all time?

What do you mean by "cited"? Do you mean "articles references in other articles on LW" or "articles cited in academic journals" or some other definition?

npostavs on AI #64: Feel the Mundane Utility

half a billion gallons of fuel in 2023.

There was a correction: this should be half a million gallons.

habryka4 on Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems

I am quite interested in takes from various people in alignment on this agenda. I've engaged with both Davidad's and Bengio's stuff a bunch in the last few months, and I feel pretty confused (and skeptical) about a bunch of it, and would be interested in reading more of what other people have to say.

linch on Ilya Sutskever and Jan Leike resign from OpenAI

I agree it's not a large commitment in some absolute sense. I think it'd still be instructive to see whether they're able to hit this (not very high) bar.

bhauth on introduction to cancer vaccines

That new Amgen drug targets a human protein that's mostly only used during embryonic development. I think it's expressed by most cancer cells in maybe around 0.2% of cancer cases. In many of those cases, some of the cancer cells will stop producing it.

Most potential targets have worse side effects and/or are less common.

anthonyc on AI #64: Feel the Mundane Utility

The problem is not that the answer of 12 cents is wrong, or even that the answer is orders of magnitude wrong.

Ah yes, strong "Verizon can't do math" vibes here.

emrik-1 on quila's Shortform

Epic Lizka post is epic.

Also, I absolutely love the word "shard" but my brain refuses to use it because then it feels like we won't get credit for discovering these notions by ourselves. Well, also just because the words "domain", "context", "scope", "niche", "trigger", "preimage" (wrt to a neural function/policy / "neureme") adequately serve the same purpose and are currently more semantically/semiotically granular in my head.

trigger/preimage ⊆ scope ⊆ domain

"niche" is a category in function space (including domain, operation, and codomain), "domain" is a set.

"scope" is great because of programming connotations and can be used as a verb. "This neural function is scoped to these contexts."

mikhail-samin on Feeling (instrumentally) Rational

(From the top of my head, maybe I’ll change my mind if I think about it more or see a good point.) What can be destroyed by truth, shall be. Emotions and beliefs are entangled. If you don’t think about how high p(doom) actually is because on the back of your mind you don’t want to be sad, you end up working on things that don’t reduce p(doom).

As long as you know the truth, emotions are only important depending on your terminal values. But many feelings are related to what we end up believing, motivated cognition, etc.

LessWrong 2.0 Reader

Archive

Recent comments