LessWrong 2.0 Reader
View: New · Old · Top← previous page (newer posts) · next page (older posts) →
← previous page (newer posts) · next page (older posts) →
Hindsight is 20/20. I think you're underemphasizing how our current state of affairs is fairly contingent on social factors, like the actions of people concerned about AI safety.
For example, I think this world is actually quite plausible, not incongruent:
A world where AI capabilities progressed far enough to get us to something like chat-gpt, but somehow this didn’t cause a stir or wake-up moment for anyone who wasn’t already concerned about AI risk.
I can easily imagine a counterfactual world in which:
Since the feature activation is just the dot product (plus encoder bias) of the concatenated z vector and the corresponding column of the encoder matrix, we can rewrite this as the sum of n_heads dot products, allowing us to look at the direct contribution from each head.
Nice work. But I have one comment.
The feature activation is the output of ReLU applied to this dot product plus the encoder bias, and ReLU is a non-linear function. So it is not clear that we can find the contribution of each head to the feature activation.
What do you mean by "cited"? Do you mean "articles references in other articles on LW" or "articles cited in academic journals" or some other definition?
npostavs on AI #64: Feel the Mundane UtilityThere was a correction: this should be half a million gallons.
habryka4 on Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI SystemsI am quite interested in takes from various people in alignment on this agenda. I've engaged with both Davidad's and Bengio's stuff a bunch in the last few months, and I feel pretty confused (and skeptical) about a bunch of it, and would be interested in reading more of what other people have to say.
linch on Ilya Sutskever and Jan Leike resign from OpenAII agree it's not a large commitment in some absolute sense. I think it'd still be instructive to see whether they're able to hit this (not very high) bar.
bhauth on introduction to cancer vaccinesThat new Amgen drug targets a human protein that's mostly only used during embryonic development. I think it's expressed by most cancer cells in maybe around 0.2% of cancer cases. In many of those cases, some of the cancer cells will stop producing it.
Most potential targets have worse side effects and/or are less common.
anthonyc on AI #64: Feel the Mundane UtilityThe problem is not that the answer of 12 cents is wrong, or even that the answer is orders of magnitude wrong.
Ah yes, strong "Verizon can't do math" vibes here.
emrik-1 on quila's ShortformEpic Lizka post is epic.
Also, I absolutely love the word "shard" but my brain refuses to use it because then it feels like we won't get credit for discovering these notions by ourselves. Well, also just because the words "domain", "context", "scope", "niche", "trigger", "preimage" (wrt to a neural function/policy / "neureme") adequately serve the same purpose and are currently more semantically/semiotically granular in my head.
trigger/preimage ⊆ scope ⊆ domain
"niche" is a category in function space (including domain, operation, and codomain), "domain" is a set.
"scope" is great because of programming connotations and can be used as a verb. "This neural function is scoped to these contexts."
mikhail-samin on Feeling (instrumentally) Rational(From the top of my head, maybe I’ll change my mind if I think about it more or see a good point.) What can be destroyed by truth, shall be. Emotions and beliefs are entangled. If you don’t think about how high p(doom) actually is because on the back of your mind you don’t want to be sad, you end up working on things that don’t reduce p(doom).
As long as you know the truth, emotions are only important depending on your terminal values. But many feelings are related to what we end up believing, motivated cognition, etc.