Comment by Keenan Pepper (keenan-pepper) on Interpretability Externalities Case Study - Hungry Hungry Hippos · 2023-09-21T00:42:06.126Z · LW · GW

The Bitter Lesson applies to almost all attempts to build additional structure into neural networks, it turns out.


Out of curiosity, what are the other exceptions to this besides the obvious one of attention?

Comment by Keenan Pepper (keenan-pepper) on Where might I direct promising-to-me researchers to apply for alignment jobs/grants? · 2023-09-21T00:11:00.513Z · LW · GW

Upvoted because this mentions Nonlinear Network.

Comment by Keenan Pepper (keenan-pepper) on Against Almost Every Theory of Impact of Interpretability · 2023-08-18T23:36:47.427Z · LW · GW

Some of your YouTube links are broken because the equals sign got escaped as "%3D". If I were you I'd spend a minute to fix that.

Comment by Keenan Pepper (keenan-pepper) on Adumbrations on AGI from an outsider · 2023-05-30T21:13:15.262Z · LW · GW

Have you read yet?

I had some similar thoughts to yours before reading that, but it helped me make a large update in favor of superintelligence being able to make magical-seeming feats of deduction. If a large number of smart humans working together for a long time can figure something out (without performing experiments or getting frequent updates of relevant sensory information), then a true superintelligence will also be able to.

Comment by Keenan Pepper (keenan-pepper) on Hell is Game Theory Folk Theorems · 2023-05-12T02:15:50.273Z · LW · GW

Hilarious... I fixed my error

Comment by Keenan Pepper (keenan-pepper) on Hell is Game Theory Folk Theorems · 2023-05-01T23:20:51.582Z · LW · GW

Reminds me of this from Scott Alexander's Meditations on Moloch:

Imagine a country with two rules: first, every person must spend eight hours a day giving themselves strong electric shocks. Second, if anyone fails to follow a rule (including this one), or speaks out against it, or fails to enforce it, all citizens must unite to kill that person. Suppose these rules were well-enough established by tradition that everyone expected them to be enforced.

Comment by Keenan Pepper (keenan-pepper) on Proposal: Butt bumps as a default for physical greetings · 2023-04-01T16:39:01.091Z · LW · GW

Keenan Pepper

Comment by Keenan Pepper (keenan-pepper) on Human values & biases are inaccessible to the genome · 2022-07-08T01:05:27.414Z · LW · GW

What I gather from is that it's sort of like what you're saying but it's much more about predictions than actual experiences. If the Learning Subsystem is imagining a plan predicted to have high likelihood of smelling sex pheromones, seeing sexy body shapes, experiencing orgasm, etc. then the Steering Subsystem will reward the generation of that plan, basically saying "Yeah, think more thoughts like that!".

The Learning Subsystem has a bunch of abstract concepts and labels for things the Steering Subsystem doesn't care about (and can't even access), but there are certain hardcoded reward channels it can understand. But the important thing is the reward signals can be evaluated for imagined worlds as well as the real immediate world.