Posts
Comments
(Not medical advice)
While most of these points are very good, there is a downside-for-many-people of gastric bypass in particular, that should be noted. Not only does it make you feel full sooner (good), but it drastically restricts the amount of food one can ever eat in one sitting/digestive cycle/whatever. Further details on this on the Mayo Clinic site.
Gastric bypass is still probably worth it for a lot of the people who'd do it. Just noting this.
Can confirm, I'm able to get into rooms where I'm easily the dumbest person in them. Luckily I know how to feel less bad, and it's to spend more time/energy learning/creating stuff to "show" the rest of the group. (Now the bottleneck is "merely" my time/energy/sleep/etc., like always!).
...
I... think your comment, combined with all this context, just fixed my life a little bit. Thank you.
This definitely describes my experience, and gave me a bit of help in correcting course, so thank you.
Also, I recall an Aella tweet where she claimed that some mental/emotional problems might be normal reactions to having low status and/or not doing much interesting in life. Partly since, in her own experience, those problems were mostly(?) alleviated when she started "doing more awesome stuff".
I have ADHD and read a lot (and read more as a kid), so this definitely is interesting to me. Then again, I also like compression at the aesthetic level, but also find it quite difficult to learn/use. I too have to translate Bayes' Theorem! (I found the version I like best is where the letters are "O" "H" and "E" for observation, hypothesis, evidence.)
You can succeed where others fail just by being braver even if you're not any smarter.
One thing I've wondered about is, how true is this for someone who's dumber than others?
(Asking for, uh, a friend.)
I think "feedback loops have a cap" is a much easier claim to defend than the implied "AI feedback loops will cap out before they can hurt humanity at an x-risk level". That second one is especially hard to defend if e.g. general-intelligence abilities + computational speed lets the AI develop some other thing (like a really bad plague) that can hurt humanity at an x-risk level. Intelligence, itself, can figure out, harness, and accelerate the other feedback loops.
You might be interested in the "anthropics" of knowing how-close-we-are beforehand.
A few off-the-cuff ideas:
- posts specifically aimed at people who, for one reason or another, are involved in capabilities or thinking of being involved in capabilities, but feel guilty about doing so. (Writing one of these myself, actually.)
- Better at-a-glance notes about the "iffy" parts of orgs. E.g. If there's a big wiki/table/community-list/whatever of different groups, have flags for like "they're moving the capabilities frontier forward" or "This group isn't hiring much for actual alignment position".
Better ideas would be as specific and public-facing and, yes, even in the same OOM as opinionated as these, I think. Even just "other people are saying X Y Z about this group", or a "community notes" type feature, if a platform doesn't want to "take a stand" on a group that e.g. might be funding them.
I imagine there's a way for LW and/or widely-seen posts on it to prevent the talent-funnel noted.
Someone would indeed indeed as to.
Grammar error?
For those worried about Obsidian not being FOSS (like I was), keep in mind that you can avoid paying for their Sync feature.
Note-encryption, and even some Internet port-blocking/monitoring/offline-usage, can also probably prevent note exfiltration. (I mean, you still need your own good opsec. But you can do it with Obsidian I think, just as you could do it with a FOSS notetaker that, like Obsidian, stores files in a nice open format like Markdown.)
There's also a FOSS(?) alternative to Obsidian, but it's harder to use and may also be a data harvesting scheme from China and may also not be open-source/updated any more???
As someone who's barely scratched the surface of any of this, I was vaguely under the impression that "big-brain" described most or all of the theoretic/conceptual alignment in this cluster of things, including e.g. both the Löbian Obstacle and infrabayesianism. Once I learn all these more in-depth and think on them, I may find and appreciate subtler-but-still-important gradations of "galaxy-brained-ness" within this idea cluster.
I may be able to try different "incubation periods" for ideas, to see which write-down times work well for me...
I wonder, though. I've found that writing things down makes them easier for me to reference, and thus I can consciously build on existing ideas/thoughts instead of rehearsing the same ones over and over again. E.g. codifying some AI timeline scenarios before doing logic upon them.
(Maybe it is a tradeoff, and people with rumination and/or focus issues, like myself, find one side of the tradeoff generally-better.)
This came at a good time for me, here are thoughts I had:
- Going to try "scheduled time" for this thinking, 1 evening a week. It batches abyss-thinking for all areas of my life. Unsure if this is a good way to do it or not.
- Some of the helping-questions (plus feedback from others) have helped me immediately. For other life-areas, I'm less-certain what the "hard/icky thing I'm avoiding doing" is, since there are multiple plausible options. This might mean I'm getting un-bottlenecked on abyss-thinking and am now bottlenecked on run-of-the-mill option-generating and decision-making.
- A few items are more "should I do what's most congruent with my personality, or force myself to do things that're less-congruent with my personality?". In hindsight it will be obvious, but it's not like this in particular is a solved problem.
Advice welcome.
Have you looked at Orthogonal? They're pretty damn culturally inoculated against doing-capabilities-(even-by-accident), and they're extremely funding constrained.
NOTE: I used "goal", "goals", and "values" interchangably in some writings such as this, and this was a mistake. A more consistent frame would be "steering vs target-selection" (especially as per the Rocket Alignment analogy).
I, too, am interested in this question.
(One note that may be of use: I think the incentives for "cultivating more/better researchers in a preparadigmatic field" lean towards "don't discourage even less-promising researchers, because they could luck out and suddenly be good/useful to alignment in an unexpected way". Like how investors encourage startup founders because they bet on a flock of them, not necessarily because any particular founder's best bet is to found a startup. This isn't necessarily bad, it just puts the incentives into perspective.)
This is a good core idea for persuasion, macro and micro.
One example: Local churches and community groups are probably run by people who are already into doing community-outreach/coordination things.
Most people (esp. affluent ones) are way too afraid of risking their social position through social disapproval. You can succeed where others fail just by being braver even if you're not any smarter.
Oh hey, I've accidentally tried this just by virtue of my personality!
Results: high-variance ideas are high-variance. YMMV, but so far I haven't had a "hit". (My friend politely calls my ideas "hits-based ideas", which is a great term.)
For those curious about this topic, here's some resources I'd recommend (I am not a professional, this is not professional advice):
- Samo Burja's core concept writings, especially Great Founder Theory
- The Mechanics of Tradecraft sequence.
- Any actual well-sourced/well-researched written history of an actual intelligence agency.
- Zvi's posts on simulacra levels
- The passage here about Julian Assange.
The compressed view-from-10,000-feet TLDR: Organizations are made of parts, those parts are people and things, game theory's gonna game theory, dramatic (and/or dramatic-sounding) stuff can result.
I feel like Ezra. I've also gotten various sources of feedback that make me think I might not be cut out for "top level" alignment research... but I find myself making progress on my beliefs about the problem, just slower than others.
Thoughts? Advice?
(Also please being back that printed Highlights From The Sequences, that's my favorite one and I want to gift it to people, but I only got a copy from EAG once. There's an underrated benefit to giving someone a physical book; Many people feel more psychologically interested in reading a physical book that takes up space (and is designed really well!), but would ignore the same content sent as a web link / index-of-blogposts.)
I love these book sets and I will keep buying them as they come out. They are also in my list of "books to gift people when the occasions come up".
Also thank you for putting the first set back into print!
Good illustrative post, we all need to be more wary of "bad memes that nevertheless flatter the rationalist/EA/LW ingroup, even if it's in a subtle/culturally-contingent way". For every "invest in crypto early", there's at least one fad diet.
Object-level: This updated me closer to Nate's view, though I think abstractions-but-not-necessarily-natural-ones would still be valuable enough to justify quite a lot of focus on them.
Meta-level: This furthers my hunch that the AI alignment field is an absurdly inadequate market, lacking what-seems-to-me-like basic infrastructure. In other fields with smart people, informal results don't sit around, unexamined and undistilled, for months on end.
I'm not even sure where the bottleneck is on this anymore; do we lack infrastructure due to a lack of funds, or a lack of talent? (My current answer: More high-talent people may be needed to get a field paradigm, more medium-talent people would help the infrastructure, funders would help infrastructure, and funders are waiting on a field paradigm. Gah!)
I anticipate most of the academia-based AI safety research to be:
- Safety but not alignment.
- Near-term but not extinction-preventing.
- Flawed in a way that a LessWrong Sequences reader would quickly notice, but random academics (even at top places) might not.
- Accidentally speeding up capabilities.
- Heavy on tractability but light on neglectedness and importance.
- ...buuuuuut 0.05% of it could turn out to be absolutely critical for "hardcore" AI alignment anyway. This is simply due to the sheer size of academia giving it a higher absolute number of [people who will come up with good ideas] than the EA/rationalist/LessWrong-sphere.
Four recommendations I have (in addition to posting on Arxiv!):
- It'd be interesting to have a prediction market / measurement on this sort of thing, somehow.
- If something's already pretty likely to get a grant within academia, EA should probably not fund it, modulo whatever considerations might override that on a case-by-case basis (e.g. promoting an EA grantor via proximity to a promising mainstream project... As you can imagine, I don't think such considerations move the needle in the overwhelming majority of cases.)
- Related to (2): Explicitly look to fund the "uncool"-by-mainstream-standards projects. This is partly due to neglectedness, and partly due to "worlds where iterative design fails" AI-risk-process logic.
- The community should investigate and/or setup better infrastructure to filter for the small number of crucial papers noted in my prediction (6).
What % of alignment is crucial to get right?
Most alignment plans involve getting the AI to a point where it cares about human values, then uses its greater intelligence to solve problems in ways we didn't think of.
Some alignment plans literally involve finding clever ways to get the AI to solve alignment itself in some safe way. [1]
This suggests something interesting: Every alignment plan, explicitly or not, is leaving some amount of "alignment work" for the AI (even if that amount is "none"), and thus leaving the remainder for humans to work out. Generally (but not always!), the idea is that humans must get X% of alignment knowledge right before launching the AI, lest it become misaligned.
I don't see many groups lay out explicit reasons for selecting which "built-in-vs-learned alignment-knowledge-mix" their plan aims for. Of course, most (all?) plans already have this by default, and maybe this whole concept is sorta trivial anyway. But I haven't seen this precise consideration expressed-as-a-ratio anywhere else.
(I got some feedback on this as a post, but they noted that the idea is probably too-abstract to be useful for many plans. Sure enough, when I helped the AI-plans.com critique-a-thon, most "plans" were actually just small things that could "slot into" a larger alignment plan. Only certain kinds of "full stack alignment" plans could be usefully compared with this idea.)
For a general mathematization of something like this, see the "QACI" plan by Tamsin Leake at Orthogonal. ↩︎
Thank you, this makes sense currently!
(Right now I'm on Pearl's Causality)
posts I may write soonish, lmk which ones sound most promising:
- Alignment in a Detailed Universe: basically compressibility-of-reality and how it should shape alignment plans.
- lit review of specifically the "paradigm-building" angle of conceptual/theoretical alignment.
[below ideas partly because I'm working with Orxl]
-
Coming up with really rudimentary desiderata/properties of the A action output of AI_0 in QACI.
-
Drawing links between "things mentioned in QACI" and "corresponding things mentioned elsewhere, e.g. in Superintelligence PDS". (E.g. specific problem --> how a thing in QACI solves it or not.)
-
Finish the Pearl Causality math basics first, before doing any of the above. Then I can do the above but with causal networks!
Thoughts while reading this, especially as they relate to realityfluid and diminishing-matteringness in the same vein as "weight by simplicity":
- If we start at a set "0" time, and only the future is infinite, then we break some of the mappings discussed in the "Agent [or any other value-location] Neutrality" mappings. Intuitively, some of the "same distributions" would, if both bounded on the right side, be different again. This does not hold for e.g. w5 and w6 (since there's a 1-to-1 mapping between n and 2n for all n), but it does hold for w3 and w4 (since the first w3 agent to the left of the cutoff-line won't have a corresponding w4 agent).
- Diminishing realityfluid sure looks superficially similar to the weight-by-simplicity idea, which is unusual for being one of the least-tempting not-quite-solutions-to-infinities in the entire post. This makes me update (weakly!) away from "diminishing realityfluid".
- What makes Heaven+Speck intuitively better-than Hell+Lollipop? I think the answer might end up as something like "realityfluid", where (normalized or not) there's more of it on the Heaven/Hell than on the Speck/Lollipop, even though both Heaven&Speck are infinite and so are both Hell&Lollipop.
- This whole post and what Tamsin Leake has built off of it, have updated me away from "backchain learning math as you need it". Maybe not all-the-way away, but a good bit away.
This is consistent with what I've heard/read elsewhere, yeah.
Good points, clarified to "any" instead of "all" math.
I think part of the question is also comparative advantage. Which, if someone is seriously considering technical research, is probably a tougher-than-normal question for them in particular. (The field still has a shortage, if not always constraint, of both talent and training-capacity. And many people's timelines are short...)
Another project idea that should at least be written down somewhere Official, even if it's never implemented.
Full agree on all of this.
This contest, in mid-2022, seems like a bit of what you're talking about. I entered it, won some money, made a friend, and went on my merry way. I haven't seen e.g. policymakers (or, uh, even many people on Twitter) use language that reminded me of my more-original winning entries. As I'd expect either way (due to secrecy), I also don't know if any policymakers received and read any new text that won money in the contest.
I wrote a relevant shortform about Elon here. TLDR: I'm not quite convinced Elon Musk has actually read any one of the Sequences.
I'm not quite convinced Elon Musk has actually read any one of the Sequences. I think what happened was "Superintelligence came out in 2014ish --> Musk mentioned it, WaitButWhy wrote about AI risk and also about Musk, LessWrong was the next logical place to go talk about it --> Musk cofounds OpenAI and then leaves --> ... --> Eveyrone associates Musk with the rationality community, despite a serious lack of evidence beyond 'he's interacted with us at all'." (NOTE: this is JUST about ratcom, NOT EA which he's interacted more with e.g. through FLI/MacAskill)
Like, does he tweet e.g. "The map is not the territory. Very deep, we must come together this" [sic]? Does he mention HPMOR or the Sequences when asked about books he likes on podcasts?
At most, he probably goes to some of the same parties as rationalists, and thus might use a few frames/vocab terms here and there. (E.g. if you found a clip of him mentioning "calibrating" his beliefs, or even "I updated towards...", that wouldn't disprove my larger point, that we don't have enough evidence for him looking at / believing / absorbing / endorsing / being affiliated with the LW-ratcom canon of stuff.)
I'd be less more concerned if I was wrong about this, since it'd imply that reading this stuff didn't stop him from [gestures at list of unforced errors by Elon Musk].
Forgot to mention, but HPMOR is a good example of using heroes and adolescent-type story elements to teach people rationality. Perhaps this is related to it being debatably the most-successful Rationalist recruitment device to date?
I think there's probably something to the theory driving this, but 2 problems:
- It seems half-baked, or half-operationalized. Like, "If I get them angry at my comment, then they'll really feel the anger that [person] feels when hearing about IQ!". No, that makes most people ignore you or dig in their heels. If I were using "mirror neurons, empathy, something..." to write a comment, it'd be like a POV story of being told "you're inherently inferior!" for the 100th time today. It'd probably be about as memetically-fit, more helpful, and even more fun to write!
Related story, not as central: I used to, and still sometimes do, have some kind of mental bias of "the angrier someone is while saying something, it must have more of The Truth" in it. The object-level problems with that should be pretty obvious, but the meta-level problem is that different angry people still disagree with each other. I think there is a sort of person on LessWrong who might try steelmanning your view. But... you don't give them much to go off of, not even linking to relevant posts against the idea that innate intelligence is real and important.
- LessWrong as-a-whole is place where we ought to have, IMHO, norms of this place is okay to be honest in. You shouldn't start a LessWrong comment by putting on your social-engineer hat and saying "Hmmm, what levers should I pull to get the sheep to feel me?". And, as noted in (1), this precise example probably didn't work, and shouldn't be the kind of thing that works on LessWrong.
[Less central: In general, I think that paying attention to vibes is considerate and good for lots of circumstances, but that truth-seeking requires decoupling, and that LessWrong should at-its-core be about truth-seeking. If I changed my mind on this within about a week, I would probably change the latter belief, but not the former.]
I admire your honesty (plain intention-stating in these contexts is rare!), and hope this feedback helps you and/or others persuade better.
(I also have angrier vibes I could shout at you, but they're pretty predictable given what I'm arguing for, and basically boil down to "
I disagree quite a bit with the pattern of "there's this true thing, but everyone around me is rounding it off to something dumb and bad, so I'm just gonna shout that the original thing is not-true, in hopes people will stop rounding-it-off".
Like, it doesn't even sound like you think the "real and important" part is false? Maybe you'd disagree, which would obviously be the crux there, but if this describes you, keep reading:
I don't think it's remotely intractable to, say, write a LessWrong post that actually convinces lots of the community to actually change their mind/extrapolation/rounding-off of an idea. Yudkowsky did it (as a knowledge popularizer) by decoupling "rationality" from "cold" and "naive". Heck, part of my point was that SSC Scott has written multiple posts doing the exact thing for the "intelligence" topic at hand!
I get that there's people in the community, probably a lot, who are overly worried about their own IQ. So... we should have a norm of "just boringly send people links to posts about [the topic-and-hand] that we think are true"! I'm sure, if someone wrote or dug up a good post about [why not to be racist/dickish/TheMotte about innate intelligence], we should link the right people that, too.
In four words: "Just send people links."
I agree with the meta-point that extreme language is sometimes necessary (the paradigmatic example imho being Chomsky's "justified authority" example of a parent yelling at their kid to get out of the road, assuming the yell and/or swear during it), good on you for making that decision explicit here.
I... didn't mention Ender's Game or military-setups-for-children. I'm sorry for not making that clearer and will fix in the main post. Also, I am try to do something instead of solely complaining (I've written more object-level posts and applied for technical-research grants for alignment).
There's also the other part that, actually, innate intelligence is real and important and should be acknowledged and (when possible) enhanced and extended, but also not used as a cudgel against others. I honestly think that most of the bad examples "in" the rationality community are on (unfortunately-)adjacent communities like TheMotte and sometimes HackerNews, not LessWrong/EA Forum proper.
Thank you, fixed.
I agree that more diverse orgs is good, heck I'm trying to do that on at least 1-2 fronts rn.
I'm not as up-to-date on key AI-researcher figures as I prolly should be, but big-if-true is Ilya is really JVN-level and is doing alignment and works at OpenAI, that's a damn good combo for at least somebody to have.
Much cheaper, though still hokey, ideas that you should have already thought of at some point:
- A "formalization office" that checks and formalizes results by alignment researchers. It should not take months for a John Wentworth result to get formalized by someone else.
- Mathopedia.
- Alignment-specific outreach at campuses/conventions with top cybersecurity people.
To clarify the "Bitter Lesson" example: the non-roundabout "direct" AI strategy is to use the "sweet shortcut" (h/t Gwern), by using existing human expert knowledge and trying to encode that into a computer. The roundabout strategy is to build a massive computing infrastructure first, which scaling requires. Even if no single group actually executed a strategy of "invent better computers and then do ML on them", society as-a-whole did via the compute overhang.
I appreciate the analysis of talent-vs-funding constraints. I think the bar-for-useful-contribution is so high that we loop back around to "we need to spend more money (and effort) on finding (and making) more talent", and the programs to do those may be more funding-constrained than talent-constrained.
Like, the 20th century had some really good mathematicians and physicists, and the US government spared little expense towards getting them what they needed, finding them, and so forth. Top basketball teams will "check up on anyone over 7 feet that’s breathing".
Consider how huge Von Neumann's expense account must've been, between all the consulting and flight tickets and car accidents. Now consider that we don't seem to have Von Neumanns anymore. There are caveats to at least that second point, but the overall problem still hasn't been "fixed".
Things an entity with absurdly-greater funding (e.g. the US Department of Defense) could probably do, with their absurdly-greater funding and probably coordination power:
- Indefinitely-long-timespan basic minimum income for everyone who
- Coordinating, possibly by force, every AI alignment researcher and aspiring alignment researcher on Earth to move to one place that doesn't have high rents like the Bay. Possibly up to and including creating that place and making rent free for those who are accepted in.
- Enforce a global large-ML-training shutdown.
- An entire school system (or at least an entire network of universities, with university-level funding) focused on Sequences-style rationality in general and AI alignment in particular.
- Genetic engineering, focused-training-from-a-young-age, or other extreme "talent development" setups.
- All of these at once.
I think the big logistical barrier here is something like "LTFF is not the US government", or more precisely "nothing cool like this can be done 'on-the-margin' or with any less than the full funding". However, I think some of these could be scaled down into mere megaprojects.