Posts

Terry Tao is hosting an "AI to Assist Mathematical Reasoning" workshop 2023-06-03T01:19:08.398Z
How I learned to stop worrying and love skill trees 2023-05-23T04:08:42.022Z
You are probably not a good alignment researcher, and other blatant lies 2023-02-02T13:55:15.186Z
junk heap homotopy's Shortform 2022-10-04T17:34:20.848Z

Comments

Comment by junk heap homotopy (zrkrlc) on Examples of Highly Counterfactual Discoveries? · 2024-04-24T14:53:28.319Z · LW · GW

Set theory is the prototypical example I usually hear about. From Wikipedia:

Mathematical topics typically emerge and evolve through interactions among many researchers. Set theory, however, was founded by a single paper in 1874 by Georg Cantor: "On a Property of the Collection of All Real Algebraic Numbers".

Comment by junk heap homotopy (zrkrlc) on Rationality Research Report: Towards 10x OODA Looping? · 2024-02-26T12:25:27.113Z · LW · GW

It'd be cool if a second group also worked towards "rationality skill assessment."

This was my project at last year's Epistea, but I sort of had to pause it to work full-time on my interp upskilling experiment.

I only got as far as implementing ~85% of an app to facilitate this (as described here), but maybe a quick chat about this would still be valuable?

Comment by junk heap homotopy (zrkrlc) on Dialogue on the Claim: "OpenAI's Firing of Sam Altman (And Shortly-Subsequent Events) On Net Reduced Existential Risk From AGI" · 2023-11-22T08:17:48.758Z · LW · GW

Time to update then 🥲

https://x.com/OpenAI/status/1727206187077370115?s=20

Comment by junk heap homotopy (zrkrlc) on The Flow-Through Fallacy · 2023-09-13T15:34:48.492Z · LW · GW

I wouldn’t say that. Signalling the way you seem to have used it implies deception on their part, but each of these instances could just be a skill issue on their end, an inability to construct the right causal graph with sufficient resolution.

For what it’s worth whatever this pattern is pointing at also applies to how wrongly most of us got the AI box problem, i.e., that some humans by default would just let the damn thing out without needing to be persuaded.

Comment by junk heap homotopy (zrkrlc) on Feedbackloop-first Rationality · 2023-08-12T07:40:46.576Z · LW · GW

There seems to two major counter-claims to your project:

  • Feedback loops can't work for nebulous domains, so this whole thing is misguided.
  • Transfer learning is impossible and you can't get better at rationality by grinding LeetCode equivalents.

(There's also the third major counter-claim that this can't work for alignment research, but I assume that's actually irrelevant since your main point seems to be about rationality training.)

My take is that these two claims stem from inappropriately applying an outcome-oriented mindset to a process-oriented problem. That is, the model seems to be: "we wanted to learn X and applied Feedback Loops™️ but it didn't work, so there!" instead of "feedback-loopiness seems like an important property of a learning approach we can explicitly optimise for".

In fact, we can probably factor out several senses of 'feedback loops' (henceforth just floops) that seem to be leading a lot of people to talk past each other in this thread:

  • Floops as reality pushing back against movement, e.g. the result of swinging a bat, the change in an animation when you change a slider in an Explorable Explanation
  • Floops where the feedback is quick but nebulous (e.g. persuasion, flirting)
  • Floops where the feedback is clear but slow (e.g. stock market)
  • Floops as reinforcement, i.e. the cycle Goal → Attempt → Result
  • Floops as OODA loops (less legible, more improvisational than previous item)
  • Floops where you don't necessary choose the Goal (e.g. competitive multiplayer games, dealing with the death of a loved one)
  • Floops which are not actually loops, but a single Goal → Attempt → Result run (e.g., getting into your target uni)
  • Floops which are about getting your environment to support you doing one thing over and over again (e.g. writing habits, deliberate practice)
  • Floops which are cumulative (e.g. math)
  • Floops where it's impossible to get sufficiently fine-grained feedback without the right paradigm (e.g. chemistry before Robert Boyle)
  • Floops where you don't necessarily know the Goal going in (e.g. doing 1-on-1s at EA Global)
  • Floops where failure is ruinous and knocks you out of the game (e.g. high-rise parkour)
  • Anti-floops where the absence of an action is the thing that moves you towards the Goal
  • Floops that are too complex to update on a single result (e.g. planning, designing a system)

When someone says "you can't possibly apply floops to research", I imagine they're coming from a place where they interpret goal-orientedness as an inherent requirement of floopiness. There are many bounded, close-ended things that one can use floops for that can clearly help the research process: stuff like grinding the prerequisites and becoming fluent with certain techniques (cf. Feynman's toolbox approach to physics), writing papers quickly, developing one's nose (e.g. by trying to forecast the number of citations of a new paper), etc.

This claim is independent of whether or not the person utilising floops is good enough to get better quickly. I think it's not controversial to claim that you can never get a person with profound mental disabilities and who is not a savant at technical subjects to discover a new result in quantum field theory, but this is also irrelevant when talking about people who are baseline capable enough to worry about this things on LessWrong dot com in the first place.

On the other end of the spectrum, the reductio version of being against floops: that everyone was literally born with all the capabilities they would ever need in life and Learning Is Actually a Myth, seems blatantly false too. Optimising for floopiness seems to me merely trying to find a happy medium in between.


On an unrelated note, I wrote about how to package and scalably transfer floops a while back: https://www.lesswrong.com/posts/3CsynkTxNEdHDexTT/how-i-learned-to-stop-worrying-and-love-skill-trees

All modern games have two floops built-in: a core game loop that gets completed in under a minute, and a larger game loop that makes you come back for more. Or in the context of my project, Blackbelt:

IMAGE

The idea is you can design bespoke tests-of-skill to serve as your core game loop (e.g., a text box with a word counter underneath, the outputs of your Peloton bike, literally just checking a box like with a TODO list) and have the deliberately status-oriented admission to a private channel be the larger, overarching hook. I think this approach generalises well to things that are not just alignment, because floops can be found in both calculating determinants and doing on-the-fly Fermi calculations for setting your base rates, and who wouldn't want to be in the company of people who obsess endlessly about numbers between 0 and 1?

Comment by junk heap homotopy (zrkrlc) on Feedbackloop-first Rationality · 2023-08-12T06:22:49.690Z · LW · GW

There's this tension between what I know from the literature (i.e. transfer learning is basically impossible) and my lived experience that I and a handful of the people I know in real life whom I have examined in depth are able to quickly apply e.g. thermodynamics concepts to designing software systems, or how consuming political fiction has increased my capacity to model equilibrium strategies in social situations. Hell, this entire website was built on the back of HPMoR, which is an explicit attempt to teach rationality by reading about it.

The point other people have made about alignment research being highly nebulous is important but irrelevant. You simply cannot advance the frontiers of a field without mastery of some technique or skill (or a combination thereof) that puts you in a spot where you can do things that were impossible before, like how Rosalind Franklin needed some mastery of x-ray crystallography to be able to image the DNA.

Research also seems to be another skill that's trainable or at least has trainable parts. If for example the bottleneck is sheer research output, I can imagine a game where you just output as many shitty papers as possible in a bounded period of time would let people write more papers ceteris paribus afterwards. Or even at the level of paragraphs even: one could play a game of "Here's 10 random papers outside your field with the titles, authors, and publication year removed. Guess how many citations they got." to develop one's nose for what makes a paper impactful, or "Write the abstract of this paper." to get better at distillation.

Comment by junk heap homotopy (zrkrlc) on Which rationality posts are begging for further practical development? · 2023-07-24T01:12:16.277Z · LW · GW

The part in That Alien Message and the Beisutsukai shorts that are about independently regenerating known science from scratch. Or zetetic explanations, whichever feels more representative of the idea cluster.

In particular, how does one go about making observations that are useful for building models? How does one select the initial axioms in the first place?

Comment by junk heap homotopy (zrkrlc) on Why I'm Not (Yet) A Full-Time Technical Alignment Researcher · 2023-05-25T04:46:56.825Z · LW · GW

I don't know. It seems to me that we have to make the graph of progress in alignment vs capabilities meet somewhere and part of that would probably involve really thinking about which parts of which bottlenecks are really blockers vs just epiphenomena that tag along but can be optimised away. For instance, in your statement:

If research would be bad for other people to know about, you should mainly just not do it

Then maybe doing research but not having the wrong people know about it is the right intervention, rather than just straight-up not doing it at all?

Comment by junk heap homotopy (zrkrlc) on How I learned to stop worrying and love skill trees · 2023-05-24T01:24:33.581Z · LW · GW

Yup, that's definitely on the roadmap. Sometimes you need facts to advance a particular skill (e.g., if you're figuring out scaling laws, you have got to know a bunch of important numbers) or even just to use the highly specialised language of your field, and there's no better way to do that than to use SRS.

We're probably going to offer Anki import some time in the future just so we can take advantage of the massive amount of material other people have already made, but also I can't promise one-to-one parity since I'd like to aim more towards gwern's programmable flashcards which directly inspired this whole project in the first place.

Comment by junk heap homotopy (zrkrlc) on LessWrong moderation messaging container · 2023-04-22T07:45:38.178Z · LW · GW

Have you scaled up the moderation accordingly? I have noticed fewer comments that are on the level of what gets posted on r/slatestarcodex these days but I'm not sure if it's just a selection effect.

Comment by junk heap homotopy (zrkrlc) on Covid 2/23/23: Your Best Possible Situation · 2023-03-01T06:20:30.349Z · LW · GW

Congratulations! I don't think I would have navigated the past three years as well as I did without your posts.

Comment by junk heap homotopy (zrkrlc) on Things that can kill you quickly: What everyone should know about first aid · 2022-12-29T03:01:03.495Z · LW · GW

Is this particular to LessWrong or is there some browser shenanigans going on in being able to render this?

Comment by junk heap homotopy (zrkrlc) on junk heap homotopy's Shortform · 2022-10-22T16:52:04.680Z · LW · GW

Person-affecting view

I want to see if I’m cut out to do nontrivial independent alignment research, and I want to get an answer as quickly as possible. The best way to do that is to waste everyone’s time publicly lay down my maps and hope that someone out there will feel white-hot intransigent rage at someone being wrong on the internet and correct me.

Why alignment research?

That is, why not work on anything else like biorisk? Or some other generic longtermist cause?

The God of Power Laws has decreed that most human endeavors be Pareto-distributed. In a high-dimensional world, that means most of the important work and most of the disproportionate outcomes will come from carving niches instead of just doggedly pursuing areas other people are already exploiting.

My physics + math + CS background might make me seem like I’m just another median LessWrong nerd, but I have the unique experience of having been particularly bad at them. But even though that is the case, I would like to think that I have this tendency to be wrong in interesting ways, mostly because I’m a wordcel who keeps clawing at shape rotating doors he shouldn’t be clawing at[1][2].

(The real reason is that a physics degree basically gives you all the math you need to start reading ML papers, and then some. Switching to biology might take years, even though I performed really well in my bio electives in uni. Plus, LessWrong has been my constant bedside reading since 2009, so like Ben Pace who knows me by another name.)

I hesitate to write all this because it’s deeply embarrassing. I was a pretty good clicker. I took the lesson of That Alien Message seriously and purposely read only a quarter of the Sequences so I can fill in the rest. When I was 12 I knew there was a nontrivial chance I wouldn’t get to live in a transhumanist spacefaring hypertechnological utopia because of our unchecked hubris. And yet, I spent the last fifteen years of my life optimising for my own happiness, learning random irrelevant things like music and storytelling, founding a VR company and failing hard, gaining people-skills and people-experiences far in excess of anyone working on alignment would ever need.

And yet, I ended up here, staring at the jaws of creation woefully unarmed.

Looking back, the biggest thing that turned me off the whole idea of becoming an AI safety researcher was to first-order everyone’s favourite cope i.e., that I wasn’t smart enough to meaningfully contribute to alignment. In my case, however, that hypothesis remains my leading candidate and not for a lack of trying (to refute its underlying generator, that no one who hasn’t IMO-level math skills is allowed to try). It’s just that I really, really find short-timeline arguments convincing to the point that I have stopped trying to optimise for raising genius children even if I have dreamed of doing it since I was in fourth grade.

Taking John Wentworth’s guide seriously means working with what I have, right now, instead of shoring up defenses around my weak spots. My impression is that there is a dangerous lack of wordcels in alignment research and thus the need for programs like the CAIS Philosophy Fellowship and PIBBS, and if that’s not the case then most of the marginal impact I would have had working on conceptual alignment directly will basically vanish. Of course, fleshing out exactly why I expect to be able to carve out a niche for myself in such a hotly-contested area should be done, but more on that later.

Why independent?

Mostly because of my visa situation. I have a particularly weak passport and some bad decisions I made in my younger years has made it difficult to rectify the situation[3]. In particular, the most viable way for me to get into Berkeley is to spend 2-3 years getting a master’s degree and using that to get residency in say, Canada, where it would be significantly easier for me to make trips southward. I think that’s time I could just spend working on alignment proper[4].

So this is my main physical constraint: what I must do, I can only do within the confines of Southeast Asia and the internet for the foreseeable future.

Q: It doesn’t sound too bad. I mean, most of the stuff you need is on the internet right?

Wrong. Conditional on The PhD Grind being an accurate look at academic research in general, and alignment work converging to similar patterns, anyone who isn’t physically present in the right offices are forever relegated to already-distilled, already-cleaned up versions of arguments and hypotheses. Most of research happens behind the scenes, out of the confines of PDFs and .edu webpages. No lunch breaks with colleagues means no Richard Hammings to fearlessly question your entire research agenda on a whim. No watercooler conversations mean you lose out on things like MATS.

Which also means you can pretty much avoid information cascades and are thus slightly better positioned to find novel research lines in a pre-paradigmatic field[5]. :P

Okay, I don’t think this is strictly the case. If I am unable to solve this geography problem within the next five years, I think my potential impact will be cut by at least half. No one can singlehandedly breed scenius. I and all the others like me are in a Milanese Leonardo situation, and unfortunate as it is it’s an inescapable test of agency that we must all pass one way or another. Either that, or figure out a way to escape the hard limit of presence.

Why nontrivial?

If I’m being honest, I would rather not work on this.

I think I speak for a lot of newcomers in this field when I say that, while thinking all day about Big Questions like what really goes on in a mind or how we can guarantee human flourishing sounds like a supremely attractive lifestyle, actually weighing it as an option versus all the other supremely attractive things in life is tough. Most of us can probably go on to do great things in other lines of work, and while funding in this space is still growing steadily there is a real chance that only a small minority of us will end up making the cut and actually make a living out of this.

A meme that’s currently doing the rounds in my circles is that there are only ~300 people working on AI safety at the moment. Taken at face value, that seems like a horrifyingly low number given the stakes we’re dealing with. But a cursory check on atomicheritage.org tells me there were only 611 scientists involved in both the US and German programs during the Manhattan Project. Sure, our researchers aren’t as highly selected as the 1927 Solvay Conference but do we really think that adding more people to the mix is the best way to climb the logistic success curve?

Here’s what I think: if you’re like me, then you’re doing this because the alternative is sickening. Decades of reading about smack-talking space detectives or wizards singlehandedly starting Industrial Revolutions are never gonna let us party while the rest of the world careens off a cliff. How can you? How can anyone? The only institution our civilisation has managed to produce that’s even remotely attempting to solve the problem is trying to justify to itself that we should take our time more than we already have. Who but the innocent can hide thine awful countenance?

If I decide against doing this after all, I would for the rest of my days stay up at night in cold sweat trying to convince myself I hadn’t consigned my loved ones to their early deaths. My only consolation is that, on the off-chance we win, then no one will die anymore and I could grapple with the shame of not having done anything to help until the last photon in the universe gets redshifted to oblivion.

And by the gods, I wish we’d all be around to see that.


  1. Last time I checked my quantitative ability lags behind my verbal ability by 2.5 σ. ↩︎

  2. I also spent the last three years hanging out with postrats on Twitter where terms like ‘wordcel’ and ‘shape rotator’ just float uncontested in the water supply. FYI they mean “high verbal skill” and “high math skill” respectively. Yes, IQ deltas are more important than subfactor differences, don’t @ me. ↩︎

  3. That is, I didn’t prioritise my GPA. I didn’t optimise for international competitions and/or model UN-type events, which would have given me access to a B1/B2 visa. This is not a justification but the main reason I only half-assedly tried to fix my situation was because I didn’t know I’d be doing this whole thing after all. Yes, I know I have developed an Ugh Field around this whole thing and part of working on it is publicly acknowledging it like what I’m doing here. ↩︎

  4. Okay, there are actually other paths but this one’s the surest. I could go join an SF startup and try my hand at the H1B lottery. I could give up on the US and optimise for London (but they have a similarly labyrinthian immigration process). There are several options, but they all take time and money and energy which I’d have to redirect from my actual work. The next-best choice really would be to ignore short timelines and just salarymaxx until I have enough money to bulldoze over the polite requirements of immigration bureaus. ↩︎

  5. We’re not really pre-paradigmatic so much as in a the-promising-paradigms-do-not-agree-with-each-other state, right? ↩︎

Comment by junk heap homotopy (zrkrlc) on Competent Elites · 2022-10-20T02:53:05.956Z · LW · GW

Steve Jurvetson reacts to this post 14 years later: https://twitter.com/FutureJurvetson/status/1582761692861038593

Comment by junk heap homotopy (zrkrlc) on Why Study Physics? · 2021-11-29T03:05:15.894Z · LW · GW

Have you read What Should A Professional Mathematician Know? The relevant bits are in the last two sections.

Comment by junk heap homotopy (zrkrlc) on Against Dog Ownership · 2020-12-04T10:48:33.767Z · LW · GW

Anecdotally speaking, being forced to take care of a puppy made me significantly more empathic towards animals in general. Whereas I subconsciously saw pet owners as being 'hijacked' (even taking into account a rather animal-obsessed childhood), it was only after fully bonding with my puppy that I was able to empathise with suffering animals on a gut level (again, I used to be able to, then I was desensitised for years, then today I'm back to full-on animal empathy mode).

All this happened over the course of 4-5 days. It was actually quite scary to see my values change so abruptly, to the extent that visiting r/petloss literally makes me want to vomit due to how heavy it makes my heart.

Comment by junk heap homotopy (zrkrlc) on Cryonics without freezers: resurrection possibilities in a Big World · 2020-12-04T10:40:41.717Z · LW · GW

Now I'm curious. Does studying history make you update in a similar way? I feel that these times are not especially insane compared to the rest of history, though the scale of the problems might be bigger.

Comment by junk heap homotopy (zrkrlc) on Online Meetup: Forecasting Workshop · 2020-05-05T16:19:26.007Z · LW · GW

Hi there. I signed up for the event but didn’t receive any URL. Can someone help me out?

Comment by junk heap homotopy (zrkrlc) on Hanson vs Mowshowitz LiveStream Debate: "Should we expose the youth to coronavirus?" (Mar 29th) · 2020-03-29T20:22:20.440Z · LW · GW

What do you think are the odds of a successful vaccine being developed within 3 months? 6 months? A year? Before we achieve herd immunity?

Comment by junk heap homotopy (zrkrlc) on Hanson vs Mowshowitz LiveStream Debate: "Should we expose the youth to coronavirus?" (Mar 29th) · 2020-03-29T20:15:59.606Z · LW · GW

How long do you think will the peak last? How does your position (and best-policy recommendations) change as this variable changes?

Comment by junk heap homotopy (zrkrlc) on I Want To Live In A Baugruppe · 2017-07-31T16:19:17.138Z · LW · GW

Definitely in 5-10 years. Hopefully with accommodations for foreignfolk as well?

Comment by junk heap homotopy (zrkrlc) on The Proper Use of Doubt · 2016-09-01T20:11:33.613Z · LW · GW

Not sure if this is the proper place to say this but your first link is broken.

http://www.yudkowsky.net/virtues/ -> http://www.yudkowsky.net/rational/virtues/