Highlights and Prizes from the 2021 Review Phase

post by Raemon · 2023-01-23T21:41:21.948Z · LW · GW · 14 comments


  Reviews I liked
    Nostalgebraist on Fun with +12 OOMs of Compute
    AdamShimi, Joe_Collman, Gyrodiot on Fun with 12 OOMs of Compute
    Habryka on the 2021 MIRI Dialogues
    Alex Ray on Coase's "Nature of the Firm" on Polyamory
    gears of ascension and Adam Jermyn on Your Cheerful Price
    Adam Shimi on What Multipolar Failure Looks Like, and Robust Agent-Agnostic Processes
    AllAmericanBreakfast DirectedEvolution on Core Pathways of Aging
    La3orn's Self Review of EfficientZero: How It Works
    Akash on ARC's first technical report: Eliciting Latent Knowledge
      Things about ELK that I benefited from
      Ways I think ELK could be improved
    Short anecdote 
    Prize Philosophy
    Prizes So Far

We've had a ton of reviews for the The LessWrong 2021 Review [LW · GW], and I wanted to take a moment to: a) share highlights from reviews that felt particularly valuable to me, b) announce the prizes so far, to give people a better idea of how prizes are awarded.

Tl;dr on prizes: we've awarded $4,425 in prizes so far. I've been aiming to pretty consistently give at least a $50-$100 prize for all reviews that put in some effort, and am happy to pay more for good reviews. The largest prize so far has been $200 but I'd be excited to give $500+ to in-depth reviews of posts that were ranked highly in the preliminary voting. [? · GW]

If you'd consider doing reviews but are uncertain about the payoff, PM me and most likely we can work out something out.

Reviews I liked

Nostalgebraist on Fun with +12 OOMs of Compute

After reading Nostalgebraist's review of Fun with +12 OOMs of Compute, I felt like I actually understood the main point for the first time:

This post provides a valuable reframing of a common question in futurology: "here's an effect I'm interested in -- what sorts of things could cause it?"

That style of reasoning ends by postulating causes.  But causes have a life of their own: they don't just cause the one effect you're interested in, through the one causal pathway you were thinking about.  They do all kinds of things.

In the case of AI and compute, it's common to ask

  • Here's a hypothetical AI technology.  How much compute would it require?

But once we have an answer to this question, we can always ask

  • Here's how much compute you have.  What kind of AI could you build with it?

If you've asked the first question, you ought to ask the second one, too.

The first question includes a hidden assumption: that the imagined technology is a reasonable use of the resources it would take to build.  This isn't always true: given those resources, there may be easier ways to accomplish the same thing, or better versions of that thing that are equally feasible.  These facts are much easier to see when you fix a given resource level, and ask yourself what kinds of things you could do with it.

This high-level point seems like an important contribution to the AI forecasting conversation.  The impetus to ask "what does future compute enable?" rather than "how much compute might TAI require?" influenced my own view of Bio Anchors, an influence that's visible in the contrarian summary at the start of this post.

(I think Zach Stein-Perlman's review makes a similar point more succinctly, although I found nostalgebraist's opening section better for driving this point home).

Nostalgebraist notes that the actual details of the post are pretty handwavy:

I find the specific examples much less convincing than the higher-level point.

For the most part, the examples don't demonstrate that you could accomplish any particular outcome applying more compute.  Instead, they simply restate the idea that more compute is being used.

They describe inputs, not outcomes.  The reader is expected to supply the missing inference: "wow, I guess if we put those big numbers in, we'd probably get magical results out."  But this inference is exactly what the examples ought to be illustrating.  We already know we're putting in +12 OOMs; the question is what we get out, in return.

This is easiest to see with Skunkworks, which amounts to: "using 12 OOMs more compute in engineering simulations, with 6 OOMs allocated to the simulations themselves, and the other 6 to evolutionary search."  Okay -- and then what?  What outcomes does this unlock?

We could replace the entire Skunkworks example with the sentence "+12 OOMs would be useful for engineering simulations, presumably?"  We don't even need to mention that evolutionary search might be involved, since (as the text notes) evolutionary search is one of the tools subsumed under the category "engineering simulations." 

Amp suffers from the same problem.  It includes two sequential phases:

  1. Training a scaled-up, instruction-tuned GPT-3.
  2. Doing an evolutionary search over "prompt programs" for the resulting model.

Each of the two steps takes about 1e34 FLOP, so we don't get the second step "for free" by spending extra compute that went unused in the first.  We're simply training a big model, and then doing a second big project that takes the same amount of compute as training the model.

We could also do the same evolutionary search project in our world, with GPT-3.  Why haven't we?  It would be smaller-scale, of course, just as "GPT-7" is smaller scale than GPT-3 (but GPT-3 was worth doing!).

With GPT-3's budget of 3.14e23 FLOP, we could to do a GPT-3 variant of AMP with, for example,

  • 10000 evaluations or "1 subjective day" per run (vs "3 subjective years")
  • population and step count ~1600 (vs ~50000), or two different values for population and step count whose product is 1600^2

100,000,000 evaluations per run (Amp) sure sounds like a lot, but then, so does 10000 (above).  Is 1600 steps "not enough"?  Not enough for what?  (For that matter, is 50000 steps even "enough" for whatever outcome we are interested in?)

The numbers sound intuitively big, but they have no sense of scale, because we don't know how they relate to outcomes.  What do we get in return for doing 50000 steps instead of 1600, or 1e8 function evaluations instead of 1e5?  What capabilities do we expect out of Amp?  How does the compute investment cause those capabilities?

The question "What could you do with +12 OOMs of Compute?" is an important one, and this post deserves credit for raising it. The concrete examples of "fun" are too fun for their own good.  [...]

Answering the question in a less "fun," more outcomes-focused manner sounds like a valuable exercise, and I'd love to read a post like that.

AdamShimi, Joe_Collman, Gyrodiot on Fun with 12 OOMs of Compute [LW · GW]

Meanwhile, Adam Shimi had previously led a round of in-depth peer review in 2021 [LW · GW], including delving into the same 12 OOMs post. Adam, Joe and Gyrodiot's Review of "Fun with +12 OOMs of Compute" [LW · GW] was also nominated for the LessWrong 2021 Review as a top-level post, but I wanted to draw attention to it here as a 

Some of their notes on the details:

  • With OmegaStar, one of us thought he remembered that AlphaStar’s reward function was hand shaped, and so humans might prove a bottleneck. A bit more research revealed that AlphaStar used imitation learning to learn a reward function from human games -- an approach that solves at least some of the problems with scaling to “all games in the steam” library.
    Since the issue of humans as bottlenecks in training is pretty relevant, it would have been helpful to describe this line of thought in the post.
  • With Amp(GPT-7), we wondered why GPT-7 and not GPT-8 or GPT-9. More concretely, why should we expect progress on the tasks that are vital for Daniel’s scenario? We don’t have convincing arguments (as far as we know) for arguing that GPT-N will be good at a task for which GPT-3 showed no big improvement over the state of the art. So the tasks for which we can expect such a jump are the ones GPT-3 (or previous GPT) made breakthrough at.

Daniel actually relies on such tasks, as shown in his reference to this extrapolation post [LW · GW] that goes into more detail on this reasoning, and what we can expect from future versions of GPT models. But he fails to make this important matter explicit enough to help us think through the argument and decide whether we’re convinced. Instead the only way to find out is either to know already that line of reasoning, or to think very hard about his post and the references in that spec way specifically.


Another issue with this hypothesis is that it assumes, under the hood, exactly the kind of breakthrough that Daniel is trying so hard to remove from the software side. Our cursory look at Ajeya’s report (focused on the speed-up instead of the cost reduction) showed that almost all the hardware improvement forecasted came from breakthrough into currently not working (or not scalable) hardware. Even without mentioning the issue that none of these technologies look like they can provide anywhere near the improvement expected, there is still the fact that getting these orders of magnitude of compute requires many hardware breakthroughs, which contradicts Daniel’s stance on not needing new technology or ideas, just scaling.

(Very important note: we haven’t studied Ajeya’s report in full. It is completely possible that our issues are actually addressed somewhere in it, and that the full-fledged argument for why this increase in compute will be possible looks convincing. Also, she herself writes that at least the hardware forecasting part looks under-informed to her. We’re mostly highlighting the same problem as in the previous section -- Daniel not summarizing enough the references that are crucial to his point -- with the difference that this time, when looking quickly at the reference, we failed to find convincing enough arguments).

I recommend reading their post in full [LW · GW]. I'm likely to give this post a retroactive "review prize" that's more in the $400 - $1000 range but I haven't finished thinking about what amount makes sense.

Habryka on the 2021 MIRI Dialogues

I think this post might be the best one of all the MIRI dialogues. I also feel confused about how to relate to the MIRI dialogues overall.

A lot of the MIRI dialogues consist of Eliezer and Nate saying things that seem really important and obvious to me, and a lot of my love for them comes from a feeling of "this actually makes a bunch of the important arguments for why the problem is hard". But the nature of the argument is kind of closed off. 

Like, I agree with these arguments, but like, if you believe these arguments, having traction on AI Alignment becomes much harder, and a lot of things that people currently label "AI Alignment" kind of stops feeling real, and I have this feeling that even though a really quite substantial fraction of the people I talk to about AI Alignment are compelled by Eliezer's argument for difficulty, that there is some kind of structural reason that AI Alignment as a field can't really track these arguments. 

Like, a lot of people's jobs and funding rely on these arguments being false, and also, if these arguments are correct, the space of perspectives on the problem suddenly loses a lot of common ground on how to proceed or what to do, and it isn't really obvious that you even want an "AI Alignment field" or lots of "AI Alignment research organizations" or "AI Alignment student groups". Like, because we don't know how to solve this problem, it really isn't clear what the right type of social organization is, and there aren't obviously great gains from trade, and so from a coalition perspective, you don't get a coalition of people who think these arguments are real. 

I feel deeply confused about this. Over the last two years, I think I wrongly ended up just kind of investing into an ecosystem of people that somewhat structurally can't really handle these arguments, and makes plans that assume that these arguments are false, and in doing so actually mostly makes the world worse, by having a far too optimistic stance on the differential technological progress of solving various ML challenges, and feeling like they can pick up a lot of probability mass of good outcomes by just having better political relationships to capabilities-labs by giving them resources to make AI happen even faster. 

I now regret that a lot, and I think somehow engaging with these dialogues more closely, or having more discussion of them, would have prevented me from making what I currently consider one of the biggest mistakes in my life. Maybe also making them more accessible, or somehow having them be structured in a way that gave me as a reader more permission for actually taking the conclusions of them seriously, by having content that builds on these assumptions and asks the question "what's next" instead of just the question of "why not X?" in dialogue with people who disagree. 

In terms of follow-up work, the dialogues I would most love to see is maybe a conversation between Eliezer and Nate, or between John Wentworth and Eliezer, where they try to hash out their disagreements about what to do next, instead of having the conversation be at the level these dialogues were at. 

Alex Ray on Coase's "Nature of the Firm" on Polyamory

Alex Ray left a few interesting reviews. I found his discussion on 1a3orn's essay, "Coase's "Nature of the Firm" on Polyamory [LW · GW]", pretty valuable from the standpoint of "what makes an analogy work, or not?". 

Weakly positive on this one overall.  I like Coase's theory of the firm, and like making analogies with it to other things.  I don't think this application felt like it quite worked to me, and trying to write up why.

One thing is I think feels off is an incomplete understanding of the Coase paper.  What I think the article gets correct: Coase looks at the difference between markets (economists preferred efficient mechanism) and firms / corporation, and observes that transaction costs (for people these would be contracts, but in general all transaction costs are included) are avoided in firms.  What I think it misses: A primary question explored in the paper is what factors govern the size of firms, and this leads to a mechanistic model that the transaction costs internal to the firm increase with the size of the firm until they reach a limit of the same as transaction costs for the open market (and thus the expected maximum efficient size of a non-monopoly firm).  A second, smaller, missed point I think is that the price mechanism works for transactions outside the firm, but does not for transactions inside the firm.

Given these, I think the metaphor presented here seems incomplete.  It's drawing connections to some of the parts of the paper, but not all of the central parts, and not enough to connect to the central question of size.

I'm confused exactly what parts of the metaphor map to the paper's concept of market and firm.  Is monogamy the market, since it doesn't require high-order coordination?  Is polyamory the market since everyone can be a free-ish actor in an unbundled way?  Is monogamy the firm since it's not using price-like mechanisms to negotiate individual unbundled goods?  Is polyamory the firm since its subject to the transaction cost scaling limit of size?

I do think that it seems to use the 'transaction costs matter' pretty solidly from the paper, so there is that bit.

I don't really have much I can say about the polyamory bits outside of the economics bits.

gears of ascension and Adam Jermyn on Your Cheerful Price [LW(p) · GW(p)]

Eliezer's Your Cheerful Price [LW · GW] has the highest number of reviews so far – it seemed like a lot of people had the post directly influence their lives. Several reviews mostly said 'yep, this was useful', But I'm including two reviews that went into some details:

gears of ascension:

Solid post. Comments:

  • long; I find it hard to parse as a result. Formatting could be improved significantly to improve skimmability. tldr helps, but if the rest of the post's words are worth their time to read, they could use better highlighting - probably bold rather than italic.
  • I'm very unclear how this differs from a happy price. The forking of the term seems unnecessary.
  • This concept entered my thinking a long time ago.
  • Use of single-currency trade assumes an efficient market; the law of one price is broken by today's exponentially inefficient markets, and so significant gains can be made by doing multicurrency bartering, ie the thing people who don't bring money into it would usually do for a personal services trade. Eg, my happy price in dollars is typically enormous because I would need to pay for a human to aid me, but if you can spare a few minutes of your time in return then I can be dramatically more productive.
  • If I could, I would make Kronopath's comment [LW(p) · GW(p)] the top comment.

Adam Jermyn:

This post introduces the concept of a "cheerful price" and (through examples and counterexamples) narrows it down to a precise notion that's useful for negotiating payment. Concretely:

  1. Having "cheerful price" in your conceptual toolkit means you know you can look for the number at which you are cheerful (as opposed to "the lowest number I can get by on", "the highest number I think they'll go for", or other common strategies). If you genuinely want to ask for an amount that makes you cheerful and no more, knowing that such a number might exist at all is useful.
  2. Even if you might want to ask for more than your cheerful price, your cheerful price helps bound how low you want the negotiation to go (subject to constraints listed in the post, like "You need to have Slack").
  3. If both parties know what "cheerful price" means it's way easier to have a negotiation that leaves everyone feeling good by explicitly signaling "I will feel less good if made to go below this number, but amounts above this number don't matter so much to me." That's not the way to maximize what you get, but that's often not the goal in a negotiation and there are other considerations (e.g. how people feel about the transaction, willingness to play iterated games, etc.) that a cheerful price does help further.

The other cool thing about this post is how well human considerations are woven in (e.g. inner multiplicity, the need for safety margins, etc.). The cheerful price feels like a surprisingly simple widget given how much it bends around human complexity.

Adam Shimi on What Multipolar Failure Looks Like, and Robust Agent-Agnostic Processes [LW · GW]

I consider this post as one of the most important ever written on issues of timelines and AI doom scenario. Not because it's perfect (some of its assumptions are unconvincing), but because it highlights a key aspect of AI Risk and the alignment problem which is so easy to miss coming from a rationalist mindset: it doesn't require an agent to take over the whole world. It is not about agency.

What RAAPs show instead is that even in a purely structural setting, where agency doesn't matter, these problem still crop up!

This insight was already present in Drexler's work, but however insightful Eric is in person, CAIS is completely unreadable and so no one cared. But this post is well written. Not perfectly once again, but it gives short, somewhat minimal proofs of concept for this structural perspective on alignment. And it also managed to tie alignment with key ideas in sociology, opening ways for interdisciplinarity.

I have made every person I have ever mentored on alignment study this post. And I plan to continue doing so. Despite the fact that I'm unconvinced by most timeline and AI risk scenarios post. That's how good and important it is.

AllAmericanBreakfast DirectedEvolution on Core Pathways of Aging [LW(p) · GW(p)]

AllAmericanBreakfast DirectedEvolutionhad both a high-level review and some nitty-gritty details examining John Wentworth's piece of the mechanics of aging.

The high level overview:

Both this document and John himself have been useful resources to me as I launch into my own career studying aging in graduate school. One thing I think would have been really helpful here are more thorough citations and sourcing. It's hard to follow John's points ("In sarcopenia, one cross-section of the long muscle cell will fail first - a “ragged red” section - and then failure gradually spreads along the length.") and trace them back to any specific source, and it's also hard to know which of the synthetic insights are original to John and which are insights from the wider literature that John is echoing here.

While eschewing citations makes the post a little easier to scan, and probably made it a lot easier to write, I think that it runs the risk of divorcing the post from the wider literature and making it harder for the reader to relate this blog post to the academic publications it is clearly drawing upon. It would have also been helpful if John had more often referenced specific terms - when he says "Modern DNA sequencing involves breaking the DNA into little pieces, sequencing those, then computationally reconstructing which pieces overlap with each other," it's true, but also, DNA sequencing methods are diverse and continue to evolve on a technological level at a rapid pace. It's hard to know exactly which set of sequencing techniques he had in mind, or how much care he took in making sure that there's no tractable way to go about this.

Overall, I'm just not sure to what extent I ought to let this post inform my understanding of aging, as opposed to inspiring and motivating my research elsewhere. But I still appreciate John for writing it - it has been a great launch point.

And then digs into a claim [LW(p) · GW(p)] that senolytics rapidly wear off:

From the OP:

For a while, people hypothesized that senescent cells accumulate with age without turning over, acting as a root cause. As mentioned earlier, the actual evidence suggests that senescent cells turn over on a timescale of days to weeks, which would mean this theory is wrong - senescent cell accumulation is not a root cause.

However, there is a saving throw: maybe a small subset of senescent cells are longer-lived, and the experiments measuring senescent cell turnover time just weren’t capturing the long-lived subset in particular. Results from senolytics (drugs which kill senescent cells) suggest this is also wrong: the effects of senolytics rapidly wear off once the drug stops being administered, whereas reversing a root cause should set an organism back to a youthful state longer-term.

Edit: I'm not sure that the claim that senolytics rapidly wear off is accurate. [1] [LW(p) · GW(p)]

Senolytics do not have to be continuously present to exert their effect. Brief disruption of pro-survival pathways is adequate to kill senescent cells. Thus, senolytics can be effective when administered intermittently.22 For example, dasatinib and quercetin have an elimination half-life of a few hours, yet a single short course alleviates effects of leg radiation for at least 7 months.

An alternative possibility is that senolytics kill senescent cells in a tissue-selective manner. We see this here.

Briefly, the team uses a mouse breed in which senescent cells can be killed with a drug called AP. In the skeletal muscle, eye, kidney, lung, heart, and spleen, AP works better in some tissues, worse in others. AP doesn't work at all in the colon or liver.

Decreasing the overall senescent cell burden seems to increase lifespan. But if the colon and liver are reservoires for senescent cells, allowing the cells to rapidly take over once AP administration stops, that would explain why senescent cells bounce back rapidly. Take Afghanistan as an analogy: America was able to suppress the Taliban for decades, as long as it maintained a constant military presence. But there were regions they couldn't touch, which became safe harbor for the Taliban. As soon as America withdrew its military, the Taliban were able to take over the country immediately.

If senescent cells stimulate their own production and dampen their own removal in a way that's concentration dependent, and if most senolytics are tissue-specific and therefore leave highly concentrated reservoires of senescent cells behind, this leaves intact the possibility that stem cells are a root cause of aging.

Fortunately, combination senolytics that cover the full range of tissues may be more tractable than chemotherapy for cancer. With cancer, we primarily target cells undergoing mitosis.[2] [LW(p) · GW(p)] That impacts human cells as well, just somewhat less. And cancer's constant growth means that it's got lots of opportunities to evolve evasion to chemotherapies. But with senolytics, we may be able to target biomarkers that don't especially impact healthy cells. And since senescent cells don't proliferate, they don't have the same opportunities to evolve mechanisms to evade senolytics (1).

John Wentworth replies, and I think the back-and-forth was worthwhile.

La3orn's Self Review of EfficientZero: How It Works

I particularly liked this for grading how his predictions bore out.

I remain pretty happy with most of this, looking back -- I think this remains clear, accessible, and about as truthful as possible without getting too technical.

I do want to grade my conclusions / predictions, though.

(1). I predicted that this work would quickly be exceeded in sample efficiency. This was wrong -- it's been a bit over a year and EfficientZero is still SOTA on Atari. My 3-to-24-month timeframe hasn't run out, but I said that I expected "at least a 25% gain" towards the start of the time, which hasn't happened.

(2). There has been a shift to multitask domains, or to multi-benchmark papers. This wasn't too hard of a prediction, but I think it was correct. (Although of course good evidence for such a shift would require comprehensive lit review.)

To sample two -- DreamerV3 is a very recently released model-based DeepMind algorithm. It does very well at Atari100k -- it gets a better mean score then everything but EfficientZero -- but it also does well at DMLab + 4 other benchmarks + even crafting a Minecraft diamond. The paper emphasizes the robustness of the algorithm, and is right to do so -- once you get human-level sample efficiency on Atari100k, you really want to make sure you aren't just overfitting to that!

And course the infamous Gato is a multitask agent across host of different domains, although the ultimate impact of it remains unclear at the moment.

(3). And finally -- well, the last conclusion, that there is still a lot of space for big gains in performance in RL even without field-overturning new insights, is inevitably subjective. But I think the evidence still supports it.

Akash on ARC's first technical report: Eliciting Latent Knowledge [LW(p) · GW(p)]

ELK was one of my first exposures to AI safety. I participated in the ELK contest shortly after moving to Berkeley to learn more about longtermism and AI safety. My review focuses on ELK’s impact on me, as well as my impressions of how ELK affected the Berkeley AIS community.

Things about ELK that I benefited from

Understanding ARC’s research methodology & the builder-breaker format. For me, most of the value of ELK came from seeing ELK’s builder-breaker research methodology in action. Much of the report focuses on presenting training strategies and presenting counterexamples to those strategies. This style of thinking is straightforward and elegant, and I think the examples in the report helped me (and others) understand ARC’s general style of thinking.

Understanding the alignment problem. ELK presents alignment problems in a very “show, don’t tell” fashion. While many of the problems introduced in ELK have been written about elsewhere, ELK forces you to think through the reasons why your training strategy might produce a dishonest agent (the human simulator) as opposed to an honest agent (the direct translator). The interactive format helped me more deeply understand some of the ways in which alignment is difficult. 

Common language & a shared culture. ELK gave people a concrete problem to work on. A whole subculture emerged around ELK, with many junior alignment researchers using it as their first opportunity to test their fit for theoretical alignment research. There were weekend retreats focused on ELK. It was one of the main topics that people were discussing from Jan-Feb 2022. People shared their training strategy ideas over lunch and dinner. It’s difficult to know for sure what kind of effect this had on the community as a whole. But at least for me, my current best-guess is that this shared culture helped me understand alignment, increased the amount of time I spent thinking/talking about alignment, and helped me connect with peers/collaborators who were thinking about alignment. (I’m sympathetic, however, to arguments that ELK may have reduced the amount of independent/uncorrelated thinking around alignment & may have produced several misunderstandings, some of which I’ll point at in the next section). 

Ways I think ELK could be improved

Disclaimer: I think each of these improvements would have been (and still is) time-consuming, and I don’t think it’s crazy for ARC to say “yes, this we could do this, but it isn't worth the time-cost.” 

More context. ELK felt like a paper without an introduction or a discussion section. I think it could've benefitted from more context about on why it's important, how it relates to previous work, how it fits into a broader alignment proposal, and what kinds of assumptions it makes.

  • Many people were confused about how ELK fits into a broader alignment plan, which assumptions ELK makes, and what would happen if ARC solved ELK. Here are some examples of questions that I heard people asking:
    • Is ELK the whole alignment problem? If we solve ELK, what else do we need to solve?
    • How did we get the predictor in the first place? Does ELK rely on our ability to build a superintelligent oracle that hasn’t already overpowered humanity? 
    • Are we assuming that the reporter doesn’t need to be superintelligent? If it does need to be superintelligent (in order to interpret a superintelligent predictor), does that mean we have to solve a bunch of extra alignment problems in order to make sure the reporter doesn’t overpower humanity? 
    • Does ELK actually tackle the “core parts” of the alignment problem? (This was discussed in this post [LW · GW] (released 7 months after the ELK report), and this post [LW · GW] (released 9 months after ELK) by Nate Soares. I think the discourse would have been faster, of higher-quality, and invited people other than Nate if ARC had made some of its positions clearer in the original report). 
  • One could argue that it’s not ARC’s job to explain any of this. However, my impression is that ELK had a major influence on how a new cohort of researchers oriented toward the alignment problem. This is partially because of the ELK contest, partially because ELK was released around the same time as several community-building efforts had ramped up, and partially because there weren't (and still aren’t) many concrete research problems to work on in alignment research.
  • With this in mind, I think the ELK report could have done a better job communicating the “big-picture” for readers. 

More justification for focusing on worst-case scenarios. The ELK report focuses on solving ELK in the worst case. If we can think of a single counterexample to a proposal, the proposal breaks. This seems strange to me. It feels much more natural to think about ELK proposals probabilistically, ranking proposals based on how likely they are to reduce the chance of misalignment. In other words, I broadly see the aim of alignment researchers as “come up with proposals that reduce the chance of AI x-risk as much as possible” as opposed to “come up with proposals that would definitely work.” 

While there are a few justifications for this in the ELK report, I didn’t find them compelling, and I would’ve appreciated more discussion of what an alternative approach would look like. For example, I would’ve found it valuable for the authors to (a) discuss their justification for focusing on the worst-case in more detail, (b) discuss what it might look like for people to think about ELK in “medium-difficulty scenarios”, (c) understand if ARC thinks about ELK probabilistically (e.g., X solution seems to improve our chance of getting the direct translator by ~2%), and (d) have ARC identify certain factors that might push them away from working on worst-case ELK (e.g., if ARC believed AGI was arriving in 2 years and they still didn’t have a solution to worst-case ELK, what would they do?)

Clearer writing. One of the most common complaints about ELK is that it’s long and dense. This is understandable; ELK conveys a lot of complicated ideas from a pre-paradigmatic field, and in doing so it introduces several novel vocabulary words and frames. Nonetheless, I would feel more excited about a version of ELK that was able to communicate concepts more clearly and succinctly. Some specific ideas include offering more real-world examples to illustrate concepts, defining terms/frames more frequently, including a glossary, and providing more labels/captions for figures. 

Short anecdote 

I’ll wrap up my review with a short anecdote. When I first began working on ELK (in Jan 2022), I reached out to Tamera (a friend from Penn EA) and asked her to come to Berkeley so we could work on ELK together. She came, started engaging with the AIS community, and ended up moving to Berkeley to skill-up in technical AIS. She’s now a research resident at Anthropic who has been working on externalized reasoning oversight [LW · GW]. It’s unclear if or when Tamera would’ve had the opportunity to come to Berkeley, but my best-guess is that this was a major speed-up for Tamera. I’m not sure how many other cases there were of people getting involved or sped-up by ELK. But I think it’s a useful reminder that some of the impact of ELK (whether positive or negative) will be difficult to evaluate, especially given the number of people who engaged with ELK (I’d guess at least 100+, and quite plausibly 500+). 


Prize Philosophy

It's a fair amount of work to review things, and my aim has been to pay people roughly commensurate with the time/value they're putting in. My policy has been to give $50 prizes for reviews which give at least some details on the gears of why a person liked or didn't like a post, $100 for reviews that made a pretty good faith effort to evaluate a post or provide significant new information, then more (using my judgment) if the review seems to be some combination of higher-value and and higher effort. 

So far no one has written any particularly in-depth reviews during the Review Phase. In my ideal world, the top-scoring essays all receive a fairly comprehensive review that engages with both the gritty details of the post, and the high level "how does this fit into the surrounding landscape?". If you're interested in contributing high-effort review for the top-ranked posts so far, you can view the top-ranked posts here [? · GW] and see if any of them call out to you. I'd be happy to pay $300-$1000 for very comprehensive reviews.

In the past 1-2 years there have been some posts that were extensively reviewing other posts (i.e. Reviews of “Is power-seeking AI an existential risk?” [LW · GW], MIRI comments on Cotra's "Case for Aligning Narrowly Superhuman Models" [LW · GW], and Review of "Fun with +12 OOMs of Compute" [LW · GW], etc). I am leaning towards awarding those posts with retroactive prizes but am still thinking through it.

Prizes So Far

Here's the overall prize totals so far. You can see the complete list of review-prizes here [LW · GW], which includes some comments on what I found valuable.

Thanks to everyone has reviewed so far. Reminder that I'd like the top-scoring posts [? · GW] to get more in-depth reviews. 


Comments sorted by top scores.

comment by Raemon · 2023-01-27T22:13:37.412Z · LW(p) · GW(p)

We've just shipped the user @mentions feature, and it seemed good to use it here to make sure prize-winners know to update their payment information. (To receive a prize you'll need a PayPal account, and to enter your email and/or paypal address here [? · GW]) [edit: was initially wrong link, but fixed now]

Folks listed below, I'll be sending prize info shortly.

@A Ray [LW · GW] , @Akash [LW · GW] , @DirectedEvolution [LW · GW] , @1a3orn [LW · GW] @Daniel Kokotajlo [LW · GW] @nostalgebraist [LW · GW] @Alex_Altair [LW · GW] , @Leon Lang [LW · GW] @Rafael Harth [LW · GW] @adamShimi [LW · GW] @Valentine [LW · GW] @Yoav Ravid [LW · GW] @LoganStrohl [LW · GW] @Zack_M_Davis [LW · GW] @DragonGod [LW · GW] @Vanessa Kosoy [LW · GW] @Neel Nanda [LW · GW] @Steven Byrnes [LW · GW] @TurnTrout [LW · GW] @Vaniver [LW · GW] @Coafos [LW · GW] @Alex Flint [LW · GW] @Zach Stein-Perlman [LW · GW] @Adam Jermyn [LW · GW] @the gears to ascenscion [LW · GW] @Darmani [LW · GW] @Eli Tyre [LW · GW] @johnswentworth [LW · GW] @Double [LW · GW] @Srdjan Miletic [LW · GW] @Unnamed [LW · GW] @iceman [LW · GW

(Note: admins are allowed to send lots of @mentions, but generally our plan is to limit it to 3 for most users for most contexts. Please do not @ people in a spammy fashion. Users can disable @mention notifications if they're annoying. We'll be looking into various options to ensure it does't get out of hand or annoying. Let us know if there are any issues)

Update payment info at:

https://www.lesswrong.com/payments/account [? · GW

Replies from: DragonGod, DragonGod, BrienneYudkowsky
comment by DragonGod · 2023-01-28T00:29:19.673Z · LW(p) · GW(p)

Just to confirm can we still review posts in hopes of receiving prize money?

There is a post I wanted to review but never got around to and eventually gave up on, but I would be motivated to write the review that I never wrote if we can still win the prize money for reviews.

[I'm a poor student without a job, and $100 for a couple hours work is a pretty big deal for me.]

Replies from: Raemon
comment by Raemon · 2023-01-28T00:31:12.801Z · LW(p) · GW(p)

Yup, still appreciated.

comment by DragonGod · 2023-01-28T00:02:30.803Z · LW(p) · GW(p)

Added my paypal info.

I did not expect to receive a prize lol; I never set out to write a review. 😅

comment by LoganStrohl (BrienneYudkowsky) · 2023-01-27T23:29:32.015Z · LW(p) · GW(p)

Wait what? I didn't write any reviews though?

Replies from: Raemon
comment by Raemon · 2023-01-27T23:31:39.933Z · LW(p) · GW(p)

You did! (it was a self-review, but I still think those are valuable)

https://www.lesswrong.com/posts/9cbEPEuCa9E7uHMXT/catching-the-spark?commentId=7Np68dFByXquQH8Gf [LW(p) · GW(p)] 

Replies from: BrienneYudkowsky
comment by LoganStrohl (BrienneYudkowsky) · 2023-01-28T00:29:29.039Z · LW(p) · GW(p)

lol oooooh ok. i think i stored that in my memory as closer to "a keyboard mash" than "a review", but i see what you mean.

Replies from: Raemon
comment by Raemon · 2023-01-28T00:48:00.770Z · LW(p) · GW(p)

I found it was pretty valuable as keyboard mashes go.

comment by Daniel Kokotajlo (daniel-kokotajlo) · 2023-01-26T07:45:09.055Z · LW(p) · GW(p)

Warning: Long rant incoming, one you probably won't benefit from reading unless you are Raemon, and in fact I'm a bit embarrassed to have written it:

I admit I feel some dismay at seeing Nostalgebraist's review and especially Shimi/Collman/Gyrodiot's reviews appear on this list. I respect all of these people as thinkers and upvoted their reviews, IIRC, and also I am genuinely honored and flattered that they not only read my post but took the time to review it. I won't object if you pay them money for their reviews; I wish them well. In fact I'll feel guilty if this comment of mine gets in the way of their reward, and I hope that it doesn't.

But am having to do some serious soul-searching upon receiving the evidence that their reviews have stood the test of time and helped you understand my original post -- because I think they both miss the point of the original post. Now I'm wondering what I did wrong, how I could have been so unclear in the OP, that so many people misunderstood...

Quoting from the original post:

I describe a hypothetical scenario that concretizes the question “what could be built with 2020’s algorithms/ideas/etc. but a trillion times more compute?”  Then I give some answers to that question. Then I ask: How likely is it that some sort of TAI would happen in this scenario? This second question is a useful operationalization of the (IMO) most important, most-commonly-discussed timelines crux [? · GW]:  “Can we get TAI just by throwing more compute at the problem?” I consider this operationalization to be the main contribution of this post; it directly plugs into Ajeya’s timelines model and is quantitatively more cruxy than anything else I know of. The secondary contribution of this post is my set of answers to the first question: They serve as intuition pumps for my answer to the second, which strongly supports my views on timelines.

I literally said right at the front (admittedly behind spoiler screen) what the main and secondary points of the post were. And the subtitle said it too: "Big Timelines Crux Operationalized."

the Shimi/Collman/Gyrodiot review most seriously misunderstands the OP; see this quote from the review:

The relevance of this work appears to rely mostly on the hypothesis that the +12 OOMs of magnitude of compute and all relevant resources could plausibly be obtained in a short time frame. If not, then the arguments made by Daniel wouldn’t have the consequence of making people have shorter timelines.

The main point of the post was to focus the discussion on the big crux, not to argue for short timelines. The secondary point was an intuition pump for short timelines -- but it does NOT depend on it being at all plausible for us to achieve +12 OOMs in the real world anytime soon! I said very clearly that the +12 OOMs thing was a hypothetical, involving magic! I brought this up in the comments; see discussion [LW · GW]. You quote a passage that seems to be making the same mistake:

Another issue with this hypothesis is that it assumes, under the hood, exactly the kind of breakthrough that Daniel is trying so hard to remove from the software side. Our cursory look at Ajeya’s report (focused on the speed-up instead of the cost reduction) showed that almost all the hardware improvement forecasted came from breakthrough into currently not working (or not scalable) hardware. Even without mentioning the issue that none of these technologies look like they can provide anywhere near the improvement expected, there is still the fact that getting these orders of magnitude of compute requires many hardware breakthroughs, which contradicts Daniel’s stance on not needing new technology or ideas, just scaling.

To be fair to the authors, I didn't spell out as much as I could have why it doesn't matter if we ever achieve +12 OOMs in real life anytime soon. I mean I did spell it out, but I didn't spell it out in as much detail as I could have -- I relied on the readers being somewhat familiar with Ajeya's model I guess. In response to a conversation with Adam Shimi after the review went up, I wrote the "Master Argument" google doc which you may have seen by now. It explains Ajeya's model and then explains how having 80% by +12 gets you t much shorter timelines than just 50%. The key, I guess, is that if you move 30% of your mass from above 12 to below 12, unless you are crazy you will move a bunch of it to the 0-6 OOM range. You won't pile it all up in the 6-12 OOM range. In retrospect I should have said more about that in the OP.

Anyhow. On to Nostalgebraist's review:

...to be honest I'm not sure I understand it. The part of it where it's talking about what the main point of Fun With +12 OOMs is... well, maybe it's something interesting that I said, and maybe it's equivalent to the main point under some transformation, but it's certainly not how I think of the main point. I think the main point is "here's this big timelines crux we all should be debating: What is the probability that +12 OOMs would be enough?" and the secondary point is "Here are some intuition pumps that +12 OOMs would be enough." 

Part of Nostalgebraist's review was a critique of my secondary point. That part I agree with; there's a LOT more that needs to be said (and a lot more I could have said, believe me!) about why +12 OOMs is probably enough, than just the 5 intuition pumps I gave. There's a lot more I could do to make those 5 pumps pump harder, too. I hope someone one day finds the time to write all that stuff.

Side note: Zach Stein-Perlman's review of Fun with +12 OOMs is great, I think he understood the original post quite well. The others... again, I appreciate them, they said some interesting things and some useful things, but it annoys me that they don't seem to have understood the main point. And, as I said at the beginning, it makes me a bit defensive and soul-searchy. What did I do wrong? I thought I was being so clear, signposting everything, etc.!?! Yet multiple smart people I respect read it closely enough that they were motivated to review it, and came away with a different impression!

I think Nostalgebraist's review might not deserve this reaction from me, actually. Like I said, maybe what they think the main takeaway is, is also what I thought it was, just described differently. And anyhow it's possible that they understood perfectly what I thought the main takeaway was, and just disagreed with me about it -- maybe they think that the most interesting and novel contribution isn't what I thought it was! Fair enough. I may be making a mistake by dragging them into this. I probably shouldn't be wasting time writing this anyway. But their review of Ajeya's Bio Anchors report also rankled me in the same way, but more so -- I think it misunderstood the whole point of the report, and I feel more confident in this claim than in the claims I made above.

Replies from: Raemon
comment by Raemon · 2023-01-26T18:16:25.524Z · LW(p) · GW(p)

Thanks for sharing. I definitely appreciate it all as user-feedback.

I think I have some high-level thoughts that don't depend much on the details of this particular post and these particular reviews, and then some object-level thoughts.

At a high level:

By default, serious in-depth reviews are a lot of work, and AFAICT fairly unrewarding. A lot of what I was trying to do with this post and prizes is correct an ecosystem incentives-issue where people aren't rewarded for doing a sort of "intellectual grunt work" that's important but underappreciated. (Part of what I appreciated about Shimi-et-al was them initiating a process for peer review in general, not just for this one particular post)

In general my posting a review here means I got something out of it, but not that I endorse everything in it. I'm also doing all this with a bit of limited time and trying to cover a lot of breadth, so I'm not too surprised if there are significant criticisms to be made of some reviews. 

I also think, well, if the system is working, reviewers should sometimes say things the authors don't like, and that's okay. I wouldn't argue the current system is that great (including this post and prizes, and my current approach to aggregating them). But I don't currently think anything necessarily went wrong here.

But, being misunderstood sucks, and I do empathize/sympathize. I've appreciated your work on the review this year and I definitely appreciate +12 OOMs as a post. (I noticed +12 OOMs getting a disproportionate amount of review attention, and in the culture-I-hope-for this feels like a compliment, even if parts of the process are frustrating)

Some object-level thoughts:

I agree that Shimi-et-al's argument about "The relevance of this work appears to rely mostly on the hypothesis that the +12 OOMs of magnitude of compute and all relevant resources could plausibly be obtained in a short time frame" isn't a fair characterization of what you wrote. (In an ideal world I'd have read more of the back-and-forth-between you and Shimi on their review, and incorporated that into my commentary here)

I think I mostly appreciated their review for digging into the details of the examples in the second half.

I had stated that Zach Perlman's review made a similar point to Nostalgebraist's. Looking back, I'm not sure whether I stand by that. I don't think I'd have derived Nostalgebraist's point of "The impetus to ask "what does future compute enable?" rather than "how much compute might TAI require?" influenced my own view of Bio Anchors" from Zach's if that's all I had to go on. 

I said, reading Nostalgebraist's review "I feel like I understand the point for the first time." I did notice that he didn't frame it the same way you did, and I'm not sure whether I endorse my phrasing. Maybe Nostalebraist's interpretation is more of it's own thing than a point you made. But, I did feel like it added another layer to your post, and somehow made things feel more crisp to me as a useful meta-level-insight than Zach's (or your) summary.

I may have more thoughts, but wanted to post this for now.

Replies from: daniel-kokotajlo
comment by Daniel Kokotajlo (daniel-kokotajlo) · 2023-01-26T19:44:04.926Z · LW(p) · GW(p)

Just chiming in to say huge +1 to the idea of rewarding people for doing reviews, it's an awesome and very pro-social thing to do and I'm honored that so many people chose my post to review. I endorse rewarding Shimi et al, and Nostalgebraist, in particular.

Also: I happen to be having a related conversation [LW(p) · GW(p)] that also gives some context on how I conceived of the OP at least & what I hoped to accomplish with it.

comment by Raemon · 2023-01-27T23:35:04.190Z · LW(p) · GW(p)

I had written this previously in The Review Phase comment threads [LW · GW], but to make it easier for people to see the individual reviews I gave prizes to:

This comment is incomplete, and I will likely edit some of the prizes slightly. But, I'd fallen behind on awarding prizes for reviews and I want to highlight Yes, the Lightcone team will give you money for reviewing stuff. So, I wanted to ship this rough version for now to give a sense of what sort of reviews I found valuable.

Thanks to the large number of people who've stepped up to review so far. (I'd still be excited for EffortReview that explore the details of some of the higher ranked posts)

Next round of prize announcements:

Honorable Mentions:

These reviews didn't go into much detail, but added at least a couple new arguments or frames that I found useful, and each get $50

Replies from: Raemon
comment by Raemon · 2023-02-07T23:01:49.421Z · LW(p) · GW(p)

More more round of prize updates:

I'm hopefully sending the prizes to our accountants today, and they should be resolved within a week or so

Replies from: Raemon
comment by Raemon · 2023-02-08T20:10:42.998Z · LW(p) · GW(p)

Oh, also meant to give a $200 prize to @Akash [LW · GW] for his review of PR is Corrosive, Reputation is not [LW · GW] (along with the full post he wrote, inspired by it. I'm thinking of this as roughly $100 for the review itself and another $100 for the post)