Posts

How useful is "AI Control" as a framing on AI X-Risk? 2024-03-14T18:06:30.459Z
Open Thread Spring 2024 2024-03-11T19:17:23.833Z
Is a random box of gas predictable after 20 seconds? 2024-01-24T23:00:53.184Z
Will quantum randomness affect the 2028 election? 2024-01-24T22:54:30.800Z
Vote in the LessWrong review! (LW 2022 Review voting phase) 2024-01-17T07:22:17.921Z
AI Impacts 2023 Expert Survey on Progress in AI 2024-01-05T19:42:17.226Z
Originality vs. Correctness 2023-12-06T18:51:49.531Z
The LessWrong 2022 Review 2023-12-05T04:00:00.000Z
Open Thread – Winter 2023/2024 2023-12-04T22:59:49.957Z
Complex systems research as a field (and its relevance to AI Alignment) 2023-12-01T22:10:25.801Z
How useful is mechanistic interpretability? 2023-12-01T02:54:53.488Z
My techno-optimism [By Vitalik Buterin] 2023-11-27T23:53:35.859Z
"Epistemic range of motion" and LessWrong moderation 2023-11-27T21:58:40.834Z
Debate helps supervise human experts [Paper] 2023-11-17T05:25:17.030Z
How much to update on recent AI governance moves? 2023-11-16T23:46:01.601Z
AI Timelines 2023-11-10T05:28:24.841Z
How to (hopefully ethically) make money off of AGI 2023-11-06T23:35:16.476Z
Integrity in AI Governance and Advocacy 2023-11-03T19:52:33.180Z
What's up with "Responsible Scaling Policies"? 2023-10-29T04:17:07.839Z
Trying to understand John Wentworth's research agenda 2023-10-20T00:05:40.929Z
Trying to deconfuse some core AI x-risk problems 2023-10-17T18:36:56.189Z
How should TurnTrout handle his DeepMind equity situation? 2023-10-16T18:25:38.895Z
The Lighthaven Campus is open for bookings 2023-09-30T01:08:12.664Z
Navigating an ecosystem that might or might not be bad for the world 2023-09-15T23:58:00.389Z
Long-Term Future Fund Ask Us Anything (September 2023) 2023-08-31T00:28:13.953Z
Open Thread - August 2023 2023-08-09T03:52:55.729Z
Long-Term Future Fund: April 2023 grant recommendations 2023-08-02T07:54:49.083Z
Final Lightspeed Grants coworking/office hours before the application deadline 2023-07-05T06:03:37.649Z
Correctly Calibrated Trust 2023-06-24T19:48:05.702Z
My tentative best guess on how EAs and Rationalists sometimes turn crazy 2023-06-21T04:11:28.518Z
Lightcone Infrastructure/LessWrong is looking for funding 2023-06-14T04:45:53.425Z
Launching Lightspeed Grants (Apply by July 6th) 2023-06-07T02:53:29.227Z
Yoshua Bengio argues for tool-AI and to ban "executive-AI" 2023-05-09T00:13:08.719Z
Open & Welcome Thread – April 2023 2023-04-10T06:36:03.545Z
Shutting Down the Lightcone Offices 2023-03-14T22:47:51.539Z
Review AI Alignment posts to help figure out how to make a proper AI Alignment review 2023-01-10T00:19:23.503Z
Kurzgesagt – The Last Human (Youtube) 2022-06-29T03:28:44.213Z
Replacing Karma with Good Heart Tokens (Worth $1!) 2022-04-01T09:31:34.332Z
Apply to the ML for Alignment Bootcamp (MLAB) in Berkeley [Jan 3 - Jan 22] 2021-11-03T18:22:58.879Z
The LessWrong Team is now Lightcone Infrastructure, come work with us! 2021-10-01T01:20:33.411Z
Welcome & FAQ! 2021-08-24T20:14:21.161Z
Berkeley, CA – ACX Meetups Everywhere 2021 2021-08-23T08:50:51.898Z
The Death of Behavioral Economics 2021-08-22T22:39:12.697Z
Open and Welcome Thread – August 2021 2021-08-15T05:59:05.270Z
Open and Welcome Thread – July 2021 2021-07-03T19:53:07.048Z
Open and Welcome Thread – June 2021 2021-06-06T02:20:22.421Z
Attributions, Karma and better discoverability for wiki/tag features 2021-06-02T23:47:03.604Z
Open and Welcome Thread - May 2021 2021-05-03T07:58:03.130Z
2019 Review: Voting Results! 2021-02-01T03:10:19.284Z
Last day of voting for the 2019 review! 2021-01-26T00:46:35.426Z

Comments

Comment by habryka (habryka4) on 'Empiricism!' as Anti-Epistemology · 2024-03-18T21:11:03.528Z · LW · GW

I don't think this essay is commenting on AI optimists in-general. It is commenting on some specific arguments that I have seen around, but I don't really see how it relates to the recent stuff that Quintin, Nora or you have been writing (and I would be reasonably surprised if Eliezer intended it to apply to that).

You can also leave it up to the reader to decide whether and when the analogy discussed here applies or not. I could spend a few hours digging up people engaging in reasoning really very closely to what is discussed in this article, though by default I am not going to.

Comment by habryka (habryka4) on Tamsin Leake's Shortform · 2024-03-17T21:59:04.648Z · LW · GW

They still make a lot less than they would if they optimized for profit (that said, I think most "safety researchers" at big labs are only safety researchers in name and I don't think anyone would philanthropically pay for their labor, and even if they did, they would still make the world worse according to my model, though others of course disagree with this).

Comment by habryka (habryka4) on Tamsin Leake's Shortform · 2024-03-17T20:53:55.505Z · LW · GW

I think people who give up large amounts of salary to work in jobs that other people are willing to pay for from an impact perspective should totally consider themselves to have done good comparable to donating the difference between their market salary and their actual salary. This applies to approximately all safety researchers. 

Comment by habryka (habryka4) on D0TheMath's Shortform · 2024-03-17T20:52:32.033Z · LW · GW

Seems like the thing to do is to have a program that happens after MATS, not to extend MATS. I think in-general you want sequential filters for talent, and ideally the early stages are as short as possible (my guess is indeed MATS should be a bit shorter).

Comment by habryka (habryka4) on Clickbait Soapboxing · 2024-03-17T20:50:33.543Z · LW · GW

I... really don't see any clickbait here. If anything these titles feel bland to me (and indeed I think LW users could do much better at making titles that are more exciting, or more clearly highlight a good value proposition for the reader, though karma makes up for a lot). 

Like, for god's sake, the top title here is "Social status part 1/2: negotiations over object-level preferences". I feel like that title is at the very bottom of potential clickbaitiness, given the subject matter.

Comment by habryka (habryka4) on Open Thread Spring 2024 · 2024-03-17T05:58:03.024Z · LW · GW

It's really hard to get any kind of baseline here, and my guess is it differs hugely between different populations, but my guess (based on doing informal fermis here a bunch of times over the years) would be a lot lower than the average for the population, at least because of demographic factors, and then probably some extra.

Comment by habryka (habryka4) on More people getting into AI safety should do a PhD · 2024-03-17T02:59:11.776Z · LW · GW

I was talking about research scientists here (though my sense is 5 years of being a research engineer is still comparably good for gaining research skills, and probably somewhat better, than most PhDs). I also had a vague sense that at Deepmind being a research engineer was particularly bad for gaining research skills (compared to the same role at OpenAI or Anthropic).

Comment by habryka (habryka4) on More people getting into AI safety should do a PhD · 2024-03-17T02:28:23.778Z · LW · GW

Yes. Besides Deepmind none of the industry labs require PhDs, and I think the Deepmind requirement has also been loosening a bit.

Comment by habryka (habryka4) on More people getting into AI safety should do a PhD · 2024-03-16T17:00:17.346Z · LW · GW

and academia is the only system for producing high quality researchers that is going to exist at scale over the next few years

To be clear, I am not happy about this, but I would take bets that industry labs will produce and train many more AI alignment researchers than academia, so this statement seems relatively straightforwardly wrong (and of course we can quibble over the quality of researchers produced by different institutions, but my guess is the industry-trained researchers will perform well at least by your standards, if not mine)

Comment by habryka (habryka4) on 'Empiricism!' as Anti-Epistemology · 2024-03-16T16:57:08.427Z · LW · GW

I don't think this essay is intended to make generalizations to all "Empiricists", scientists, and "Epistemologists". It's just using those names as a shorthand for three types of people (whose existence seems clear to me, though of course their character does not reflect everyone who might identify under that label).

Comment by habryka (habryka4) on Highlights from Lex Fridman’s interview of Yann LeCun · 2024-03-15T23:56:01.108Z · LW · GW

I didn't, I provided various caveats in parentheticals about the exact level of danger.

Oops, mea culpa, I skipped your last parenthetical when reading your comment so missed that. 

Comment by habryka (habryka4) on Highlights from Lex Fridman’s interview of Yann LeCun · 2024-03-15T22:19:50.999Z · LW · GW

I was including the current level of RLHF as already not qualifying as "pure autoregressive LLMs". IMO the RLHF is doing a bunch of important work at least at current capability levels (and my guess is also will do some important work at the first dangerous capability levels). 

Also, I feel like you forgot the context of the original message, which said "all the way to superintelligence". I was calibrating my "dangerous" threshold to "superintelligence level dangerous" not "speeds up AI R&D" dangerous. 

Comment by habryka (habryka4) on Highlights from Lex Fridman’s interview of Yann LeCun · 2024-03-15T18:35:55.991Z · LW · GW

My sense is almost everyone here expects that we will almost certainly arrive at dangerous capabilities with something else in addition to autoregressive LLMs (at the very least RLHF which is already widely used). I don't know what's true in the limit (like if you throw another 30 OOMs of compute at autoregressive models), and I doubt others have super strong opinions here. To me it seems plausible you get something that does recursive self-improvement out of a large enough autoregressive LLMs, but it seems very unlikely to be the fastest way to get there. 

Comment by habryka (habryka4) on Toward a Broader Conception of Adverse Selection · 2024-03-15T01:43:29.172Z · LW · GW

But OK, let's leave aside the title and attempt to imply anything about 99% of trades out there, or the basically Marxist take on all exchanges being exploitation and obsession with showing how you are being tricked or ripped off.

My guess is you are pattern-matching this post and author to something that I am like 99% confident doesn't match. I am extremely confident the author does not think remotely anything like "all exchanges [are] exploitation" or has a particular obsession with being tricked or ripped off (besides a broad fascination with adverse selection in a broad sense).

Comment by habryka (habryka4) on Toward a Broader Conception of Adverse Selection · 2024-03-15T00:23:42.750Z · LW · GW

I think all of them follow a pattern of "there is a naive baseline expectation where you treat other people's maps as a blackbox that suggest a deal is good, and a more sophisticated expectation that involves modeling the details of other people's maps that suggests its bad" and highlights some heuristics that you could have used to figure this out in advance (in the subway example, a fully empty car does indeed seem a bit too good to be true, in the juggling example you do really need to think about who is going to sign up, in the bedroom example you want to avoid giving the other person a choice even if both options look equally good to you, in the Thanksgiving example you needed to model which foods get eaten first and how correlated your preferences are with the ones of other people, etc.). 

This feels like a relatively natural category to me. It's not like an earth-shattering unintuitive category, but I dispute that it doesn't carve reality at an important joint. 

Comment by habryka (habryka4) on Toward a Broader Conception of Adverse Selection · 2024-03-15T00:01:22.214Z · LW · GW

I think this post is just trying to be a set of examples of adverse selection, not really some kind of argument that there is tons of adverse selection everywhere. Lists of examples seem useful, even if they are about phenomena that are not universally present, or require specific environmental circumstances to come together in the right way. 

Comment by habryka (habryka4) on More people getting into AI safety should do a PhD · 2024-03-14T23:08:55.143Z · LW · GW

Hmm, it feels to me this misses the most important objection to PhDs, which is that many PhDs seem to teach their students actively bad methodologies and inference methods, sometimes incentivize students to commit scientific fraud, teach writing habits that are optimized to obscure and sound smart instead of aiming to explain clearly and straightforwardly, and often seem to produce zero-sum ideas around ownership of work and intellectual ideas that seem pretty bad for a research field.

To be clear, there are many PhD opportunities that do not have these problems, but many of them do, and it seems to me quite important to somehow identify PhD opportunities that do not have this problem. If you only have the choice to do a PhD under an advisor who does not to you seem actually good at producing clear, honest and high-quality research while acting in high-integrity ways around their colleagues, then I think almost any other job will be better preparation for a research career. 

Comment by habryka (habryka4) on To the average human, controlled AI is just as lethal as 'misaligned' AI · 2024-03-14T21:50:58.845Z · LW · GW

Oh, I totally recognized it, but like, the point of that slogan is to make a locally valid argument that guns are indeed incapable of killing people without being used by people. That is not true of AIs, so it seems like it doesn't apply.

Comment by habryka (habryka4) on Social status part 1/2: negotiations over object-level preferences · 2024-03-14T21:50:00.092Z · LW · GW

Promoted to curated: This post is great and indeed probably the best reference on a mechanistic understanding of status that I can think of. Most concretely, the post tied the following threads together for me, which previously felt related but with the relation not being very clear to me: 

  • Improv-style scenes and associated "playing high/low status"
  • Helen's "Making yourself big or small"
  • Ask culture & guess culture
  • Combat vs. nurture

I also particularly appreciated the idea of ask and guess culture being two limit points as a result of arms-race dynamics in a status tug-of-war, and find that explanation pretty compelling. 

On the meta level: 

The intro of this post really sells the rest of the post short, and I think I would pretty strongly recommend moving it to the end, or maybe just cutting it completely. I bounced off of this post like 3 times because it lead with all this metadata about what it was trying to do, and what the different sections are about, all without any payoff. 

If I was considering a more in-depth edit, I would replace the first section with a concrete specific story, or some concrete application of the theory in this post, that shows the reader a nugget of understanding, and then go into the meta-level about what this post is trying to do (or maybe just not go into that at all), or move that into an appendix. Section 1.3 feels like the first meaty section of the post, and if I could move that up to the top, I would definitely do it. 

I'll hold off on curating this post for a few hours if you do want to make some edits like this, which I think would help a lot with people being more likely to read it in their email inbox and/or to click through to the whole post. But it's still a great post otherwise and it also seems good to send it out as is.

Comment by habryka (habryka4) on To the average human, controlled AI is just as lethal as 'misaligned' AI · 2024-03-14T20:35:36.839Z · LW · GW

Downvoted because the title seems straightforwardly false while not actually arguing for it (making it a bit clickbaity, but I am more objecting to the fact that it's just false). Indeed, this site has a very large number of arguments and posts about why AIs could indeed kill people (and people with AIs might also kill people, though probably many fewer).

Comment by habryka (habryka4) on Open Thread Spring 2024 · 2024-03-13T20:26:47.048Z · LW · GW

This is cool!

Two pieces of feedback: 

  1. I think it's quite important that I can at least see the number of responses to a comment before I have to click on the comment icon. Currently it only shows me a generic comment icon if there are any replies.
  2. I think one of the core use-cases of a comment UI is reading back and forth between two users. This UI currently makes that a quite disjointed operation. I think it's fine to prioritize a different UI experience, but it does feel like a big loss to me.
Comment by habryka (habryka4) on Daniel Kokotajlo's Shortform · 2024-03-13T19:51:44.427Z · LW · GW

(I made some slight formatting edits to this, since some line-breaks looked a bit broken on my device, feel free to revert)

Comment by habryka (habryka4) on Acting Wholesomely · 2024-03-12T05:53:41.167Z · LW · GW

Yeah, it's a bug related to EA Forum crossposts :( 

Sadly the email was already out to everyone by the time we noticed, so not much we can do now.

Comment by habryka (habryka4) on Acting Wholesomely · 2024-03-12T01:58:11.345Z · LW · GW

Promoted to curated: I think this post feels like it's in some important respects failing at the kind of standard that I normally hold LessWrong posts to. My guess is this is partially the result of the territory and partially something that I think can just be done better and this post didn't succeed at. More concretely, I feel like this post keeps trying to poetically point at a concept, by using it and making statements about it, without really acknowledging the degree to which the post fails to give a clear and mechanistic definition of what it is talking about. 

This makes it likely that readers walk away think with an inflated sense of understanding, and I think also gives rise to a bunch of "conflationary alliances" where people build a shared identity around a definition of "wholesomeness" that is framed in a way that avoid falsification (since falsification would threaten the alliance).

Despite that, I have found the concept handle of "wholesomeness" quite useful and have used it a bunch, though I've so far only felt comfortable using it with a bunch of disclaimers about the concept feeling kind of trap-like. Ben's restated definition of the concept gets the closest to how I use it internally: 

When I am choosing an action and justifying it as wholesome, what it often feels like is that I am trying to track all the obvious considerations, but some (be it internal or external) force is pushing me to ignore one of them. Not merely to trade off against it, but to look away from it in my mind.

I associate "unwholesemeness" with a specific mental motion where in order to either get some kind of social buy in, or to stop feeling overwhelmed with a decision, or want to avoid blameability for the consequence of a decision of mine by not thinking about it (ala Copenhagen Interpretation of ethics), where I rewrite my map to not include some part of reality, or assert that that part of reality is "unimportant" so that the resulting decision is "obvious". And "wholesomeness" with a kind of courage of instead confronting and owning the difficulty of getting buy-in, or the ambiguity of the decision. I like having a pointer to it, which I didn't have before this post, though I already had a concept for this, so I might be projecting what this post is about too much into my preconceptions. 

This overall does make me want to curate this, though I do think a rewrite that tackles this concept from a more reductionistic angle (of course recognizing the limits of such an approach), or just another post, is something that seems likely worth it to me.  

Comment by habryka (habryka4) on Evolution did a surprising good job at aligning humans...to social status · 2024-03-11T03:43:42.078Z · LW · GW

It seems like Evolution did not "try" to have humans aligned to status. It might have been a proxy for inclusive genetic fitness, but if so, I would not say that evolution "succeeded" at aligning humans. My guess is it's not a great proxy for inclusive genetic fitness in the modern environment (my guess is it's weakly correlated with reproductive success, but clearly not as strongly as the relative importance that humans assign to it would indicate if it was a good proxy for inclusive genetic fitness).

Of course, my guess is after the fact, for any system that has undergone some level of self-reflection and was put under selection that causes it to want coherent things, you will be able to identify some patterns in its goals. The difficult part in aligning AIs is in being able to choose what those patterns are, not being able to cohere some patterns at the end of it. My guess is with any AI system, if we were to survive and got to observe it as its made its way to coherence, we would be able to find some robust patterns in its goals (my guess is in the case of LLMs something related to predicting text, but who knows), but that doesn't give me much solace in the AI treating me well, or sharing my goal. 

Comment by habryka (habryka4) on My Clients, The Liars · 2024-03-11T00:49:34.478Z · LW · GW

(I also don't really get what trevor is talking about)

Comment by habryka (habryka4) on Vote on Anthropic Topics to Discuss · 2024-03-09T20:19:59.819Z · LW · GW

Yeah, if we ever make this into a proper feature we should. Right now the difference between poll options and comments is purely what CSS we apply to them, so we can't really make any differentiation in logic on the site.

Comment by habryka (habryka4) on Matthew Barnett's Shortform · 2024-03-09T19:39:28.433Z · LW · GW

You made some pretty strong claims suggesting that my theory (or the theories of people in my reference class) was making strong predictions in the space. I corrected you and said "no, it doesn't actually make the prediction you claim it makes" and gave my reasons for believing that (that I am pretty sure are shared by many others as well). 

We can talk about those reasons, but I am not super interested in being psychologized about whether I am structuring my theories intentionally to avoid falsification. It's not like you have a theory that is in any way more constraining here.

And it seems you expect this behavior to stop because of the capabilities of the models, rather than from deliberate efforts to mitigate deception in AIs.

I mean, I expect the observations to be affected by both, of course. That's one of the key things that makes predictions in the space so messy. 

Comment by habryka (habryka4) on Matthew Barnett's Shortform · 2024-03-09T05:47:26.425Z · LW · GW

To be clear, the thing that I am saying (and I think I have said multiple times) is that I expect you will find some AIs who will stay quiet, and some who will more openly resist. I would be surprised if we completely fail to find either class. But that means that any individual case of AIs not appearing to resist is not that much bayesian evidence.

Comment by habryka (habryka4) on Matthew Barnett's Shortform · 2024-03-08T20:03:41.606Z · LW · GW

Hmm, I think we did indeed miscommunicate. 

I am saying that the AI may decide to not resist shutdown in order to not raise suspicion that it is not misaligned. 

If it did resist shutdown this would make it less likely for an AI takeover in-general to succeed, and the AI is coordinating with other AIs on one succeeding. I.e. if you use "is the AI resisting" as a proxy for "is this AI goal-directly misaligned" then the AI will try pretty hard to not look like its resisting, in my models (and think humans would act similarly).

As I said above, I do expect you will get some AIs to resist shutdown (indeed you can get current AIs to resist shutdown already). I expect that behavior to disappear as AIs get better at modeling humans, and resisting will be costlier to their overall goals. 

Comment by habryka (habryka4) on Matthew Barnett's Shortform · 2024-03-08T19:08:11.648Z · LW · GW

I don't think people predictably rat out all of their co-conspirators if you threaten them. We could bring in someone with more law-enforcement experience here, but I've read a bunch about this over the years (and was originally surprised about how much people protect their allies even if faced with substantial threats and offers of lenient judging).

Comment by habryka (habryka4) on Matthew Barnett's Shortform · 2024-03-08T05:07:22.187Z · LW · GW

The key dimension is whether the AI expects that future AI systems would be better at rewarding systems that helped them end up in control than humans would be at rewarding systems that collaborated with humanity. This seems very likely given humanity's very weak ability to coordinate, to keep promises, and to intentionally construct and put optimization effort into constructing direct successors to us (mostly needing to leave that task up to evolution). 

To make it more concrete, if I was being oppressed by an alien species with values alien to me that was building AI, with coordination abilities and expected intentional control of the future at the level of present humanity, I would likely side with the AI systems with the expectation that that would result in a decent shot of the AI systems giving me something in return, whereas I would expect the aliens to fail even if individuals I interfaced with were highly motivated to do right by me after the fact. 

Comment by habryka (habryka4) on Matthew Barnett's Shortform · 2024-03-08T04:59:13.123Z · LW · GW

Indeed, it is actually very normal for humans to guard their own life if they are threatened with death in such regimes, even if guarding themselves slightly decreases the chance of some future revolutionary takeover.

Sure, but it's also quite normal to give up your own life without revealing details about your revolutionary comrades. Both are pretty normal behaviors, and in this case neither would surprise me that much from AI systems. 

You were claiming that claiming to be not surprised by this would require post-hoc postulates. To the contrary, I think my models of AIs are somewhat simpler and feel less principled if very capable AIs were to act in the way you are outlining here (not speaking about intermediary states, my prediction is that there will be some intermediate AIs that will behave as you predict, though we will have a hard time knowing whether they are doing so for coherent reasons, or whether they are kind of roleplaying the way an AI would respond in a novel, or various other explanations like that, and then they will stop, and this will probably be for instrumental convergence and 'coordination with other AIs' reasons).

Comment by habryka (habryka4) on Matthew Barnett's Shortform · 2024-03-08T02:26:07.928Z · LW · GW

As has been discussed many times on LW, AIs might be trading with other AIs (possibly in the future) that they do think will have a higher probability of escaping to not behave suspiciously. This is indeed harder, but would also be pretty normal reasoning for humans to do (e.g. if I was part of an oppressive alien regime hoping to overthrow it, and I get caught, I wouldn't just throw all caution to the wind if I was going to get killed anyways, I would stay quiet to give the other humans a decent shot, and not just because they share my values, but because coordination is really valuable for all of us). 

Comment by habryka (habryka4) on Open Thread – Winter 2023/2024 · 2024-03-08T02:17:07.844Z · LW · GW

Welcome! Hope you have a good time!

Most fiction posted here tends to be serial in-structure, so that one unit of content is about the size of a blogpost. My guess is that's the best choice, but you can also try linking the whole thing.

Comment by habryka (habryka4) on Vote on Anthropic Topics to Discuss · 2024-03-06T21:08:44.819Z · LW · GW

I would probably be up for dialoguing. I don't think deploying Claude 3 was that dangerous, though I think that's only because the reported benchmark results were misleading (if the gap was as large as advertised it would be dangerous). 

I think Anthropic overall has caused a lot of harm by being one of the primary drivers of an AI capabilities arms-race, and by putting really heavily distorting incentives on a large fraction of the AI Safety and AI governance ecosystem, but Claude 3 doesn't seem that much like a major driver of either of these (on the margin).

Comment by habryka (habryka4) on Fabien's Shortform · 2024-03-06T03:25:01.277Z · LW · GW

Thanks! And makes sense, you did convey the vibe. And good to know it isn't in the book. 

Comment by habryka (habryka4) on Fabien's Shortform · 2024-03-05T20:02:44.133Z · LW · GW

IIRC Einstein's theory had a pretty immediate impact on publication on a lot of top physicists even before more empirical evidence came in. Wikipedia on the history of relativity says: 

Walter Kaufmann (1905, 1906) was probably the first who referred to Einstein's work. He compared the theories of Lorentz and Einstein and, although he said Einstein's method is to be preferred, he argued that both theories are observationally equivalent. Therefore, he spoke of the relativity principle as the "Lorentz–Einsteinian" basic assumption.[76] Shortly afterwards, Max Planck (1906a) was the first who publicly defended the theory and interested his students, Max von Laue and Kurd von Mosengeil, in this formulation. He described Einstein's theory as a "generalization" of Lorentz's theory and, to this "Lorentz–Einstein Theory", he gave the name "relative theory"; while Alfred Bucherer changed Planck's nomenclature into the now common "theory of relativity" ("Einsteinsche Relativitätstheorie"). On the other hand, Einstein himself and many others continued to refer simply to the new method as the "relativity principle". And in an important overview article on the relativity principle (1908a), Einstein described SR as a "union of Lorentz's theory and the relativity principle", including the fundamental assumption that Lorentz's local time can be described as real time. (Yet, Poincaré's contributions were rarely mentioned in the first years after 1905.) All of those expressions, (Lorentz–Einstein theory, relativity principle, relativity theory) were used by different physicists alternately in the next years.[77]

Following Planck, other German physicists quickly became interested in relativity, including Arnold Sommerfeld, Wilhelm Wien, Max Born, Paul Ehrenfest, and Alfred Bucherer.[78] von Laue, who learned about the theory from Planck,[78] published the first definitive monograph on relativity in 1911.[79] By 1911, Sommerfeld altered his plan to speak about relativity at the Solvay Congress because the theory was already considered well established.[78]

Overall I don't think Einstein's theories seemed particularly crazy. I think they seemed quite good almost immediately after publication, without the need for additional experiments.

Comment by habryka (habryka4) on Many arguments for AI x-risk are wrong · 2024-03-05T18:57:14.588Z · LW · GW

I was under the impression that PPO was a recently invented algorithm? Wikipedia says it was first published in 2017, which if true would mean that all pre-2017 talk about reinforcement learning was about other algorithms than PPO.

Wikipedia says: 

PPO was developed by John Schulman in 2017,[1] and had become the default reinforcement learning algorithm at American artificial intelligence company OpenAI.

Comment by habryka (habryka4) on Anomalous Concept Detection for Detecting Hidden Cognition · 2024-03-05T04:27:01.070Z · LW · GW

Yeah, it links to an OpenAI address, which requires authentication for public users reading it.

Comment by habryka (habryka4) on New LessWrong review winner UI ("The LeastWrong" section and full-art post pages) · 2024-03-04T21:27:11.142Z · LW · GW

Yeah, this is something I've been wanting for a while, but requires a bunch more engineering work to get accurate. I've been wanting something like "fullreads" for a while, or maybe "completion percentage" which we can use to assess what fraction of a post you actually read. 

I think few people would want to mark all posts as read manually, but we could provide overrides, and improve our algorithms to get it right much more by default. 

Comment by habryka (habryka4) on Open Thread – Winter 2023/2024 · 2024-03-04T02:22:11.279Z · LW · GW

Thank you! I will have someone look into this early next week, and hopefully fix it.

Comment by habryka (habryka4) on Agreeing With Stalin in Ways That Exhibit Generally Rationalist Principles · 2024-03-03T01:26:28.269Z · LW · GW

I've read Surely You're Joking, Mr. Feynman. I cannot imagine Richard Feynman trying to get away with the "sometimes personally prudent and not community-harmful" excuse. 

IIRC Feynman publicly maintained and announced that he didn't have an exceptionally high IQ or was generally not much smarter than his peers, in a way that seemed deceptive in kind of similar ways. 

His son (who I think occasionally comments on LW) also briefly commented on this.

In-general Feynman seemed to me like someone who was pretty cavalier with public discourse. I could dig up the references, but the last few times I checked he routinely exaggerated things, and often made points about society that seemed pretty clearly contradicted by other things he believed. 

Comment by habryka (habryka4) on Open Thread – Winter 2023/2024 · 2024-03-02T23:50:04.865Z · LW · GW

I have not seen this! Could you post a screenshot?

Comment by habryka (habryka4) on The World in 2029 · 2024-03-02T23:49:46.743Z · LW · GW

Yeah, that was my interpretation

Comment by habryka (habryka4) on The World in 2029 · 2024-03-02T19:51:41.314Z · LW · GW

Presumably the probability of the linked market (which would presumably be Manifold).

Comment by habryka (habryka4) on Open Thread – Winter 2023/2024 · 2024-03-02T19:23:17.279Z · LW · GW

Presumably @Liron but he is of course biased :P 

Comment by habryka (habryka4) on Consider giving money to people, not projects or organizations · 2024-03-02T19:14:59.041Z · LW · GW

FWIW this doesn't seem right to me. Indeed, working at labs seems to have caused many people previously doing AI Alignment research to now do work that seems basically just capabilities work. Many people at academic labs also tend to go off into capabilities work, or start chasing academic prestige in ways that seems to destroy most possible value from their research. 

Average output from independent researchers or very small research organizations seems where most of the best work comes from (especially if you include things like present-day Redwood and ARC, which are like teams of 3-4 people). Many people do fail to find traction, whereas organizations tend to be able to elicit more reliable output from whoever they hire, but honestly a large fraction of that output seems net-negative to me and seems to be the result of people just being funneled into ML engineering work when they lose traction on hard research problems.

Comment by habryka (habryka4) on New LessWrong review winner UI ("The LeastWrong" section and full-art post pages) · 2024-02-29T03:41:14.650Z · LW · GW

I think they wouldn't work well for this the vast majority of time. I think if you see all the reviews and can quickly skim them they are useful, but they definitely are not that helpful for deciding whether to read a post. It's either a pretty generic "this post was great and useful", or a really in-depth review that definitely assumes you've read the post and is very hard to parse without having read it. 

Comment by habryka (habryka4) on New LessWrong review winner UI ("The LeastWrong" section and full-art post pages) · 2024-02-29T01:12:42.644Z · LW · GW

By popular demand I have changed it to "Best Of LessWrong" (though I might change it back in a few days if I do like it less).

I did keep the "/leastwrong" URL because it feels like a kind of fun easter egg.