
AI #74: GPT-4o Mini Me and Llama 3 2024-07-25T13:50:06.528Z
Llama Llama-3-405B? 2024-07-24T19:40:07.565Z
Monthly Roundup #20: July 2024 2024-07-23T12:50:07.991Z
On the CrowdStrike Incident 2024-07-22T12:40:05.894Z
AI #73: Openly Evil AI 2024-07-18T14:40:05.770Z
Housing Roundup #9: Restricting Supply 2024-07-17T12:50:05.321Z
AI #72: Denying the Future 2024-07-11T15:00:05.865Z
Medical Roundup #3 2024-07-09T13:10:06.862Z
AI #71: Farewell to Chevron 2024-07-04T13:40:05.905Z
Economics Roundup #2 2024-07-02T12:40:05.908Z
AI #70: A Beautiful Sonnet 2024-06-27T14:40:08.087Z
Childhood and Education Roundup #6: College Edition 2024-06-26T11:40:03.990Z
Monthly Roundup #19: June 2024 2024-06-25T12:00:03.333Z
On Claude 3.5 Sonnet 2024-06-24T12:00:05.719Z
On OpenAI’s Model Spec 2024-06-21T13:00:03.014Z
AI #69: Nice 2024-06-20T12:40:02.566Z
On DeepMind’s Frontier Safety Framework 2024-06-18T13:30:21.154Z
OpenAI #8: The Right to Warn 2024-06-17T12:00:02.639Z
The Leopold Model: Analysis and Reactions 2024-06-14T15:10:03.480Z
AI #68: Remarkably Reasonable Reactions 2024-06-13T16:30:02.969Z
AiPhone 2024-06-12T22:20:02.141Z
On Dwarksh’s Podcast with Leopold Aschenbrenner 2024-06-10T12:40:03.348Z
Quotes from Leopold Aschenbrenner’s Situational Awareness Paper 2024-06-07T11:40:03.981Z
AI #67: Brief Strange Trip 2024-06-06T18:50:03.514Z
SB 1047 Is Weakened 2024-06-06T13:40:41.547Z
The Gemini 1.5 Report 2024-05-31T12:20:03.098Z
OpenAI: Helen Toner Speaks 2024-05-30T21:10:02.938Z
AI #66: Oh to Be Less Online 2024-05-30T14:20:03.334Z
OpenAI: Fallout 2024-05-28T13:20:04.325Z
I am the Golden Gate Bridge 2024-05-27T14:40:03.216Z
The Schumer Report on AI (RTFB) 2024-05-24T15:10:03.122Z
AI #65: I Spy With My AI 2024-05-23T12:40:02.793Z
Do Not Mess With Scarlett Johansson 2024-05-22T15:10:03.215Z
On Dwarkesh’s Podcast with OpenAI’s John Schulman 2024-05-21T17:30:04.332Z
OpenAI: Exodus 2024-05-20T13:10:03.543Z
GPT-4o My and Google I/O Day 2024-05-16T17:50:03.040Z
AI #64: Feel the Mundane Utility 2024-05-16T15:20:02.956Z
Monthly Roundup #18: May 2024 2024-05-13T12:30:04.863Z
AI #63: Introducing Alpha Fold 3 2024-05-09T14:20:03.176Z
I Got 95 Theses But a Glitch Ain’t One 2024-05-09T14:10:02.677Z
Dating Roundup #3: Third Time’s the Charm 2024-05-08T13:30:03.232Z
AI #61: Meta Trouble 2024-05-02T18:40:03.242Z
AI #62: Too Soon to Tell 2024-05-02T15:40:04.364Z
Q&A on Proposed SB 1047 2024-05-02T15:10:02.916Z
Changes in College Admissions 2024-04-24T13:50:03.487Z
On Llama-3 and Dwarkesh Patel’s Podcast with Zuckerberg 2024-04-22T13:10:02.645Z
AI #60: Oh the Humanity 2024-04-18T14:10:02.281Z
Childhood and Education Roundup #5 2024-04-17T13:00:03.015Z
Monthly Roundup #17: April 2024 2024-04-15T12:10:03.126Z
AI #59: Model Updates 2024-04-11T14:20:06.339Z


Comment by Zvi on AI #74: GPT-4o Mini Me and Llama 3 · 2024-07-25T14:06:43.463Z · LW · GW

Yeah, I didn't see the symbol properly, I've edited.

Comment by Zvi on AI #72: Denying the Future · 2024-07-15T15:30:01.688Z · LW · GW

So this is essentially a MIRI-style argument from game theory and potential acausal trades and such with potential other or future entities? And that these considerations will be chosen and enforced via some sort of coordination mechanism, since they have obvious short-term competition costs?

Comment by Zvi on 80,000 hours should remove OpenAI from the Job Board (and similar EA orgs should do similarly) · 2024-07-06T17:04:06.736Z · LW · GW

Not only do they continue to list such jobs, they do so with no warnings that I can see regarding OpenAI's behavior, including both its actions involving safety and also towards its own employees. 

Not warning about the specific safety failures and issues is bad enough, and will lead to uninformed decisions on the most important issue of someone's life. 

Referring a person to work at OpenAI, without warning them about the issues regarding how they treat employees, is so irresponsible towards the person looking for work as to be a missing stair issue. 

I am flaberghasted that this policy has been endorsed on reflection.

Comment by Zvi on On DeepMind’s Frontier Safety Framework · 2024-06-19T10:52:52.562Z · LW · GW

Oh, sorry, will fix.

Comment by Zvi on The Leopold Model: Analysis and Reactions · 2024-06-17T20:43:35.355Z · LW · GW

Based on how he engaged with me privately I am confident that he it not just a dude tryna make a buck.

(I am not saying he is not also trying to make a buck.)

Comment by Zvi on OpenAI: Fallout · 2024-05-29T12:12:21.075Z · LW · GW

I think it works, yes. Indeed I have a canary on my Substack About page to this effect.

Comment by Zvi on OpenAI: Fallout · 2024-05-28T19:42:14.053Z · LW · GW

Yes this is quoting Neel.

Comment by Zvi on AI #65: I Spy With My AI · 2024-05-25T14:16:43.100Z · LW · GW

Roughly this, yes. SV here means the startup ecosystem, Big Tech means large established (presumably public) companies.

Comment by Zvi on Ilya Sutskever and Jan Leike resign from OpenAI [updated] · 2024-05-21T17:25:37.031Z · LW · GW

Here is my coverage of it. Given this is a 'day minus one' interview of someone in a different position, and given everything else we already know about OpenAI, I thought this went about as well as it could have. I don't want to see false confidence in that kind of spot, and the failure of OpenAI to have a plan for that scenario is not news.

Comment by Zvi on On Llama-3 and Dwarkesh Patel’s Podcast with Zuckerberg · 2024-04-23T19:28:17.722Z · LW · GW

It is better than nothing I suppose but if they are keeping the safeties and restrictions on then it will not teach you whether it is fine to open it up.

Comment by Zvi on RTFB: On the New Proposed CAIP AI Bill · 2024-04-11T20:26:38.499Z · LW · GW

My guess is that different people do it differently, and I am super weird.

For me a lot of the trick is consciously asking if I am providing good incentives, and remembering to consider what the alternative world looks like. 

Comment by Zvi on RTFB: On the New Proposed CAIP AI Bill · 2024-04-11T14:15:31.276Z · LW · GW

I don't see this response as harsh at all? I see it as engaging in detail with the substance, note the bill is highly thoughtful overall, with a bunch of explicit encouragement, defend a bunch of their specific choices, and I say I am very happy they offered this bill. It seems good and constructive to note where I think they are asking for too much? While noting that the right amount of 'any given person reacting thinks you went too far in some places' is definitely not zero.

Comment by Zvi on On the Gladstone Report · 2024-03-23T15:45:37.644Z · LW · GW

Excellent. On the thresholds, got it, sad that I didn't realize this, and that others didn't either from what I saw.

I appreciate the 'long post is long' problem but I do think you need the warnings to be in all the places someone might see the 10^X numbers in isolation, if you don't want this to happen, and it probably happens anyway, on the grounds of 'yes that was technically not a proposal but of course it will be treated like one.' And there's some truth in that, and that you want to use examples that are what you would actually pick right now if you had to pick what to actually do (or propose).

I do think the numbers I suggest are about as low as one could realistically get until we get much stronger evidence of impending big problems.

Comment by Zvi on [deleted post] 2024-03-22T12:23:42.236Z
Comment by Zvi on [deleted post] 2024-03-22T12:23:28.770Z

Secrecy is the exception. Mostly no one cares about your startup idea or will remember your hazardous brainstorm, no one is going to cause you trouble, and so on, and honesty is almost always the best policy.  

That doesn't mean always tell everyone everything, but you need to know what you are worried about if you are letting this block you. 

On infohazards, I think people were far too worried for far too long. The actual dangerous idea turned out to be that AGI was a dangerous idea, not any specific thing. There are exceptions, but you need a very good reason, and an even better reason if it is an individual you are talking with.

Trust in terms of 'they won't steal from me' or 'they will do what they promise' is another question with no easy answers.

If you are planning something radical enough to actually get people's attention (e.g. breaking laws, using violence, fraud of various kinds, etc) then you would want to be a lot more careful who you tell, but also - don't do that?

Comment by Zvi on [deleted post] 2024-03-22T12:23:10.383Z

Secrecy is the exception. Mostly no one cares about your startup idea or will remember your hazardous brainstorm, no one is going to cause you trouble, and so on, and honesty is almost always the best policy.  

That doesn't mean always tell everyone everything, but you need to know what you are worried about if you are letting this block you. 

On infohazards, I think people were far too worried for far too long. The actual dangerous idea turned out to be that AGI was a dangerous idea, not any specific thing. There are exceptions, but you need a very good reason, and an even better reason if it is an individual you are talking with.

Trust in terms of 'they won't steal from me' or 'they will do what they promise' is another question with no easy answers.

If you are planning something radical enough to actually get people's attention (e.g. breaking laws, using violence, fraud of various kinds, etc) then you would want to be a lot more careful who you tell, but also - don't do that?

Comment by Zvi on Monthly Roundup #14: January 2024 · 2024-01-30T23:37:44.565Z · LW · GW

Sounds like your scale is stingier than mine is a lot of it. And it makes sense that the recommendations come apart at the extreme high end, especially for older films. The 'for the time' here is telling. 

Comment by Zvi on Monthly Roundup #14: January 2024 · 2024-01-30T00:22:54.794Z · LW · GW

On my scale, if I went 1 for 7 on finding 4.0+ films in a year, then yeah I'd find that a disappointing year. 

In other news, I tried out Scaruffi. I figured I'd watch the top pick. Number was Citizen Kane which I'd already watched (5.0 so that was a good sign), which was Repulsion. And... yeah, that was not a good selection method. Critics and I do NOT see eye to eye. 

I also scanned their ratings of various other films, which generally seemed reasonable for films I'd seen, although with a very clear 'look at me I am a movie critic' bias, including one towards older films. I don't know how to correct for that properly. 

Comment by Zvi on Monthly Roundup #14: January 2024 · 2024-01-26T15:25:58.094Z · LW · GW

Real estate can definitely be a special case, because (1) you are also doing consumption, (2) it is non-recourse and you never get a margin call, which provides a lot of protection and (3) The USG is massively subsidizing you doing that...

Comment by Zvi on Monthly Roundup #14: January 2024 · 2024-01-26T15:24:16.785Z · LW · GW

There are lead times to a lot of these actions, costs to do so are often fixed, and no reason to expect the rules changes not to happen. I buy that it is efficient to do so early.

'Greed' I consider a non-sequitur here, the manager will profit maximize.

Comment by Zvi on Monthly Roundup #14: January 2024 · 2024-01-26T15:19:39.858Z · LW · GW

I'm curious how many films you saw - having only one above 3.5 on that scale seems highly disappointing. 

Comment by Zvi on AI #48: Exponentials in Geometry · 2024-01-23T12:50:42.685Z · LW · GW

Argument from incredulity? 

Comment by Zvi on On Anthropic’s Sleeper Agents Paper · 2024-01-17T20:30:13.175Z · LW · GW

Thanks for the notes!

As I understand that last point, you're saying that it's not a good point because it is false (hence my 'if it turns out to be true'). Weird that I've heard the claim from multiple places in these discussions. I assumed there was some sort of 'order matters in terms of pre-training vs. fine-tuning obviously, but there's a phase shift in what you're doing between them.' I also did wonder about the whole 'you can remove Llama-2's fine tuning in 100 steps' thing, since if that is true then presumably order must matter within fine tuning.

Anyone think there's any reason to think Pope isn't simply technically wrong here (including Pope)? 

Comment by Zvi on Medical Roundup #1 · 2024-01-17T15:55:58.540Z · LW · GW

Yep, whoops, fixing.

Comment by Zvi on Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training · 2024-01-16T14:46:38.269Z · LW · GW

That seems rather loaded in the other direction. How about “The evidence suggests that if current ML systems were going to deceive us in scenarios that do not appear in our training sets, we wouldn’t be able to detect this or change them not to unless we found the conditions where it would happen.”? 

Comment by Zvi on Announcing Balsa Research · 2024-01-15T00:41:34.439Z · LW · GW

Did you see ( That's the closest thing available at the moment.

Comment by Zvi on Criticism of EA Criticism Contest · 2024-01-09T14:21:23.847Z · LW · GW

This post was, in the end, largely a failed experiment. It did win a lesser prize, and in a sense that proved its point, and I had fun doing it, but I do not think it successfully changed minds, and I don't think it has lasting value, although someone gave it a +9 so it presumably worked for them. The core idea - that EA in particular wants 'criticism' but it wants it in narrow friendly ways and it discourages actual substantive challenges to its core stuff - does seem important. But also this is LW, not EA Forum. If I had to do it over again, I wouldn't bother writing this.

Comment by Zvi on Announcing Balsa Research · 2024-01-09T14:16:53.152Z · LW · GW

I am flattered that someone nominated this but I don't know why. I still believe in the project, but this doesn't match at all what I'd look to in this kind of review? The vision has changed and narrowed substantially. So this is a historical artifact of sorts, I suppose, but I don't see why it would belong.

Comment by Zvi on Jailbreaking ChatGPT on Release Day · 2024-01-09T14:15:02.296Z · LW · GW

I think this post did good work in its moment, but doesn't have that much lasting relevance and can't see why someone would revisit at this point. It shouldn't be going into any timeless best-of lists.

Comment by Zvi on On Bounded Distrust · 2024-01-09T14:13:29.891Z · LW · GW

I continue to frequently refer back to my functional understanding of bounded distrust. I now try to link to 'How To Bounded DIstrust' instead because it's more compact, but this is I think the better full treatment for those who have the time. I'm sad this isn't seeing more support, presumably because it isn't centrally LW-focused enough? But to me this is a core rationalist skill not discussed enough, among its other features.

Comment by Zvi on On OpenAI’s Preparedness Framework · 2024-01-03T15:24:19.955Z · LW · GW

I do not monitor the EA Forum unless something triggers me to do so, which is rare, so I don't know which threads/issues this refers to. 

Comment by Zvi on AI #43: Functional Discoveries · 2023-12-24T19:22:30.222Z · LW · GW

Yes, I mean the software (I am not going to bother fixing it)

Comment by Zvi on AI #43: Functional Discoveries · 2023-12-21T20:52:17.410Z · LW · GW

I'm skeptical, but I love a good hypothesis, so: 

Comment by Zvi on OpenAI: Preparedness framework · 2023-12-20T14:34:16.583Z · LW · GW

I wrote up a ~4k-word take on the document that I'll likely post tomorrow - if you'd like to read the draft today you can PM/DM/email me. 

(Basic take: Mixed bag, definitely highly incomplete, some definite problems, but better than I would have expected and a positive update)

Comment by Zvi on Balsa Update and General Thank You · 2023-12-16T20:36:54.074Z · LW · GW

It's based on the general preference to be in place X instead of Y. If you could get equally attractive jobs in lousy places Y, that would take away that factor. There would still be many other reasons, but it would help.

Comment by Zvi on Balsa Update and General Thank You · 2023-12-14T13:02:22.155Z · LW · GW

They seem mutually compatible to me, same way you need food and water and oxygen. Economically, the median person needs (1) a decent job and (2) affordable housing and other necessary expenses, without any one thing that is so bad it binds and eats everything. Right now housing does that, and we also have big issues with education and health care, whereas food and clothing used to be problems and no longer are.

Comment by Zvi on The Best of Don’t Worry About the Vase · 2023-12-14T12:59:49.041Z · LW · GW

You're welcome. That's a reasonable point (I think that the LW mod team assembled the sequence here for me, and made different choices on what to include). I think they belong but also that one often has to make cuts.

Comment by Zvi on The Best of Don’t Worry About the Vase · 2023-12-13T22:46:12.969Z · LW · GW

Indeed, and the sequence is there, called Slack and the Sabbath.

I think I've given people enough hints to longtime readers for them to mostly know what Moloch's Army is, but unfortunately I doubt I'll be in the headspace to be able to write that one any time soon. 

Comment by Zvi on AI #41: Bring in the Other Gemini · 2023-12-08T22:41:18.150Z · LW · GW

From my perspective here's what happened: I spent hours trying to parse his arguments. I then wrote an effort post, responding to something that seemed very wrong to me, that took me many hours, that was longer than the OP, and attempted to explore the questions and my model in detail. 

He wrote a detailed reply, which I thanked him for, ignoring the tone issues in question here and focusing on thee details and disagreements. I spent hours processing it and replied in detail to each of his explanations in the reply, including asking many detailed questions, identifying potential cruxes, making it clear where I thought he was right about my mistakes, and so on. I read all the comments carefully, by everyone. 

This was an extraordinary, for me, commitment of time, by this point, while the whole thing was stressful. He left it at that. Which is fine, but I don't know how else I was supposed to 'follow up' at that point? I don't know what else someone seeking to understand is supposed to do. 

I agree Nate's post was a mistake, and said so in OP here - either take the time to engage or don't engage. That was bad. But in general no, I do not think that the thing I am observing from Pope/Belrose is typical of LW/AF/rationalist/MIRI/etc behaviors to anything like the same degree that they consistently do it.

Nor do I get the sense that they are open to argument. Looking over Pope's reply to me, I basically don't see him changing his mind about anything, agreeing a good point was made, addressing my arguments or thoughts on their merits rather than correcting my interpretation of his arguments, asking me questions, suggesting cruxes and so on. Where he notes disagreement he says he's baffled anyone could think such a thing and doesn't seem curious why I might think it.

If people want to make a higher bid for me to engage more after that, I am open to hearing it. Otherwise, I don't see how to usefully do so in reasonable time in a way that would have value.

Comment by Zvi on AI #41: Bring in the Other Gemini · 2023-12-07T21:07:55.058Z · LW · GW

I agree on the margin I fall into the trap of doing more of this than I should. I do curate my Twitter feed to try and make this a better form of reaction than it would otherwise be, but I should raise the bar for that relative to my other bars. 

Always good to get reminders on this.

However, as you allude to, you're in the spot where you're already checking many of the same sources on Twitter, whereas one of the points of these posts for a lot of readers is so they don't have to do that. I'd definitely do it radically differently if I thought most readers of mine were going to be checking Twitter a lot anyway. 

Comment by Zvi on On ‘Responsible Scaling Policies’ (RSPs) · 2023-12-06T01:12:30.108Z · LW · GW

Ah, thanks for clearing that up. That definitely wasn't made clear to me.

Comment by Zvi on AI #39: The Week of OpenAI · 2023-11-25T13:38:40.316Z · LW · GW

Ah, he didn't realize he was getting signal boosted and edited after he got a bunch of inquiries. Under the old wording, I didn't think they had no alignment teams, but I read it as 'a new alignment team.' It makes sense under Google's general structure to have multiples, in fact it would be weird if you didn't. 

Comment by Zvi on Possible OpenAI's Q* breakthrough and DeepMind's AlphaGo-type systems plus LLMs · 2023-11-24T01:34:12.039Z · LW · GW

How far does this go? Does this mean if I e.g. had stupid questions or musings about Q learning, I shouldn't talk about that in public in case I accidentally hit upon something or provoked someone else to say something?

Comment by Zvi on OpenAI: The Battle of the Board · 2023-11-22T19:12:51.713Z · LW · GW

My presumption is that doing this while leaving Altman in place as CEO risks Altman engaging in hostile action, and it represents a vote of no confidence in any case. It isn't a stable option. But I'd have gamed it out?

Comment by Zvi on OpenAI: Facts from a Weekend · 2023-11-22T18:14:12.545Z · LW · GW

It would be sheer insanity to have a rule that you can't vote on your own removal, I would think, or else a tied board will definitely shrink right away.

Comment by Zvi on OpenAI: Facts from a Weekend · 2023-11-20T16:27:54.988Z · LW · GW

Now claim that it's up to 650/770.

Comment by Zvi on OpenAI: Facts from a Weekend · 2023-11-20T16:22:59.257Z · LW · GW


Comment by Zvi on OpenAI: Facts from a Weekend · 2023-11-20T16:22:47.995Z · LW · GW

Yeah, should have put that in the main, forgot. Added now.

Comment by Zvi on OpenAI: Facts from a Weekend · 2023-11-20T16:19:12.125Z · LW · GW

Initially I saw it from Kara Swisher (~1mm views) then I saw it from a BB employee. I presume it is genuine.

Comment by Zvi on Bostrom Goes Unheard · 2023-11-14T12:05:59.798Z · LW · GW

I definitely do not think this is on the level of the EO or Summit.