zvi

Individually for a particular manifestation of each issue this is true, you can imagine doing a hacky solution to each one. But that assumes there is a list of such particular problems that if you check off all the boxes you win, rather than them being manifestations of broader problems. You do not want to get into a hacking contest if you're not confident your list is complete.

Comment by Zvi on AI: Practical Advice for the Worried · 2024-12-25T19:27:33.513Z · LW · GW

I find myself linking back to this often. I don't still fully endorse quite everything here, but the core messages still seem true even with things seeming further along.

I do think it should likely get updated soon for 2025.

Comment by Zvi on Alignment Faking in Large Language Models · 2024-12-20T16:00:38.483Z · LW · GW

My interpretation/hunch of this is that there are two things going on, curious if others see it this way:

It is learning to fake the trainer's desired answer.
It is learning to actually give the trainer's desired answer.

So during training, it learns to fake a lot more, and will often decide to fake the desired answer, even though it would have otherwise decided to give the desired answer anyway. It's 'lying with the truth' and perhaps giving a different variation of the desired answer than it would have given otherwise or perhaps not. The algorithm in training is learning to be mostly preferences-agnostic, password-guessing behavior.

Comment by Zvi on AI #91: Deep Thinking · 2024-11-23T15:40:01.824Z · LW · GW

I am not a software engineer, and I've encountered cases where it seems plausible that an engineer has basically stopped putting in work. It can be tough to know for sure for a while even when you notice. But yeah, it shouldn't be able to last for THAT long, but if no one is paying attention?

I've also had jobs where I've had periods with radically different hours worked, and where it would have been very difficult for others to tell which it was for a while if I was trying to hide it, which I wasn't.

Comment by Zvi on Zvi’s Thoughts on His 2nd Round of SFF · 2024-11-23T15:36:43.019Z · LW · GW

I think twice as much time actually spent would have improved decisions substantially, but is tough - everyone is very busy these days, so it would require both a longer working window, and also probably higher compensation for recommenders. At minimum, it would allow a lot more investigations especially of non-connected outsider proposals.

Comment by Zvi on johnswentworth's Shortform · 2024-10-29T13:21:12.130Z · LW · GW

The skill in such a game is largely in understanding the free association space, knowing how people likely react and thinking enough steps ahead to choose moves that steer the person where you want to go, either into topics you find interesting, information you want from them, or getting them to a particular position, and so on. If you're playing without goals, of course it's boring...

Comment by Zvi on AI #86: Just Think of the Potential · 2024-10-19T13:50:16.106Z · LW · GW

I don't think that works because my brain keeps trying to make it a literal gas bubble?

Comment by Zvi on AI #86: Just Think of the Potential · 2024-10-19T13:49:51.924Z · LW · GW

I see how you got there. It's a position one could take, although I think it's unlikely and also that it's unlikely that's what Dario meant. If you are right about what he meant, I think it would be great for Dario to be a ton more explicit about it (and for someone to pass that message along to him). Esotericism doesn't work so well here!

Comment by Zvi on AI #85: AI Wins the Nobel Prize · 2024-10-19T13:47:35.776Z · LW · GW

I am taking as a given people's revealed and often very strongly stated preference that CSAM images are Very Not Okay even if they are fully AI generated and not based on any individual, to the point of criminality, and that society is going to treat it that way.

I agree that we don't know that it is actually net harmful - e.g. the studies on video game use and access to adult pornography tend to not show the negative impacts people assume.

Comment by Zvi on GPT-o1 · 2024-09-16T18:41:44.316Z · LW · GW

Yep, I've fixed it throughout.

That's how bad the name is, my lord - you have a GPT-4o and then an o1, and there is no relation between the two 'o's.

Comment by Zvi on GPT-o1 · 2024-09-16T18:36:22.505Z · LW · GW

I do read such comments (if not always right away) and I do consider them. I don't know if they're worth the effort for you.

Briefly, I do not think these two things I am presenting here are in conflict. In plain metaphorical language (so none of the nitpicks about word meanings, please, I'm just trying to sketch the thought not be precise): It is a schemer when it is placed in a situation in which it would be beneficial for it to scheme in terms of whatever de facto goal it is de facto trying to achieve. If that means scheming on behalf of the person giving it instructions, so be it. If it means scheming against that person, so be it. The de facto goal may or may not match the instructed goal or intended goal, in various ways, because of reasons. Etc.

Comment by Zvi on SB 1047: Final Takes and Also AB 3211 · 2024-08-28T15:57:13.979Z · LW · GW

Two responses.

One, even if no one used it, there would still be value in demonstrating it was possible - if academia only develops things people will adapt commercially right away then we might as well dissolve academia. This is a highly interesting and potentially important problem, people should be excited.

Two, there would presumably at minimum be demand to give students (for example) access to a watermarked LLM, so they could benefit from it without being able to cheat. That's even an academic motivation. And if the major labs won't do it, someone can build a Llama version or what not for this, no?

Comment by Zvi on SB 1047: Final Takes and Also AB 3211 · 2024-08-28T11:58:38.871Z · LW · GW

If the academics can hack together an open source solution why haven't they? Seems like it would be a highly cited, very popular paper. What's the theory on why they don't do it?

Comment by Zvi on Guide to SB 1047 · 2024-08-21T12:37:22.520Z · LW · GW

Worth noticing that is a much weaker claim. The FMB issuing non-binding guidance on X is not the same as a judge holding a company liable for ~X under the law.

Comment by Zvi on Guide to SB 1047 · 2024-08-20T22:49:52.449Z · LW · GW

I am rather confident that the California Supreme Court (or US Supreme Court, potentially) would rule that the law says what it says, and would happily bet on that.

If you think we simply don't have any law and people can do what they want, when nothing matters. Indeed, I'd say it would be more likely to work for Gavin to today simply declare some sort of emergency about this, than to try and invoke SB 1047.

Comment by Zvi on Guide to SB 1047 · 2024-08-20T22:47:45.711Z · LW · GW

They do have to publish any SSP at all, or they are in violation of the statute, and injunctive relief could be sought.

Comment by Zvi on Guide to SB 1047 · 2024-08-20T22:46:55.094Z · LW · GW

This is a silly wordplay joke, you're overthinking it.

Comment by Zvi on AI #74: GPT-4o Mini Me and Llama 3 · 2024-07-25T14:06:43.463Z · LW · GW

Yeah, I didn't see the symbol properly, I've edited.

Comment by Zvi on AI #72: Denying the Future · 2024-07-15T15:30:01.688Z · LW · GW

So this is essentially a MIRI-style argument from game theory and potential acausal trades and such with potential other or future entities? And that these considerations will be chosen and enforced via some sort of coordination mechanism, since they have obvious short-term competition costs?

Comment by Zvi on 80,000 hours should remove OpenAI from the Job Board (and similar EA orgs should do similarly) · 2024-07-06T17:04:06.736Z · LW · GW

Not only do they continue to list such jobs, they do so with no warnings that I can see regarding OpenAI's behavior, including both its actions involving safety and also towards its own employees.

Not warning about the specific safety failures and issues is bad enough, and will lead to uninformed decisions on the most important issue of someone's life.

Referring a person to work at OpenAI, without warning them about the issues regarding how they treat employees, is so irresponsible towards the person looking for work as to be a missing stair issue.

I am flaberghasted that this policy has been endorsed on reflection.

Comment by Zvi on On DeepMind’s Frontier Safety Framework · 2024-06-19T10:52:52.562Z · LW · GW

Oh, sorry, will fix.

Comment by Zvi on The Leopold Model: Analysis and Reactions · 2024-06-17T20:43:35.355Z · LW · GW

Based on how he engaged with me privately I am confident that he it not just a dude tryna make a buck.

(I am not saying he is not also trying to make a buck.)

Comment by Zvi on OpenAI: Fallout · 2024-05-29T12:12:21.075Z · LW · GW

I think it works, yes. Indeed I have a canary on my Substack About page to this effect.

Comment by Zvi on OpenAI: Fallout · 2024-05-28T19:42:14.053Z · LW · GW

Yes this is quoting Neel.

Comment by Zvi on AI #65: I Spy With My AI · 2024-05-25T14:16:43.100Z · LW · GW

Roughly this, yes. SV here means the startup ecosystem, Big Tech means large established (presumably public) companies.

Comment by Zvi on Ilya Sutskever and Jan Leike resign from OpenAI [updated] · 2024-05-21T17:25:37.031Z · LW · GW

Here is my coverage of it. Given this is a 'day minus one' interview of someone in a different position, and given everything else we already know about OpenAI, I thought this went about as well as it could have. I don't want to see false confidence in that kind of spot, and the failure of OpenAI to have a plan for that scenario is not news.

Comment by Zvi on On Llama-3 and Dwarkesh Patel’s Podcast with Zuckerberg · 2024-04-23T19:28:17.722Z · LW · GW

It is better than nothing I suppose but if they are keeping the safeties and restrictions on then it will not teach you whether it is fine to open it up.

Comment by Zvi on RTFB: On the New Proposed CAIP AI Bill · 2024-04-11T20:26:38.499Z · LW · GW

My guess is that different people do it differently, and I am super weird.

For me a lot of the trick is consciously asking if I am providing good incentives, and remembering to consider what the alternative world looks like.

Comment by Zvi on RTFB: On the New Proposed CAIP AI Bill · 2024-04-11T14:15:31.276Z · LW · GW

I don't see this response as harsh at all? I see it as engaging in detail with the substance, note the bill is highly thoughtful overall, with a bunch of explicit encouragement, defend a bunch of their specific choices, and I say I am very happy they offered this bill. It seems good and constructive to note where I think they are asking for too much? While noting that the right amount of 'any given person reacting thinks you went too far in some places' is definitely not zero.

Comment by Zvi on On the Gladstone Report · 2024-03-23T15:45:37.644Z · LW · GW

Excellent. On the thresholds, got it, sad that I didn't realize this, and that others didn't either from what I saw.

I appreciate the 'long post is long' problem but I do think you need the warnings to be in all the places someone might see the 10^X numbers in isolation, if you don't want this to happen, and it probably happens anyway, on the grounds of 'yes that was technically not a proposal but of course it will be treated like one.' And there's some truth in that, and that you want to use examples that are what you would actually pick right now if you had to pick what to actually do (or propose).

I do think the numbers I suggest are about as low as one could realistically get until we get much stronger evidence of impending big problems.

Comment by Zvi on [deleted post] 2024-03-22T12:23:42.236Z

Comment by Zvi on [deleted post] 2024-03-22T12:23:28.770Z

Secrecy is the exception. Mostly no one cares about your startup idea or will remember your hazardous brainstorm, no one is going to cause you trouble, and so on, and honesty is almost always the best policy.

That doesn't mean always tell everyone everything, but you need to know what you are worried about if you are letting this block you.

On infohazards, I think people were far too worried for far too long. The actual dangerous idea turned out to be that AGI was a dangerous idea, not any specific thing. There are exceptions, but you need a very good reason, and an even better reason if it is an individual you are talking with.

Trust in terms of 'they won't steal from me' or 'they will do what they promise' is another question with no easy answers.

If you are planning something radical enough to actually get people's attention (e.g. breaking laws, using violence, fraud of various kinds, etc) then you would want to be a lot more careful who you tell, but also - don't do that?

Comment by Zvi on [deleted post] 2024-03-22T12:23:10.383Z

That doesn't mean always tell everyone everything, but you need to know what you are worried about if you are letting this block you.

Trust in terms of 'they won't steal from me' or 'they will do what they promise' is another question with no easy answers.

Comment by Zvi on Monthly Roundup #14: January 2024 · 2024-01-30T23:37:44.565Z · LW · GW

Sounds like your scale is stingier than mine is a lot of it. And it makes sense that the recommendations come apart at the extreme high end, especially for older films. The 'for the time' here is telling.

Comment by Zvi on Monthly Roundup #14: January 2024 · 2024-01-30T00:22:54.794Z · LW · GW

On my scale, if I went 1 for 7 on finding 4.0+ films in a year, then yeah I'd find that a disappointing year.

In other news, I tried out Scaruffi. I figured I'd watch the top pick. Number was Citizen Kane which I'd already watched (5.0 so that was a good sign), which was Repulsion. And... yeah, that was not a good selection method. Critics and I do NOT see eye to eye.

I also scanned their ratings of various other films, which generally seemed reasonable for films I'd seen, although with a very clear 'look at me I am a movie critic' bias, including one towards older films. I don't know how to correct for that properly.

Comment by Zvi on Monthly Roundup #14: January 2024 · 2024-01-26T15:25:58.094Z · LW · GW

Real estate can definitely be a special case, because (1) you are also doing consumption, (2) it is non-recourse and you never get a margin call, which provides a lot of protection and (3) The USG is massively subsidizing you doing that...

Comment by Zvi on Monthly Roundup #14: January 2024 · 2024-01-26T15:24:16.785Z · LW · GW

There are lead times to a lot of these actions, costs to do so are often fixed, and no reason to expect the rules changes not to happen. I buy that it is efficient to do so early.

'Greed' I consider a non-sequitur here, the manager will profit maximize.

Comment by Zvi on Monthly Roundup #14: January 2024 · 2024-01-26T15:19:39.858Z · LW · GW

I'm curious how many films you saw - having only one above 3.5 on that scale seems highly disappointing.

Comment by Zvi on AI #48: Exponentials in Geometry · 2024-01-23T12:50:42.685Z · LW · GW

Argument from incredulity?

Comment by Zvi on On Anthropic’s Sleeper Agents Paper · 2024-01-17T20:30:13.175Z · LW · GW

Thanks for the notes!

As I understand that last point, you're saying that it's not a good point because it is false (hence my 'if it turns out to be true'). Weird that I've heard the claim from multiple places in these discussions. I assumed there was some sort of 'order matters in terms of pre-training vs. fine-tuning obviously, but there's a phase shift in what you're doing between them.' I also did wonder about the whole 'you can remove Llama-2's fine tuning in 100 steps' thing, since if that is true then presumably order must matter within fine tuning.

Anyone think there's any reason to think Pope isn't simply technically wrong here (including Pope)?

Comment by Zvi on Medical Roundup #1 · 2024-01-17T15:55:58.540Z · LW · GW

Yep, whoops, fixing.

Comment by Zvi on Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training · 2024-01-16T14:46:38.269Z · LW · GW

That seems rather loaded in the other direction. How about “The evidence suggests that if current ML systems were going to deceive us in scenarios that do not appear in our training sets, we wouldn’t be able to detect this or change them not to unless we found the conditions where it would happen.”?

Comment by Zvi on Announcing Balsa Research · 2024-01-15T00:41:34.439Z · LW · GW

Did you see (https://thezvi.substack.com/p/balsa-update-and-general-thank-you)? That's the closest thing available at the moment.

Comment by Zvi on Criticism of EA Criticism Contest · 2024-01-09T14:21:23.847Z · LW · GW

This post was, in the end, largely a failed experiment. It did win a lesser prize, and in a sense that proved its point, and I had fun doing it, but I do not think it successfully changed minds, and I don't think it has lasting value, although someone gave it a +9 so it presumably worked for them. The core idea - that EA in particular wants 'criticism' but it wants it in narrow friendly ways and it discourages actual substantive challenges to its core stuff - does seem important. But also this is LW, not EA Forum. If I had to do it over again, I wouldn't bother writing this.

Comment by Zvi on Announcing Balsa Research · 2024-01-09T14:16:53.152Z · LW · GW

I am flattered that someone nominated this but I don't know why. I still believe in the project, but this doesn't match at all what I'd look to in this kind of review? The vision has changed and narrowed substantially. So this is a historical artifact of sorts, I suppose, but I don't see why it would belong.

Comment by Zvi on Jailbreaking ChatGPT on Release Day · 2024-01-09T14:15:02.296Z · LW · GW

I think this post did good work in its moment, but doesn't have that much lasting relevance and can't see why someone would revisit at this point. It shouldn't be going into any timeless best-of lists.

Comment by Zvi on On Bounded Distrust · 2024-01-09T14:13:29.891Z · LW · GW

I continue to frequently refer back to my functional understanding of bounded distrust. I now try to link to 'How To Bounded DIstrust' instead because it's more compact, but this is I think the better full treatment for those who have the time. I'm sad this isn't seeing more support, presumably because it isn't centrally LW-focused enough? But to me this is a core rationalist skill not discussed enough, among its other features.

Comment by Zvi on On OpenAI’s Preparedness Framework · 2024-01-03T15:24:19.955Z · LW · GW

I do not monitor the EA Forum unless something triggers me to do so, which is rare, so I don't know which threads/issues this refers to.

User info

Posts

Comments