Posts

Holly Elmore and Rob Miles dialogue on AI Safety Advocacy 2023-10-20T21:04:32.645Z
TOMORROW: the largest AI Safety protest ever! 2023-10-20T18:15:18.276Z
Global Pause AI Protest 10/21 2023-10-14T03:20:27.937Z
Protest against Meta's irreversible proliferation (Sept 29, San Francisco) 2023-09-19T23:40:30.202Z
Holly_Elmore's Shortform 2023-06-18T11:54:18.790Z
Seeking beta readers who are ignorant of biology but knowledgeable about AI safety 2022-07-27T23:02:57.192Z
Virtue signaling is sometimes the best or the only metric we have 2022-04-28T04:52:53.884Z
Instead of "I'm anxious," try "I feel threatened" 2019-06-28T05:24:52.593Z

Comments

Comment by Holly_Elmore on Sam Altman fired from OpenAI · 2023-11-18T03:42:49.769Z · LW · GW

What kind of securities fraud could he have committed? 

Comment by Holly_Elmore on Holly Elmore and Rob Miles dialogue on AI Safety Advocacy · 2023-11-08T06:47:39.871Z · LW · GW

No, sacrificing truth is fundamentally an act of self-deception. It is making yourself a man who believes a falsehood, or has a disregard for the truth. It is Gandhi taking the murder-pill. That is what I consider irreversible.

This is what I was talking about, or the general thing I had in mind, and I think it is reversible. Not a good idea, but I think people who have ever self-deceived or wanted to believe something convenient have come back around to wanting to know the truth. I also think people can be truthseeking in some domains while self-deceiving in others. Perhaps if this weren’t the case, it would be easier to draw lines for acceptable behavior, but I think that unfortunately it isn’t.

Very beside my original point about being willing to speak more plainly, but I think you get that.

Comment by Holly_Elmore on If a little is good, is more better? · 2023-11-04T07:22:52.719Z · LW · GW

I get the sense that "but Google and textbooks exist" is more of a deontological argument, like if the information is public at all "the cat's out of the bag" and it's unfair to penalize LLMs bc they didn't cross any new lines, just increased accessibility.

Comment by Holly_Elmore on Holly Elmore and Rob Miles dialogue on AI Safety Advocacy · 2023-10-26T23:12:59.854Z · LW · GW

Does that really seem true to you? Do you have no memories of sacrificing truth for something else you wanted when you were a child, say? I'm not saying it's just fine to sacrifice truth but it seems false to me to say that people never return to seeking the truth after deceiving themselves, much less after trying on different communication styles or norms. If that were true I feel like no one could ever be rational at all. 

Comment by Holly_Elmore on TOMORROW: the largest AI Safety protest ever! · 2023-10-23T17:30:01.222Z · LW · GW

That’s why I said “financially cheap”. They are expensive for the organizer in terms of convincing people to volunteer and to all attendees as far as their time and talents, and getting people to put in sweat equity is what makes it an effective demonstration. But per dollar invested they are very effective.

I would venture that the only person who was seriously prevented from doing something else by being involved in this protest was me. Of course there is some time and labor cost for everyone involved. I hope it was complementary to whatever else they do, and, as Ben said, perhaps even allowing them to flex different muscles in an enriching way.

Comment by Holly_Elmore on Holly Elmore and Rob Miles dialogue on AI Safety Advocacy · 2023-10-23T01:43:59.529Z · LW · GW

I’m down for a followup!

Comment by Holly_Elmore on TOMORROW: the largest AI Safety protest ever! · 2023-10-23T01:39:27.418Z · LW · GW

It’s hard to say what the true impact of the events will be at this time, but they went well! I’m going to write a post-mortem for the SF PauseAI protest yesterday and the Meta protest in September and post it on EAF/LW that will cover the short-term outcomes.

Considering they are financially cheap to do (each around $2000 if you don’t count my salary), I’d call them pretty successful already. Meta protest got good media coverage, and it remains to be seen how this one will be covered since most of the coverage happened in the two following weeks last time.

Comment by Holly_Elmore on TOMORROW: the largest AI Safety protest ever! · 2023-10-23T01:38:36.504Z · LW · GW
Comment by Holly_Elmore on TOMORROW: the largest AI Safety protest ever! · 2023-10-20T22:16:49.320Z · LW · GW

You could share the events with your friends and family who may be near, and signal boost media coverage of the events after! If you want to donate to keep me organizing events, I have a GoFundMe (and if anyone wants to give a larger amount, I'm happy to talk about how to do that :D). If you want to organize future events yourself, please DM me. Even putting the pause emoji ⏸️ in your twitter name helps :)

Here are the participating cities and links:
October 21st (Saturday), in multiple countries

Comment by Holly_Elmore on TOMORROW: the largest AI Safety protest ever! · 2023-10-20T21:34:22.439Z · LW · GW

Personally, I'm interested in targeting hardware development and that will be among my future advocacy directions.  I think it'll be a great issue for corporate campaigns pushing voluntary agreements and for pushing for external regulations simultaneously. This protest is aimed more at governments (attending the UK Summit) and their overall plans for regulating AI, so we're pushing compute governance as way to most immediately address the creation of frontier models. Imo hardware tracking at the very least is going to have to be part of enforcing such a limit if it is adopted, and slowing the development of more powerful hardware will be important to keeping an acceptable compute threshold high enough that we're not constantly on the verge of someone illegally getting together enough chips to make something dangerous. 

Comment by Holly_Elmore on Holly Elmore and Rob Miles dialogue on AI Safety Advocacy · 2023-10-20T21:26:41.679Z · LW · GW

If you found yourself interested in advocacy, the largest AI Safety protest ever is happening Saturday, October 21st! 

https://www.lesswrong.com/posts/abBtKF857Ejsgg9ab/tomorrow-the-largest-ai-safety-protest-ever 

Comment by Holly_Elmore on The International PauseAI Protest: Activism under uncertainty · 2023-10-14T03:25:16.925Z · LW · GW

Check out the LessWrong event here: https://www.lesswrong.com/events/ZoTkRYdqGuDCnojMW/global-pause-ai-protest-10-21

Comment by Holly_Elmore on Evaluating the historical value misspecification argument · 2023-10-05T20:30:09.576Z · LW · GW

I think you’re correct that the paradigm has changed, Matthew, and that the problems that stood out to MIRI before as possibilities no longer quite fit the situation.

I still think the broader concern MIRI exhibited is correct: namely, that that an AI could appear to be aligned but not actually be aligned, and that this may not come to light until it is behaving outside of the context of training/in which the command was written. Because of the greater capabilities of an AI, the problem may have to do with differences in superficially similar goals that wouldn’t matter at the human capabilities level.

I’m not sure if the fact that LLMs solve the cauldron-filling problem means that we should consider the whole broader class of problems easier to solve than we thought. Maybe it does. But given the massive stakes of the issue I think we ought to consider not knowing if LLMs will always behave as intended OOD a live problem.

Comment by Holly_Elmore on Protest against Meta's irreversible proliferation (Sept 29, San Francisco) · 2023-09-27T16:59:10.198Z · LW · GW

Change log: I removed the point about Meta inaccurately calling itself "open source" because it was confusing. 

Comment by Holly_Elmore on Protest against Meta's irreversible proliferation (Sept 29, San Francisco) · 2023-09-22T18:51:38.931Z · LW · GW

Particularly in the rationalist community it seems like protesting is seen as a very outgroup thing to do. But why should that be? Good on you for expanding your comfort zone-- hope to see you there :)

Comment by Holly_Elmore on Protest against Meta's irreversible proliferation (Sept 29, San Francisco) · 2023-09-22T18:48:36.242Z · LW · GW

^ all good points, but I think the biggest thing here is the policy of sharing weights continuing into the future with more powerful models. 

Comment by Holly_Elmore on Protest against Meta's irreversible proliferation (Sept 29, San Francisco) · 2023-09-21T23:50:11.696Z · LW · GW

Yeah, I’ve been weighing a lot whether big tent approaches are something I can pull off at this stage or whether I should stick to “Pause AI”. The Meta protest is kind of an experiment in that regard and it has already been harder than I expected to get the message about irreversible proliferation across well. Pause is sort of automatically a big tent because it would address all AI harms. People can be very aligned on Pause as a policy without having the same motivations. Not releasing model weights is more of a one-off issue and requires a lot of inferential distance crossing even with knowledgeable people. So I’ll probably keep the next several events focused on Pause, a message much better suited to advocacy.

Comment by Holly_Elmore on Protest against Meta's irreversible proliferation (Sept 29, San Francisco) · 2023-09-21T23:45:19.731Z · LW · GW

Yeah, I’m afraid of this happening with AI even as the danger becomes clearer. It’s one reason we’re in a really important window for setting policy.

Comment by Holly_Elmore on Protest against Meta's irreversible proliferation (Sept 29, San Francisco) · 2023-09-21T06:49:42.879Z · LW · GW

Reducing the harm of irreversible proliferation potentially addresses almost all AI harms, but my motivating concern is x-risk.

Comment by Holly_Elmore on Protest against Meta's irreversible proliferation (Sept 29, San Francisco) · 2023-09-21T06:48:12.806Z · LW · GW

This strikes me as the kind of political thinking I think you’re trying to avoid. Contempt is not good for thought. Advocacy is not the only way to be tempted to lower your epistemic standards. I think you’re doing it right now when you other me or this type of intervention.

Comment by Holly_Elmore on Protest against Meta's irreversible proliferation (Sept 29, San Francisco) · 2023-09-20T06:34:06.381Z · LW · GW

I commend your introspection on this.

Comment by Holly_Elmore on Protest against Meta's irreversible proliferation (Sept 29, San Francisco) · 2023-09-20T06:30:06.532Z · LW · GW

I agree with your assessment of the situation a lot, but I disagree that there is all that much controversy about this issue in the broader public. There is a lot of controversy on lesswrong, and in tech, but the public as a whole is in favor of slowing down and regulating AI developments. (Although other AI companies think sharing weights is really irresponsible and there are anti-competitive issues with llama 2’s ToS, which why it isn’t actually open source.) https://theaipi.org/poll-shows-overwhelming-concern-about-risks-from-ai-as-new-institute-launches-to-understand-public-opinion-and-advocate-for-responsible-ai-policies/

The public doesn’t understand the risks of sharing model weights so getting media attention to this issue will be helpful.

Comment by Holly_Elmore on Protest against Meta's irreversible proliferation (Sept 29, San Francisco) · 2023-09-20T00:29:28.100Z · LW · GW

I actually did not realize they released the base model. There's research showing how easy it is to remove the safety fine-tuning, which is where I got the framing and probably Zvi too, but perhaps that was more of a proof of concept than the main concern in this case. 

The concept of being able to remove fine-tuning is pretty important for safety, but I will change my wording where possible to also mention it being bad to release the base model without any safety fine-tuning. Just asked to download llama 2 so I'll see what options they give.

Comment by Holly_Elmore on Contra Yudkowsky on Epistemic Conduct for Author Criticism · 2023-09-13T19:42:54.611Z · LW · GW

Yeah, it felt like Eliezer was rounding off all of the bad faith in the post to this one stylistic/etiquette breach, but he didn't properly formulate the one rule that was supposedly violated. 

Comment by Holly_Elmore on Introducing the Center for AI Policy (& we're hiring!) · 2023-08-30T05:37:00.116Z · LW · GW

Sorry, what harmful thing would this proposal do? Require people to have licenses to fine-tune llama 2? Why is that so crazy?

Comment by Holly_Elmore on Introducing the Center for AI Policy (& we're hiring!) · 2023-08-28T21:18:57.071Z · LW · GW

I endorse!

Comment by Holly_Elmore on Holly_Elmore's Shortform · 2023-08-19T19:28:23.131Z · LW · GW

A weakness I often observe in my numerous rationalist friends is "rationalizing and making excuses to feel like doing the intellectually cool thing is the useful or moral thing". Fwiw. If you want to do the cool thing, own it, own the consequences, and own the way that changes how you can honestly see yourself.

Comment by Holly_Elmore on Consciousness as a conflationary alliance term for intrinsically valued internal experiences · 2023-07-12T21:19:46.004Z · LW · GW

Say more?

Unless you’re endorsing illusionism or something I don’t understand how people disagreeing about the nature of consciousness means the hard problem is actually a values issue. There’s still the issue of qualia or why it is “like” anything to have experiences when all the same actions could be accomplished without that. I don’t see how people having different ideas of what consciousness refers to or what is morally valuable about that makes the Hard Problem any less hard.

Comment by Holly_Elmore on Consciousness as a conflationary alliance term for intrinsically valued internal experiences · 2023-07-12T21:14:47.510Z · LW · GW

I liked the post and the general point, but I think the different consciousness concepts are more unified than you’re giving them credit. Few of them could apply to rocks or automata. They all involve sone idea of awareness or perception. And some of them seemed to be trying to describe the experience of being conscious (vestibular sense, sense behind the eyes) rather than consciousness itself.

I know there are very real differences in some people’s ideas of consciousness and especially what they value about it, but I suspect a lot of the differences you found are more the result of struggling to operationalize what’s important about consciousness than that kind of deep disagreement.

Comment by Holly_Elmore on Douglas Hofstadter changes his mind on Deep Learning & AI risk (June 2023)? · 2023-07-05T20:32:46.028Z · LW · GW

From how the quote looks, I think his gripe is with the possibility of in-context learning, where human-like learning happens without anything about how the network works (neither its weights nor previous token states) being ostensibly updated.

I don't understand this. Something is being updated when humans or LLMs learn, no?

Comment by Holly_Elmore on Holly_Elmore's Shortform · 2023-07-03T19:15:15.888Z · LW · GW

Also the short term harms of AI aren't lies?

Comment by Holly_Elmore on Holly_Elmore's Shortform · 2023-07-03T19:14:54.370Z · LW · GW

For many people the answer to that hypothetical is yes.

For a handful of people, a large chunk of them on this website, the answer is yes. Most people don't think life extension is possible for them and it isn't their first concern about AGI. I would bet the majority of people would not want to gamble with the possibility of everyone dying of an AGI because it might under a subset of scenarios extend their lives. 

Comment by Holly_Elmore on Douglas Hofstadter changes his mind on Deep Learning & AI risk (June 2023)? · 2023-07-03T19:09:25.697Z · LW · GW

But one thing that has completely surprised me is that these LLMs and other systems like them are all feed-forward. It's like the firing of the neurons is going only in one direction. And I would never have thought that deep thinking could come out of a network that only goes in one direction, out of firing neurons in only one direction. And that doesn't make sense to me, but that just shows that I'm naive.

What was the argument that being feed-forward limited the potential for deep thought in principle? It makes sense that multi-directional nets could do more with fewer neurons but Hofstader seemed to think there were things that feed-forward system fundamentally couldn't do. 

Comment by Holly_Elmore on Holly_Elmore's Shortform · 2023-06-27T04:11:50.549Z · LW · GW

I see the cost of simply delaying life extension technology as totally acceptable to avoid catastrophe. It’s more difficult to contemplate losing it entirely, but I don’t think think a pause would cause that. It might put it out of reach for people alive today, but what are you going to do about that? Gamble the whole future of humanity?

Comment by Holly_Elmore on Holly_Elmore's Shortform · 2023-06-18T20:02:28.936Z · LW · GW

How does a pause let us ignore bad directions?

Comment by Holly_Elmore on Holly_Elmore's Shortform · 2023-06-18T19:59:39.145Z · LW · GW

I don’t recall much conversation about regulation after Dying with Dignity. At the time, I was uncritically accepting the claim that, since this issue was outside of the Overton window, that just wasn’t an option. I do remember a lot of baleful talk about how we’re going to die.

I just don’t understand how anyone who believed in Dying with Dignity would consider regulation too imperfect a solution. Why would you not try? What are you trying to preserve if you think we're on track to die with no solution to alignment in sight? Even if you don’t think regulation will accomplish the goal of saving civilization, isn’t shooting your shot anyway what “dying with dignity” means?

Comment by Holly_Elmore on Holly_Elmore's Shortform · 2023-06-18T11:54:19.022Z · LW · GW

For years, I’ve been worried that we were doomed to die by misaligned AGI because alignment wasn’t happening fast enough or maybe because it wouldn’t work at all. Since I didn’t have the chops to do technical alignment and I didn’t think there was another option, I was living my life for the worlds where disaster didn’t happen (or hadn’t happened yet) and trying to make them better places. The advent of AI Pause as an option— something the public and government might actually hear and act on— has been extremely hopeful for me. I’ve quit my job in animal welfare to devote myself to it.

So I’m confused by the reticence I’m seeing toward Pause from people who, this time last year, were reconciling themselves to “dying with dignity”. Some people think the Pause would somehow accelerate capabilities or make gains on capabilities, which at least make sense to me as a reason not to support it. But I’ve gotten responses that make no sense to me like “every day we wait to make AGI, more galaxies are out of our reach forever”. More than one person has said to me that they are worried that “AGI will never get built” if a pause is successful. (For the record I think is is very unlikely that humanity will not eventually make AGI at this point unless another catastrophe sets back civilization.) This is sometimes coming from the same people who were mourning our species’ extinction as just a matter of time before the Pause option arose. I keep hearing comparisons to nuclear power and ridicule of people who were overcautious about new technology.

What gives? If you’re not excited about Pause, can you tell me why?

Comment by Holly_Elmore on Aligned Behavior is not Evidence of Alignment Past a Certain Level of Intelligence · 2022-12-05T16:33:08.335Z · LW · GW

This makes a good point!

The only thing that I think constrains the ability to deceive in a simulation which I don't see mentioned here are energy/physical constraints. It's my assumption (could be wrong with very high intelligence, numbers, and energy) that it's harder, even if only by a tiny, tiny bit to answer the simulation trials deceptively than it is to answer honestly. So I think if the simulation is able to ask enough questions/perform enough trials, it will eventually see time differences in the responses of different programs, with unaligned programs on average taking longer to get the correct answers. So I don't think it's fundamentally useless to test program behavior in simulations to assess utility function if there is some kind of constraint involved like time it takes to executes steps of each algorithm. 

Comment by Holly_Elmore on Seeking beta readers who are ignorant of biology but knowledgeable about AI safety · 2022-08-08T15:59:34.485Z · LW · GW

Awesome, thank you! Want to PM me your email?

Comment by Holly_Elmore on Seeking beta readers who are ignorant of biology but knowledgeable about AI safety · 2022-07-30T15:15:57.344Z · LW · GW

Awesome! Would you mind sending me the email address where you'd like to get the google doc invite? I should be sending it out sometime next week.

Comment by Holly_Elmore on Seeking beta readers who are ignorant of biology but knowledgeable about AI safety · 2022-07-30T15:14:54.000Z · LW · GW

Sweet! Would you mind PMing me the email address you'd like the google doc sent to? You should be getting in around a week.

Comment by Holly_Elmore on Virtue signaling is sometimes the best or the only metric we have · 2022-05-01T02:29:55.301Z · LW · GW

I think you should consider the legibility of the signals you send, but that should flow from a desire to monitor yourself so you can improve and be consistent with your higher goals. I feel like you’re assuming virtue signal means manipulative signal, and I suppose that’s my fault for taking a word whose meaning seems to have been too tainted and not being explicit about trying to reclaim it more straightforwardly as “emissions of a state of real virtue”.

Maybe in your framework it would be more accurate to say to LWers: “Don’t fall into the bad virtue signal of not doing anything legibly virtuous or with the intent of being virtuous. Doing so can make it easy to deceive yourself and unnecessarily hard to cooperate with others.”

It seems like the unacknowledged virtue signals among rationalists are 1) painful honesty, including erring on the side of the personally painful course of action when it’s not clear which is most honest and dogpiling on any anyone who seems to use PR, and 2) unhesitant updating (goodharting “shut up and multiply”) that doesn’t indulge qualms of the intuition. If they could just stop doing these then I think they might be more inclined to use the legible virtue signals I’m advocating as a tool, or at the very least they would focus on developing other aspects of character.

I also think if thinking about signaling is too much of a mindfuck (and it has obviously been a serious mindfuck for the community) that not thinking about it and focusing on being good, as you’re suggesting, can be a great solution.

Comment by Holly_Elmore on Virtue signaling is sometimes the best or the only metric we have · 2022-04-30T16:56:50.111Z · LW · GW

Suggestions for new terms and strategies for preventing them being co-opted too?

Comment by Holly_Elmore on Virtue signaling is sometimes the best or the only metric we have · 2022-04-30T16:55:50.180Z · LW · GW

I think it's too early to say the true meaning of virtue signal is now tribal signal. I wish to reclaim the word before that happens. At the very least I want people to trip on the phrase a little when they reach for it lazily, because the idea of signaling genuine virtue is not so absurd that it could only be meant ironically. 

Comment by Holly_Elmore on Virtue signaling is sometimes the best or the only metric we have · 2022-04-30T16:53:34.779Z · LW · GW

> If people optimize to gain status by donating and being vegan, you can't trust people who donate and are vegan to do moves that cost them status but that would result in other positive ends.

How are people supposed to know their moves are socially positive? 

Also I'm not saying to make those things the only markers of status. You seem to want to optimize for costly signals of "honesty", which I worry is being goodharted in this conversation.

Comment by Holly_Elmore on Virtue signaling is sometimes the best or the only metric we have · 2022-04-30T16:50:05.432Z · LW · GW

Editing pictures that you publish on your own website to remove uncomfortable information, is worse than just not speaking about certain information. It would be possible to simply not publish the photo. Deciding to edit it to remove information is a conscious choice that's a signal.

I don't know this full situation or what I would conclude about it but I don't think your interpretation is QED on its face. Like I said, I feel like it is potentially more dishonest or misleading to seem to endorse Leverage. Idk why they didn't just not post the pictures at all, which seems the least potentially confusing or deceptive, but the fact that they didn't doesn't lead me to conclude dishonesty without knowing more.

I actually think LWers tend toward the bad kind of virtue signaling with honesty, and they tend to define honesty as not doing themselves any favors with communication. (Makes sense considering Hanson's foundational influence.)

Comment by Holly_Elmore on Virtue signaling is sometimes the best or the only metric we have · 2022-04-30T16:35:00.959Z · LW · GW

Yes

Comment by Holly_Elmore on Virtue signaling is sometimes the best or the only metric we have · 2022-04-29T13:43:18.714Z · LW · GW

In general, I don’t fully agree with rationalist culture about what is demanded by honesty. Like that Leverage example doesn’t sound obviously bad to me— maybe they just don’t want to promote Leverage or confuse anyone about their position on Leverage instead of creating a historical record, as you seem to take to be the only legitimate goal? (Unless you mean the most recent EA Global in which case that would seem more like a cover-up.)

The advantage of pre-commitment virtue signals is that you don’t have to interpret them through the lens of your values to know whether the person fulfilled them or not. Most virtue signals depend on whether you agree the thing is a virtue, though, and when you have a very specific flavor of a virtue like honesty then that becomes ingroup v neargroup-defining.

Comment by Holly_Elmore on Virtue signaling is sometimes the best or the only metric we have · 2022-04-29T13:31:05.880Z · LW · GW

Generally, signals for non-maziness often involve the willingness to create social tension with other people who are in the ingroup. That's qualitatively different than requiring people to engage in costly signals like veganism or taking the giving pledge as EAs.

I disagree— I would call social tension a cost. Willingness to risk social tension is not as legible of a signal, though, because it’s harder to track that someone is living up to a pre-commitment.

Comment by Holly_Elmore on Virtue signaling is sometimes the best or the only metric we have · 2022-04-29T13:27:57.541Z · LW · GW

I mentioned goodharting, of which Moloch-worshipping is a more specific case. I don’t share the skepticism in these comments that good virtue signaling is possible and that people can keep the spirit of the law in their hearts. I also reject the implicit solution to just not legibly measure our characters at all. I think that is interpreted a signal of virtue among LWers and it shouldn’t be.