Posts

Attention on AI X-Risk Likely Hasn't Distracted from Current Harms from AI 2023-12-21T17:24:16.713Z
Linkpost: Are Emergent Abilities in Large Language Models just In-Context Learning? 2023-10-08T12:14:46.825Z
Alpha 2023-07-01T16:05:55.940Z
The Prospect of an AI Winter 2023-03-27T20:55:35.619Z
Against LLM Reductionism 2023-03-08T15:52:38.741Z
Notes on Meta's Diplomacy-Playing AI 2022-12-22T11:34:27.384Z
Erich_Grunewald's Shortform 2022-09-16T10:42:21.833Z
Supposing Europe is headed for a serious energy crisis this winter, what can/should one do as an individual to prepare? 2022-08-31T19:28:31.070Z
Quick Summaries of Two Papers on Kant and Game Theory 2022-06-25T10:25:08.398Z
Why Rome? 2022-03-12T11:29:19.957Z
How Bad Is QWERTY, Really? A Review of the Literature, such as It Is 2022-01-15T12:07:55.390Z
How the Equivalent Bet Test Actually Works 2021-12-18T11:17:12.633Z
Scout Mindset and Latter-Day Saint Apostasy 2021-10-23T14:18:27.260Z
Prefer the British Style of Quotation Mark Punctuation over the American 2021-09-11T13:37:22.704Z
Moral Golems Assume Uncertainty 2021-08-28T14:05:38.909Z
Interview with Olle Häggström: Reason, COVID-19 and Academic Freedom in Sweden 2021-08-21T15:08:07.216Z
Utilitarianism Expressed in Julia 2021-08-15T11:31:51.109Z
Scoring My AstraZeneca Predictions 2021-08-06T11:52:43.427Z
How Can One Tell What Is Beautiful? 2021-05-15T12:55:25.109Z
Interview with Christine M. Korsgaard: Animal Ethics, Kantianism, Utilitarianism 2021-05-08T11:44:39.521Z
Networks of Meaning 2021-04-17T07:30:16.012Z
Rediscovery, the Mind's Curare 2021-04-10T07:42:11.348Z
Moral Golems 2021-04-03T10:12:59.152Z
Can a Vegan Diet Be Healthy? A Literature Review 2021-03-12T12:47:50.590Z

Comments

Comment by Erich_Grunewald on What I Would Do If I Were Working On AI Governance · 2023-12-08T16:00:18.796Z · LW · GW

Fwiw, there is also AI governance work that is neither policy nor lab governance, in particular trying to answer broader strategic questions that are relevant to governance, e.g., timelines, whether a pause is desirable, which intermediate goals are valuable to aim for, and how much computing power Chinese actors will have access to. I guess this is sometimes called "AI strategy", but often the people/orgs working on AI governance also work on AI strategy, and vice versa, and they kind of bleed into each other.

How do you feel about that sort of work relative to the policy work you highlight above?

Comment by Erich_Grunewald on “Why can’t you just turn it off?” · 2023-11-22T15:44:08.303Z · LW · GW

Open Philanthropy did donate $30M to OpenAI in 2017, and got in return the board seat that Helen Toner occupied until very recently. However, that was when OpenAI was a non-profit, and was done in order to gain some amount of oversight and control over OpenAI. I very much doubt any EA has donated to OpenAI unconditionally, or at all since then.

Comment by Erich_Grunewald on Integrity in AI Governance and Advocacy · 2023-11-07T10:58:28.701Z · LW · GW

They often do things of the form "leaving out info, knowing this has misleading effects"

On that, here are a few examples of Conjecture leaving out info in what I think is a misleading way.

(Context: Control AI is an advocacy group, launched and run by Conjecture folks, that is opposing RSPs. I do not want to discuss the substance of Control AI’s arguments -- nor whether RSPs are in fact good or bad, on which question I don’t have a settled view -- but rather what I see as somewhat deceptive rhetoric.)

One, Control AI’s X account features a banner image with a picture of Dario Amodei (“CEO of Anthropic, $2.8 billion raised”) saying, “There’s a one in four chance AI causes human extinction.” That is misleading. What Dario Amodei has said is, “My chance that something goes really quite catastrophically wrong on the scale of human civilisation might be somewhere between 10-25%.” I understand that it is hard to communicate uncertainty in advocacy, but I think it would at least have been more virtuous to use the middle of that range (“one in six chance”), and to refer to “global catastrophe” or something rather than “human extinction”.

Two, Control AI writes that RSPs like Anthropic’s “contain wording allowing companies to opt-out of any safety agreements if they deem that another AI company may beat them in their race to create godlike AI”. I think that, too, is misleading. The closest thing Anthropic’s RSP says is:

However, in a situation of extreme emergency, such as when a clearly bad actor (such as a rogue state) is scaling in so reckless a manner that it is likely to lead to imminent global catastrophe if not stopped (and where AI itself is helpful in such defense), we could envisage a substantial loosening of these restrictions as an emergency response. Such action would only be taken in consultation with governmental authorities, and the compelling case for it would be presented publicly to the extent possible.

Anthropic’s RSP is clearly only meant to permit labs to opt out when any other outcome very likely leads to doom, and for this to be coordinated with the government, with at least some degree of transparency. The scenario is not “DeepMind is beating us to AGI, so we can unilaterally set aside our RSP”, but more like “North Korea is beating us to AGI, so we must cooperatively set aside our RSP”.

Relatedly, Control AI writes that, with RSPs, companies “can decide freely at what point they might be falling behind – and then they alone can choose to ignore the already weak” RSPs. But part of the idea with RSPs is that they are a stepping stone to national or international policy enforced by governments. For example, ARC and Anthropic both explicitly said that they hope RSPs will be turned into standards/regulation prior to the Control AI campaign. (That seems quite plausible to me as a theory of change.) Also, Anthropic commits to only updating its RSP in consultation with its Long-Term Benefit Trust (consisting of five people without any financial interest in Anthropic) -- which may or may not work well, but seems sufficiently different from Anthropic being able to “decide freely” when to ignore its RSP that I think Control AI’s characterisation is misleading. Again, I don't want to discuss the merits of RSPs, I just think Control AI is misrepresenting Anthropic's and others' positions.

Three, Control AI seems to say that Anthropic’s advocacy for RSPs is an instance of safetywashing and regulatory capture. (Connor Leahy: “The primary aim of responsible scaling is to provide a framework which looks like something was done so that politicians can go home and say: ‘We have done something.’ But the actual policy is nothing.” And also: “The AI companies in particular and other organisations around them are trying to capture the summit, lock in a status quo of an unregulated race to disaster.”) I don’t know exactly what Anthropic’s goals are -- I would guess that its leadership is driven by a complex mixture of motivations -- but I doubt it is so clear-cut as Leahy makes it out to be.

To be clear, I think Conjecture has good intentions, and wants the whole AI thing to go well. I am rooting for its safety work and looking forward to seeing updates on CoEm. And again, I personally do not have a settled view on whether RSPs like Anthropic’s are in fact good or bad, or on whether it is good or bad to advocate for them – it could well be that RSPs turn out to be toothless, and would displace better policy – I only take issue with the rhetoric.

(Disclosure: Open Philanthropy funds the organisation I work for, though the above represents only my views, not my employer’s.)

Comment by Erich_Grunewald on Propaganda or Science: A Look at Open Source AI and Bioterrorism Risk · 2023-11-03T10:30:47.439Z · LW · GW

I think it is reasonable to treat this as a proxy for the state of the evidence, because lots of AI policy people specifically praised it as a good and thoughtful paper on policy.

All four of those AI policy people are coauthors on the paper -- that does not seem like good evidence that the paper is widely considered good and thoughtful, and therefore a good proxy (though I think it probably is an ok proxy).

Comment by Erich_Grunewald on Propaganda or Science: A Look at Open Source AI and Bioterrorism Risk · 2023-11-03T10:20:13.994Z · LW · GW

When Jeff Kaufman shared one of the papers discussed here on the EA Forum, there was a highly upvoted comment critical of the paper (more upvoted than the post itself). That would suggest to me that this post would be fairly well received on the EA Forum, though its tone is definitely more strident than that comment, so maybe not.

Comment by Erich_Grunewald on Lying is Cowardice, not Strategy · 2023-10-24T13:59:16.647Z · LW · GW

ARC & Open Philanthropy state in a press release “In a sane world, all AGI progress should stop. If we don’t, there’s more than a 10% chance we will all die.”

Could you spell out what you mean by "in a sane world"? I suspect a bunch of people you disagree with do not favor a pause due to various empirical facts about the world (e.g., there being competitors like Meta).

Comment by Erich_Grunewald on EA Vegan Advocacy is not truthseeking, and it’s everyone’s problem · 2023-10-21T15:31:49.259Z · LW · GW

Well, it's not like vegans/vegetarians are some tiny minority in EA. Pulling together some data from the 2022 ACX survey, people who identify as EA are about 40% vegan/vegetarian, and about 70% veg-leaning (i.e., vegan, vegetarian, or trying to eat less meat and/or offsetting meat-eating for moral reasons). (That's conditioning on identifying as an LW rationalist, since anecdotally I think being vegan/vegetarian is somewhat less common among Bay Area EAs, and the ACX sample is likely to skew pretty heavily rationalist, but the results are not that different if you don't condition.)

ETA: From the 2019 EA survey, 46% of EAs are vegan/vegetarian and 77% veg-leaning.

Comment by Erich_Grunewald on I'm a Former Israeli Officer. AMA · 2023-10-10T11:15:57.406Z · LW · GW

Israel's strategy since the Hamas took the strip over in 2007 has been to try and contain it, and keeping it weak by periodic, limited confrontations (the so called Mowing the Lawn doctorine), and trying to economically develop the strip in order to give Hamas incentives to avoid confrontation. While Hamas grew stronger, the general feeling was that the strategy works and the last 15 years were not that bad.

I am surprised to read the bolded part! What actions have the Israeli government taken to develop Gaza, and did Gaza actually develop economically in that time? (That is not a rhetorical question -- I know next to nothing about this.)

Looking quickly at some stats, real GDP per capita seems to have gone up a bit since 2007, but has declined since 2016, and its current figure ($5.6K in 2021) is lower than e.g., Angola, Bangladesh, and Venezuela.

Qualitatively, the blockade seems to have been net negative for Gaza's economic development. NYT writes:

The Palestinian territory of Gaza has been under a suffocating Israeli blockade, backed by Egypt, since Hamas seized control of the coastal strip in 2007. The blockade restricts the import of goods, including electronic and computer equipment, that could be used to make weapons and prevents most people from leaving the territory.

More than two million Palestinians live in Gaza. The tiny, crowded coastal enclave has a nearly 50 percent unemployment rate, and Gaza’s living conditions, health system and infrastructure have all deteriorated under the blockade.

But that is a news report, so we should take it with a grain of salt.

Comment by Erich_Grunewald on EA Vegan Advocacy is not truthseeking, and it’s everyone’s problem · 2023-10-01T18:08:51.930Z · LW · GW

Assuming you have the singular "you" in mind, no, I do not think I am not running a motte and bailey. I said above that if you accept the assumptions, I think using the ranges as (provisional, highly uncertain) moral weights is pretty reasonable, but I also think it's reasonable to reject the assumptions. I do think it is true that some people have (mis)interpreted the report and made stronger claims than is warranted, but the report is also full of caveats and (I think) states its assumptions and results clearly.

The report:

Instead, we’re usually comparing either improving animal welfare (welfare reforms) or preventing animals from coming into existence (diet change → reduction in production levels) with improving human welfare or saving human lives.

Yes, the report is intended to guide decision-making in this way. It is not intended to provide a be-all-end-all estimate. The results still need to be interpreted in the context of the assumptions (which are clearly stated up front). I would take it as one input when making decisions, not the only input.

The post's response to the heading "So you’re saying that one person = ~three chickens?" is, no, that's just the year to year of life comparison, chickens have shorter lives than humans so the life-to-life comparison is more like 1/16. Absolutely insane.

No, that is not the post's response to that heading. It also says: "No. We’re estimating the relative peak intensities of different animals’ valenced states at a given time. So, if a given animal has a welfare range of 0.5 (and we assume that welfare ranges are symmetrical around the neutral point), that means something like, 'The best and worst experiences that this animal can have are half as intense as the best and worst experiences that a human can have' [...]" There is a difference between comparing the most positive/negative valenced states an animal can achieve and their moral worth.

The report says that somehow, people should still mostly accept Rethinking Priotities' conclusions even if they disagree with the assumptions:

“I don't share this project’s assumptions. Can't I just ignore the results?" We don’t think so. First, if unitarianism is false, then it would be reasonable to discount our estimates by some factor or other. However, the alternative—hierarchicalism, according to which some kinds of welfare matter more than others or some individuals’ welfare matters more than others’ welfare—is very hard to defend.

I think I disagree with your characterization, but it depends a bit on what you mean by "mostly". The report makes a weaker claim, that if you don't accept the premises, you shouldn't totally ignore the conclusions (as opposed to "mostly accepting" the conclusions). The idea is that even if you don't accept hedonism, it would be weird if capacity for positively/negatively valenced experiences didn't matter at all when determining moral weights. That seems reasonable to me and I don't really see the issue?

So if you factor in life span (taking 2 months for a drone) and do the ⅔ reduction for not accepting hedonism, you get a median of 1 human life = ~20K bee lives, given the report's other assumptions. That's 3 OOMs more than what Richard Kennaway wrote above.

In response to someone commenting in part:

saving human lives is net positive

The post author's reply is:

This is a very interesting result; thanks for sharing it. I've heard of others reaching the same conclusion, though I haven't seen their models. If you're willing, I'd love to see the calculations. But no pressure at all.

I am not sure what you are trying to say here, could you clarify?

Comment by Erich_Grunewald on EA Vegan Advocacy is not truthseeking, and it’s everyone’s problem · 2023-10-01T10:30:46.186Z · LW · GW

e.g. 12 (ETA: 14) bees are worth 1 human

This is a misrepresentation of what the report says. The report says that, conditional on hedonism, valence symmetry, the animals being sentient, and other assumptions, the intensity of positive/negative valence that a bee can experience is 7% that of the positive/negative intensity that a human can experience. How to value creatures based on the intensities of positively/negatively valenced states they are capable of is a separate question, even if you fully accept the assumptions. (ETA: If you assume utilitarianism and hedonism etc., I think it is pretty reasonable to anchor moral weight (of a year of life) in range of intensity of positive/negative valence, while of course keeping the substantial uncertainties around all this in mind.)

On bees in particular, the authors write:

We also find it implausible that bees have larger welfare ranges than salmon. But (a) we’re also worried about pro-vertebrate bias; (b) bees are really impressive; (c) there's a great deal of overlap in the plausible welfare ranges for these two types of animals, so we aren't claiming that their welfare ranges are significantly different; and (d) we don’t know how to adjust the scores in a non-arbitrary way. So, we’ve let the result stand.

I think when engaging in name-calling ("batshit crazy animal rights folks") it is especially important to get things right.

(COI: The referenced report was produced by my employer, though a different department.)

Comment by Erich_Grunewald on How did you make your way back from meta? · 2023-09-08T14:20:11.921Z · LW · GW

I think this is a productivity/habit question disguised as something else. You know you want to do thing X, but instead procrastinate by doing thing Y. Here are some concrete suggestions for getting out of this trap:

  • Try Focusmate. Sign up and schedule a session. The goal of your first session will be to come up with a concrete project/exercise to do, if you have not already done so. The goal of your second session will be to make some progress on that project/exercise (e.g., write 1 page).
    • You can also use the same accountability technique with a friend, but Focusmate is probably easier since you can always schedule a session whenever you want, and you will feel more obliged to focus in the presence of a stranger.
    • I often start my day with scheduling Focusmate sessions. It is easier to schedule a session for future you to be productive during, and then to stick to that commitment, than to start being productive right away.
  • Try Beeminder. Sign up and set a goal to write object-level things for at least N minutes each day. If you fail to do so, Beeminder will charge you money. (I think N can be small -- the difficult thing is to get started on the right task.)
  • Try other accountability devices. For example, tell a friend or partner that you commit to doing N minutes of object-level writing each week, and that you will report your progress to them weekly. If you did not do what you committed to, brainstorm ways to make it more likely that you do so next week.
Comment by Erich_Grunewald on A quick update from Nonlinear · 2023-09-07T22:54:07.184Z · LW · GW

Kelsey Piper wrote this comment on the EA Forum:

It could be that I am misreading or misunderstanding these screenshots, but having read through them a couple of times trying to parse what happened, here's what I came away with:

On December 15, Alice states that she'd had very little to eat all day, that she'd repeatedly tried and failed to find a way to order takeout to their location, and tries to ask that people go to Burger King and get her an Impossible Burger which in the linked screenshots they decline to do because they don't want to get fast food. She asks again about Burger King and is told it's inconvenient to get there.  Instead, they go to a different restaurant and offer to get her something from the restaurant they went to. Alice looks at the menu online and sees that there are no vegan options. Drew confirms that 'they have some salads' but nothing else for her. She assures him that it's fine to not get her anything.


It seems completely reasonable that Alice remembers this as 'she was barely eating, and no one in the house was willing to go out and get her nonvegan foods' - after all, the end result of all of those message exchanges was no food being obtained for Alice and her requests for Burger King being repeatedly deflected with 'we are down to get anything that isn't fast food' and 'we are down to go anywhere within a 12 min drive' and 'our only criteria is decent vibe + not fast food', after which she fails to find a restaurant meeting those (I note, kind of restrictive if not in a highly dense area) criteria and they go somewhere without vegan options and don't get her anything to eat. 

It also seems totally reasonable that no one at Nonlinear understood there was a problem. Alice's language throughout emphasizes how she'll be fine, it's no big deal, she's so grateful that they tried (even though they failed and she didn't get any food out of the 12/15 trip, if I understand correctly). I do not think that these exchanges depict the people at Nonlinear as being cruel, insane, or unusual as people. But it doesn't seem to me that Alice is lying to have experienced this as 'she had covid, was barely eating, told people she was barely eating, and they declined to pick up Burger King for her because they didn't want to go to a fast food restaurant, and instead gave her very limiting criteria and went somewhere that didn't have any options she could eat'.

On December 16th it does look like they successfully purchased food for her. 

My big takeaway from these exchanges is not that the Nonlinear team are heartless or insane people, but that this degree of professional and personal entanglement and dependence, in a foreign country, with a young person, is simply a recipe for disaster. Alice's needs in the 12/15 chat logs are acutely not being met. She's hungry, she's sick, she conveys that she has barely eaten, she evidently really wants someone to go to BK and get an impossible burger for her, but (speculatively) because of this professional/personal entanglement, she lobbies for this only by asking a few times why they ruled out Burger King, and ultimately doesn't protest when they instead go somewhere without food she can eat, assuring them it's completely fine. This is also how I relate to my coworkers, tbh - but luckily, I don't live with them and exclusively socialize with them and depend on them completely when sick!!

Given my experience with talking with people about strongly emotional events, I am inclined towards the interpretation where Alice remembers the 15th with acute distress and remembers it as 'not getting her needs met despite trying quite hard to do so', and the Nonlinear team remembers that they went out of their way that week to get Alice food - which is based on the logs from the 16th clearly true! But I don't think I'd call Alice a liar based on reading this, because she did express that she'd barely eaten and request apologetically for them to go somewhere she could get vegan food (with BK the only option she'd been able to find) only for them to refuse BK because of the vibes/inconvenience.

To which Kat Woods replied:

We definitely did not fail to get her food, so I think there has been a misunderstanding - it says in the texts below that Alice told Drew not to worry about getting food because I went and got her mashed potatoes. Ben mentioned the mashed potatoes in the main post, but we forgot to mention it again in our comment - which has been updated

The texts involved on 12/15/21:

I also offered to cook the vegan food we had in the house for her.

I think that there's a big difference between telling everyone "I didn't get the food I wanted, but they did get/offer to cook me vegan food, and I told them it was ok!" and "they refused to get me vegan food and I barely ate for 2 days".

Also, re: "because of this professional/personal entanglement" - at this point, Alice was just a friend traveling with us. There were no professional entanglements.

Comment by Erich_Grunewald on The U.S. is becoming less stable · 2023-08-19T14:15:59.529Z · LW · GW

Some possibly relevant data:

  • As of 2020, anti-government protests in North America rose steadily from 2009 to 2017 where it peaked (at ~7x the 2009 number) and started to decline (to ~4x the 2009 number in 2019).
  • Americans' trust in the US government is very low (only ~20% say they trust the USG to do what's right most of the time) and has been for over a decade. It seems to have locally peaked at ~50% after 9/11, and then declined to ~15% in 2010, after the financial crisis.
  • Congressional turnover rates have risen somewhat since the 90s, and are now at about the same level as in the 1970s.
  • Congress seems to pass fewer bills every year since at least the mid-1970s (though apparently bottoming out in 2011, following the 2010 red wave midterms).
  • The volume of executive orders seems fairly stable or even declining since WWII.
  • DSA membership is down to 85K in 2023 from a peak of 95K in 2021. I can't think of an analogous right-wing group that publishes membership numbers.
Comment by Erich_Grunewald on A brief history of computers · 2023-07-24T10:35:44.884Z · LW · GW

Actually Charles Babbage was not trying to disrupt the industry of printed logarithmic tables, he was trying to print accurate tables.

Hmm, Babbage wanted to remove errors from tables by doing the calculations by steam. He was also concerned with how tedious and time-consuming those calculations were, though, and I guess the two went hand in hand. ("The intolerable labour and fatiguing monotony of a continued repetition of similar arithmetical calculation, first excited the desire and afterwards suggested the idea, of a machine, which, by the aid of gravity or any other moving power, should become a substitute for one of the lower operations of human intellect. [...] I think I am justified in presuming that if engines were made purposely for this object, and were afterwards useless, the tables could be produced at a much cheaper rate; and of their superior accuracy there could be no doubt.") I think that fits "disrupt" if defined something like "causing radical change in (an industry or market) by means of innovation".

Comment by Erich_Grunewald on A brief history of computers · 2023-07-19T08:56:51.822Z · LW · GW

Great post!

But let's back up and get some context first. The year was 1812, and mathematical tables were a thing.

What are mathematical tables, you ask? Imagine that you need to do some trigonometry. What's sin(79)?

Well, today you'd just look it up online. 15 years ago you'd probably grab your TI-84 calculator. But in the year 1812, you'd have to consult a mathematical table. Something like this:

They'd use computers to compute all the values and write them down in books. Just not the type of computers you're probably thinking of. No, they'd use human computers.

Interestingly, humans having to do a lot of calculation manually was also how John Napier discovered the logarithm in the 17th century. The logarithm reduces the task of multiplication to the much faster and less error-prone task of addition. Of course that meant you also needed to get the logarithms of numbers, so it in turn spawned an industry of printed logarithmic tables (which Charles Babbage later tried to disrupt with his Difference Engine).

Comment by Erich_Grunewald on Elon Musk announces xAI · 2023-07-14T21:23:21.762Z · LW · GW

I think your analysis makes sense if using a "center" name really should require you to have some amount of eminence or credibility first. I've updated a little bit in that direction now, but I still mostly think it's just synonymous with "institute", and on that view I don't care if someone takes a "center" name (any more than if someone takes an "institute" name). It's just, you know, one of the five or so nouns non-profits and think tanks use in their names ("center", "institute", "foundation", "organization", "council", blah).

Or actually, maybe it's more like I'm less convinced that there's a common pool of social/political capital that CAIS is now spending from. I think the signed statement has resulted in other AI gov actors now having higher chances of getting things done. I think if the statement had been not very successful, it wouldn't have harmed those actors' ability to get things done. (Maybe if it was really botched it would've, but then my issue would've been with CAIS's botching the statement, not with their name.)

I guess I also don't really buy that using "center" spends from this pool (to the extent that there is a pool). What's the scarce resource it's using? Policy-makers' time/attention? Regular people's time/attention? Or do people only have a fixed amount of respect or credibility to accord various AI safety orgs? I doubt, for example, that other orgs lost out on opportunities to influence people, or inform policy-makers, due to CAIS's actions. I guess what I'm trying to say is I'm a bit confused about your model!

Btw, in case it matters, the other examples I had in mind were Center for Security and Emerging Technology (CSET) and Centre for the Governance of AI (GovAI).

Comment by Erich_Grunewald on Elon Musk announces xAI · 2023-07-14T19:59:53.302Z · LW · GW

The only criticism of you and your team in the OP is that you named your team the "Center" for AI Safety, as though you had much history leading safety efforts or had a ton of buy-in from the rest of the field.

Fwiw, I disagree that "center" carries these connotations. To me it's more like "place where some activity of a certain kind is carried out", or even just a synonym of "institute". (I feel the same about the other 5-10 EA-ish "centers/centres" focused on AI x-risk-reduction.) I guess I view these things more as "a center of X" than "the center of X". Maybe I'm in the minority on this but I'd be kind of surprised if that were the case.

Comment by Erich_Grunewald on The Seeker’s Game – Vignettes from the Bay · 2023-07-10T14:58:23.606Z · LW · GW

It's valuable to flag the causal process generating an idea, but it's also valuable to provide legible argumentation, because most people can't describe the factors which led them to their beliefs in sufficient detail to actually be compelling.

To add to that, trying to provide legible argumentation can also be good because it can convince you that your idea actually doesn't make sense, or doesn't make sense as stated, if that is indeed the case.

Comment by Erich_Grunewald on The Seeker’s Game – Vignettes from the Bay · 2023-07-10T14:55:06.326Z · LW · GW

Have you considered writing (more) shortforms instead? If not, this comment is a modest nudge for you to consider doing so.

Comment by Erich_Grunewald on Ways I Expect AI Regulation To Increase Extinction Risk · 2023-07-04T20:08:40.871Z · LW · GW
  1. Seems to me like (a), (b) and maybe (d) are true for the airplane manufacturing industry, to some degree.
  2. But I'd still guess that flying is safer with substantial regulation than it would be in a counterfactual world without substantial regulation.

That would seem to invalidate your claim that regulation would make AI x-risk worse. Do you disagree with (1), and/or with (2), and/or see some important dissimilarities between AI and flight that make a difference here?

Comment by Erich_Grunewald on Douglas Hofstadter changes his mind on Deep Learning & AI risk (June 2023)? · 2023-07-03T13:08:08.398Z · LW · GW

It's not clear whether that will mean the end of humanity in the sense of the systems we've created destroying us. It's not clear if that's the case, but it's certainly conceivable. If not, it also just renders humanity a very small phenomenon compared to something else that is far more intelligent and will become incomprehensible to us, as incomprehensible to us as we are to cockroaches.

It's interesting that he seems so in despair over this now. To the extent that he's worried about existential/catastrophic risks, I wonder if he is unaware of efforts to mitigate those, or if he is aware but thinks they are hopeless (or at least not guaranteed to succeed, which -- fair enough). To the extent that he's more broadly worried about human obsolescence (or anyway something more metaphysical), well, there are people trying to slow/stop AI, and others trying to enhance human capabilities -- maybe he's pessimistic about those efforts, too.

Comment by Erich_Grunewald on InternLM - China's Best (Unverified) · 2023-06-09T17:00:25.586Z · LW · GW

I’m confused about the parallelization part and what it implies. It says the model was trained on 2K GPUs but GPT-4 was probably trained on 1 OOM more than that right?

Comment by Erich_Grunewald on Transformative AGI by 2043 is <1% likely · 2023-06-07T19:16:30.346Z · LW · GW

They state that their estimated probability for each event is conditional on all previous events happening.

Comment by Erich_Grunewald on Transformative AGI by 2043 is <1% likely · 2023-06-06T23:39:47.860Z · LW · GW

I think this is an excellent, well-researched contribution and am confused about why it's not being upvoted more (on LW that is; it seems to be doing much better on EAF, interestingly).

Comment by Erich_Grunewald on GPT as an “Intelligence Forklift.” · 2023-05-24T16:18:13.116Z · LW · GW

I see, that makes sense. I agree that holding all else constant more neurons implies higher intelligence.

Comment by Erich_Grunewald on GPT as an “Intelligence Forklift.” · 2023-05-22T17:41:36.288Z · LW · GW

Within a particular genus or architecture, more neurons would be higher intelligence.

I'm not sure that's necessarily true? Though there's probably a correlation. See e.g. this post:

[T]he raw number of neurons an organism possesses does not tell the full story about information processing capacity. That’s because the number of computations that can be performed over a given amount of time in a brain also depends upon many other factors, such as (1) the number of connections between neurons, (2) the distance between neurons (with shorter distances allowing faster communication), (3) the conduction velocity of neurons, and (4) the refractory period which indicates how much time must elapse before a given neuron can fire again. In some ways, these additional factors can actually favor smaller brains (Chitka 2009).

Comment by Erich_Grunewald on What failure looks like · 2023-04-28T14:26:08.817Z · LW · GW

Once we start searching over policies that understand the world well enough, we run into a problem: any influence-seeking policies we stumble across would also score well according to our training objective, because performing well on the training objective is a good strategy for obtaining influence.

I'm slightly confused by this. It sounds like "(1) ML systems will do X because X will be rewarded according to the objective, and (2) X will be rewarded according to the objective because being rewarded will accomplish X". But (2) sounds circular -- I see that performing well on the training objective gives influence, but I would've thought only effects (direct and indirect) on the objective are relevant in determining which behaviors ML systems pick up, not effects on obtaining influence.

Maybe that's the intended meaning -- I'm just misreading this passage, but also maybe I'm missing some deeper point here?

Terrific post, by the way, still now four years later.

Comment by Erich_Grunewald on My Assessment of the Chinese AI Safety Community · 2023-04-25T18:10:42.087Z · LW · GW

I vaguely remember OpenAI citing US law as a reason they don't allow Chinese users access, maybe legislation passed as part of the chip ban?

Nah, the export controls don't cover this sort of thing. They just cover chips, devices that contain chips (i.e. GPUs and AI ASICs), and equipment/materials/software/information used to make those. (I don't know the actual reason for OpenAI's not allowing Chinese customers, though.)

Comment by Erich_Grunewald on My Assessment of the Chinese AI Safety Community · 2023-04-25T17:44:34.708Z · LW · GW

If only we could spread the meme of irresponsible Western powers charging head-first into building AGI without thinking through the consequences and how wise the Chinese regulation is in contrast.

That sort of strategy seems like it could easily backfire, where people only pick up the first part of that statement ("irresponsible Western powers charging head-first into building AGI") and think "oh, that means we need to speed up". Or maybe that's what you mean by "if only" -- that it's hard to spread even weakly nuanced messages?

Comment by Erich_Grunewald on Complex Systems are Hard to Control · 2023-04-04T18:12:57.651Z · LW · GW

Thanks, this analysis makes a lot of sense to me. Some random thoughts:

  • The lack of modularity in ML models definitely makes it harder to make them safe. I wonder if mechanistic interpretability research could find some ways around this. For example, if you could identify modules within transformers (kinda like circuits) along with their purpose(s), you could maybe test each module as an isolated component.
  • More generally, if we get to a point where we can (even approximately?) "modularize" an ML model, and edit these modules, we could maybe use something like Nancy Leveson's framework to make it safe (i.e. treat safety as a control problem, where a system's lower-level components impose constraints on its emergent properties, in particular safety).
  • Of course these things (1) depend on mechanistic interpretability research being far ahead of where it is now, and (2) wouldn't alone be enough to make safe AI, but would maybe help quite a bit.
Comment by Erich_Grunewald on Erich_Grunewald's Shortform · 2023-03-29T20:36:50.807Z · LW · GW

I'm really confused by this passage from The Six Mistakes Executives Make in Risk Management (Taleb, Goldstein, Spitznagel):

We asked participants in an experiment: “You are on vacation in a foreign country and are considering flying a local airline to see a special island. Safety statistics show that, on average, there has been one crash every 1,000 years on this airline. It is unlikely you’ll visit this part of the world again. Would you take the flight?” All the respondents said they would.

We then changed the second sentence so it read: “Safety statistics show that, on average, one in 1,000 flights on this airline has crashed.” Only 70% of the sample said they would take the flight. In both cases, the chance of a crash is 1 in 1,000; the latter formulation simply sounds more risky.

One crash every 1,000 years is only the same as one crash in 1,000 flights if there's exactly one flight per year on average. I guess they must have stipulated that in the experiment (of which there's no citation), because otherwise it's perfectly rational to suppose the first option is safer (since generally an airline serves >1 flight per year)?

Comment by Erich_Grunewald on The Prospect of an AI Winter · 2023-03-28T21:41:35.881Z · LW · GW

If there's a death penalty at play, I'd say yeah (though ofc traditionally "safety-critical" is used to refer to engineering systems only). But if it's a traffic ticket at play, I'd say no.

I'm going by something like the Wikipedia definition, where a safety-critical system is "a system whose failure or malfunction may result in one (or more) of the following outcomes: (a) death or serious injury to people, (b) loss or severe damage to equipment/property, and/or (c) [severe] environmental harm".

Comment by Erich_Grunewald on The Prospect of an AI Winter · 2023-03-28T18:09:01.045Z · LW · GW

Agree, I think the safety-critical vs not-safety-critical distinction is better for sorting out what semi-reliable AI systems will/won't be useful for.

Comment by Erich_Grunewald on The Prospect of an AI Winter · 2023-03-28T18:06:52.926Z · LW · GW

Footnote 4 seems relevant here!

It's interesting to note that the term AI winter was inspired by the notion of a nuclear winter. AI researchers in the 1980s used it to describe a calamity that would befall themselves, namely a lack of funding, and, true, both concepts involve stagnation and decline. But a nuclear winter happens after nuclear weapons are used.

Comment by Erich_Grunewald on The Prospect of an AI Winter · 2023-03-28T18:02:53.223Z · LW · GW

Thanks! Those papers are new to me; I'll have a look.

I'm willing to make bets against AI Winter before TAI, if anyone has a specific bet to propose...

I just want to call attention to the fact that my operationalisation ("a drawdown in annual global AI investment of ≥50%") is pretty inclusive (maybe too much so). I can imagine some scenarios where this happens and then we get TAI within 5 years after that anyway, or where this happens but it doesn't really look like a winter.

(Partly I did this to be more "charitable" to Eden -- to say, "AI winter seems pretty unlikely even on these pretty conservative assumptions", but I should probably have flagged the fact that "≥50% drawdown" is more inclusive than "winter" more clearly.)

Comment by Erich_Grunewald on The Prospect of an AI Winter · 2023-03-27T21:25:50.963Z · LW · GW

I take it you're referring to

Investors mostly aren't betting on TAI -- as I understand it, they generally want a return on their investment in <10 years, and had they expected AGI in the next 10-20 years they would have been pouring far more than some measly hundreds of millions into AI companies today.

I was not referencing total investment, but the size of a typical investment like the one mentioned further down ("For example, Character.AI recently raised >$200M at a $1B valuation for a service that doesn't really seem to add much value on top of the standard ChatGPT API, especially now that OpenAI has added the system prompt feature.") ... But I realize that this was really unclear and will edit to clarify.

Also I think the gpt-4 capabilities are a black swan and were not priced in.

Interesting, personally I would've guessed ChatGPT had a larger effect than GPT-4 here, and that that's largely down to it showing the viability of LLMs as products, which seems to point in favor of "investors mostly don't bet on TAI". But I could be wrong here, of course.

Comment by Erich_Grunewald on Open & Welcome Thread — March 2023 · 2023-03-26T12:27:41.005Z · LW · GW

I understand that modern LLMs are generally trained only for a single epoch, or at most a few.

  1. Is this true?
  2. Why is this? Is it due to the cost of compute? Or is there just so much data available that you can always just expand the data set rather than using the same observations twice? Or for some other reason?
Comment by Erich_Grunewald on Abstracts should be either Actually Short™, or broken into paragraphs · 2023-03-24T17:47:02.041Z · LW · GW

See also Using the “executive summary” style: writing that respects your reader’s time and Reasoning transparency.

Comment by Erich_Grunewald on More information about the dangerous capability evaluations we did with GPT-4 and Claude. · 2023-03-19T14:28:02.243Z · LW · GW

You may be referring to the BIG-bench canary string?

Comment by Erich_Grunewald on Against LLM Reductionism · 2023-03-15T10:14:37.385Z · LW · GW

Thanks.

It's really weird to me, but it seems ChatGPT with GPT-4 fails the original test while Bing/Sydney passes it. Is the RLHF messing something up here?

ChatGPT with GPT-4:

ME: What is the 'it' in each of these two sentences? 1. The cat fed the kitten because it was hungry. 2. The cat snarled at the kitten because it was angry.

CHATGPT: In each of these sentences, 'it' refers to a different subject:

  1. In "The cat fed the kitten because it was hungry," the pronoun 'it' refers to "the cat." This is because the cat is the one performing the action (feeding the kitten) as a result of being hungry.
  2. In "The cat snarled at the kitten because it was angry," the pronoun 'it' refers to "the cat" again. In this case, the cat is the one displaying the emotion (anger) that led to the action (snarling at the kitten).

Bing/Sydney with GPT-4:

ME: Don't make any searches. What is the 'it' in each of these two sentences? 1. The cat fed the kitten because it was hungry. 2. The cat snarled at the kitten because it was angry.

BING: This is a tricky question because the pronoun ‘it’ can refer to either the cat or the kitten depending on the context and the intended meaning. However, based on my internal knowledge of grammar and logic, I would say that:

The cat fed the kitten because it was hungry. In this sentence, ‘it’ most likely refers to the kitten, since it makes sense that a mother cat would feed her offspring if they are hungry. Alternatively, ‘it’ could also refer to the cat, if the cat was hungry and decided to share its food with the kitten. However, this interpretation is less plausible since cats are usually territorial and selfish about their food.

The cat snarled at the kitten because it was angry. In this sentence, ‘it’ most likely refers to the cat, since it makes sense that a cat would snarl at another animal if it was angry or annoyed by them. Alternatively, ‘it’ could also refer to the kitten, if the kitten was angry and provoked the cat by biting or scratching it. However, this interpretation is less plausible since kittens are usually playful and submissive to adult cats.

So in summary, I would say that ‘it’ refers to the kitten in sentence 1 and the cat in sentence 2.

Comment by Erich_Grunewald on GPT-4 · 2023-03-14T23:52:03.543Z · LW · GW

That makes a lot of sense, but it doesn't explain why calibration post-RLHF is much better for the 10-40% buckets than for the 60-90% buckets.

Comment by Erich_Grunewald on GPT-4 · 2023-03-14T18:54:16.798Z · LW · GW

Ah yeah, that does seem needlessly ambiguous.

Comment by Erich_Grunewald on GPT-4 · 2023-03-14T18:38:10.620Z · LW · GW

Yeah but it's not clear to me that they needed 8 months of safety research. If they released it after 12 months, they could've still written that they'd been "evaluating, adversarially testing, and iteratively improving" it for 12 months. So it's still not clear to me how much they delayed bc they had to, versus how much (if at all) they did due to the forecasters and/or acceleration considerations.

But this itself is surprising: GPT-4 was "finished training" in August 2022, before ChatGPT was even released! I am unsure of what "finished training" means here - is the released model weight-for-weight identical to the 2022 version? Did they do RLHF since then?

I think "finished training" is the next-token prediction pre-training, and what they did since August is the fine-tuning and the RLHF + other stuff.

Comment by Erich_Grunewald on GPT-4 · 2023-03-14T18:15:20.753Z · LW · GW

"Delaying deployment of GPT-4 by a further six months" which they didn't pursue because ????

IMO it's not clear from the text whether or how long they delayed the release on account of the forecasters' recommendations.

Comment by Erich_Grunewald on Against LLM Reductionism · 2023-03-09T21:19:35.153Z · LW · GW

Yes good call -- I read that at the time and found it excellent. In fact, now that I think about it, it may have been where I got the term "reductionism". Sorry for not citing you!

Comment by Erich_Grunewald on Against LLM Reductionism · 2023-03-09T21:18:31.631Z · LW · GW

Thanks, fixed!

Comment by Erich_Grunewald on Against LLM Reductionism · 2023-03-09T08:47:55.467Z · LW · GW

Oops, right you are -- fixed.

Comment by Erich_Grunewald on The Waluigi Effect (mega-post) · 2023-03-06T19:52:33.284Z · LW · GW

Eliciting Latent Knowledge

Comment by Erich_Grunewald on A Mechanistic Interpretability Analysis of Grokking · 2023-02-26T14:59:12.583Z · LW · GW

Thanks for doing this research, it's super interesting. I have a couple of questions about how this relates (or doesn't relate) to LLMs:

  1. You mention that LLMs probably don't do anything like this since, empirically, repeated data harms test performance, so it is avoided. I'm wondering if we're looking at the same thing, though. If you train an LLM with a subset of data repeated N times for many epochs, wouldn't you expect it to eventually, thanks to regularization, learn some general algorithm governing the repetition; i.e. it'd grok, but it would grok the repetition, not whatever the text in the data set is about?
  2. Similarly, what would happen if you trained a modulo addition model similar to the one you study (but with a far larger modulo threshold) on lots of never-duplicated data? My impression is that it wouldn't exactly grok, since memorization probably isn't very useful, but it would learn general patterns that help it make better predictions, and eventually the general, independently working algorithm. Does that seem right? If so, do you think it's likely that something like this goes on in LLMs?
  3. I'm not sure at which level you normally de-duplicate training data for LLMs, but I assume it's at a fairly high level (e.g. paragraph or document). If so, when you train an LLM, it'll presumably contain lots reoccurring lower-level phrases. Perhaps some of these, for some specific task (addition, say), have examples that are both repeated many times in the dataset, but not covering all possible inputs (similar to your experiment training for many epochs on 30% of possible inputs, only here it could still be a single epoch). If so, couldn't the same sort of grokking happen in an LLM? E.g. first the LLM learns to memorize answers to addition of single- and double-digit numbers, but eventually it may, due to regularization, grok addition.
Comment by Erich_Grunewald on Self-Reference Breaks the Orthogonality Thesis · 2023-02-20T09:30:40.921Z · LW · GW

See for example Bostrom's original paper (pdf):

The Orthogonality Thesis Intelligence and final goals are orthogonal axes along which possible agents can freely vary. In other words, more or less any level of intelligence could in principle be combined with more or less any final goal.

It makes no claim about how likely intelligence and final goals are to diverge, it only claims that it's in principle possible to combine any intelligence with any set of goals. Later on in the paper he discusses ways of actually predicting the behavior of a superintelligence, but that's beyond the scope of the Thesis.