Posts

Research Jan/Feb 2024 2024-01-01T06:02:47.785Z
To what extent is the UK Government's recent AI Safety push entirely due to Rishi Sunak? 2023-10-27T03:29:28.465Z
What are the best published papers from outside the alignment community that are relevant to Agent Foundations? 2023-08-05T03:02:33.003Z
Ateliers: But what is an Atelier? 2023-07-01T05:57:19.510Z
Ateliers: Motivation 2023-06-27T13:07:06.129Z
Scaffolded LLMs: Less Obvious Concerns 2023-06-16T10:39:58.835Z
What do beneficial TDT trades for humanity concretely look like? 2023-06-10T06:50:21.817Z
Requisite Variety 2023-04-21T08:07:28.751Z
Ng and LeCun on the 6-Month Pause (Transcript) 2023-04-09T06:14:18.757Z
No Summer Harvest: Why AI Development Won't Pause 2023-04-06T03:53:34.469Z
100 Dinners And A Workshop: Information Preservation And Goals 2023-03-28T03:13:06.362Z
Alignment Targets and The Natural Abstraction Hypothesis 2023-03-08T11:45:28.579Z
Stephen Fowler's Shortform 2023-01-27T07:13:01.418Z
Swap and Scale 2022-09-09T22:41:49.682Z
Searching for Modularity in Large Language Models 2022-09-08T02:25:31.711Z
What Makes an Idea Understandable? On Architecturally and Culturally Natural Ideas. 2022-08-16T02:09:39.635Z
How Do We Align an AGI Without Getting Socially Engineered? (Hint: Box It) 2022-08-10T18:14:08.786Z
Broad Basins and Data Compression 2022-08-08T20:33:16.846Z
Race Along Rashomon Ridge 2022-07-07T03:20:59.701Z
Identification of Natural Modularity 2022-06-25T15:05:17.793Z

Comments

Comment by Stephen Fowler (LosPolloFowler) on Ruby's Quick Takes · 2024-08-31T04:26:44.498Z · LW · GW

I'd like access to it. 

Comment by Stephen Fowler (LosPolloFowler) on In defense of technological unemployment as the main AI concern · 2024-08-28T03:37:19.526Z · LW · GW

I agree that the negative outcomes from technological unemployment do not get enough attention but my model of how the world will implement Transformative AI is quite different to yours.

Our current society doesn't say "humans should thrive", it says "professional humans should thrive"

Let us define workers to be the set of humans whose primary source of wealth comes from selling their labour. This is a very broad group that includes people colloquially called working class (manual labourers, baristas, office workers, teachers etc) but we are also including many people who are well remunerated for their time, such as surgeons, senior developers or lawyers. 

Ceteris paribus, there is a trend that those who can perform more valuable, difficult and specialised work can sell their labour at a higher value. Among workers, those who earn more are usually "professionals". I believe this is essentially the same point you were making.

However, this is not a complete description of who society allows to "thrive". It neglects a small group of people with very high wealth. This is the group of people who have moved beyond needing to sell their labour and instead are rewarded for owning capital. It is this group who society says should thrive and one of the strongest predictors of whether you will be a member is the amount of wealth your parents give you.

The issue is that this small group is owns a disproportionate proportion of shares in frontier AI companies. 

Assuming we develop techniques to reliably align AGIs to arbitrary goals, there is little reason to expect private entities to intentionally give up power (doing so would be acting contrary to the interests of their shareholders).

Workers unable to compete with artificial agents will find themselves relying on the charity and goodwill of a small group of elites. (And of course, as technology progresses, this group will eventually include all workers.)

Those lucky enough to own substantial equity in AI companies will thrive as the majority of wealth generated by AI workers flows to them.

In itself, this scenario isn't an existential threat. But I suspect many humans would consider their descendants being trapped into serfdom is a very bad outcome.

I worry a focus on preventing the complete extinction of the human race means that we are moving towards AI Safety solutions which lead to rather bleak futures in the majority of timelines.[1]
 

  1. ^

    My personal utility function considers permanent techno-feudalism forever removing the agency of the majority of humans is only slightly better than everyone dying.

    I suspect that some fraction of humans currently alive also consider a permanent loss of freedom to be only marginally better (or even worse) than death.

Comment by Stephen Fowler (LosPolloFowler) on Adverse Selection by Life-Saving Charities · 2024-08-18T13:56:20.330Z · LW · GW

Assuming I blend in and speak the local language, within an order of magnitude of 5 million (edit: USD)

I don't feel your response meaningfully engaged with either of my objections.

Comment by Stephen Fowler (LosPolloFowler) on Adverse Selection by Life-Saving Charities · 2024-08-18T09:48:25.801Z · LW · GW

I strongly downvoted this post.

1 . The optics of actually implementing this idea would be awful. It would permanently damage EA's public image and be raised as a cudgel in every single expose written about the movement. To the average person, concluding that years in the life of the poorest are worth less than those of someone in a rich, first world country is an abhorrent statement, regardless of how well crafted your argument is. 

2.1 It would be also be extremely difficult for rich foreigners to objectively assess the value of QALYs in the most globally impoverished nations, regardless of good intentions and attempts to overcome biases. 

2.2 There is a fair amount of arbitrariness to metrics chosen to value someones life. You've mentioned womens rights, but we could look alternatively look at the suicide rate as a lower bound on the number of women in a society who believe more years of their life has negative value. By choosing this reasonable sounding metric, we can conclude that a year of a womans life in South Korea is much worse than a year of a womans life in Afghanistan. How confident are you that you'll be able to find metrics which accurately reflect the value of a year of someones life?

The error in reasoning comes from making a utilitarian calculation without giving enough weight to the potential for flaws within the reasoning machine itself. 

Comment by Stephen Fowler (LosPolloFowler) on Ten arguments that AI is an existential risk · 2024-08-17T07:14:27.670Z · LW · GW

what does it mean to keep a corporation "in check"
I'm referring to effective corporate governance. Monitoring, anticipating and influencing decisions made by the corporation via a system of incentives and penalties, with the goal of ensuring actions taken by the corporation are not harmful to broader society.

do you think those mechanisms will not be available for AIs
Hopefully, but there are reasons to think that the governance of a corporation controlled (partially or wholly) by AGIs or controlling one or more AGIs directly may be very difficult. I will now suggest one reason this is the case, but it isn't the only one.

Recently we've seen that national governments struggle with effectively taxing multinational corporations. Partially this is because the amount of money at stake is so great, multinational corporations are incentivized to invest large amounts of money into hiring teams of accountants to reduce their tax burden or pay money directly to politicians in the form of donations to manipulate the legal environment. It becomes harder to govern an entity as that entity invest more resources into finding flaws in your governance strategy.

Once you have the capability to harness general intelligence, you can invest a vast amount of intellectual "resources" into finding loopholes in governance strategies. So while many of the same mechanisms will be available for AI's, there's reason to think they might not be as effective.

Comment by Stephen Fowler (LosPolloFowler) on Ten arguments that AI is an existential risk · 2024-08-16T07:23:04.125Z · LW · GW

I'm not confident that I could give a meaningful number with any degree of confidence. I lack expertise in corporate governance, bio-safety and climate forecasting. Additionally, for the condition to be satisfied that corporations are left "unchecked" there would need to be a dramatic Western political shift that makes speculating extremely difficult. 

I will outline my intuition for why (very large, global) human corporations could pose an existential risk (conditional on the existential risk from AI being negligible and global governance being effectively absent).


1.1 In the last hundred years, we've seen that (some) large corporations are willing to cause harm on a massive scale if it is profitable to do so, either intentionally or through neglect. Note that these decisions are mostly "rational" if your only concern is money.

Copying some of the examples I gave in No Summer Harvest:

1.2 Some corporations have also demonstrated they're willing to cut corners and take risks at the expense of human lives.

2. Without corporate governance, immoral decision making and risk taking behaviour could be expected to increase. If the net benefit of taking an action improves because there are fewer repercussions when things go wrong, they should reasonably be expected to increase in frequency.

3. In recent decades there has been a trend (at least in the US) towards greater stock market concentration. For large corporations to pose and existential risk, this trend would need to continue until individual decisions made by a small group of corporations can affect the entire world.

I am not able to describe the exact mechanism of how unchecked corporations would post an existential risk, similar to how the exact mechanism for an AI takeover is still speculation. 

You would have a small group of organisations responsible for deciding the production activities of large swaths of the globe. Possible mechanism include:

  • Irreparable environmental damage.
  • A widespread public health crisis due to non-obvious negative externalities of production.
  • Premature widespread deployment of biotechnology with unintended harms.

I think if you're already sold on the idea that "corporations are risking global extinction through the development of AI" it isn't a giant leap to recognise that corporations could potentially threaten the world via other mechanisms. 

Comment by Stephen Fowler (LosPolloFowler) on Ten arguments that AI is an existential risk · 2024-08-15T14:34:13.810Z · LW · GW

"This argument also appears to apply to human groups such as corporations, so we need an explanation of why those are not an existential risk"

I don't think this is necessary. It seems pretty obvious that (some) corporations could pose an existential risk if left unchecked.

Edit: And depending on your political leanings and concern over the climate, you might agree that they already are posing an existential risk.

Comment by Stephen Fowler (LosPolloFowler) on TurnTrout's shortform feed · 2024-08-15T13:51:17.677Z · LW · GW

I might be misunderstanding something crucial or am not expressing myself clearly.

I understand TurnTrout's original post to be an argument for a set of conditions which, if satisfied, prove the AI is (probably) safe. There are no restrictions on the capabilities of the system given in the argument.

You do constructively show "that it's possible to make an AI which very probably does not cause x-risk" using a system that cannot do anything coherent when deployed.

But TurnTrout's post is not merely arguing that it is "possible" to build a safe AI.

Your conclusion is trivially true and there are simpler examples of "safe" systems if you don't require them to do anything useful or coherent. For example, a fried, unpowered GPU is guaranteed to be "safe" but that isn't telling me anything useful.

Comment by Stephen Fowler (LosPolloFowler) on TurnTrout's shortform feed · 2024-08-15T11:29:07.782Z · LW · GW

I can see that the condition you've given, that a "curriculum be sampled uniformly at random" with no mutual information with the real world is sufficient for a curriculum to satisfy Premise 1 of TurnTrouts argument.

But it isn't immediately obvious to me that it is a sufficient and necessary condition (and therefore equivalent to Premise 1). 

Comment by Stephen Fowler (LosPolloFowler) on TurnTrout's shortform feed · 2024-08-15T10:09:18.081Z · LW · GW

Right, but that isn't a good safety case because such an AI hasn't learnt about the world and isn't capable of doing anything useful. I don't see why anyone would dedicate resources to training such a machine.

I didn't understand TurnTrouts original argument to be limited to only "trivially safe" (ie. non-functional) AI systems.

Comment by Stephen Fowler (LosPolloFowler) on TurnTrout's shortform feed · 2024-08-15T03:20:51.364Z · LW · GW

Does this not mean the AI has also learnt no methods that provide any economic benefit either?

Comment by Stephen Fowler (LosPolloFowler) on TurnTrout's shortform feed · 2024-08-15T03:19:44.404Z · LW · GW

Is a difficulty in moving from statements about the variance in logits to statements about x-risk?

One is a statement about the output of a computation after a single timestep, the other is a statement about the cumulative impact of the policy over multiple time-steps in a dynamic environment that reacts in a complex way to the actions taken.

My intuition is that for any  bounding the variance in the logits, you could always construct a suitably pathological environment that will always amplify these cumulative deviations into a catastrophy.

(There is at least a 30% chance I haven't grasped your idea correctly)

Comment by Stephen Fowler (LosPolloFowler) on Self-Other Overlap: A Neglected Approach to AI Alignment · 2024-08-02T11:11:44.809Z · LW · GW

My understanding is that model organisms can demonstrate the existence of an alignment failure mode. But that's very different from an experiment on small systems informing you about effective mitigation strategies of that failure mode in larger systems. 

Comment by Stephen Fowler (LosPolloFowler) on Self-Other Overlap: A Neglected Approach to AI Alignment · 2024-08-01T08:14:15.651Z · LW · GW

This seems useful and I'm glad people are studying it.

I'd be very interested in experiments that demonstrate that this technique can mitigate deception in more complex experimental environments (cicero?) without otherwise degrading performance. 

I have a very nitpicky criticism, but I think there might be a bit of a map/territory confusion emerging here. The introduction claims "non-deceptive agents consistently have higher mean self-other overlap than the deceptive agents". The actual experiment is about a policy which exhibits seemingly deceptive behaviour but the causal mechanism behind this deception is not necessarily anything like the causal mechanism behind deception in self-aware general intelligences.

Comment by Stephen Fowler (LosPolloFowler) on Alexander Gietelink Oldenziel's Shortform · 2024-07-06T04:37:12.298Z · LW · GW

I have only skimmed the paper.

Is my intuition correct that in the MB formalism, past events that are causally linked to are not included in the Markov Blanket, but the node corresponding to the memory state still is included in the MB?

That is, the influence of the past event is mediated by a node corresponding to having memory of that past event?

Comment by Stephen Fowler (LosPolloFowler) on Elizabeth's Shortform · 2024-07-06T04:16:37.730Z · LW · GW

I agree with your overall point re: 80k hours, but I think my model of how this works differs somewhat from yours. 

"But you can't leverage that into getting the machine to do something different- that would immediately zero out your status/cooperation score."

The machines are groups of humans, so the degree to which you can change the overall behaviour depends on a few things. 

1) The type of status (which as you hint, is not always fungible).
 If you're widely considered to be someone who is great at predicting future trends and risks, other humans in the organisation will be more willing to follow when you suggest a new course of action. If you've acquired status by being very good at one particular niche task, people won't necessarily value your bold suggestion for changing the organisations direction. 

2) Strategic congruence. 
Some companies in history have successfully pivoted their business model (the example that comes to mind is Nokia). This transition is possible because while the machine is operating in a new way, the end goal of the machine remains the same (make money). If your suggested course of action conflicts with the overall goals of the machine, you will have more trouble changing the machine. 

3) Structure of the machine. 
Some decision making structures give specific individuals a high degree of autonomy over the direction of the machine. In those instances, having a lot of status among a small group may be enough for you to exercise a high degree of control (or get yourself placed in a decision making role).


Of course, each of these variables all interact with each other in complex ways. 

Sam Altman's high personal status as an excellent leader and decision maker, combined with his strategic alignment to making lots of money, meant that he was able to out-manoeuvre a more safety focused board when he came into apparent conflict with the machine. 
 

Comment by Stephen Fowler (LosPolloFowler) on 80,000 hours should remove OpenAI from the Job Board (and similar EA orgs should do similarly) · 2024-07-06T03:14:17.211Z · LW · GW

I dislike when conversations about that are really about one topic get muddied by discussion about an analogy. For the sake of clarity, I'll use italics relate statements when talking about the AI safety jobs at capabilities companies. 

Interesting perspective. At least one other person also had a problem with that statement, so it is probably worth me expanding. 

Assume, for the sake of the argument, that the Environmental Manager's job is to assist with clean-ups after disasters, monitoring for excessive emissions and preventing environmental damage. In a vacuum these are all wonderful, somewhat-EA aligned tasks. 
Similarly the safety focused role, in a vacuum, is mitigating concrete harms from prosaic systems and, in the future, may be directly mitigating existential risk. 

However, when we zoom out and look at these jobs in the context of the larger organisations goals, things are less obviously clear. The good you do helps fuel a machine whose overall goals are harmful. 

The good that you do is profitable for the company that hires you. This isn't always a bad thing, but by allowing BP to operate in a more environmentally friendly manner you improve BP's public relations and help to soften or reduce regulation BP faces. 
Making contemporary AI systems safer, reducing harm in the short term, potentially reduces the regulatory hurdles that these companies face. It is harder to push restrictive legislation governing the operation of AI capabilities companies if they have good PR.

More explicitly, the short-term, environmental management that you do on may hide more long-term, disastrous damage. Programs to protect workers and locals from toxic chemical exposure around an exploration site help keep the overall business viable. While the techniques you develop shield the local environment from direct harm, you are not shielding the globe from the harmful impact of pollution. 
Alignment and safety research at capabilities companies focuses on today's models, which are not generally intelligent. You are forced to assume that the techniques you develop will extend to systems that are generally intelligent, deployed in the real world and capable of being an existential threat. 
Meanwhile the techniques used to align contemporary systems absolutely improve their economic viability and indirectly mean more money is funnelled towards AGI research. 
 

Comment by Stephen Fowler (LosPolloFowler) on 3C's: A Recipe For Mathing Concepts · 2024-07-05T04:49:15.874Z · LW · GW

"everyday common terms such as tool, function/purpose, agent, perception"

I suspect getting the "true name" of these terms would get you a third of the way to resolving ai safety. 

Comment by Stephen Fowler (LosPolloFowler) on 80,000 hours should remove OpenAI from the Job Board (and similar EA orgs should do similarly) · 2024-07-04T01:20:59.692Z · LW · GW

Firstly, some form of visible disclaimer may be appropriate if you want to continue listing these jobs. 

While the jobs board may not be "conceptualized" as endorsing organisations, I think some users will see jobs from OpenAI listed on the job board as at least a partial, implicit endorsement of OpenAI's mission.

Secondly, I don't think roles being directly related to safety or security should be a sufficient condition to list roles from an organisation, even if the roles are opportunities to do good work. 

I think this is easier to see if we move away from the AI Safety space. Would it be appropriate for 80,000 Hours job board advertise an Environmental Manager job from British Petroleum?

Comment by Stephen Fowler (LosPolloFowler) on Leon Lang's Shortform · 2024-07-02T06:59:50.892Z · LW · GW

Just started using this, great recommendation. I like the night mode feature which changes the color of the pdf itself.

Comment by Stephen Fowler (LosPolloFowler) on Getting 50% (SoTA) on ARC-AGI with GPT-4o · 2024-07-02T03:26:08.637Z · LW · GW

I think this experiment does not update me substantially towards thinking we are closer toward AGI because the experiment does not show GPT-4o coming up with a strategy to solve the task and then executing it. Rather a human (a general intelligence) has looked at the benchmark then devised an algorithm that will let GPT-4o perform well on the task.

Further, the method does not seem flexible enough to work on a diverse range of tasks and certainly not without human involvement in adapting it.

In other words, the result is less that GPT-4o is able to achieve 50% on ARC-AGI. It is that a human familiar with the style of question used in ARC-AGI can devise a method for getting 50% on ARC-AGI that offloads some of the workload to GPT-4o.
 

Comment by Stephen Fowler (LosPolloFowler) on Sci-Fi books micro-reviews · 2024-06-25T12:16:51.309Z · LW · GW

The sequels obviously include a lot of stuff relating to aliens, but a big focus is on how human group react to the various dangerous scenarios they now face. Much of the books are concerned with how human culture evolves given the circumstances, with numerous multi-generational time-skips.

Comment by Stephen Fowler (LosPolloFowler) on Sci-Fi books micro-reviews · 2024-06-25T00:56:52.484Z · LW · GW

Updating to say that I just finished the short story "Exhalation" by Ted Chiang and it was absolutely exceptional! 

I was immediately compelled to share it with some friends who are also into sci-fi.

Comment by Stephen Fowler (LosPolloFowler) on Sci-Fi books micro-reviews · 2024-06-24T12:21:15.497Z · LW · GW

Cool list, I'm going to start reading Ted Chiang.

Some thoughts

Permutation City

"To be blunt, Egan is not a great author, and this book is mostly his excuse to elucidate some ideas in philosophy."

You are being, if anything, too nice to Greg Egan's writing here. I think 4/10 is extremely charitable. 

But if you enjoyed the hard sci-fi elements you'll probably also enjoy "Diaspora". Even the errata for this book make for a fun read and show you the level of care Egan puts into trying to make the science realistic. 

The Three Body Problem

The two other books in the series (particularly The Dark Forest) are very interesting and have a much wider scope which gives Liu a lot of space for world-building. There's also a fair bit of commentary on societal cultural evolution which you might enjoy if you enjoyed the non-western perspective of the first book.

A fair warning about the readability of The Dark Forest. Liu's editor somehow let him keep in some crushingly boring material.

Death's End is extremely wide in scope and faster paced. But I think you might hate the more fantastical sci-fi elements. 

Comment by Stephen Fowler (LosPolloFowler) on Nathan Young's Shortform · 2024-06-23T00:10:58.187Z · LW · GW

Obvious and "shallow" suggestion. Whoever goes on needs to be "classically charismatic" to appeal to a mainstream audience. 

Potentially this means someone from policy rather than technical research. 

Comment by Stephen Fowler (LosPolloFowler) on My AI Model Delta Compared To Yudkowsky · 2024-06-22T22:08:01.436Z · LW · GW

"I assumed the idea here was that AGI has a different mind architecture and thus also has different internal concepts for reflection."

It is not just the internal architecture. An AGI will have a completely different set of actuators and sensors compared to humans.

Comment by Stephen Fowler (LosPolloFowler) on Hot take: The AI safety movement is way too sectarian and this is greatly increasing p(doom) · 2024-06-22T21:48:53.957Z · LW · GW

A suggestion to blacklist anyone who decided to give $30 million (a paltry sum of money for a startup) to OpenAI. 

 

I agree with many of the points you have made in this post, but I strongly disagree with the characterisation of $30 million as a "paltry sum".

1. My limited research indicates that $30 million was likely a significant amount of money for OpenAI at the time

I  haven't been able to find internal financial reports from 2017 OpenAI* but the following quote from wikipedia describes OpenAI's operating expenses in that year.

"In 2017, OpenAI spent $7.9 million, or a quarter of its functional expenses, on cloud computing alone"

So, while OpenAI is currently worth tens of billions, $30 million appears to have been a significant sum for them in 2017.

Again, I haven't been able to find internal financial reports (not claiming they aren't available). 

My understanding is Open Phil would have access to reports which would show that $30 million was or wasn't a significant amount of money at the time, although they're probably bound by confidentiality agreements which would forbid them from sharing. 

2. $30 million was (and still is) a substantial amount of money for AI Safety Research.

This can be seen by simply looking at the financial reports of various safety orgs. In my original shortform post I believe I compared that amount to a few years of MIRI's operating expenses. 

But you can take your pick of safety orgs and you'll see that $30 million buys you a lot. AI Safety researchers are (relatively) cheap. 

Comment by Stephen Fowler (LosPolloFowler) on Stephen Fowler's Shortform · 2024-06-22T21:36:18.869Z · LW · GW

This was a great reply. In responding to it my confidence in my arguments declined substantially. 

I'm going to make what I think is a very cruxy high level clarification and then address individual points.

High Level Clarification

My original post has clearly done a poor job at explaining why I think the mismatch between the optimisation definition given in RFLO and evolution matters. I think clarifying my position will address the bulk of your concerns. 

"I don’t see why this matters for anything, right? Did Eliezer or Nate or whoever make some point about evolution which is undermined by the observation that evolution is not a separate system? 
[...]
Yet again, I don’t see why this matters for anything."

I believe you have interpreted the high level motivation behind my post to be something along the lines of "evolution doesn't fit this definition of optimisation, and therefore this should be a reason to doubt the conclusions of Nate, Eliezer or anyone else invoking evolution."

This is a completely fair reading of my original post, but it wasn't my intended message.

I'm concerned that AI Safety research lacks a sufficiently robust framework for reasoning about the development, deployment and behaviour of AGI's. I am very interested in the broad category of "deconfusion". It is under that lens that I comment on evolution not fitting the definition in RFLO. It indicates that the optimiser framework in RFLO may not be cutting reality at the joints, and a more careful treatment is needed. 

I'm going to immediately edit my original post to make this more clear thanks to your feedback!

Detailed Responses to Individual Points

"And also, sometimes a difference between X and Y is totally irrelevant to the point that someone was making with their analogy"

I agree. I mentioned Nate's evolution analogy because I think it wasn't needed to make the point and lead to confusion. I don't think the properties of evolution I've mentioned can be used to argue against the Sharp Left Turn. 

"If you just want to list any property that Evolution does not share with ML optimisation, here are a bunch: (1) The first instantiation of Evolution on Earth was billions of years earlier than the first instantiation of ML optimisation. (2) Evolution was centrally involved in the fact that I have toenails, whereas ML optimisation was not. (3) ML optimisation can be GPU-accelerated using PyTorch, whereas Evolution cannot … I could go on all day! Of course, none of these differences are relevant to anything. That’s my point. I don’t think the three differences you list are relevant to anything either."

Keeping in mind the "deconfusion" lens that motivated my original post, I don't think these distinctions point to any flaws in the definition of optimisation given in RFLO, in the same way that evolution failing to satisfy the criteria of having an internally represented objective does.

"I don’t even know why the RFLO paper put that criterion in. Like, let’s compare (1) an algorithm training an LLM with the “explicit” objective function of minimizing perplexity, written in Python, (2) some super-accelerated clever hardware implementation that manages to perform the exact same weight updates but in a way that doesn’t involve the objective function ever being “explicitly represented” or calculated."

I don't have any great insight here, but that's very interesting to think about. I would guess that "clever hardware implementation that performs the exact same weight updates" without an explicitly represented objective function ends up being wildly inefficient. This seems broadly similar to the relationship between a search algorithm and an implementation of a the same algorithm that is simply a gigantic pre-computed lookup table. 

"Separately, I wonder whether you would say that AlphaZero self-play training is an “optimizer” or not. Some of your points seem to apply to it—in particular, since it’s self-play, the opponent is different each step, and thus so is the “optimal” behavior. You could say “the objective is always checkmate”, but the behavior leading to that objective may keep changing; by the same token you could say “the objective is always inclusive genetic fitness”, but the behavior leading to that objective may keep changing. (Granted, AlphaZero self-play training in fact converges to very impressive play rather than looping around in circles, but I think that’s an interesting observation rather than an a priori theoretical requirement of the setup—given that the model obviously doesn’t have the information capacity to learn actual perfect play.)"
 

Honestly, I think this example has caused me to lose substantial confidence in my original argument. 

Clearly,the AlphaZero training process should fit under any reasonable definition of optimisation and as you point out there is no reason fundamental reason a similar training process on a variant game couldn't get stuck in a loop. 

The only distinction I can think of is that the definition of "checkmate" is essentially a function of board state and that function is internally represented in the system as a set of conditions. This means you can point to an internal representation and alter it by explicitly changing certain bits. 

In contrast, evolution is stuck optimising for genes which are good at (directly or indirectly) getting passed on. 

I guess the equivalent of changing the checkmate rules would be changing the environment to tweak which organisms tend to evolve. But the environment doesn't provide an explicit representation.

To conclude

I'm fairly confident "explicit internal representation" part of the optimisation definition in RFLO needs tweaking. 

I had previously been tossing around the idea that evolution was sort of it's own thing that was meaningfully distinct from other things called optimisers, but the Alpha Go example has scuttled that idea. 

Comment by Stephen Fowler (LosPolloFowler) on Stephen Fowler's Shortform · 2024-06-22T19:41:07.326Z · LW · GW

This is somewhat along the lines of the point I was trying to make with the Lazy River analogy.

I think the crux is that I'm arguing that because the "target" that evolution appears to be evolving towards is dependent on the state and differs as the state changes, it doesn't seem right to refer to it as "internally represented".

Comment by Stephen Fowler (LosPolloFowler) on Stephen Fowler's Shortform · 2024-06-22T05:08:04.225Z · LW · GW

There are meaningful distinctions between evolution and other processes referred to as "optimisers"

 

People should be substantially more careful about invoking evolution as an analogy for the development of AGI, as tempting as this comparison is to make.

"Risks From Learned Optimisation"  is one of the most influential AI Safety papers ever written, so I'm going to use it's framework for defining optimisation. 

"We will say that a system is an optimiser if it is internally searching through a search space (consisting of possible outputs, policies, plans, strategies, or similar) looking for those elements that score high according to some objective function that is explicitly represented within the system" ~Hubinger et al (2019) 

It's worth noting that the authors of this paper do consider evolution to be an example of optimisation (something stated explicitly in the paper). Despite this, I'm going to argue the definition shouldn't apply to evolution.
 

2 strong (and 1 weak) Arguments That Evolution Doesn't Fit This Definition:

Weak Argument 0:
Evolution itself isn't a separate system that is optimising for something. (Micro)evolution is the change in allele frequency over generations. There is no separate entity you can point to and call "evolution".

Consider how different this is from a human engaged in optimisation to design a bottle cap. We have the system that optimises, and the system that is optimised.

It is tempting to say "the system optimises itself" but then go ahead and define the system you would say is engaged in optimisation. That system isn't "evolution" but is instead something like "the environment", "all carbon based structures complexes on earth" or "all matter on the surface of earth" etc. 

Strong Argument 1:
Evolution does not have an explicitly represented objective function.

This is a major issue. When I'm training a model against a loss function I can explicitly represent that loss function. It is possible to physically implement 

There is no single explicit representation of what "fitness" is within our environment. 

Strong Argument 2:
Evolution isn't a "conservative" process. The thing that it is optimising "toward" is dependent on the current state of the environment, and changes over time. It is possible for evolution to get caught in "loops" or "cycles". 

- A refresher on conservative fields. 

In physics a conservative vector field is vector field that can be understood of the gradient of some other function. By associating any point in that vector field with the corresponding point on the other function, you meaningfully order each point in your field. 

To be less abstract, imagine your field is "slope" which describes the gradient of a mountain range. You can talk meaningfully order the points in the slope field by the height of the point they correspond to on the mountain range. 

In a conservative vector field, the curl everywhere is zero. Letting a ball roll down the mountain range (with a very high amount of friction) and the ball will find its way to a local minima and stop. 

In a non-conservative vector field it is possible to create paths that loop forever. 

My local theme-park has a ride called the "Lazy River" which is an artificial river which has been formed into a loop. There is no change in elevation, and the water is kept flowing clockwise by a series of underwater fans which continuously put energy into the system. Families hire floating platforms and can drift endlessly in a circle until their children get bored.

If you throw a ball into the Lazy River it will circle endlessly. If we write down a vector field that describes the force on the ball at any point in the river, it isn't possible to describe this field as the gradient of another field. There is no absolute ordering of points in this field. 

- Evolution isn't conservative

In the ball rolling over the hills, we might be able to say that as time evolves it seems to be getting "lower". By appealing to the function that it is the gradient of, we can meaningfully say if two points are higher, lower or the same height.

In the lazy river, this is no longer possible. Locally, you could describe the motion of the ball as rolling down a hill, but continuing this process around the entire loop tells you that you are describing an impossible MC Escher Waterfall.

If evolution is not conservative (and hence has no underlying goal it is optimising toward) then it would be possible to observe creatures evolving in circles, stuck in "loops". Evolving, losing then re-evolving the same structures.

This is not only possible, but it has been observed. The side-blotched lizard appear to shift throat colours in a cyclic repeating pattern.  For more details, See this talk by John Baez. 

To summarise, the "direction" or "thing evolution is optimising toward" cannot be some internally represented thing, because the thing it optimises toward is a function of not just the environment but also of the things evolving in that environment. 

Who cares?

Using evolution as an example of "optimisation" is incredible common among AI safety researchers, and can be found in Yudkowsky's writing on Evolution in The Sequences. 

I think the notion of evolution as an optimiser can do more harm than good.

As a concrete example, Nate's "Sharp Left Turn" post was weakened substantially by invoking an evolution analogy, which spawned a lengthy debate (see Pope, 2023 and the response from Zvi). This issue could have been skipped entirely simply by arguing in favour of the Sharp Left Turn without any reference to evolution (see my upcoming post on this topic).

Clarification Edit:
Further I'm concerned that AI Safety research lacks a sufficiently robust framework for reasoning about the development, deployment and behaviour of AGI's. I am very interested in the broad category of "deconfusion". It is under that lens that I comment on evolution not fitting the definition in RFLO. It indicates that the optimiser framework in RFLO may not be cutting reality at the joints, and a more careful treatment is needed.

To conclude

An intentionally provocative and attention-grabbing summary of this post might be "evolution is not an optimiser", but that is essentially just a semantic argument and isn't quite what I'm trying to say.

A better summary is "in the category of things that are referred to as optimisation, evolution has numerous properties that it does not share with ML optimisation, so be careful about invoking it as an analogy".

On similarities between ML similar to RLHF and evolution:
You might notice that any form of ML that relies on human feedback, also fails to have an "internal representation" of what it's optimising toward, instead getting feedback from humans assessing it's performance. 

Like evolution, it is also possible to set up this optimisation process so that it is also not "conservative". 

A contrived example of this:
Consider training a language model to complete text if the humans giving feedback exhibited a preference for text that was a function of what they'd just read. If the model outputs dense, scientific jargon the humans prefer lighter prose. If the models output light prose, the humans prefer more formal writing etc.

 

(This is a draft of a post, very keen for feedback and disagreement) 

Comment by Stephen Fowler (LosPolloFowler) on Richard Ngo's Shortform · 2024-06-22T03:14:09.889Z · LW · GW

That makes sense.

I guess the followup question is "how were Anthropic able to cultivate the impression that they were safety focused if they had only made an extremely loose offhand commitment?"

Certainly the impression I had from how integrated they are in the EA community was that they had made a more serious commitment.

Comment by Stephen Fowler (LosPolloFowler) on Richard Ngo's Shortform · 2024-06-21T11:51:47.626Z · LW · GW

This post confuses me.

Am I correct that the implied implication here is that assurances from a non-rationalist are essentially worthless? 

I think it is also wrong to imply that Anthropic have violated their commitment simply because they didn't rationally think through the implications of their commitment when they made it. 

I think you can understand Anthropic's actions as purely rational, just not very ethical.

They made an unenforceable commitment to not push capabilities when it directly benefited them. Now that it is more beneficial to drop the facade, they are doing so.

I think "don't trust assurances from non-rationalists" is not a good takeaway. Rather it should be "don't trust unenforceable assurances from people who will stand to greatly benefit from violating your trust at a later date".

Comment by Stephen Fowler (LosPolloFowler) on I would have shit in that alley, too · 2024-06-19T13:28:29.851Z · LW · GW

I agree that it is certainly morally wrong to post this if that is the persons real full name.

It is less bad, but still dubious, to post someones traumatic life story on the internet even under a pseudonym. 

Comment by Stephen Fowler (LosPolloFowler) on Disproving and partially fixing a fully homomorphic encryption scheme with perfect secrecy · 2024-05-27T10:05:40.510Z · LW · GW

At the risk of missing something obvious, any distributed quantum circuit without a measurement step it is not possible for Kevin and Charlie to learn anything about the plaintext per the no cloning theorem.

Eavesdropping in the middle of the circuit should lead to measurable statistical anomalies due to projecting the state onto the measurement basis.

(I'll add a caveat that I am talking about theoretical quantum circuits and ignoring any nuances that emerge from their physical implementations.)

Edit:

On posting, I think I realize my error. 
We need Kevin and Charlie to not have knowledge of the specific gates that they are implementing as well.

 

Comment by Stephen Fowler (LosPolloFowler) on simeon_c's Shortform · 2024-05-24T00:47:39.110Z · LW · GW

Do you know if there have been any concrete implications (ie. someone giving Daniel a substantial amount of money) from the discussion?'

Comment by Stephen Fowler (LosPolloFowler) on The case for stopping AI safety research · 2024-05-24T00:42:29.950Z · LW · GW

I think this is an important discussion to have but I suspect this post might not convince people who don't already share similar beliefs.

1. I think the title is going to throw people off. 

I think what you're actually saying "stop the current strain of research focused on improving and understanding contemporary systems which has become synonymous with the term AI safety" but many readers might interpret this as if you're saying "stop research that is aimed at reducing existential risks from AI". It might be best to reword it as "stopping prosaic AI safety research". 

In fairness, the first, narrower definition of AI Safety certainly describes a majority of work under the banner of AI Safety. It certainly seems to be where most of the funding is going and describes the work done at industrial labs. It is certainly what educational resources (like the AI Safety Fundamentals course) focus on. 
 

2. I've had a limited number of experiences informally having discussions with researchers on similar ideas (not necessarily arguing for stopping AI safety research entirely though). My experience is that people either agree immediately or do not really appreciate the significance of concerns about AI safety research largely being on the wrong track. Convincing people in the second category seems to be rather difficult.

To summarize what I'm trying to convey:
I think this is a crucial discussion to have and it would be beneficial to the community to write this up into a longer post if you have the time. 
 

Comment by Stephen Fowler (LosPolloFowler) on Stephen Fowler's Shortform · 2024-05-21T18:15:51.281Z · LW · GW

Thank you, this explains my error. I've retracted that part of my response.

Comment by Stephen Fowler (LosPolloFowler) on Stephen Fowler's Shortform · 2024-05-20T18:22:06.679Z · LW · GW

(I'm the OP)

I'm not trying to say "it's bad to give large sums of money to any group because humans have a tendency to to seek power." 

I'm saying "you should be exceptionally cautious about giving large sums of money to a group of humans with the stated goal of constructing an AGI."

You need to weight any reassurances they give you against two observations:

  1. The commonly observed pattern of individual humans or organisations seeking power (and/or wealth) at the expense of the wider community. 
  2. The strong likelihood that there will be an opportunity for organisations pushing ahead with AI research to obtain incredible wealth or power.

So, it isn't "humans seek power therefore giving any group of humans money is bad". It's "humans seek power" and, in the specific case of AI companies, there may be incredibly strong rewards for groups that behave in a self-interested way.

The general idea I'm working off is that you need to be skeptical of seemingly altruistic statements and commitments made by humans when there are exceptionally lucrative incentives to break these commitments at a later point in time (and limited ways to enforce the original commitment).

Comment by Stephen Fowler (LosPolloFowler) on Stephen Fowler's Shortform · 2024-05-20T18:20:19.844Z · LW · GW

"In particular, it emphasized the importance of distributing AI broadly;1 our current view is that this may turn out to be a promising strategy for reducing potential risks"

Yes, I'm interpreting the phrase "may turn out" to be treating the idea with more seriousness than it deserves. 

Rereading the paragraph, it seems reasonable to interpret it as politely downplaying it, in which case my statement about Open Phil taking the idea seriously is incorrect.

Comment by Stephen Fowler (LosPolloFowler) on Stephen Fowler's Shortform · 2024-05-20T06:05:47.297Z · LW · GW

This does not feel super cruxy as the the power incentive still remains. 

Comment by Stephen Fowler (LosPolloFowler) on Stephen Fowler's Shortform · 2024-05-19T19:16:36.026Z · LW · GW

"This grant was obviously ex ante bad. In fact, it's so obvious that it was ex ante bad that we should strongly update against everyone involved in making it."

This is an accurate summary. 

"arguing about the impact of grants requires much more thoroughness than you're using here"

We might not agree on the level of effort required for a quick take. I do not currently have the time available to expand this into a full write up on the EA forum but am still interested in discussing this with the community. 

"you're making a provocative claim but not really spelling out why you believe the premises."

I think this is a fair criticism and something I hope I can improve on.

I feel frustrated that your initial comment (which is now the top reply) implies I either hadn't read the 1700 word grant justification that is at the core of my argument, or was intentionally misrepresenting it to make my point. This seems to be an extremely uncharitable interpretation of my initial post. (Edit: I am retracting this statement and now understand Buck's comment was meaningful context. Apologies to Buck and see commentary by Ryan Greenblat below)

Your reply has been quite meta, which makes it difficult to convince you on specific points.

Your argument on betting markets has updated me slightly towards your position, but I am not particularly convinced. My understanding is that Open Phil and OpenAI had a close relationship, and hence Open Phil had substantially more information to work with than the average manifold punter. 
 

Comment by Stephen Fowler (LosPolloFowler) on Stephen Fowler's Shortform · 2024-05-19T12:12:21.248Z · LW · GW

So the case for the grant wasn't "we think it's good to make OAI go faster/better".

I agree. My intended meaning is not that the grant is bad because its purpose was to accelerate capabilities. I apologize that the original post was ambiguous

Rather, the grant was bad for numerous reasons, including but not limited to:

  • It appears to have had an underwhelming governance impact (as demonstrated by the board being unable to remove Sam). 
  • It enabled OpenAI to "safety-wash" their product (although how important this has been is unclear to me.)
  • From what I've seen at conferences and job boards, it seems reasonable to assert that the relationship between Open Phil and OpenAI has lead people to work at OpenAI. 
  • Less important, but the grant justification appears to take seriously the idea that making AGI open source is compatible with safety. I might be missing some key insight, but it seems trivially obvious why this is a terrible idea even if you're only concerned with human misuse and not misalignment.
  • Finally, it's giving money directly to an organisation with the stated goal of producing an AGI. There is substantial negative -EV if the grant sped up timelines. 

This last claim seems very important. I have not been able to find data that would let me confidently estimate OpenAI's value at the time the grant was given. However, wikipedia mentions that "In 2017 OpenAI spent $7.9 million, or a quarter of its functional expenses, on cloud computing alone." This certainly makes it seem that the grant provided OpenAI with a significant amount of capital, enough to have increased its research output. 

Keep in mind, the grant needs to have generated 30 million in EV just to break even. I'm now going to suggest some other uses for the money, but keep in mind these are just rough estimates and I haven't adjusted for inflation. I'm not claiming these are the best uses of 30 million dollars. 

The money could have funded an organisation the size of MIRI for roughly a decade (basing my estimate on MIRI's 2017 fundraiser, using 2020 numbers gives an estimate of ~4 years). 

Imagine the shift in public awareness if there had been an AI safety Superbowl ad for 3-5 years. 

Or it could have saved the lives of ~1300 children

This analysis is obviously much worse if in fact the grant was negative EV.

Comment by Stephen Fowler (LosPolloFowler) on Stephen Fowler's Shortform · 2024-05-19T07:28:11.869Z · LW · GW

That's a good point. You have pushed me towards thinking that this is an unreasonable statement and "predicted this problem at the time" is better.

Comment by Stephen Fowler (LosPolloFowler) on D0TheMath's Shortform · 2024-05-18T09:49:22.478Z · LW · GW
Comment by Stephen Fowler (LosPolloFowler) on Stephen Fowler's Shortform · 2024-05-18T09:47:24.188Z · LW · GW

On the OpenPhil / OpenAI Partnership

Epistemic Note: 
The implications of this argument being true are quite substantial, and I do not have any knowledge of the internal workings of Open Phil. 

(Both title and this note have been edited, cheers to Ben Pace for very constructive feedback.)

Premise 1: 
It is becoming increasingly clear that OpenAI is not appropriately prioritizing safety over advancing capabilities research.

Premise 2:
This was the default outcome. 

Instances in history in which private companies (or any individual humans) have intentionally turned down huge profits and power are the exception, not the rule. 

Edit: To clarify, you need to be skeptical of seemingly altruistic statements and commitments made by humans when there are exceptionally lucrative incentives to break these commitments at a later point in time (and limited ways to enforce the original commitment).

Premise 3:
Without repercussions for terrible decisions, decision makers have no skin in the game

Conclusion:
Anyone and everyone involved with Open Phil recommending a grant of $30 million dollars be given to OpenAI in 2017 shouldn't be allowed anywhere near AI Safety decision making in the future.

To go one step further, potentially any and every major decision they have played a part in needs to be reevaluated by objective third parties. 

This must include Holden Karnofsky and Paul Christiano, both of whom were closely involved. 

To quote OpenPhil:
"OpenAI researchers Dario Amodei and Paul Christiano are both technical advisors to Open Philanthropy and live in the same house as Holden. In addition, Holden is engaged to Dario’s sister Daniela."

 

Comment by Stephen Fowler (LosPolloFowler) on Increasing IQ by 10 Points is Possible · 2024-03-20T12:14:53.243Z · LW · GW

This is your second post and you're still being vague about the method. I'm updating strongly towards this being a hoax and I'm surprised people are taking you seriously.

Edit: I'll offer you a 50 USD even money bet that your method won't replicate when tested by a 3rd party with more subjects and a proper control group.

Comment by Stephen Fowler (LosPolloFowler) on Stephen Fowler's Shortform · 2024-02-19T06:22:23.249Z · LW · GW

You are given a string s corresponding to the Instructions for the construction of an AGI which has been correctly aligned with the goal of converting as much of the universe into diamonds as possible. 

What is the conditional Kolmogorov Complexity of the string s' which produces an AGI aligned with "human values" or any other suitable alignment target.

To convert an abstract string to a physical object, the "Instructions" are read by a Finite State Automata, with the state of the FSA at each step dictating the behavior of a robotic arm (with appropriate mobility and precision) with access to a large collection of physical materials. 

Comment by Stephen Fowler (LosPolloFowler) on Is a random box of gas predictable after 20 seconds? · 2024-02-10T05:43:59.496Z · LW · GW

Tangential. 

Is part of the motivation behind this question to think about the level of control that a super-intelligence could have on a complex system if it was only able to only influence a small part of that system?

Comment by Stephen Fowler (LosPolloFowler) on Stephen Fowler's Shortform · 2024-01-11T10:37:02.499Z · LW · GW

I was not precise enough in my language and agree with you highlighting that what "alignment" means for LLM is a bit vague. While people felt Sydney Bing was cool, if it was not possible to reign it in it would have made it very difficult for Microsoft to gain any market share. An LLM that doesn't do what it's asked or regularly expresses toxic opinions is ultimately bad for business.

In the above paragraph understand "aligned" to mean in the concrete sense of "behaves in a way that is aligned with it's parent companies profit motive", rather than "acting in line with humanities CEV". To rephrase the point I was making above, I feel much of (a majority even) of today's alignment research is focused on the the first definition of alignment, whilst neglecting the second.

Comment by Stephen Fowler (LosPolloFowler) on Stephen Fowler's Shortform · 2024-01-08T07:10:14.666Z · LW · GW

A concerning amount of alignment research is focused on fixing misalignment in contemporary models, with limited justification for why we should expect these techniques to extend to more powerful future systems.

By improving the performance of today's models, this research makes investing in AI capabilities more attractive, increasing existential risk.

Imagine an alternative history in which GPT-3 had been wildly unaligned. It would not have posed an existential risk to humanity but it would have made putting money into AI companies substantially less attractive to investors.