Why GPT wants to mesa-optimize & how we might change this 2020-09-19T13:48:30.348Z
John_Maxwell's Shortform 2020-09-11T20:55:20.409Z
Are HEPA filters likely to pull COVID-19 out of the air? 2020-03-25T01:07:18.833Z
Comprehensive COVID-19 Disinfection Protocol for Packages and Envelopes 2020-03-15T10:00:33.170Z
Why don't singularitarians bet on the creation of AGI by buying stocks? 2020-03-11T16:27:20.600Z
When are immunostimulants/immunosuppressants likely to be helpful for COVID-19? 2020-03-05T21:44:08.288Z
The Goodhart Game 2019-11-18T23:22:13.091Z
Self-Fulfilling Prophecies Aren't Always About Self-Awareness 2019-11-18T23:11:09.410Z
What AI safety problems need solving for safe AI research assistants? 2019-11-05T02:09:17.686Z
The problem/solution matrix: Calculating the probability of AI safety "on the back of an envelope" 2019-10-20T08:03:23.934Z
The Dualist Predict-O-Matic ($100 prize) 2019-10-17T06:45:46.085Z
Replace judges with Keynesian beauty contests? 2019-10-07T04:00:37.906Z
Three Stories for How AGI Comes Before FAI 2019-09-17T23:26:44.150Z
How to Make Billions of Dollars Reducing Loneliness 2019-08-30T17:30:50.006Z
Response to Glen Weyl on Technocracy and the Rationalist Community 2019-08-22T23:14:58.690Z
Proposed algorithm to fight anchoring bias 2019-08-03T04:07:41.484Z
Raleigh SSC/LW/EA Meetup - Meet MealSquares People 2019-05-08T00:01:36.639Z
The Case for a Bigger Audience 2019-02-09T07:22:07.357Z
Why don't people use formal methods? 2019-01-22T09:39:46.721Z
General and Surprising 2017-09-15T06:33:19.797Z
Heuristics for textbook selection 2017-09-06T04:17:01.783Z
Revitalizing Less Wrong seems like a lost purpose, but here are some other ideas 2016-06-12T07:38:58.557Z
Zooming your mind in and out 2015-07-06T12:30:58.509Z
Purchasing research effectively open thread 2015-01-21T12:24:22.951Z
Productivity thoughts from Matt Fallshaw 2014-08-21T05:05:11.156Z
Managing one's memory effectively 2014-06-06T17:39:10.077Z
OpenWorm and differential technological development 2014-05-19T04:47:00.042Z
System Administrator Appreciation Day - Thanks Trike! 2013-07-26T17:57:52.410Z
Existential risks open thread 2013-03-31T00:52:46.589Z
Why AI may not foom 2013-03-24T08:11:55.006Z
[Links] Brain mapping/emulation news 2013-02-21T08:17:27.931Z
Akrasia survey data analysis 2012-12-08T03:53:35.658Z
Akrasia hack survey 2012-11-30T01:09:46.757Z
Thoughts on designing policies for oneself 2012-11-28T01:27:36.337Z
Room for more funding at the Future of Humanity Institute 2012-11-16T20:45:18.580Z
Empirical claims, preference claims, and attitude claims 2012-11-15T19:41:02.955Z
Economy gossip open thread 2012-10-28T04:10:03.596Z
Passive income for dummies 2012-10-27T07:25:33.383Z
Morale management for entrepreneurs 2012-09-30T05:35:05.221Z
Could evolution have selected for moral realism? 2012-09-27T04:25:52.580Z
Personal information management 2012-09-11T11:40:53.747Z
Proposed rewrites of LW home page, about page, and FAQ 2012-08-17T22:41:57.843Z
[Link] Holistic learning ebook 2012-08-03T00:29:54.003Z
Brainstorming additional AI risk reduction ideas 2012-06-14T07:55:41.377Z
Marketplace Transactions Open Thread 2012-06-02T04:31:32.387Z
Expertise and advice 2012-05-27T01:49:25.444Z
PSA: Learn to code 2012-05-25T18:50:01.407Z
Knowledge value = knowledge quality × domain importance 2012-04-16T08:40:57.158Z
Rationality anecdotes for the homepage? 2012-04-04T06:33:32.097Z
Simple but important ideas 2012-03-21T06:59:22.043Z


Comment by john_maxwell on Thoughts on Iason Gabriel’s Artificial Intelligence, Values, and Alignment · 2021-01-15T16:04:38.067Z · LW · GW

Humans aren't fit to run the world, and there's no reason to think humans can ever be fit to run the world.

I see this argument pop up every so often. I don't find it persuasive because it presents a false choice in my view.

Our choice is not between having humans run the world and having a benevolent god run the world. Our choice is between having humans run the world, and having humans delegate the running of the world to something else (which is kind of just an indirect way of running the world).

If you think the alignment problem is hard, you probably believe that humans can't be trusted to delegate to an AI, which means we are left with either having humans run the world (something humans can't be trusted to do) or having humans build an AI to run the world (also something humans can't be trusted to do).

The best path, in my view, is to pick and choose in order to make the overall task as easy as possible. If we're having a hard time thinking of how to align an AI for a particular situation, add more human control. If we think humans are incompetent or untrustworthy in some particular circumstance, delegate to the AI in that circumstance.

It's not obvious to me that becoming wiser is difficult -- your comment is light on supporting evidence, violence seems less frequent nowadays, and it seems possible to me that becoming wiser is merely unincentivized, not difficult. (BTW, this is related to the question of how effective rationality training is.)

However, again, I see a false choice. We don't have flawless computerized wisdom at the touch of a button. The alignment problem remains unsolved. What we do have are various exotic proposals for computerized wisdom (coherent extrapolated volition, indirect normativity) which are very difficult to test. Again, insofar as you believe the problem of aligning AIs with human values is hard, you should be pessimistic about these proposals working, and (relatively) eager to shift responsibility to systems we are more familiar with (biological humans).

Let's take coherent extrapolated volition. We could try & specify some kind of exotic virtual environment where the AI can simulate idealized humans and observe their values... or we could become idealized humans. Given the knowledge of how to create a superintelligent AI, the second approach seems more robust to me. Both approaches require us to nail down what we mean by an "idealized human", but the second approach does not include the added complication+difficulty of specifying a virtual environment, and has a flesh and blood "human in the loop" observing the process at every step, able to course correct if things seem to be going wrong.

The best overall approach might be a committee of ordinary humans, morally enhanced humans, and morally enhanced ems of some sort, where the AI only acts when all three parties agree on something (perhaps also preventing the parties from manipulating each other somehow). But anyway...

You talk about the influence of better material conditions and institutions. Fine, have the AI improve our material conditions and design better institutions. Again I see a false choice between outcomes achieved by institutions and outcomes achieved by a hypothetical aligned AI which doesn't exist. Insofar as you think alignment is hard, you should be eager to make an AI less load-bearing and institutions more load-bearing.

Maybe we can have an "institutional singularity" where we have our AI generate a bunch of proposals for institutions, then we have our most trusted institution choose from amongst those proposals, we build the institution as proposed, then have that institution choose from amongst a new batch of institution proposals until we reach a fixed point. A little exotic, but I think I've got one foot on terra firma.

Comment by john_maxwell on The Great Karma Reckoning · 2021-01-15T14:59:35.116Z · LW · GW

We removed the historical 10x multiplier for posts that were promoted to main on LW 1.0

Are comments currently accumulating karma in the same way that toplevel posts do?

Comment by john_maxwell on Approval Extraction Advertised as Production · 2021-01-13T11:35:40.659Z · LW · GW

When I read this essay in 2019, I remember getting the impression that approval-extracting vs production-oriented was supposed to be about the behavior of the founders, not the industry the company competes in.

Comment by john_maxwell on Why GPT wants to mesa-optimize & how we might change this · 2021-01-12T00:03:39.014Z · LW · GW

I was using it to refer to "any inner optimizer". I think that's the standard usage but I'm not completely sure.

Comment by john_maxwell on Why GPT wants to mesa-optimize & how we might change this · 2021-01-09T02:30:54.451Z · LW · GW

With regard to the editing text discussion, I was thinking of a really simple approach where we resample words in the text at random. Perhaps that wouldn't work great, but I do think editing has potential because it allows for more sophisticated thinking.

Let's say we want our language model to design us an aircraft. Perhaps its starts by describing the engine, and then it describes the wings. Standard autoregressive text generation (assuming no lookahead) will allow the engine design to influence the wing design (assuming the engine design is inside the context window when it's writing about the wings), but it won't allow the wing design to influence the engine design. However, if the model is allowed to edit its text, it can rethink the engine in light of the wings and rethink the wings in light of the engine until it's designed a really good aircraft.

In particular, it would be good to figure out some way of contriving a mesa-optimization setup, such that we could measure if these fixes would prevent it or not.

Agreed. Perhaps if we generated lots of travelling salesman problem instances where the greedy approach doesn't get you something that looks like the optimal route, then try & train a GPT architecture to predict the cities in the optimal route in order?

This is an interesting quote: our experience we find that lean stochastic local search techniques such as simulated annealing are often the most competitive for hard problems with little structure to exploit.


I suspect GPT will be biased towards avoiding mesa-optimization and making use of heuristics, so the best contrived mesa-optimization setup may be an optimization problem with little structure where heuristics aren't very helpful. Maybe we could focus on problems where non-heuristic methods such as branch and bound / backtracking are considered state of the art, and train the architecture to mesa-optimize by starting with easy instances and gradually moving to harder and harder ones.

Comment by john_maxwell on Why GPT wants to mesa-optimize & how we might change this · 2020-11-28T06:11:04.868Z · LW · GW

Thanks for sharing!

Comment by john_maxwell on The (Unofficial) Less Wrong Comment Challenge · 2020-11-14T09:06:29.689Z · LW · GW

I also felt frustrated by lack of feedback my posts got, my response was to write this: Maybe submitting LW posts to targeted subreddits could be high impact?

LessWrong used to have a lot of comments back in the day. I wonder if part of the issue is simply that the number of posts went up, which means a bigger surfaces for readers to be spread across. Why did the writer/reader ratio go up? Perhaps because writing posts falls into the "endorsed" category, whereas reading/writing comments feels like "time-wasting". And as CFAR et al helped rationalists be more productive, they let activities labeled as "time-wasting" fall by the wayside. (Note that there's something rather incoherent about this: If the subject matter of the post was important enough to be worth a post, surely it is also worth reading/commenting?)

Anyway, here are the reasons why commenting falls into the "endorsed" column for me:

  • It seems neglected. See above argument.
  • I suspect people actually read comments a fair amount. I know I do. Sometimes I will skip to the comments before reading the post itself.
  • Writing a comment doesn't trigger the same "officialness" anxiety that writing a post does. I don't feel obligated to do background research, think about how my ideas should be structured, or try to anticipate potential lines of counterargument.
  • Taking this further, commenting doesn't feel like work. So it takes fewer spoons. I'm writing this comment during a pre-designated goof off period, in fact. The ideal activity is one which is high-impact yet feels like play. Commenting and brainstorming are two of the few things that fall in that category for me.

I know there was an effort to move the community from Facebook to LW recently. Maybe if we pitched LW as "just as fun as Facebook, but discussing more valuable things and adding to a searchable/taggable knowledge archive" that could lure people over? IMO the concept of "work that feels like play" is underrated in the rationalist and EA communities.

Unfortunately, even though I find it fun to write comments, I tend to get demoralized a while later when my comments don't get comment replies themselves :P So that ends up being an "endorsed" reason to avoid commenting.

Comment by john_maxwell on Where do (did?) stable, cooperative institutions come from? · 2020-11-04T08:02:10.368Z · LW · GW

Well, death spirals can happen, but turnaround / reform can also happen. It usually needs good leadership though.

Sure, they have competitors, but what are they competing on? In terms of what's going on in the US right now, one story is that newspapers used to be nice and profitable, which created room for journalists to pursue high-minded ideals related to objectivity, fairness, investigative reporting, etc. But since Google/Craigslist took most of their ad revenue, they've had to shrink a bunch, and the new business environment leaves less room for journalists to pursue those high-minded ideals. Instead they're forced to write clickbait and/or pander to a particular ideological group to get subscriptions. Less sophisticated reporting/analysis means less sophisticated voting means less sophisticated politicians who aren't as capable of reforming whatever government department is currently most in need of reform (or, less sophisticated accountability means they do a worse job).

Comment by john_maxwell on Where do (did?) stable, cooperative institutions come from? · 2020-11-04T04:59:43.709Z · LW · GW

Another hypothesis: Great people aren't just motivated by money. They're also motivated by things like great coworkers, interesting work, and prestige.

In the private sector, you see companies like Yahoo go into death spirals: Once good people start to leave, the quality of the coworkers goes down, the prestige of being a Yahoo employee goes down, and you have to deal with more BS instead of bold, interesting initiatives... which means fewer great people join and more leave (partially, also, because mediocre people can't identify, or don't want to hire, great people.)

This death spiral is OK in the private sector because people can just switch their search engine from Yahoo to Google if the results become bad. But there's no analogous competitive process for provisioning public sector stuff.

Good Marines get out because of bad leadership, which means bad Marines stay in and eventually get promoted to leadership positions and the cycle repeats itself.


Comment by john_maxwell on John_Maxwell's Shortform · 2020-11-03T04:18:24.638Z · LW · GW

That's possible, but I'm guessing that it's not hard for a superintelligent AI to suddenly swallow an entire system using something like gray goo.

Comment by john_maxwell on John_Maxwell's Shortform · 2020-11-03T03:58:52.531Z · LW · GW

In this reaction to Critch's podcast, I wrote about some reasons to think that a singleton would be preferable to a multipolar scenario. Here's another rather exotic argument.

[The dark forest theory] is explained very well near the end of the science fiction novel, The Dark Forest by Liu Cixin.


When two [interstellar] civilizations meet, they will want to know if the other is going to be friendly or hostile. One side might act friendly, but the other side won't know if they are just faking it to put them at ease while armies are built in secret. This is called chains of suspicion. You don't know for sure what the other side's intentions are. On Earth this is resolved through communication and diplomacy. But for civilizations in different solar systems, that's not possible due to the vast distances and time between message sent and received. Bottom line is, every civilization could be a threat and it's impossible to know for sure, therefore they must be destroyed to ensure your survival.

Source. (Emphasis mine.)

Secure second strike is the ability to retaliate with your own nuclear strike if someone hits you with nukes. Secure second strike underpins mutually assured destruction. If nuclear war had a "first mover advantage", where whoever launches nukes first wins because the country that is hit with nukes is unable to retaliate, that would be much worse for a game theory perspective, because there's an incentive to be the first mover and launch a nuclear war (especially if you think your opponent might do the same).

My understanding is that the invention of nuclear submarines was helpful for secure second strike. There is so much ocean for them to hide in that it's difficult to track and eliminate all of your opponent's nuclear submarines and ensure they won't be able to hit you back.

However, in Allan Dafoe's article AI Governance: Opportunity and Theory of Impact, he mentions that AI processing of undersea sensors could increase the risk of nuclear war (presumably because it makes it harder for nuclear submarines to hide).

Point being, we don't know what the game theory of a post-AGI world looks like. And we really don't know what interstellar game theory between different AGIs looks like. ("A colonized solar system is plausibly a place where predators can see most any civilized activities of any substantial magnitude, and get to them easily if not quickly."--source.) It might be that the best strategy is for multipolar AIs to unify into a singleton anyway.

Comment by john_maxwell on John_Maxwell's Shortform · 2020-10-31T03:16:17.952Z · LW · GW

A friend and I went on a long drive recently and listened to this podcast with Andrew Critch on ARCHES. On the way back from our drive we spent some time brainstorming solutions to the problems he outlines. Here are some notes on the podcast + some notes on our brainstorming.

In a possibly inaccurate nutshell, Critch argues that what we think of as the "alignment problem" is most likely going to get solved because there are strong economic incentives to solve it. However, Critch is skeptical of forming a singleton--he says people tend to resist that kind of concentration of power, and it will be hard for an AI team that has this as their plan to recruit team members. Critch says there is really a taxonomy of alignment problems:

  • single-single, where we have a single operator aligning a single AI with their preferences
  • single-multi, where we have a single operator aligning multiple AIs with their preferences
  • multi-single, where we have multiple operators aligning a single AI with their preferences
  • multi-multi, where we have multiple operators aligning multiple AIs with their preferences

Critch says that although there are commercial incentives to solve the single-single alignment problem, there aren't commercial incentives to solve all of the others. He thinks the real alignment failures might look like the sort of diffusion of responsibility you see when navigating bureaucracy.

I'm a bit skeptical of this perspective. For one thing, I'm not convinced commercial incentives for single-single alignment will extrapolate well to exotic scenarios such as the "malign universal prior" problem--and if hard takeoff happens then these exotic scenarios might come quickly. For another thing, although I can see why advocating a singleton would be a turnoff to the AI researchers that Critch is pitching, I feel like the question of whether to create a singleton deserves more than the <60 seconds of thought that an AI researcher having a casual conversation with Critch likely puts into their first impression. If there are commercial incentives to solve single-single alignment but not other kinds, shouldn't we prefer that single-single is the only kind which ends up being load-bearing? Why can't we form an aligned singleton and then tell it to design a mechanism by which everyone can share their preferences and control what the singleton does (democracy but with better reviews)?

I guess a big issue is the plausibility of hard takeoff, because if hard takeoff is implausible, that makes it less likely that a singleton will form under any circumstances, and it also means that exotic safety problems aren't likely to crop up as quickly. If this is Critch's worldview then I could see why he is prioritizing the problems he is prioritizing.

Anyway my friend and I spent some time brainstorming about how to solve versions of the alignment problem besides single-single. Since we haven't actually read ARCHES or much relevant literature, it's likely that much of what comes below is clueless, but it might also have new insights due to being unconstrained by existing paradigms :P

One scenario which is kind of in between multi-single and multi-multi alignment is a scenario where everyone has an AI agent which negotiates with some kind of central server on their behalf. We could turn multi-single into this scenario by telling the single AI to run internal simulations of everyone's individual AI agent, or we could turn multi-multi into this scenario if we have enough cooperation/enforcement for different people to abide by the agreements that their AI agents make with one another on their behalf.

Most of the game theory we're familiar with deals with a fairly small space of agreements it is possible to make, but it occurred to us that in an ideal world, these super smart AIs would be doing a lot of creative thinking, trying to figure out a clever way for everyone's preferences to be satisfied simultaneously. Let's assume each robot agent has a perfect model of its operator's preferences (or can acquire a perfect model as needed by querying the operator). The central server queries the agents about how much utility their operator assigns to various scenarios, or whether they prefer Scenario A to Scenario B, or something like that. And the agents can respond either truthfully or deceptively ("data poisoning"), trying to navigate towards a final agreement which is as favorable as possible for their operator. Then the central server searches the space of possible agreements in a superintelligent way and tries to find an agreement that everyone likes. (You can also imagine a distributed version of this where there is no central server and individual robot agents try to come up with a proposal that everyone likes.)

How does this compare to the scenario I mentioned above, where an aligned AI designs a mechanism and collects preferences from humans directly without any robot agent as an intermediary? The advantage of robot agents is that if everyone gets a superintelligent agent, then it is harder for individuals to gain advantage through the use of secret robot agents, so the overall result ends up being more fair. However, it arguably makes the mechanism design problem harder: If it is humans who are answering preference queries rather than superintelligent robot agents, since humans have finite intelligence, it will be harder for them to predict the strategic results of responding in various ways to preference queries, so maybe they're better off just stating their true preferences to minimize downside risk. Additionally, an FAI is probably better at mechanism design than humans. But then again, if the mechanism design for discovering fair agreements between superintelligent robot agents fails, and a single agent manages to negotiate really well on behalf of its owner's preferences, then arguably you are back in the singleton scenario. So maybe the robot agents scenario has the singleton scenario as its worst case.

I said earlier that it will be harder for humans to predict the strategic results of responding in various ways to preference queries. But we might be able to get a similar result for supersmart AI agents by making use of secret random numbers during the negotiation process to create enough uncertainty where revealing true preferences becomes the optimal strategy. (For example, you could imagine two mechanisms, one of which incentivizes strategic deception in one direction, and the other incentivizes strategic deception in the other direction; if we collect preferences and then flip a coin regarding which mechanism to use, the best strategy might be to do no deception at all.)

Another situation to consider is one where we don't have as much cooperation/enforcement and individual operators are empowered to refuse to abide by any agreement--let's call this "declaring war". In this world, we might prefer to overweight the preferences of more powerful players, because if everyone is weighted equally regardless of power, then the powerful players might have an incentive to declare war and get more than their share. However it's unclear how to do power estimation in an impartial way. Also, such a setup incentivizes accumulation of power.

One idea which seems like it might be helpful on first blush would be to try to invent some way of verifiably implementing particular utility functions, so competing teams could know that a particular AI will take their utility function into account. However this could be abused as follows: In the same way the game of chicken incentivizes tearing out your steering wheel so the opponent has no choice but to swerve, Team Evil could verifiably implement a particular utility function in their AI such that their AI will declare war unless competing teams verifiably implement a utility function Team Evil specifies.

Anyway looking back it doesn't seem like what I've written actually does much for the "bureaucratic diffusion of responsibility" scenario. I'd be interested to know concretely how this might occur. Maybe what we need is a mechanism for incentivizing red teaming/finding things that no one is responsible for/acquiring responsibility for them?

Comment by john_maxwell on Babble challenge: 50 consequences of intelligent ant colonies · 2020-10-30T08:15:12.000Z · LW · GW

Last week we tried a more direct babble, on solving a problem in our lives. When I did it, I felt a bit like the tennis player trying to swing their racket the same way as when they were doing a bicep curl. I felt like I went too directly at the problem, while misunderstanding the mechanism.

Maybe a babble for "50 babble prompts that are both useful and not too direct"? :P

Seems to me that you want to gradually transition towards being able to babble about topics you don't feel very babbly about. It's the most important, most ugh-ish areas of our lives where we typically need fresh thinking the most, IMO.

Perhaps "50 ways to make it easier to babble about things that don't feel babbly"? ;)

Comment by john_maxwell on The Darwin Game · 2020-10-15T22:52:18.689Z · LW · GW

It's a good point but in the original Darwin Game story, the opening sequence 2, 0, 2 was key to the plot.

Comment by john_maxwell on Everything I Know About Elite America I Learned From ‘Fresh Prince’ and ‘West Wing’ · 2020-10-12T15:14:33.561Z · LW · GW

For some reason I was reminded of this post, which could be seen as being about class structure within the Effective Altruist movement.

Comment by john_maxwell on The Darwin Game · 2020-10-10T15:39:59.027Z · LW · GW


Comment by john_maxwell on The Darwin Game · 2020-10-10T12:38:39.021Z · LW · GW

Why does get_opponent_source take self as an argument?

Comment by john_maxwell on Upside decay - why some people never get lucky · 2020-10-10T09:08:47.199Z · LW · GW

Yeah I think it's an empirical question what fraction of upside is explained by weak ties.

Paul Graham wrote this essay which identifies weak ties as one of the 2 main factors behind the success of startup hubs. He also says that "one of the most distinctive things about startup hubs is the degree to which people help one another out, with no expectation of getting anything in return".

Comment by john_maxwell on Open & Welcome Thread – October 2020 · 2020-10-07T07:59:01.358Z · LW · GW

There hasn't been an LW survey since 2017. That's the longest we've ever gone without a survey since the first survey. Are people missing the surveys? What is the right interval to do them on, if any?

Comment by john_maxwell on Open & Welcome Thread – October 2020 · 2020-10-07T07:55:37.975Z · LW · GW

Why not just have a comment which is a list of bullet points and keep editing it?

Comment by john_maxwell on MikkW's Shortform · 2020-10-05T12:51:29.235Z · LW · GW

For what it's worth, I get frustrated by people not responding to my posts/comments on LW all the time. This post was my attempt at a constructive response to that frustration. I think if LW was a bit livelier I might replace all my social media use with it. I tried to do my part to make it lively by reading and leaving comments a lot for a while, but eventually gave up.

Comment by john_maxwell on Davis_Kingsley's Shortform · 2020-10-05T12:42:41.630Z · LW · GW

In a world of distraction, focusing on something is a revolutionary act.

Comment by john_maxwell on Postmortem to Petrov Day, 2020 · 2020-10-05T02:24:42.818Z · LW · GW

You mentioned petrov_day_admin_account, but I got a message from a user called petrovday:

Hello John_Maxwell,

You are part of a smaller group of 30 users who has been selected for the second part of this experiment. In order for the website not to go down, at least 5 of these selected users must enter their codes within 30 minutes of receiving this message, and at least 20 of these users must enter their codes within 6 hours of receiving the message. To keep the site up, please enter your codes as soon as possible. You will be asked to complete a short survey afterwards.

I saw the message more than 6 hours after it was sent and didn't read it very carefully. The possibility of phishing didn't occur to me, and I assumed that this new smaller group thing would involve entering a different code into a different page. Anyway, it was a useful lesson in being more aware of phishing attacks.

Comment by john_maxwell on John_Maxwell's Shortform · 2020-09-30T03:27:32.093Z · LW · GW

Someone wanted to know about the outcome of my hair loss research so I thought I would quickly write up what I'm planning to try for the next year or so. No word on how well it works yet.

Most of the ideas are from this review:

I think this should be safer/less sketchy than the big 3 and fairly low cost, but plausibly less effective on expectation; let me know if you disagree.

Comment by john_maxwell on Some Simple Observations Five Years After Starting Mindfulness Meditation · 2020-09-28T00:10:16.220Z · LW · GW

Those fatigue papers you recommended were a serious game-changer for me

Any chance you could link to whatever you're referring to? :)

Comment by john_maxwell on Why GPT wants to mesa-optimize & how we might change this · 2020-09-27T05:45:54.540Z · LW · GW

Your philosophical point is interesting; I have a post in the queue about that. However I don't think it really proves what you want it to.

Having John_Maxwell in the byline makes it far more likely that I'm the author of the post.

If humans can make useful judgements re: whether this is something I wrote, vs something nostalgebraist wrote to make a point about bylines, I don't see why a language model can't do the same, in principle.

GPT is trying to be optimal at next-step prediction, and an optimal next-step predictor should not get improved by lookahead, it should already have those facts priced in to its next-step prediction.

A perfectly optimal next-step predictor would not be improved by lookahead or anything else, it's perfectly optimal. I'm talking about computational structures which might be incentivized during training when the predictor is suboptimal. (It's still going to be suboptimal after training with current technology, of course.)

In orthonormal's post they wrote:

...GPT-3's ability to write fiction is impressive- unlike GPT-2, it doesn't lose track of the plot, it has sensible things happen, it just can't plan its way to a satisfying resolution.

I'd be somewhat surprised if GPT-4 shared that last problem.

I suspect that either GPT-4 will still be unable to plan its way to a satisfying resolution, or GPT-4 will develop some kind of internal lookahead (probably not beam search, but beam search could be a useful model for understanding it) which is sufficiently general to be re-used across many different writing tasks. (Generality takes fewer parameters.) I don't know what the relative likelihoods of those possibilities are. But the whole idea of AI safety is to ask what happens if we succeed.

Comment by john_maxwell on Why GPT wants to mesa-optimize & how we might change this · 2020-09-26T04:03:22.727Z · LW · GW

So a predictor which seems (and is) frighteningly powerful at some short range L will do little better than random guessing if you chain its predictions up to some small multiple of L.

A system which develops small-L lookahead (for L > 1) may find large-L lookahead to be nearby in programspace. If so, incentivizing the development of small-L lookahead makes it more likely that the system will try large-L lookahead and find it to be useful as well (in predicting chess moves for instance).

My intuition is that small-L lookahead could be close to large-L lookahead in programspace for something like an RNN, but not for GPT-3's transformer architecture.

Anyway, the question here isn't whether lookahead will be perfectly accurate, but whether the post-lookahead distribution of next words will allow for improvement over the pre-lookahead distribution. Lookahead is almost certainly going to do better than random guessing, even topic models can do that.

By construction, language modeling gives you nothing to work with except the text itself, so you don't know who produced it or for whom.

Are you saying that GPT-3's training corpus was preprocessed to remove information about the author, title, and publication venue? Or are you only talking about what happens when this info is outside the context window?

Comment by john_maxwell on Draft report on AI timelines · 2020-09-24T05:27:28.684Z · LW · GW

Worth noting that the "evidence from the nascent AI industry" link has bits of evidence pointing in both directions. For example:

Training a single AI model can cost hundreds of thousands of dollars (or more) in compute resources. While it’s tempting to treat this as a one-time cost, retraining is increasingly recognized as an ongoing cost, since the data that feeds AI models tends to change over time (a phenomenon known as “data drift”).

Doesn't this kind of cost make AI services harder to commodify? And also:

We’ve seen a massive difference in COGS between startups that train a unique model per customer versus those that are able to share a single model (or set of models) among all customers....

That sounds rather monopoly-ish doesn't it? Although the blogger's takeaway is

Machine learning startups generally have no moat or meaningful special sauce

I'll be somewhat surprised if language modeling gets commodified down to 0 profits even if Google and Facebook release competing models. I'd expect it to look more like cloud infrastructure industry, "designed to extract maximum blood" as the author of your blog post puts it. See e.g.

Comment by john_maxwell on Why GPT wants to mesa-optimize & how we might change this · 2020-09-23T08:05:00.555Z · LW · GW
  1. Stopping mesa-optimizing completely seems mad hard.

As I mentioned in the post, I don't think this is a binary, and stopping mesa-optimization "incompletely" seems pretty useful. I also have a lot of ideas about how to stop it, so it doesn't seem mad hard to me.

  1. Managing "incentives" is the best way to deal with this stuff, and will probably scale to something like 1,000,000x human intelligence.

I'm less optimistic about this approach.

  1. There is a stochastic aspect to training ML models, so it's not enough to say "the incentives favor Mesa-Optimizing for X over Mesa-Optimizing for Y". If Mesa-Optimizing for Y is nearby in model-space, we're liable to stumble across it.

  2. Even if your mesa-optimizer is aligned, if it doesn't have a way to stop mesa-optimization, there's the possibility that your mesa-optimizer would develop another mesa-optimizer inside itself which isn't necessarily aligned.

  3. I'm picturing value learning via (un)supervised learning, and I don't see an easy way to control the incentives of any mesa-optimizer that develops in the context of (un)supervised learning. (Curious to hear about your ideas though.)

My intuition is that the distance between Mesa-Optimizing for X and Mesa-Optimizing for Y is likely to be smaller than the distance between an Incompetent Mesa-Optimizer and a Competent Mesa-Optimizer. If you're shooting for a Competent Human Values Mesa-Optimizer, it would be easy to stumble across a Competent Not Quite Human Values Mesa-Optimizer along the way. All it would take would be having the "Competent" part in place before the "Human Values" part. And running a Competent Not Quite Human Values Mesa-Optimizer during training is likely to be dangerous.

On the other hand, if we have methods for detecting mesa-optimization or starving it of compute that work reasonably well, we're liable to stumble across an Incompetent Mesa-Optimizer and run it a few times, but it's less likely that we'll hit the smaller target of a Competent Mesa-Optimizer.

Comment by john_maxwell on Why GPT wants to mesa-optimize & how we might change this · 2020-09-22T11:38:43.856Z · LW · GW

My thought was that if lookahead improves performance during some period of the training, it's liable to develop mesa-optimization during that period, and then find it to be a useful for other things later on.

Comment by john_maxwell on Why GPT wants to mesa-optimize & how we might change this · 2020-09-22T11:35:09.051Z · LW · GW

Now it's true that efficiently estimating that conditional using a single forward pass of a transformer might involve approximations to beam search sometimes.

Yeah, that's the possibility the post explores.

At a high level, I don't think we really need to be concerned with this form of "internal lookahead" unless/until it starts to incorporate mechanisms outside of the intended software environment (e.g. the hardware, humans, the external (non-virtual) world).

Is there an easy way to detect if it's started doing that / tell it to restrict its lookahead to particular domains? If not, it may be easier to just prevent it from mesa-optimizing in the first place. (The post has arguments for why that's (a) possible and (b) wouldn't necessarily involve a big performance penalty.)

Comment by john_maxwell on Developmental Stages of GPTs · 2020-09-20T02:01:48.805Z · LW · GW

BTW with regard to "studying mesa-optimization in the context of such systems", I just published this post: Why GPT wants to mesa-optimize & how we might change this.

I'm still thinking about the point you made in the other subthread about MAML. It seems very plausible to me that GPT is doing MAML type stuff. I'm still thinking about if/how that could result in dangerous mesa-optimization.

Comment by john_maxwell on Why GPT wants to mesa-optimize & how we might change this · 2020-09-20T00:40:46.911Z · LW · GW

Well I suppose mesa-optimization isn't really a binary is it? Like, maybe there's a trivial sense in which self-attention "mesa-optimizes" over its input when figuring out what to pay attention to.

But ultimately, what matters isn't the definition of the term "mesa-optimization", it's the risk of spontaneous internal planning/optimization that generalizes in unexpected ways or operates in unexpected domains. At least in my mind. So the question is whether this considering multiple possibilities about text stuff could also improve its ability to consider multiple possibilities in other domains. Which depends on whether the implementation of "considering multiple possibilities" looks more like beam search vs very domain-adapted heuristics.

Comment by john_maxwell on Why GPT wants to mesa-optimize & how we might change this · 2020-09-19T22:09:59.409Z · LW · GW

This post distinguishes between mesa-optimization and learned heuristics. What you're describing sounds like learned heuristics. ("Learning which words are easy to rhyme" was an example I gave in the post.) Learned heuristics aren't nearly as worrisome as mesa-optimization because they're harder to modify and misuse to do planning in unexpected domains. When I say "lookahead" in the post I'm pretty much always referring to the mesa-optimization sort.

Comment by john_maxwell on Developmental Stages of GPTs · 2020-09-19T01:18:52.334Z · LW · GW


Comment by john_maxwell on Developmental Stages of GPTs · 2020-09-17T23:40:42.360Z · LW · GW

The outer optimizer is the more obvious thing: it's straightforward to say there's a big difference in dealing with a superhuman Oracle AI with only the goal of answering each question accurately, versus one whose goals are only slightly different from that in some way.

GPT generates text by repeatedly picking whatever word seems highest probability given all the words that came before. So if its notion of "highest probability" is almost, but not quite, answering every question accurately, I would expect a system which usually answers questions accurately but sometimes answers them inaccurately. That doesn't sound very scary?

Comment by john_maxwell on Developmental Stages of GPTs · 2020-09-17T22:54:07.344Z · LW · GW

esp. since GPT-3's 0-shot learning looks like mesa-optimization

Could you provide more details on this?

Sometimes people will give GPT-3 a prompt with some examples of inputs along with the sorts of responses they'd like to see from GPT-3 in response to those inputs ("few-shot learning", right? I don't know what 0-shot learning you're referring to.) Is your claim that GPT-3 succeeds at this sort of task by doing something akin to training a model internally?

If that's what you're saying... That seems unlikely to me. GPT-3 is essentially a stack of 96 transformers right? So if it was doing something like gradient descent internally, how many consecutive iterations would it be capable of doing? It seems more likely to me that GPT-3 is simply able to learn sufficiently rich internal representations such that when the input/output examples are within its context window, it picks up their input/output structure and forms a sufficiently sophisticated conception of that structure that the word that scores highest according to next-word prediction is a word that comports with the structure.

96 transformers would appear to offer a very limited budget for any kind of serial computation, but there's a lot of parallel computation going on there, and there are non-gradient-descent optimization algorithms, genetic algorithms say, that can be parallelized. I guess the query matrix could be used to implement some kind of fitness function? It would be interesting to try some kind of layer-wise pretraining on transformer blocks and train them to compute steps in a parallelizable optimization algorithm (probably you'd want to pick a deterministic algorithm which is parallelizable instead of a stochastic algorithm like genetic algorithms). Then you could look at the resulting network and based on it, try to figure out what the telltale signs of a mesa-optimizer are (since this network is almost certainly implementing a mesa-optimizer).

Still, my impression is you need 1000+ generations to get interesting results with genetic algorithms, which seems like a lot of serial computation relative to GPT-3's budget...

Comment by john_maxwell on John_Maxwell's Shortform · 2020-09-17T22:27:23.167Z · LW · GW

/r/tressless is about 6 times as big FYI.

The way I'm currently thinking about it is that reddit was originally designed as a social news website, and you have tack on a bunch of extras if you want your subreddit to do knowledge-accumulation, but phpBB gets you that with much less effort. (Could be as simple as having a culture of "There's already a thread for that here, you should add your post to it.")

Comment by john_maxwell on John_Maxwell's Shortform · 2020-09-16T05:52:27.210Z · LW · GW

Another point is that if LW and a hypothetical phpBB forum have different "cognitive styles", it could be valuable to keep both around for the sake of cognitive diversity.

Comment by john_maxwell on John_Maxwell's Shortform · 2020-09-11T20:55:24.404Z · LW · GW

Progress Studies: Hair Loss Forums

I still have about 95% of my hair. But I figure it's best to be proactive. So over the past few days I've been reading a lot about how to prevent hair loss.

My goal here is to get a broad overview (i.e. I don't want to put in the time necessary to understand what a 5-alpha-reductase inhibitor actually is, beyond just "an antiandrogenic drug that helps with hair loss"). I want to identify safe, inexpensive treatments that have both research and anecdotal support.

In the hair loss world, the "Big 3" refers to 3 well-known treatments for hair loss: finasteride, minoxidil, and ketoconazole. These treatments all have problems. Some finasteride users report permanent loss of sexual function. If you go off minoxidil, you lose all the hair you gained, and some say it wrinkles their skin. Ketoconazole doesn't work very well.

To research treatments beyond the Big 3, I've been using various tools, including both Google Scholar and a "custom search engine" I created for digging up anecdotes from forums. Basically, take whatever query I'm interested in ("pumpkin seed oil" for instance), add this OR OR OR OR OR OR OR OR OR OR OR OR OR OR OR OR

and then search on Google.

Doing this repeatedly has left me feeling like a geologist who's excavated a narrow stratigraphic column of Internet history.

And my big takeaway is how much dumber people got collectively between the "old school phpBB forum" layer and the "subreddit" layer.

This is a caricature, but I don't think it would be totally ridiculous to summarize discussion on /r/tressless as:

  1. Complaining about Big 3 side effects
  2. Complaining that the state of the art in hair loss hasn't advanced in the past 10 years
  3. Putdowns for anyone who tries anything which isn't the Big 3

If I was conspiracy-minded, I would wonder if Big 3 manufacturers had paid shills who trolled online forums making fun of anyone who tries anything which isn't their product. It's just the opposite of the behavior you'd expect based on game theory: Someone who tries something new individually runs the risk of new side effects, or wasting their time and money, with some small chance of making a big discovery which benefits the collective. So a rational forum user's response to someone trying something new should be: "By all means, please be the guinea pig". And yet that seems uncommon.

Compared with reddit, discussion of nonstandard treatments on old school forums goes into greater depth--I stumbled across a thread on an obscure treatment which was over 1000 pages long. And the old school forums have a higher capacity for innovation... here is a website that an old school forum user made for a DIY formula he invented, "Zix", which a lot of forum users had success with. (The site has a page explaining why we should expect the existence of effective hair loss treatments that the FDA will never approve.) He also links to a forum friend who started building and selling custom laser helmets for hair regrowth. (That's another weird thing about online hair loss forums... Little discussion of laser hair regrowth, even though it's FDA approved, intuitively safe, and this review found it works better than finasteride or minoxidil.)

So what happened with the transition to reddit? Some hypotheses:

  • Generalized eternal September
  • Internet users have a shorter attention span nowadays
  • Upvoting/downvoting facilitates groupthink
  • reddit's "hot" algorithm discourages the production of deep content; the "bump"-driven discussion structure of old school forums allows for threads which are over 1000 pages long
  • Weaker community feel due to intermixing with the entire reddit userbase

I'm starting to wonder if we should set up a phpBB style AI safety discussion forum. I have hundreds of thousands of words of AI content in my personal notebook, only a small fraction of which I've published. Posting to LW seems to be a big psychological speed bump for me. And I'm told that discussion on the Alignment Forum represents a fairly narrow range of perspectives within the broader AI safety community, perhaps because of the "upvoting/downvoting facilitates groupthink" thing.

The advantage of upvoting/downvoting seems to be a sort of minimal quality control--there is less vulnerability to individual fools as described in this post. But I'm starting to wonder if some of the highs got eliminated along with the lows.

Anyway, please send me a message if an AI safety forum sounds interesting to you.

Comment by john_maxwell on The Box Spread Trick: Get rich slightly faster · 2020-09-03T05:14:45.349Z · LW · GW

Does anyone have thoughts on whether buying Treasury Inflation-Protected Securities (probably in the form of an ETF) on margin would be a good way to hedge against currency devaluation?

Comment by john_maxwell on ricraz's Shortform · 2020-08-27T04:36:06.301Z · LW · GW

There's been a fair amount of discussion of that sort of thing here: There are also groups outside LW thinking about social technology such as RadicalxChange.

Imagine you took 5 separate LWers and asked them to create a unified consensus response to a given article. My guess is that they’d learn more through that collective effort, and produce a more useful response, than if they spent the same amount of time individually evaluating the article and posting their separate replies.

I'm not sure. If you put those 5 LWers together, I think there's a good chance that the highest status person speaks first and then the others anchor on what they say and then it effectively ends up being like a group project for school with the highest status person in charge. Some related links.

Comment by john_maxwell on ricraz's Shortform · 2020-08-26T11:52:22.090Z · LW · GW
  1. All else equal, the harder something is, the less we should do it.

  2. My quick take is that writing lit reviews/textbooks is a comparative disadvantage of LW relative to the mainstream academic establishment.

In terms of producing reliable knowledge... if people actually care about whether something is true, they can always offer a cash prize for the best counterargument (which could of course constitute citation of academic research). The fact that people aren't doing this suggests to me that for most claims on LW, there isn't any (reasonably rich) person who cares deeply re: whether the claim is true. I'm a little wary of putting a lot of effort into supply if there is an absence of demand.

(I guess the counterargument is that accurate knowledge is a public good so an individual's willingness to pay doesn't get you complete picture of the value accurate knowledge brings. Maybe what we need is a way to crowdfund bounties for the best argument related to something.)

(I agree that LW authors would ideally engage more with each other and academic literature on the margin.)

Comment by john_maxwell on Learning human preferences: black-box, white-box, and structured white-box access · 2020-08-26T11:02:04.690Z · LW · GW

Let's say I'm trying to describe a hockey game. Modularizing the preferences from other aspects of the team algorithm makes it much easier to describe what happens at the start of the second period, when the two teams switch sides.

The fact that humans find an abstraction useful is evidence that an AI will as well. The notion that agents have preferences helps us predict how people will change their plans for achieving their goals when they receive new information. Same for an AI.

Comment by john_maxwell on ricraz's Shortform · 2020-08-26T08:10:34.050Z · LW · GW

Fair enough. I'm reminded of a time someone summarized one of my posts as being a definitive argument against some idea X and me thinking to myself "even I don't think my post definitively settles this issue" haha.

Comment by john_maxwell on ricraz's Shortform · 2020-08-26T05:41:09.199Z · LW · GW

LW doesn't have enough depth because people don't care enough about depth - they're willing to accept ideas even before they've been explored in depth. If this explanation is correct, then it seems accurate to call it a problem with our epistemic standards - specifically, the standard of requiring (and rewarding) deep investigation and scholarship.

Your solution to the "willingness to accept ideas even before they've been explored in depth" problem is to explore ideas in more depth. But another solution is to accept fewer ideas, or hold them much more provisionally.

I'm a proponent of the second approach because:

  • I suspect even academia doesn't hold ideas as provisionally as it should. See Hamming on expertise:

  • I suspect trying to browbeat people to explore ideas in more depth works against the grain of an online forum as an institution. Browbeating works in academia because your career is at stake, but in an online forum, it just hurts intrinsic motivation and cuts down on forum use (the forum runs on what Clay Shirky called "cognitive surplus", essentially a term for peoples' spare time and motivation). I'd say one big problem with LW 1.0 that LW 2.0 had to solve before flourishing was people felt too browbeaten to post much of anything.

If we accept fewer ideas / hold them much more provisionally, but provide a clear path to having an idea be widely held as true, that creates an incentive for people to try & jump through hoops--and this incentive is a positive one, not a punishment-driven browbeating incentive.

Maybe part of the issue is that on LW, peer review generally happens in the comments after you publish, not before. So there's no publication carrot to offer in exchange for overcoming the objections of peer reviewers.

Comment by john_maxwell on ricraz's Shortform · 2020-08-26T05:27:15.969Z · LW · GW

One proven claim is worth a dozen compelling hypotheses, but LW to a first approximation only produces the latter.

Depends on the claim, right?

If the cost of evaluating a hypothesis is high, and hypotheses are cheap to generate, I would like to generate a great deal before selecting one to evaluate.

Comment by john_maxwell on "Good judgement" and its components · 2020-08-26T04:49:47.818Z · LW · GW

So much AI safety literature is based around reinforcement learning, but it seems like an impoverished model for describing how humans plan. I have a feeling RL will ultimately be left behind in the same way e.g. SVMs have been left behind.

Comment by john_maxwell on How much can surgical masks help with wildfire smoke? · 2020-08-26T04:40:08.942Z · LW · GW

Sorry to hear about your asthma.

The Powecom KN95s sold here did very well in unofficial government KN95 tests and breathe easier than my P100 respirator:

Adding this improves the seal:

Comment by john_maxwell on On Suddenly Not Being Able to Work · 2020-08-26T04:23:38.012Z · LW · GW

Some ideas:

  • Take some kind of antianxiety thing like ashwagandha or theanine (see
  • Have a place in your house where you sit with your computer where work is the only allowed thing. You can take a break, but it has to be away from your computer (e.g. walk around the neighborhood). Or you can goof off on your computer, but you have to move it out of the work place first. Then make rules for moving out of the work place (e.g. 10-minute wait required).
  • When you relax in the evening, select breaks that entrain longer relaxation (e.g. a full TV show instead of short youtube videos). Don't allow yourself to worry about work after 10 pm say--the idea is to retrain your concentration ability through an activity that rewards concentration.