Posts

What AI safety problems need solving for safe AI research assistants? 2019-11-05T02:09:17.686Z · score: 15 (4 votes)
The problem/solution matrix: Calculating the probability of AI safety "on the back of an envelope" 2019-10-20T08:03:23.934Z · score: 24 (8 votes)
The Dualist Predict-O-Matic ($100 prize) 2019-10-17T06:45:46.085Z · score: 17 (6 votes)
Replace judges with Keynesian beauty contests? 2019-10-07T04:00:37.906Z · score: 28 (9 votes)
Three Stories for How AGI Comes Before FAI 2019-09-17T23:26:44.150Z · score: 28 (9 votes)
How to Make Billions of Dollars Reducing Loneliness 2019-08-30T17:30:50.006Z · score: 60 (27 votes)
Response to Glen Weyl on Technocracy and the Rationalist Community 2019-08-22T23:14:58.690Z · score: 52 (26 votes)
Proposed algorithm to fight anchoring bias 2019-08-03T04:07:41.484Z · score: 10 (2 votes)
Raleigh SSC/LW/EA Meetup - Meet MealSquares People 2019-05-08T00:01:36.639Z · score: 12 (3 votes)
The Case for a Bigger Audience 2019-02-09T07:22:07.357Z · score: 69 (27 votes)
Why don't people use formal methods? 2019-01-22T09:39:46.721Z · score: 21 (8 votes)
General and Surprising 2017-09-15T06:33:19.797Z · score: 3 (3 votes)
Heuristics for textbook selection 2017-09-06T04:17:01.783Z · score: 8 (8 votes)
Revitalizing Less Wrong seems like a lost purpose, but here are some other ideas 2016-06-12T07:38:58.557Z · score: 24 (29 votes)
Zooming your mind in and out 2015-07-06T12:30:58.509Z · score: 8 (9 votes)
Purchasing research effectively open thread 2015-01-21T12:24:22.951Z · score: 12 (13 votes)
Productivity thoughts from Matt Fallshaw 2014-08-21T05:05:11.156Z · score: 13 (14 votes)
Managing one's memory effectively 2014-06-06T17:39:10.077Z · score: 14 (15 votes)
OpenWorm and differential technological development 2014-05-19T04:47:00.042Z · score: 6 (7 votes)
System Administrator Appreciation Day - Thanks Trike! 2013-07-26T17:57:52.410Z · score: 70 (71 votes)
Existential risks open thread 2013-03-31T00:52:46.589Z · score: 10 (11 votes)
Why AI may not foom 2013-03-24T08:11:55.006Z · score: 23 (35 votes)
[Links] Brain mapping/emulation news 2013-02-21T08:17:27.931Z · score: 2 (7 votes)
Akrasia survey data analysis 2012-12-08T03:53:35.658Z · score: 13 (14 votes)
Akrasia hack survey 2012-11-30T01:09:46.757Z · score: 11 (14 votes)
Thoughts on designing policies for oneself 2012-11-28T01:27:36.337Z · score: 80 (80 votes)
Room for more funding at the Future of Humanity Institute 2012-11-16T20:45:18.580Z · score: 18 (21 votes)
Empirical claims, preference claims, and attitude claims 2012-11-15T19:41:02.955Z · score: 5 (28 votes)
Economy gossip open thread 2012-10-28T04:10:03.596Z · score: 23 (30 votes)
Passive income for dummies 2012-10-27T07:25:33.383Z · score: 17 (22 votes)
Morale management for entrepreneurs 2012-09-30T05:35:05.221Z · score: 9 (14 votes)
Could evolution have selected for moral realism? 2012-09-27T04:25:52.580Z · score: 4 (14 votes)
Personal information management 2012-09-11T11:40:53.747Z · score: 18 (19 votes)
Proposed rewrites of LW home page, about page, and FAQ 2012-08-17T22:41:57.843Z · score: 18 (19 votes)
[Link] Holistic learning ebook 2012-08-03T00:29:54.003Z · score: 10 (17 votes)
Brainstorming additional AI risk reduction ideas 2012-06-14T07:55:41.377Z · score: 12 (15 votes)
Marketplace Transactions Open Thread 2012-06-02T04:31:32.387Z · score: 29 (30 votes)
Expertise and advice 2012-05-27T01:49:25.444Z · score: 17 (22 votes)
PSA: Learn to code 2012-05-25T18:50:01.407Z · score: 34 (39 votes)
Knowledge value = knowledge quality × domain importance 2012-04-16T08:40:57.158Z · score: 8 (13 votes)
Rationality anecdotes for the homepage? 2012-04-04T06:33:32.097Z · score: 3 (8 votes)
Simple but important ideas 2012-03-21T06:59:22.043Z · score: 20 (25 votes)
6 Tips for Productive Arguments 2012-03-18T21:02:32.326Z · score: 30 (45 votes)
Cult impressions of Less Wrong/Singularity Institute 2012-03-15T00:41:34.811Z · score: 34 (59 votes)
[Link, 2011] Team may be chosen to receive $1.4 billion to simulate human brain 2012-03-09T21:13:42.482Z · score: 8 (15 votes)
Productivity tips for those low on motivation 2012-03-06T02:41:20.861Z · score: 7 (12 votes)
The Singularity Institute has started publishing monthly progress reports 2012-03-05T08:19:31.160Z · score: 21 (24 votes)
Less Wrong mentoring thread 2011-12-29T00:10:58.774Z · score: 31 (34 votes)
Heuristics for Deciding What to Work On 2011-06-01T07:31:17.482Z · score: 20 (23 votes)
Upcoming meet-ups: Auckland, Bangalore, Houston, Toronto, Minneapolis, Ottawa, DC, North Carolina, BC... 2011-05-21T05:06:08.824Z · score: 5 (8 votes)

Comments

Comment by john_maxwell on Robin Hanson on the futurist focus on AI · 2019-11-16T02:23:31.743Z · score: 2 (1 votes) · LW · GW

Recent paper that might be relevant:

https://arxiv.org/abs/1911.01547

Comment by john_maxwell on What AI safety problems need solving for safe AI research assistants? · 2019-11-14T22:15:54.222Z · score: 2 (1 votes) · LW · GW

Those specific failure modes seem to me like potential convergent instrumental goals of arbitrarily capable systems that "want to affect the world" and are in an air-gapped computer.

My understanding is convergent instrumental goals are goals which are useful to agents which want to achieve a broad variety of utility functions over different states of matter. I'm not sure how the concept applies in other cases. Like, if we aren't using RL, and there is no unintended optimization, why specifically would there be pressure to achieve convergent instrumental goals? (I'm not trying to be rhetorical or antagonistic--I really want to hear if you can think of something.)

I'm interested in #1. It seems like the most promising route is to prevent unintended optimization from arising in the first place, instead of trying to outwit a system that's potentially smarter than we are.

Comment by john_maxwell on Open & Welcome Thread - November 2019 · 2019-11-13T02:27:22.288Z · score: 2 (1 votes) · LW · GW

I'm not sure I'm familiar with the word "mixture" in the way you're using it.

Comment by john_maxwell on What AI safety problems need solving for safe AI research assistants? · 2019-11-13T02:23:52.670Z · score: 2 (1 votes) · LW · GW

Do you have any thoughts on how specifically those failure modes might come about?

Comment by john_maxwell on What AI safety problems need solving for safe AI research assistants? · 2019-11-12T09:51:43.111Z · score: 2 (1 votes) · LW · GW

I agree well-calibrated uncertainties are quite valuable, but I'm not convinced they are essential for this sort of application. For example, if my assistant tells me a story about how my proposed FAI could fail, if my assistant is overconfident in its pessimism, then the worst case is that I spend a lot of time thinking about the failure mode without seeing how it could happen (not that bad). If my assistant is underconfident, and tells me a failure mode is 5% likely when it's really 95% likely, it still feels like my assistant is being overall helpful if the failure case is one I wasn't previously aware of. To put it another way, if my assistant isn't calibrated, it feels like I should just be able to ignore its probability estimates and get good use out if it.

but eventually we want to switch over to a more scalable approach that will use few of the same tools.

I actually think the advisor approach might be scaleable, if advisor_1 has been hand-verified, and advisor_1 verifies advisor_2, who verifies advisor_3, etc.

Comment by john_maxwell on What AI safety problems need solving for safe AI research assistants? · 2019-11-12T09:47:12.646Z · score: 2 (1 votes) · LW · GW

Are you referring to the possibility of unintended optimization, or is there something more?

Comment by john_maxwell on Notes on Running Objective · 2019-11-12T09:14:22.681Z · score: 2 (1 votes) · LW · GW

Hey Jeff, for whatever it's worth, I had some really bad RSI a few years ago and discovering painscience.com has allowed me to almost completely cure it. Happy to chat more about this if you'd like, feel free to reach out!

Comment by john_maxwell on Open & Welcome Thread - November 2019 · 2019-11-12T04:25:26.829Z · score: 2 (1 votes) · LW · GW

See that unbiased "prior-free" estimates must be mixtures of the (unbiased) estimates

I don't follow.

assembly of (higher-variance) estimates

What's an "assembly of estimates"?

treating the two as independent pieces of evidence

But they're distributions, not observations.

Comment by john_maxwell on Open & Welcome Thread - November 2019 · 2019-11-10T17:33:24.428Z · score: 2 (1 votes) · LW · GW

and that their choice of weights minimizes the error

The author has selected a weighted average such that if we treat that weighted average as a random variable, its standard deviation is minimized. But if we just want a random variable whose standard deviation is minimized, we could have a distribution which assigns 100% credence to the number 0 and be done with it. In other words, my question is whether the procedure in this post can be put on a firmer philosophical foundation. Or whether there is some alternate derivation/problem formulation (e.g. a mixture model) that gets us the same formula.

Another way of getting at the same idea: There are potentially other procedures one could use to create a "blended estimate", for example, you could find the point such that the product of the likelihoods of the two distributions is maximized, or take a weighted average of the two estimates using e.g. (1/sigma) as the weight of each estimate. Is there a justification for using this particular loss function, of finding a random variable constructed via weighted average whose variance is minimized? It seems to me that this procedure is a little weird because it's the random variable that corresponds to the person's age that we really care about. We should be looking "upstream" of the estimates, but instead we're going "downstream" (where up/down stream roughly correspond to the direction of arrows in a Bayesian network).

Comment by john_maxwell on [Team Update] Why we spent Q3 optimizing for karma · 2019-11-09T06:11:41.003Z · score: 4 (0 votes) · LW · GW

Cool project!

I suggest you make Q3 "growth quarter" every year, and always aim to achieve 1.5x the amount of growth you were able to achieve during last year's "growth quarter".

You could have an open thread soliciting growth ideas from the community right before each "growth quarter".

Comment by john_maxwell on Open & Welcome Thread - November 2019 · 2019-11-08T00:50:03.009Z · score: 9 (5 votes) · LW · GW

Woah, this seems like a big jump to a form of technocracy / paternalism that I would think would typically require more justification than spending a short amount of time brainstorming in a comment thread why the thing millions of people use daily is actually bad.

Under what circumstances do you feel introducing new policy ideas with the preface "maybe this could be a good idea" is acceptable?

I don't expect anyone important to be reading this thread, certainly not important policymakers. Even if they were, I think it was pretty clear I was spitballing.

Like, banning sites from offering free services if a character limit is involved because high status members of communities you like enjoy such sites

If society's elites are incentivized to use a platform which systematically causes misunderstandings and strife for no good reason, that seems bad.

Now, one counterargument would be "coordination problems" mean those writers would prefer to write somewhere else. But presumably if anyone's aware of "inadequate equilibria" like this and able to avoid it it would be Eliezer.

Let's not fall prey to the halo effect. Eliezer also wrote a long post about the necessity of back-and-forth debate, and he's using a platform which is uniquely bad at this. At some point, one starts to wonder whether Eliezer is a mortal human being who suffers from akrasia and biases just like the rest of us.

I agree it's not the best possible version of a platform by any means, but to say it's obviously net negative seems like a stretch without further evidence.

I didn't make much of an effort to assemble arguments that Twitter is bad. But I think there are good arguments out there. How do you feel about the nuclear diplomacy that's happened on Twitter?

Comment by john_maxwell on Open & Welcome Thread - November 2019 · 2019-11-07T05:22:25.913Z · score: 3 (2 votes) · LW · GW

Twitter's usefulness mostly comes from the celebrities being there. The initial reason the celebrities were attracted probably had to do with the char limit, its pretext, that they are not expected to read too much and that they are not expected to write too much.

Interesting. So if Twitter's celebrity appeal is about brevity, and brevity is a big part of why the platform is so toxic, maybe the best outcome is for it to just get regulated out of existence? Like, we could pass a law that prohibits character limits on social media sites or something.

Comment by john_maxwell on Open & Welcome Thread - November 2019 · 2019-11-06T15:52:42.870Z · score: 2 (1 votes) · LW · GW

Have any of these people said why they have made that choice?

Not that I've seen, but I have seen Paul Graham tweet complaints about Twitter, and I think I've seen Eliezer complain about the behavior of users on the platform.

Twitter makes tweet replies less prominent than Facebook. When I'm scrolling down Eliezer's Facebook feed, I see comment replies to his posts. When I scroll down his Twitter feed, I have to click a tweet to see replies. So that could be playing a role.

It doesn't seem socially expected to reply to replies to your tweets.

Comment by john_maxwell on Open & Welcome Thread - November 2019 · 2019-11-06T05:47:24.011Z · score: 3 (2 votes) · LW · GW

Can anyone think of a theoretical justification (Bayesian, Frequentist, whatever) for the procedure described in this blog post? I think this guy invented it for himself -- searching on Google for "blended estimate" just sends me to his post.

Comment by john_maxwell on Open & Welcome Thread - November 2019 · 2019-11-06T05:37:36.533Z · score: 34 (13 votes) · LW · GW

Eliezer Yudkowsky and Paul Graham have a lot in common. They're both well-known bloggers who write about rationality and are influential in Silicon Valley. They're both known for Bayesian stuff (Graham was a pioneer of Bayesian spam filtering). They both played a role in creating discussion sites which are, in my opinion, among the best on the internet (Less Wrong for Eliezer, Hacker News for Paul Graham). And they've both stopped posting to the sites they created, but they both still post to... Twitter, which is, in my opinion, one of the worst discussion sites on the internet. (Here is one of many illustrations.)

It seems like having so many celebrities, scientists, and politicians is a major asset for Twitter. What is it about Twitter which makes big names want to post there? How could a rival site attract big names without also importing Twitter's pathologies?

Comment by john_maxwell on The Dualist Predict-O-Matic ($100 prize) · 2019-10-27T22:41:04.973Z · score: 3 (2 votes) · LW · GW

I wouldn't argue that self-aware systems are automatically dangerous, but rather that self-unaware systems are automatically safe (or at least comparatively pretty safe).

Fair enough.

Most people in AI safety, most of the time, are talking about self-aware (in my minimal sense of taking purposeful actions etc.) agent-like systems. I don't think such systems are automatically dangerous, but they do necessitate solving the alignment problem, and since we haven't solved the alignment problem yet, I think it's worth spending time exploring alternative approaches.

I suspect the important part is the agent-like part.

I'm not sure it makes to think of "the alignment problem" as a singularity entity. I'd rather taboo "the alignment problem" and just ask what could go wrong with a self-aware system that's not agent-like.

A self-unaware system will not do that because it is not aware that it can do things to affect the universe.

Hot take: it might be useful to think of "self-awareness" and "awareness that it can do things to affect the universe" separately. Not sure they are one and the same.

Comment by john_maxwell on Prediction markets for internet points? · 2019-10-27T21:07:42.055Z · score: 3 (2 votes) · LW · GW

This reminds me of the ranking system Metaculus has. Maybe they could qualify for the bounty by adding a few features.

Comment by john_maxwell on The Dualist Predict-O-Matic ($100 prize) · 2019-10-26T15:57:02.605Z · score: 2 (1 votes) · LW · GW

I suspect we're using SGD in different ways, because everything we've talked about seems like it could be implemented with SGD. Do you agree that letting the Predict-O-Matic predict the future and rewarding it for being right, RL-style, would lead to it finding fixed points? Because you can definitely use SGD to do RL (first google result).

Fair enough, I was thinking about supervised learning.

Comment by john_maxwell on The Dualist Predict-O-Matic ($100 prize) · 2019-10-23T23:05:20.209Z · score: 2 (1 votes) · LW · GW

I think it depends on internal details of the Predict-O-Matic's prediction process. If it's still using SGD, SGD is not going to play the future forward to see the new feedback mechanism you've described and incorporate it into the loss function which is being minimized. However, it's conceivable that given a dataset about its own past predictions and how they turned out, the Predict-O-Matic might learn to make its predictions "more self-fulfilling" in order to minimize loss on that dataset?

Comment by john_maxwell on The Dualist Predict-O-Matic ($100 prize) · 2019-10-23T23:00:13.853Z · score: 2 (1 votes) · LW · GW

Studying the possibility of self-aware systems seems like a good idea, but I have a feeling most ways to achieve this will be brittle. My objective with this post was to get crisp stories for why self-aware predictive systems should be considered dangerous.

My reason is that such AIs will have a general capability to find underlying patterns, and thus will discover an analogy between its own thoughts and actions and those of others.

Let's taboo introspection for a minute. Suppose the AI does discover some underlying patters and analogize the piece of matter in which it is encased with the thoughts and actions of its human operator. Not only that, it finds analogies between other computers and its human operator, between its human operator and other computers, etc. Why precisely is this a problem?

Comment by john_maxwell on Deliberation as a method to find the "actual preferences" of humans · 2019-10-23T06:30:52.721Z · score: 3 (2 votes) · LW · GW

Speed. In AI takeoff scenarios where a bunch of different AIs are competing with each other, the deliberation process must produce some answer quickly or produce successive answers as time goes on (in order to figure out which resources are worth acquiring). On the other hand, in takeoff scenarios where the first successful project achieves a decisive strategic advantage, the deliberation can take its time.

I suspect a better way to think about this is the quality of the deliberation process as a function of time available for deliberation, but time available for deliberation might itself vary over time (pre- vs post- acquisition of decisive strategic advantage).

Comment by john_maxwell on The Dualist Predict-O-Matic ($100 prize) · 2019-10-22T23:33:08.867Z · score: 2 (1 votes) · LW · GW

it seems plausible that someone training the Predict-O-Matic like that would think they're doing supervised learning, while they're actually closer to RL.

How's that?

Comment by john_maxwell on Where to absolutely start? · 2019-10-22T04:57:39.152Z · score: 4 (3 votes) · LW · GW

Did you see the list of guides to the LW archives in the FAQ here? Maybe one of them is what you're looking for:

https://wiki.lesswrong.com/wiki/FAQ#How_can_I_go_about_reading_the_Less_Wrong_archives.3F

Comment by john_maxwell on The problem/solution matrix: Calculating the probability of AI safety "on the back of an envelope" · 2019-10-22T04:30:17.573Z · score: 3 (2 votes) · LW · GW

Anyone have a sense of what distinct set of AI safety problems we're faced with?

See section 4, "Problems with AGI", in this review for a list of lists.

However, I suspect the thing would work best in conjunction with a particular proposed FAI design where each column corresponds to a potential safety problem people are worried about with it.

Comment by john_maxwell on We tend to forget complicated things · 2019-10-21T00:28:40.289Z · score: 4 (3 votes) · LW · GW

I agree in general, but I think it can also have a general sense of what a particular topic is about, so if you come across a situation that calls for that topic, you'll know to read up on that topic and remind yourself.

Comment by john_maxwell on Mediums Overpower Messages · 2019-10-21T00:26:54.103Z · score: 13 (9 votes) · LW · GW

I find myself agreeing if you replace "makes me dumber/smarter" with "shortens/lengthens my attention span".

Comment by john_maxwell on The Dualist Predict-O-Matic ($100 prize) · 2019-10-20T07:17:44.585Z · score: 2 (1 votes) · LW · GW

No, but it does imply that it has the information about its own prediction process encoded in its weights such that there's no reason it would have to encode that information twice by also re-encoding it as part of its knowledge of the world as well.

OK, it sounds like we agree then? Like, the Predict-O-Matic might have an unusually easy time modeling itself in certain ways, but other than that, it doesn't get special treatment because it has no special awareness of itself as an entity?

Edit: Trying to provide an intuition pump for what I mean here--in order to avoid duplicating information, I might assume that something which looks like a stapler behaves the same way as other things I've seen which looks like staplers--but that doesn't mean I think all staplers are the same object. It might in some cases be sensible to notice that I keep seeing a stapler lying around and hypothesize that there's just one stapler which keeps getting moved around the office. But that requires that I perceive the stapler as an entity every time I see it, so entities which were previously separate in my head can be merged. Whereas arguendo, my prediction machinery isn't necessarily an entity that I recognize; it's more like the water I'm swimming in in some sense.

Comment by john_maxwell on The Dualist Predict-O-Matic ($100 prize) · 2019-10-19T21:26:36.934Z · score: 1 (2 votes) · LW · GW

I think maybe what you're getting at is that if we try to get a machine learning model to predict its own predictions (i.e. we give it a bunch of data which consists of labels that it made itself), it will do this very easily. Agreed. But that doesn't imply it's aware of "itself" as an entity. And in some cases the relevant aspect of its internals might not be available as a conceptual building block. For example, a model trained using stochastic gradient descent is not necessarily better at understanding or predicting a process which is very similar to stochastic gradient descent.

Furthermore, suppose that we take the weights for a particular model, mask some of those weights out, use them as the labels y, and try to predict them using the other weights in that layer as features x. The model will perform terribly on this because it's not the task that it was trained for. It doesn't magically have the "self-awareness" necessary to see what's going on.

In order to be crisp about what could happen, your explanation also has to account for what clearly won't happen.

BTW this thread also seems relevant: https://www.lesswrong.com/posts/RmPKdMqSr2xRwrqyE/the-dualist-predict-o-matic-usd100-prize#AvbnFiKpJxDqM8GYh

Comment by john_maxwell on The Dualist Predict-O-Matic ($100 prize) · 2019-10-19T05:07:17.257Z · score: 2 (1 votes) · LW · GW

But if we put a Predict-O-Matic in the real world, let it generate predictions, and then define the loss according to what happens afterwards, a non-dualistic Predict-O-Matic will be selected for over dualistic variants.

Yes, that sounds more like reinforcement learning. It is not the design I'm trying to point at in this post.

If you still disagree with that, what do you think would happen (in the limit of infinite training time) with an algorithm that just made a random change proportional to how wrong it was, at every training step?

That description sounds a lot like SGD. I think you'll need to be crisper for me to see what you're getting at.

Comment by john_maxwell on The Dualist Predict-O-Matic ($100 prize) · 2019-10-19T04:46:00.838Z · score: 1 (2 votes) · LW · GW

When you wrote

having an "ego" which identifies itself with its model of itself significantly reduces description length by not having to duplicate a bunch of information about its own decision-making process.

that suggested to me that there were 2 instances of this info about Predict-O-Matic's decision-making process in the dataset whose description length we're trying to minimize. "De-duplication" only makes sense if there's more than one. Why is there more than one?

We might not have good models of brains, but we do have very good models of ourselves, which is the actual analogy here. You don't have to have a good model of your brain to have a good model of yourself, and to identify that model of yourself with your own actions (i.e. the thing you called an "ego").

Sometimes people take psychedelic drugs/meditate and report an out of body experience, oneness with the universe, ego dissolution, etc. This suggests to me that ego is an evolved adaptation rather than a necessity for cognition. A clue is the fact that our ego extends to all parts of our body, even those which aren't necessary for computation (but are necessary for survival & reproduction)

there is a massive duplication of information between the part of the model that encodes its prediction machinery and the part that encodes its model of itself.

The prediction machinery is in code, but this code isn't part of the info whose description length is attempting to be minimized, unless we take special action to include it in that info. That's the point I was trying to make previously.

Compression has important similarities to prediction. In compression terms, your argument is essentially that if we use zip to compress its own source code, it will be able to compress its own source code using a very small number of bytes, because it "already knows about itself".

Comment by john_maxwell on The Dualist Predict-O-Matic ($100 prize) · 2019-10-18T05:42:46.665Z · score: 2 (1 votes) · LW · GW

So most animals don't seem very introspective. Machine learning algorithms haven't shown spontaneous capacity for introspection (so far, that I know of). But humans can introspect. Maybe a crux here is something along the lines of: Humans have capability for introspection. They're also smarter than animals. Maybe once our ML algorithms get good enough, the capacity for introspection will spontaneously arise.

People should be thinking about this possibility. But we also have ML algorithms which are in some ways superhuman and like I said I know no instances of spontaneous emergence of introspection. It seems like a reasonably likely possibility to me that "intelligence" of the sort needed for cross-domain superhuman prediction ability and spontaneous emergence of introspection are, in fact, orthogonal axes.

In terms of non-spontaneous emergence of introspection, that's basically meta-learning. I agree meta-learning is probably super important for the future of AI. In fact, come to think of it, I wonder if the reason why humans are both smart and introspective is because our brains evolved some additional meta-learning capabilities! And I agree that your idea of having some kind of firewall between introspection and object-level reality models could help prevent problems. I've spent a little while thinking concretely about how this could work.

(Hoping to run a different competition related to these issues at some time in the future. Was thinking it would be bigger and longer--please PM me if you want to help contribute to the prize pool.)

Comment by john_maxwell on The Dualist Predict-O-Matic ($100 prize) · 2019-10-18T05:26:48.936Z · score: 2 (1 votes) · LW · GW

it just tries to find a model that generates a lot of reward

SGD searches for a set of parameters which minimize a loss function. Selection, not control.

If the Predict-O-Matic has a model that makes bad prediction (i.e. looks bad), that model will be selected against.

Only if that info is included in the dataset that SGD is trying to minimize a loss function with respect to.

And if it accidentally stumbled upon a model that could correctly think about it's own behaviour in a non-dualist fashion, and find fixed points, that model would be selected for (since its predictions come true).

Suppose we're running SGD trying to find a model which minimizes the loss over a set of (situation, outcome) pairs. Suppose some of the situations are situations in which the Predict-O-Matic made a prediction, and that prediction turned out to be false. It's conceivable that SGD could learn that the Predict-O-Matic predicting something makes it less likely to happen and use that as a feature. However, this wouldn't be helpful because the Predict-O-Matic doesn't know what prediction it will make at test time. At best it could infer that some of its older predictions will probably end up being false and use that fact to inform the thing it's currently trying to predict.

If we only train it on data where it can't affect the data that it's evaluated against, and then freeze the model, I agree that it probably won't exhibit this kind of behaviour; is that the scenario that you're thinking about?

Not necessarily. The scenario I have in mind is the standard ML scenario where SGD is just trying to find some parameters which minimize a loss function which is supposed to approximate the predictive accuracy of those parameters. Then we use those parameters to make predictions. SGD isn't concerned with future hypothetical rounds of SGD on future hypothetical datasets. In some sense, it's not even concerned with predictive accuracy except insofar as training data happens to generalize to new data.

If you think including historical observations of a Predict-O-Matic (which happens to be 'oneself') making bad (or good) predictions in the Predict-O-Matic's training dataset will cause a catastrophe, that's within the range of scenarios I care about, so please do explain!

By the way, if anyone wants to understand the standard ML scenario more deeply, I recommend this class.

Comment by john_maxwell on The Dualist Predict-O-Matic ($100 prize) · 2019-10-18T05:02:52.990Z · score: 3 (3 votes) · LW · GW

In particular, I think we should expect ML to be biased towards simple functions such that if there's a simple and obvious compression, then you should expect ML to take it.

Yes, for the most part.

In particular, having an "ego" which identifies itself with its model of itself significantly reduces description length by not having to duplicate a bunch of information about its own decision-making process.

I think maybe you're pre-supposing what you're trying to show. Most of the time, when I train a machine learning model on some data, that data isn't data about the ML training algorithm or model itself. This info is usually not part of the dataset whose description length the system is attempting to minimize.

A machine learning model doesn't get understanding of or data about its code "for free", in the same way we don't get knowledge of how brains work "for free" despite the fact that we are brains. Humans get self-knowledge in basically the same way we get any other kind of knowledge--by making observations. We aren't expert neuroscientists from birth. Part of what I'm trying to indicate with the "dualist" term is that this Predict-O-Matic is the same way, i.e. its position with respect to itself is similar to the position of an aspiring neuroscientist with respect to their own brain.

Comment by john_maxwell on The Dualist Predict-O-Matic ($100 prize) · 2019-10-18T04:50:22.967Z · score: 4 (2 votes) · LW · GW

Intuitively, things go wrong if you get unexpected, unwanted, potentially catastrophic behavior. Basically, if it's something we'd want to fix before using this thing in production. I think most of your bullet points qualify, but if you give an example which falls under one of those bullet points, yet doesn't seem like it'd be much of a concern in practice (very little catastrophic potential), that might not get a prize.

In particular, inner misalignment seems like something you aren't including in your "going wrong"? (Since it seems like an easy answer to your challenge.)

Thanks for bringing that up. Yes, I am looking specifically for defeaters aimed in the general direction of the points I made in this post. Bringing up generic widely known safety concerns that many designs are potentially susceptible to does not qualify.

I note that the recursive-decomposition type system you describe is very different from most modern ML, and different from the "basically gradient descent" sort of thing I was imagining in the story. (We might naturally suppose that Predict-O-Matic has some "secret sauce" though.)

I think there's potentially an analogy with attention in the context of deep learning, but it's pretty loose.

It seems a bit like you might be equating the second option with "does not produce self-fulfilling prophecies", which I think would be a mistake.

Do you mean to say that a prophecy might happen to be self-fulfilling even if it wasn't optimized for being so? Or are you trying to distinguish between "explicit" and "implicit" searches for fixed points? Or are you trying to distinguish between fixed points and self-fulfilling prophecies somehow? (I thought they were basically the same thing.)

Comment by john_maxwell on Misconceptions about continuous takeoff · 2019-10-16T23:20:17.657Z · score: 2 (1 votes) · LW · GW

In a scenario where multiple AIs compete for power the AIs who makes fast decisions without checking back with humans have an advantage in the power competition and are going to get more power over time.

Agreed this is a risk, but I wouldn't call this an alignment roadblock.

Comment by john_maxwell on Misconceptions about continuous takeoff · 2019-10-14T04:13:16.190Z · score: 2 (1 votes) · LW · GW

Has the "alignment roadblock" scenario been argued for anywhere?

Like Lanrian, I think it sounds implausible. My intuition is that understanding human values is a hard problem, but taking over the world is a harder problem. For example, the AI which can talk its way out of a box probably has a very deep understanding of humans--a deeper understanding than most humans have of humans! In order to have such a deep understanding, it must have lower-level building blocks for making sense of the world which work extremely well, and could be used for a value learning system.

BTW, coincidentally, I quoted this same passage in a post I wrote recently which discussed this scenario (among others). Is there a particular subscenario of this I outlined which seems especially plausible to you?

Comment by john_maxwell on Debate on Instrumental Convergence between LeCun, Russell, Bengio, Zador, and More · 2019-10-06T16:06:47.884Z · score: 10 (3 votes) · LW · GW

I think part of what may be going on here is that the approach to AI that Yann advocates happens to be one that is unusually amenable to alignment. Some discussion here:

https://www.lesswrong.com/posts/EMZeJ7vpfeF4GrWwm/self-supervised-learning-and-agi-safety

Comment by john_maxwell on Debate on Instrumental Convergence between LeCun, Russell, Bengio, Zador, and More · 2019-10-06T05:34:32.380Z · score: 3 (2 votes) · LW · GW

Nor has anyone come up with a way to make AGI. Perhaps Yann's assumption is that how to do what he specifies will become more obvious as more about the nature of AGI is known. Maybe from Yann's perspective, trying to create safe AGI without knowing how AGI will work is like trying to design a nuclear reactor without knowing how nuclear physics works.

(Not saying I agree with this.)

Comment by john_maxwell on AI Alignment Open Thread August 2019 · 2019-10-04T01:20:03.150Z · score: 4 (2 votes) · LW · GW

Cool!

I guess another way of thinking about this is not a safety emphasis so much as a forecasting emphasis. Reminds me of our previous discussion here. If someone could invent new scientific institutions which reward accurate forecasts about scientific progress, that could be really helpful for knowing how AI will progress and building consensus regarding which approaches are safe/unsafe.

Comment by john_maxwell on Is Specificity a Mental Model? · 2019-09-29T22:52:19.347Z · score: 2 (1 votes) · LW · GW

Here's another mental models thing by some 80K people:

https://conceptually.org

Comment by john_maxwell on Kohli episode discussion in 80K's Christiano interview · 2019-09-29T22:39:56.630Z · score: 6 (3 votes) · LW · GW

I think it's fairly likely that the "devil is in the details" for AI safety. For example, some AI safety people like to talk about computer security, but I can't think of any grand theories which have been useful there. Even for cryptography, we don't have a proof that factoring large integers can't be done efficiently, but this problem's difficulty "in practice" still forms the basis of widely used cryptosystems. And the discipline devoted to studying how statistics can go wrong in practice seems to be called "applied statistics", not "theoretical statistics". So I think it's plausibly valuable to have people concerned with safety at the cutting edge of AI development, thinking in a proactive way about the details of how the algorithms work and how they might go wrong.

Comment by john_maxwell on One Way to Think About ML Transparency · 2019-09-29T15:00:37.182Z · score: 4 (2 votes) · LW · GW

This definition is not ideal, however. It misses a core element of what alignment researchers consider important in understanding machine learning models. In particular, in order for a model to be simulatable, it must also be at a human-level or lower. Otherwise, a human would not be able to go step by step through the decision procedure.

I don't think that's quite true. It's possible in theory that your training procedure discovers a radically simple way to model the data that the human wasn't aware of, and the human can easily step through this radically simple model once it's discovered. In which case the model could be characterized as both super-human and simulatable. Of course, deep learning doesn't work this way.

Comment by john_maxwell on Follow-Up to Petrov Day, 2019 · 2019-09-28T13:47:29.289Z · score: 14 (7 votes) · LW · GW

Please don't let a downvote or two discourage you. I appreciate your participation here, including these comments :)

Comment by john_maxwell on Honoring Petrov Day on LessWrong, in 2019 · 2019-09-27T03:04:58.195Z · score: 25 (6 votes) · LW · GW

Pro: It reinforces the norm of actually considering consequences, and not holding any value too sacred.

Not an expert here, but my impression was sometimes it can be useful to have "sacred values" in certain decision-theoretic contexts (like "I will one-box in Newcomb's Problem even if consequentialist reasoning says otherwise"?) If I had to choose a sacred value to adopt, cooperating in epistemic prisoners' dilemmas actually seems like a relatively good choice?

Comment by john_maxwell on The Zettelkasten Method · 2019-09-22T16:47:49.780Z · score: 9 (5 votes) · LW · GW

One way to think about a notebook vs Anki/Mnemosyne is that Anki/Mnemosyne offers faster reads at the expense of slower writes.

if, over your lifetime, you will spend more than 5 minutes looking something up or will lose more than 5 minutes as a result of not knowing something, then it’s worthwhile to memorize it with spaced repetition. 5 minutes is the line that divides trivia from useful data.

Source.

In other words, with Anki/Mnemosyne, you have to spend ~5 minutes of additional effort writing the info to your brain (relative to just writing it in a notebook). But once it's written to your brain, presumably it will be a bit faster to recall than if it's in your notebook.

I'm a bit of an Anki/Mnemosyne skeptic for the following reasons:

  • I think it's pretty rare that you will actually spend more than 5 minutes looking something up. Looking up e.g. a formula on Google is going to take on the order of 10 seconds. How many formulas are you going to realistically look up more than 30 times over the course of your life?

    • Remember, if you find yourself using a formula that often, you'll plausibly find yourself accidentally memorizing it anyway! To help with this, you could always challenge yourself to recall things from memory before Googling for them. Sort of a "just in time"/opportunistic approach to building useful long-term memories.
  • I'm not totally convinced that it actually offers a substantial speedup for reads. Suppose I've "memorized" some formula using Anki. If I haven't actually seen the card recently, it could easily take several seconds, perhaps even 10+ seconds, for me to recall it from memory.

  • Even if you think you recall a formula, if it's an actually important application, you'll likely want to look it up anyway to be sure.

  • Anki/Mnemosyne seem bad for knowledge which changes, such as research ideas.

If Anki/Mnemosyne have value, I think it is probably in constructing better mental chunks. It's not about the cost of looking something up, it's about the context switch when you're trying to use that idea as a subcomponent of a larger mental operation.

You could also argue that the value in Anki/Mnemosyne comes from knowing that there is something there to look up, as opposed to not having to look it up. However, a good notebook structure can mitigate that problem (whenever you learn some interesting info, add it to the pages associated with whichever future situations it could be useful in, so you won't have to remember to look it up when you're in that situation). Additionally, I think Anki/Mnemosyne may be overkill for just knowing that something exists. (Though deeper understanding could be good for noticing deeper conceptual isomorphisms.)

Personally, I prefer to refine my mental chunks through doing problems, or just directly using them for the purpose I acquired them for (just-in-time learning), rather than reciting info. I think this builds a deeper level of understanding and helps you see how concepts are related to each other in a way which is harder to do than with Anki. I'm a big believer in structuring one's learning and thinking process to incidentally incorporate an element of spaced repetition the way Dan Sheffler describes.

Comment by john_maxwell on [Site Feature] Link Previews · 2019-09-18T19:53:40.563Z · score: 2 (1 votes) · LW · GW

Nice! I know people have complained about jargon use on LW in the past. Have you thought about an option new users could activate which autolinks jargon to the corresponding wiki entry/archive post?

Comment by john_maxwell on Meetups as Institutions for Intellectual Progress · 2019-09-17T23:55:00.358Z · score: 4 (3 votes) · LW · GW

Exciting stuff!

On the other hand, many of the times that I've previously tried to publicly take action in this space, I got shot down pretty harshly. So I'd love to hear why you think my proposals are naïve / misguided / going to pollute the commons / going to crash and burn before I launch anything this time. Concrete critiques and proposals of alternatives would be greatly appreciated.

I think it's unfortunate that you were shot down this way. I think caution of this sort would be well-justified if we were in the business of operating a nuclear reactor or something like that. But as things are, I expect that even if one of your meetup experiments failed, it would give us useful data.

Maybe it's not that people are against trying new things, it's just that those who disagree are more likely to comment than those who agree.

One activity which I think could be fun, useful, and a good fit for the meetup format is brainstorming. You could have one or several brainstorming prompts every month (example prompt: "How can Moloch be defeated?") and ask meetups to brainstorm based on those prompts and send you their ideas, and then you could assemble those into a global masterlist which credits the person who originated each idea (a bit like the Junto would gather the best ideas from each subgroup, I think). You could go around to various EA organizations and ask them for prompt ideas, for topics that EA organization wants more ideas on. For example, maybe Will MacAskill would request ideas for what Cause X might be. Maybe Habryka would ask for feature ideas for LW. You could offer brainstorming services publicly--maybe Mark Zuckerberg would ask for ideas on how to improve Facebook (secretly, through Julia Galef). You could have a brainstorming session for brainstorming prompts. You could suggest brainstorming protocols or give people a video to play or have a brainstorming session for brainstorming protocols (recursive self-improvement FTW).

Comment by john_maxwell on Distance Functions are Hard · 2019-09-15T17:59:21.824Z · score: 2 (1 votes) · LW · GW

^ I don't see how?

No human labor: Just compute the function. Fast experiment loop: Computers are faster than humans. Reproducible: Share the code for your function with others.

I'm talking about interactive training

I think for a sufficiently advanced AI system, assuming it's well put together, active learning can beat this sort of interactive training--the AI will be better at the task of identifying & fixing potential weaknesses in its models than humans.

Adversarial examples suggest we should be worried that apparently similar concepts will actually be wildly different in non-obvious ways.

I think the problem with adversarial examples is that deep neural nets don't have the right inductive biases. I expect meta-learning approaches which identify & acquire new inductive biases (in order to determine "how to think" about a particular domain) will solve this problem and will also be necessary for AGI anyway.

BTW, different human brains appear to learn different representations (previous discussion), and yet we are capable of delegating tasks to each other.

I'm cautiously optimistic, since this could make things a lot easier.

Huh?

My problem with that argument is that it seems like we will have so many chances to fuck up that we would need 1) AI systems to be extremely reliable, or 2) for catastrophic mistakes to be rare, and minor mistakes to be transient or detectable. (2) seems plausible to me in many applications, but probably not all of the applications where people will want to use SOTA AI.

Maybe. But my intuition is that if you can create a superintelligent system, you can make one which is "superhumanly reliable" even in domains which are novel to it. I think the core problems for reliable AI are very similar to the core problems for AI in general. An example is the fact that solving adversarial examples and improving classification accuracy seem intimately related.

I think algorithms the significant features of RL here are: 1) having the goal of understanding the world and how to influence it, and 2) doing (possibly implicit) planning.

In what sense does RL try to understand the world? It seems very much not focused on that. You essentially have to hand it a reasonably accurate simulation of the world (i.e. a world that is already fully understood, in the sense that we have a great model for it) for it to do anything interesting.

If the planning is only "implicit", RL sounds like overkill and probably not a great fit. RL seems relatively good at long sequences of actions for a stateful system we have a great model of. If most of the value can be obtained by planning 1 step in advance, RL seems like a solution to a problem you don't have. It is likely to make your system less safe, since planning many steps in advance could let it plot some kind of treacherous turn. But I also don't think you will gain much through using it. So luckily, I don't think there is a big capabilities vs safety tradeoff here.

I think having general knowledge will be very valuable, and hard to replicate with a network of narrow systems.

Agreed. But general knowledge is also not RL, and is handled much more naturally in other frameworks such as transfer learning, IMO.

So basically I think daemons/inner optimizers/whatever you want to call them are going to be the main safety problem.

Comment by john_maxwell on Distance Functions are Hard · 2019-09-14T02:26:57.918Z · score: 2 (1 votes) · LW · GW

They're a pain because they involve a lot of human labor, slow down the experiment loop, make reproducing results harder, etc.

I see. How about doing active learning of computable functions? That solves all 3 problems.

Instead of standard benchmarks, you could offer an API which provides an oracle for some secret functions to be learned. You could run a competition every X months and give each competition entrant a budget of Y API calls over the course of the competition.

RE self-supervised learning: I don't see why we needed the rebranding (of unsupervised learning).

Well I don't see why neural networks needed to be rebranded as "deep learning" either :-)

When I talk about "self-supervised learning", I refer to chopping up your training set into automatically created supervised learning problems (predictive processing), which feels different from clustering/dimensionality reduction. It seems like a promising approach regardless of what you call it.

I don't see why it would make alignment straightforward (ETA: except to the extent that you aren't necessarily, deliberately building something agenty).

In order to make accurate predictions about reality, you need to understand humans, because humans exist in reality. So at the very least, a superintelligent self-supervised learning system trained on loads of human data would have a lot of conceptual building blocks (developed in order to make predictions about its training data) which could be tweaked and combined to make predictions about human values (analogous to fine-tuning in the context of transfer learning). But I suspect fine-tuning might not even be necessary. Just ask it what Gandhi would do or something like that.

Re: gwern's article, RL does not seem to me like a good fit for most of the problems he describes. I agree active learning/interactive training protocols are powerful, but that's not the same as RL.

Autonomy is also nice (and also not the same as RL). I think the solution for autonomy is (1) solve calibration/distributional shift, so the system knows when it's safe to act autonomously (2) have the system adjust its own level of autonomy/need for clarification dynamically depending on the apparent urgency of its circumstances. I have notes for a post about (2), let me know if you think I should prioritize writing it.

Comment by john_maxwell on Contest: $1,000 for good questions to ask to an Oracle AI · 2019-09-13T01:21:16.186Z · score: 2 (1 votes) · LW · GW

I know the contest is over, but this idea for a low-bandwidth oracle might be useful anyhow: Given a purported FAI design, what is the most serious flaw? Then highlight lines from the FAI design description, plus given a huge corpus of computer science papers, LW/AF posts, etc. highlight relevant paragraphs from those as well (perhaps using some kind of constraint like "3 or fewer paragraphs highlighted in their entirety") that, taken together, come closest to pinpointing the issue. We could even give it a categorization scheme for safety problems we came up with, and it could tell us which category this particular problem comes closest to falling under. Or offer it categories a particular hint could fall under to choose from, such as "this is just an analogy", "keep thinking along these lines", etc. Then do the same and ask it to highlight text which leads to a promising solution. The rationale being that unforseen difficulties are the hardest part of alignment, but if there's a flaw, it will probably be somehow analogous to a problem we've seen in the past, or will be addressable using methods which have worked in the past, or something. But it's hard to fit "everything we've seen in the past" into one human head.