The Goodhart Game 2019-11-18T23:22:13.091Z · score: 12 (8 votes)
Self-Fulfilling Prophecies Aren't Always About Self-Awareness 2019-11-18T23:11:09.410Z · score: 15 (7 votes)
What AI safety problems need solving for safe AI research assistants? 2019-11-05T02:09:17.686Z · score: 15 (4 votes)
The problem/solution matrix: Calculating the probability of AI safety "on the back of an envelope" 2019-10-20T08:03:23.934Z · score: 24 (8 votes)
The Dualist Predict-O-Matic ($100 prize) 2019-10-17T06:45:46.085Z · score: 17 (6 votes)
Replace judges with Keynesian beauty contests? 2019-10-07T04:00:37.906Z · score: 31 (10 votes)
Three Stories for How AGI Comes Before FAI 2019-09-17T23:26:44.150Z · score: 28 (9 votes)
How to Make Billions of Dollars Reducing Loneliness 2019-08-30T17:30:50.006Z · score: 60 (27 votes)
Response to Glen Weyl on Technocracy and the Rationalist Community 2019-08-22T23:14:58.690Z · score: 52 (26 votes)
Proposed algorithm to fight anchoring bias 2019-08-03T04:07:41.484Z · score: 10 (2 votes)
Raleigh SSC/LW/EA Meetup - Meet MealSquares People 2019-05-08T00:01:36.639Z · score: 12 (3 votes)
The Case for a Bigger Audience 2019-02-09T07:22:07.357Z · score: 69 (27 votes)
Why don't people use formal methods? 2019-01-22T09:39:46.721Z · score: 21 (8 votes)
General and Surprising 2017-09-15T06:33:19.797Z · score: 3 (3 votes)
Heuristics for textbook selection 2017-09-06T04:17:01.783Z · score: 8 (8 votes)
Revitalizing Less Wrong seems like a lost purpose, but here are some other ideas 2016-06-12T07:38:58.557Z · score: 24 (29 votes)
Zooming your mind in and out 2015-07-06T12:30:58.509Z · score: 8 (9 votes)
Purchasing research effectively open thread 2015-01-21T12:24:22.951Z · score: 12 (13 votes)
Productivity thoughts from Matt Fallshaw 2014-08-21T05:05:11.156Z · score: 13 (14 votes)
Managing one's memory effectively 2014-06-06T17:39:10.077Z · score: 14 (15 votes)
OpenWorm and differential technological development 2014-05-19T04:47:00.042Z · score: 9 (8 votes)
System Administrator Appreciation Day - Thanks Trike! 2013-07-26T17:57:52.410Z · score: 70 (71 votes)
Existential risks open thread 2013-03-31T00:52:46.589Z · score: 10 (11 votes)
Why AI may not foom 2013-03-24T08:11:55.006Z · score: 23 (35 votes)
[Links] Brain mapping/emulation news 2013-02-21T08:17:27.931Z · score: 2 (7 votes)
Akrasia survey data analysis 2012-12-08T03:53:35.658Z · score: 13 (14 votes)
Akrasia hack survey 2012-11-30T01:09:46.757Z · score: 11 (14 votes)
Thoughts on designing policies for oneself 2012-11-28T01:27:36.337Z · score: 80 (80 votes)
Room for more funding at the Future of Humanity Institute 2012-11-16T20:45:18.580Z · score: 18 (21 votes)
Empirical claims, preference claims, and attitude claims 2012-11-15T19:41:02.955Z · score: 5 (28 votes)
Economy gossip open thread 2012-10-28T04:10:03.596Z · score: 26 (31 votes)
Passive income for dummies 2012-10-27T07:25:33.383Z · score: 17 (22 votes)
Morale management for entrepreneurs 2012-09-30T05:35:05.221Z · score: 9 (14 votes)
Could evolution have selected for moral realism? 2012-09-27T04:25:52.580Z · score: 4 (14 votes)
Personal information management 2012-09-11T11:40:53.747Z · score: 18 (19 votes)
Proposed rewrites of LW home page, about page, and FAQ 2012-08-17T22:41:57.843Z · score: 18 (19 votes)
[Link] Holistic learning ebook 2012-08-03T00:29:54.003Z · score: 10 (17 votes)
Brainstorming additional AI risk reduction ideas 2012-06-14T07:55:41.377Z · score: 12 (15 votes)
Marketplace Transactions Open Thread 2012-06-02T04:31:32.387Z · score: 29 (30 votes)
Expertise and advice 2012-05-27T01:49:25.444Z · score: 17 (22 votes)
PSA: Learn to code 2012-05-25T18:50:01.407Z · score: 34 (39 votes)
Knowledge value = knowledge quality × domain importance 2012-04-16T08:40:57.158Z · score: 8 (13 votes)
Rationality anecdotes for the homepage? 2012-04-04T06:33:32.097Z · score: 3 (8 votes)
Simple but important ideas 2012-03-21T06:59:22.043Z · score: 20 (25 votes)
6 Tips for Productive Arguments 2012-03-18T21:02:32.326Z · score: 30 (45 votes)
Cult impressions of Less Wrong/Singularity Institute 2012-03-15T00:41:34.811Z · score: 34 (59 votes)
[Link, 2011] Team may be chosen to receive $1.4 billion to simulate human brain 2012-03-09T21:13:42.482Z · score: 8 (15 votes)
Productivity tips for those low on motivation 2012-03-06T02:41:20.861Z · score: 7 (12 votes)
The Singularity Institute has started publishing monthly progress reports 2012-03-05T08:19:31.160Z · score: 21 (24 votes)
Less Wrong mentoring thread 2011-12-29T00:10:58.774Z · score: 31 (34 votes)


Comment by john_maxwell on Realism about rationality · 2020-01-14T08:07:25.524Z · score: 2 (1 votes) · LW · GW

In my experience, if there are several concepts that seem similar, understanding how they relate to one another usually helps with clarity rather than hurting.

Comment by john_maxwell on Realism about rationality · 2020-01-11T03:45:50.771Z · score: 4 (4 votes) · LW · GW

It seems to me like my position, and the MIRI-cluster position, is (1) closer to "rationality is like fitness" than "rationality is like momentum"

Eliezer is a fan of law thinking, right? Doesn't the law thinker position imply that intelligence can be characterized in a "lawful" way like momentum?

Whereas the non-MIRI cluster is saying "biologists don't need to know about evolution."

As a non-MIRI cluster person, I think deconfusion is valuable (insofar as we're confused), but I'm skeptical of MIRI because they seem more confused than average to me.

Comment by john_maxwell on Self-Supervised Learning and AGI Safety · 2020-01-01T23:43:13.872Z · score: 3 (2 votes) · LW · GW

The term "self-supervised learning" (replacing the previous and more general term "unsupervised learning")

BTW, the way I've been thinking about it, "self-supervised learning" represents a particular way to achieve "unsupervised learning"--not sure what use is standard.

Comment by john_maxwell on 2020's Prediction Thread · 2020-01-01T23:38:25.093Z · score: 4 (2 votes) · LW · GW

I guess Paypal, Amazon Pay, etc. could also qualify--they allow me to make purchases without giving a merchant access to my credit card number.

Comment by john_maxwell on 2020's Prediction Thread · 2020-01-01T09:19:17.407Z · score: 3 (2 votes) · LW · GW

As of 1/1/30, customers will not make purchases by giving each merchant full access to a non-transaction-specific numeric string (i.e. credit cards as they are today): 70%

This seems like the kind of bold prediction which failed last time around. Maybe you can make it more specific and say what fraction of online transactions will be processed using something which looks unlike the current credit card setup?

Comment by john_maxwell on Tabooing 'Agent' for Prosaic Alignment · 2020-01-01T07:55:16.398Z · score: 2 (1 votes) · LW · GW

I think the world where H is true is a good world, because it's a world where we are much closer to understanding and predicting how sophisticated models generalize.

This seemed liked a really surprising sentence to me. If the model is an agent, doesn't that pull in all the classic concerns related to treacherous turns and so on? Whereas a non-agent probably won't have an incentive to deceive you?

Even if the model is an agent, then you still need to be able to understand its goals based on their internal representation. Which could mean, for example, understanding what a deep neural network was doing. Which doesn't appear to be much easier than the original task of "understand what a model, for example a deep neural network, is doing".

Comment by John_Maxwell_IV on [deleted post] 2020-01-01T06:00:32.911Z

Some quick thoughts:

In Beware of black boxes in AI alignment research, cousin_it wrote:

Unfortunately, ideas like adversarial examples, treacherous turns or nonsentient uploads show that we shouldn't bet our future on something that imitates a particular black box, even if the imitation passes many tests. We need to understand how the black box works inside, to make sure our version's behavior is not only similar but based on the right reasons.

Which of these (adversarial examples, treacherous turns or nonsentient uploads) would be the most compelling for a mainstream machine learning researcher?

I gave some reasons here for why the adversarial examples objection might not be super compelling. Though it could be a useful for imparting a general sense of "we don't actually know how deep learning works".

I don't think treacherous turns are a great response, because a treacherous turn scenario assumes an agent style AI with the wrong value function, and the question is why we should expect "just use ML to learn ethics" to produce the wrong value function in the first place.

In addition to being weird, it's not clear to me how much of a practical concern nonsentient uploads are; presumably it's possible to create an AI system that commits a pivotal act without any large-scale ancestor simulations.

You might cite work on the impossibility of extracting the human value function using inverse reinforcement learning, but inverse reinforcement learning isn't the only method available.

You might cite the recent mesa-optimization paper, but that paper barely mentions supervised learning, which seems like the most natural way to "just use ML to learn ethics".

You might argue that doing a good job of learning human values is just too difficult, and you'll learn an imperfect approximation of human values that will be vulnerable to Goodhart's law. However, Superintelligence mentions that each human brain develops its own idiosyncratic representations of higher-level content, yet that doesn't seem to present an insurmountable problem to us learning one anothers' values. In any case, this answer might not be the most desirable because it makes AI safety seem like a capabilities problem: "OK, so what you're saying is that we need a model which is highly accurate on a human ethics dataset. Well, accuracy on ImageNet is increasing every year, so we are making progress."

So I don't know if there's a good snappy comeback to "just use ML to learn ethics", but I'd love to hear it if it's out there.

Comment by john_maxwell on 2019 AI Alignment Literature Review and Charity Comparison · 2020-01-01T05:18:23.335Z · score: 4 (2 votes) · LW · GW

There's also this post and others by the same author.

Comment by john_maxwell on 2019 AI Alignment Literature Review and Charity Comparison · 2020-01-01T05:17:13.757Z · score: 5 (2 votes) · LW · GW

just use ML to learn ethics

Can you explain more about why you think that these papers are low quality? Is it just a matter of lack of originality? Personally, I think this is a perspective that can be steelmanned pretty effectively, and gets unfairly disregarded because it's too simple or something like that. I think it's worth engaging with this perspective in depth because (a) I think it's pretty likely there's a solution to friendliness there and (b) even if there isn't, a very clear explanation of why (which anticipates as many counterarguments as possible) could be a very useful recruiting tool.

Comment by john_maxwell on How’s that Epistemic Spot Check Project Coming? · 2019-12-29T06:27:10.984Z · score: 4 (2 votes) · LW · GW

A "spot check" of a few of a book's claims is supposed to a proxy for the accuracy of the rest of the claims, right?

Of course there are issues to work through. For example, you'd probably want to have a training set and a test set like people always do in machine learning to see if it's just "what sticks" or whether you've actually found a signal that generalizes. And if you published your reasoning then people might game whatever indicators you discovered. (Should still work for older books though.) You might also find that most of the variability in accuracy is per-book rather than per-author or anything like that. (Alternatively, you might find that a book's accuracy can be predicted better based on external characteristics than doing a few spot checks, if individual spot checks are comparatively noisy.) But the potential upside is much larger because it could help you save time deciding what to read on any subject.

Anyway, just an idea.

Comment by john_maxwell on A dilemma for prosaic AI alignment · 2019-12-20T08:42:19.151Z · score: 2 (1 votes) · LW · GW

I am not sure what to think of the lack of commercial applications of RL, but I don't think it is strong evidence either way, since commercial applications involve competing with human and animal agents and RL hasn't gotten us anything as good as human or animal agents yet.

Supervised learning has lots of commercial applications, including cases where it competes with humans. The fact that RL doesn't suggests to me that if you can apply both to a problem, RL is probably an inferior approach.

Another way to think about it: If superhuman performance is easier with supervised learning than RL, that gives us some evidence about the relative strengths of each approach.

Agent-like architectures are simple yet powerful ways of achieving arbitrary things, because for almost any thing you wish achieved, you can insert it into the "goal" slot of the architecture and then let it loose, and it'll make good progress even in a very complex environment. (I'm comparing agent-like architectures to e.g. big lists of heuristics, or decision trees, or look-up tables, all of which have complexity that increases really fast as the environment becomes more complex. Maybe there is some other really powerful yet simple architecture I'm overlooking?)

I'm not exactly sure what you mean by "architecture" here, but maybe "simulation", or "computer program", or "selection" (as opposed to control) could satisfy your criteria? IMO, attaining understanding and having ideas aren't tasks that require an agent architecture -- it doesn't seem most AI applications in these categories make use of agent architectures -- and if we could do those things safely, we could make AI research assistants which make remaining AI safety problems easier.

Aren't the 3.5 bullet points above specific examples of how 'predict the next word in this text' could benefit from -- in the sense of produce, when used as training signal

I do think these are two separate questions. Benefit from = if you take measures to avoid agentlike computation, that creates a significant competitiveness penalty above and beyond whatever computation is necessary to implement your measures (say, >20% performance penalty). Produce when used as a training signal = it could happen by accident, but if that accident fails to happen, there's not necessarily a loss of competitiveness. An example would be bullet point 2, which is an accident that I suspect would harm competitiveness. Bullet points 3 and 3.5 are also examples of unintended agency, not answers to the question of why text prediction benefits from an agent architecture. (Note: If you don't mind, let's standardize on using "agent architecture" to only refer to programs which are doing agenty things at the toplevel, so bullet points 2, 3, and 3.5 wouldn't qualify--maybe they are agent-like computation, but they aren't descriptions of agent-like software architectures. For example, in bullet point 2 the selection process that leads to the agent might be considered part of the architecture, but the agent which arose out of the selection process probably wouldn't.)

How would you surmount bullet point 3?

Hopefully I'll get around to writing a post about that at some point, but right now I'm focused on generating as many concrete plausible scenarios around accidentally agency as possible, because I think not identifying a scenario and having things blow up in an unforseen way is a bigger risk than having all safety measures fail on a scenario that's already been anticipated. So please let me know if you have any new concrete plausible scenarios!

In any case, note that issues with the universal prior seem to be a bit orthogonal to the agency vs unsupervised discussion -- you can imagine agent architectures that make use of it, and non-agent architectures that don't.

Comment by john_maxwell on How’s that Epistemic Spot Check Project Coming? · 2019-12-20T06:55:07.709Z · score: 3 (4 votes) · LW · GW

I see, interesting.

Here's another crazy idea. Instead of trying to measure the reliability of specific books, try to figure out what predicts whether a book is reliable. You could do a single spot check for a lot of different books and then figure out what predicts the output of the spot check: whether the author has a PhD/tenure/what their h-index is, company that published the book, editor, length, citation density, quality of sources cited (e.g. # citations/journal prestige of typical paper citation), publication date, # authors, sales rank, amount of time the author spent on the book/how busy they seemed with other things during that time period, use of a ghostwriter, etc. You could code all those features and feed them into a logistic regression and see which were most predictive.

Comment by john_maxwell on A dilemma for prosaic AI alignment · 2019-12-19T11:48:26.123Z · score: 3 (2 votes) · LW · GW

I think there is a bit of a motte and bailey structure to our conversation. In your post above, you wrote: "to be competitive prosaic AI safety schemes must deliberately create misaligned mesa-optimizers" (emphasis mine). And now in bullet point 2, we have (paraphrase) "maybe if you had a really weird/broken training scheme where it's possible to sabotage rival subnetworks, agenty things get selected for somehow [probably in a way that makes the system as a whole less competitive]". I realize this is a bit of a caricature, and I don't mean to call you out or anything, but this is a pattern I've seen in AI safety discussions and it seemed worth flagging.

Anyway, I think there is a discussion worth having here because most people in AI safety seem to assume RL is the thing, and RL has an agent style architecture, which seems like a pretty strong inductive bias towards mesa-optimizers. Non-RL stuff seem like a relatively unknown quantity where mesa-optimizers are concerned, and thus worth investigating, and additionally, even RL will plausibly have non-RL stuff as a subcomponent of its cognition, so still useful to know how to do non-RL stuff in a mesa-optimizer free way (so the RL agent doesn't get pwned by its own cognition).

Agent-like architectures are simple yet powerful ways of achieving arbitrary things

Why do you think that's true? I think the lack of commercial applications of reinforcement learning is evidence against this. From my perspective, RL has been a huge fad and people have been trying to shoehorn it everywhere, yet they're coming up empty handed.

Can you get more specific about how "predict the next word in this text" could benefit from an agent architecture? (Or even better, can you support your original strong claim and explain how the only way to achieve predictive performance on "predict the next word in this text" is through deliberate creation of a misaligned mesa-optimizer?)

Bullet point 3 is one of the more plausible things I've heard -- but it seems fairly surmountable.

Comment by john_maxwell on Inductive biases stick around · 2019-12-19T11:30:51.932Z · score: 2 (1 votes) · LW · GW

I think we would benefit from tabooing the word "simple". It seems to me that when people use the word "simple" in the context of ML, they are usually referring to either smoothness/Lipschitzness or minimum description length. But it's easy to see that these metrics don't always coincide. A random walk is smooth, but its minimum description length is long. A tall square wave is not smooth, but its description length is short. L2 regularization makes a model smoother without reducing its description length. Quantization reduces a model's description length without making it smoother. I'm actually not aware of any argument that smoothness and description length are or should be related--it seems like this might be an unexamined premise.

Based on your paper, the argument for mesa-optimizers seems to be about description length. But if SGD's inductive biases target smoothness, it's not clear why we should expect SGD to discover mesa-optimizers. Perhaps you think smooth functions tend to be more compressible than functions which aren't smooth. I don't think that's enough. Imagine a Venn diagram where compressible functions are a big circle. Mesa-optimizers are a subset, and the compressible functions discovered by SGD are another subset. The question is whether these two subsets are overlapping. Pointing out that they're both compressible is not a strong argument for overlap: "all cats are mammals, and all dogs are mammals, so therefore if you see a cat, it's also likely to be a dog".

When I read your paper, I get a sense that an optimizers outperform by allowing one to collapse a lot of redundant functionality into a single general method. It seems like maybe it's the act of compression that gets you an agent, not the property of being compressible. If our model is a smooth function which could in principle be compressed using a single general method, I'm not seeing why the reapplication of that general method in a very novel context is something we should expect to happen.

BTW I actually do think minimum description length is something we'll have to contend with long term. It's just too useful as an inductive bias. (Eliminating redundancies in your cognition seems like a basic thing an AGI will need to do to stay competitive.) But I'm unconvinced SGD possesses the minimum description length inductive bias. Especially if e.g. the flat minima story is the one that's true (as opposed to e.g. the lottery ticket story).

Also, I'm less confident that what I wrote above applies to RNNs.

Comment by john_maxwell on A dilemma for prosaic AI alignment · 2019-12-18T19:20:27.081Z · score: 2 (1 votes) · LW · GW

Or is the idea that mere unsupervised learning wouldn't result in an agent-like architecture, and therefore we don't need to worry about mesa-optimizers?

Pretty much.

That might be true, but if so it's news to me.

In my opinion the question is very under-explored, curious if you have any thoughts.

Comment by john_maxwell on How’s that Epistemic Spot Check Project Coming? · 2019-12-18T01:08:05.210Z · score: 3 (2 votes) · LW · GW

How are you deciding which books to do spot checks for? My instinct is to suggest finding some overarching question which seems important to investigate, so your project does double duty exploring epistemic spot checks and answering a question which will materially impact the actions of you / people you're capable of influencing, but you're a better judge of whether that's a good idea of course.

Comment by john_maxwell on A dilemma for prosaic AI alignment · 2019-12-18T00:40:27.808Z · score: 4 (2 votes) · LW · GW

It sounds like your notion of "prosaic" assumes something related to agency/reinforcement learning, but I believe several top AI people think what we'll need for AGI is progress in unsupervised learning -- not sure if that counts as "prosaic". (FWIW, this position seems obviously correct to me.)

Comment by john_maxwell on Optimization Amplifies · 2019-12-17T00:20:29.574Z · score: 4 (2 votes) · LW · GW

I don't like the intro to the post. I feel like the example Scott gives makes the opposite of the point he wants it to make. Either a number with the given property exists or not. If such a number doesn't exist, creating a superintelligence won't change that fact. Given talk I've heard around the near certainty of AI doom, betting the human race on the nonexistence of a number like this looks pretty attractive by comparison -- and it's plausible there are AI alignment bets we could make that are analogous to this bet.

Comment by john_maxwell on Approval Extraction Advertised as Production · 2019-12-16T23:25:25.012Z · score: 23 (12 votes) · LW · GW

Cynical view of cynics: Some people are predisposed to be cynical about everything, and their cynicism doesn't provide much signal about the world. Any data can be interpreted to fit a cynical hypothesis. What distinguishes cynics is they are rarely interested in exploring whether a non-cynical hypothesis might be the right one.

I'm too cynical to do a point by point response to this post, but I will quickly say: I have a fair amount of startup experience, and I've basically decided I am bad at startups because I lack the decisiveness characteristic Sam says is essential. The stuff Sam writes about startups rings very true for me, and I think I'd be a better startup founder if I had the disposition he describes, but unfortunately I don't think disposition is a very easy thing to change.

Comment by john_maxwell on What I talk about when I talk about AI x-risk: 3 core claims I want machine learning researchers to address. · 2019-12-13T06:53:29.891Z · score: 2 (1 votes) · LW · GW

A key sticking point seems to be the lack of a highly plausible concrete scenario.

IMO coming up with highly plausible concrete scenarios should be a major priority of people working on AI safety. It seems both very useful for getting other researchers involved, and also very useful for understanding the problem and making progress.

In terms of talking to other researchers, in-person conversations like the ones you're having seem like a great way to feel things out before writing public documents.

Comment by john_maxwell on Understanding “Deep Double Descent” · 2019-12-10T06:24:47.178Z · score: 8 (4 votes) · LW · GW

Thanks Preetum. You're right, I missed that note the first time -- I edited my comment a bit.

It might be illuminating to say "the polynomial found by iterating weights starting at 0" instead of "the polynomial found with gradient descent", since in this case, the inductive bias comes from the initialization point, not necessarily gradient descent per se. Neural nets can't learn if all the weights are initialized to 0 at the start, of course :)

BTW, I tried switching from pseudoinverse to regularized linear regression, and the super high degree polynomials seemed more overfit to me.

Comment by john_maxwell on Understanding “Deep Double Descent” · 2019-12-08T22:59:49.034Z · score: 10 (6 votes) · LW · GW

I took a look at the Colab notebook linked from that blog post, and there are some subtleties the blog post doesn't discuss. First, the blog post says they're using gradient descent, but if you look at the notebook, they're actually computing a pseudoinverse. [EDIT: They state a justification in their notebook which I missed at first. See discussion below.] Second, the blog post talks about polynomial regression, but doesn't mention that the notebook uses Legendre polynomials. I'm pretty sure high degree Legendre polynomials look like a squiggle which closely follows the x-axis on [-1, 1] (the interval used in their demo). If you fork the notebook and switch from Legendre polynomials to classic polynomial regression (1, x, x^2, x^3, etc.), the degree 100,000 fit appears to be worse than the degree 20 fit. I searched on Google and Google Scholar, and use of Legendre polynomials doesn't seem to be common practice in ML.

Comment by john_maxwell on Understanding “Deep Double Descent” · 2019-12-07T08:27:21.049Z · score: 14 (10 votes) · LW · GW

The decision tree result seemed counterintuitive to me, so I took a look at that section of the paper. I wasn't impressed. In order to create a double descent curve for decision trees, they change their notion of "complexity" (midway through the graph... check Figure 5) from the number of leaves in a tree to the number of trees in a forest. Turns out that right after they change their notion of "complexity" from number of leaves to number of trees, generalization starts to improve :)

I don't see this as evidence for double descent per se. Just that ensembling improves generalization. Which is something we've known for a long time. (And this fact doesn't seem like a big mystery to me. From a Bayesian perspective, ensembling is like using the posterior predictive distribution instead of the MAP estimate. BTW I think there's also a Bayesian story for why flat minima generalize better -- the peak of a flat minimum is a slightly better approximation for the posterior predictive distribution over the entire hypothesis class. Sometimes I even wonder if something like this explains why Occam's Razor works...)

Anyway, the authors' rationale seems to be that once your decision tree has memorized the training set, the only way to increase the complexity of your hypothesis class is by adding trees to the forest. I'd rather they had kept the number of trees constant and only modulated the number of leaves.

However, the decision tree discussion does point to a possible explanation of the double descent phenomenon for neural networks. Maybe once you've got enough complexity to memorize the training set, adding more complexity allows for a kind of "implicit ensembling" which leads to memorizing the training set in many different ways and averaging the results together like an ensemble does.

It's suspicious to me that every neural network case study in the paper modulates layer width. There's no discussion of modulating depth. My guess is they tried modulating depth but didn't get the double descent phenomenon and decided to leave those experiments out.

I think increased layer width fits pretty nicely with my implicit ensembling story. Taking a Bayesian perspective on the output neuron: After there are enough neurons to memorize the training set, adding more leads to more pieces of evidence regarding the final output, making estimates more robust. Which is more or less why ensembles work IMO.

Comment by john_maxwell on Self-Fulfilling Prophecies Aren't Always About Self-Awareness · 2019-11-27T23:59:47.913Z · score: 2 (1 votes) · LW · GW

Could work.

Comment by john_maxwell on Gears-Level Models are Capital Investments · 2019-11-25T01:44:23.062Z · score: 4 (2 votes) · LW · GW

offer outside-the-box insights

I don't think that's the same as "thinking outside the box you're given". That's about power of extrapolation, which is a separate entangled dimension.

Anyway, suppose I'm thinking of a criterion. Of the integers 1-20, the ones which meet my criterion are 2, 3, 5, 7, 11, 13, 17, 19. I challenge you to write a program that determines whether a number meets my criterion or not. A "black-box" program might check to see if the number is on the list I gave. A "gears-level" program might check to see if the number is divisible by any integer besides itself and 1. The "gears-level" program is "within the box" in the sense that it is a program which returns True or False depending on whether my criterion is supposedly met--the same box the "black-box" program is in. And in principle it doesn't have to be constructed using prior knowledge. Maybe you could find it by brute forcing all short programs and returning the shortest one which matches available data with minimal hardcoded integers, or some other method for searching program space.

Similarly, a being from another dimension could be transported to our dimension, observe some physical objects, try to make predictions about them, and deduce that F=ma. They aren't using prior knowledge since their dimension works differently than ours. And they aren't "thinking outside the box they're given", they're trying to make accurate predictions, just as one could do with a black box model.

Comment by john_maxwell on Gears-Level Models are Capital Investments · 2019-11-24T21:53:43.115Z · score: 3 (2 votes) · LW · GW

Of gears-level models that don't make use of prior knowledge or entangled dimensions?

Comment by john_maxwell on Ultra-simplified research agenda · 2019-11-24T07:22:46.077Z · score: 7 (3 votes) · LW · GW

Theory of mind is something that humans have instinctively and subconsciously, but that isn't easy to spell out explicitly; therefore, by Moravec's paradox, it will be very hard to implant it into an AI, and this needs to be done deliberately.

I think this is the weakest part. Consider: "Recognizing cat pictures is something humans can do instinctively and subconsciously, but that isn't easy to spell out explicitly; therefore, by Moravec's paradox, it will be very hard to implant it into an AI, and this needs to be done deliberately." But in practice, the techniques that work best for cat pictures work well for lots of other things as well, and a hardcoded solution customized for cat pictures will actually tend to underperform.

Comment by john_maxwell on Gears-Level Models are Capital Investments · 2019-11-24T07:14:21.617Z · score: 2 (1 votes) · LW · GW

I can imagine modeling strategies which feel relatively "gears-level" yet don't make use of prior knowledge or "think outside the box they're given". I think there are a few entangled dimensions here which could be disentangled in principle.

Comment by john_maxwell on Gears-Level Models are Capital Investments · 2019-11-24T07:14:02.080Z · score: 2 (1 votes) · LW · GW

I can imagine modeling strategies which feel relatively "gears-level" yet don't make use of prior knowledge or "think outside the box they're given". I think there are a few entangled dimensions here which could be disentangled in principle.

Comment by john_maxwell on Gears-Level Models are Capital Investments · 2019-11-24T05:15:40.919Z · score: 3 (2 votes) · LW · GW

Gears-level insights can highlight ideas we wouldn't even have thought to try, whereas black-box just tests the things we think to test... it can't find unknown unknowns

It seems to me that black-box methods can also highlight things we wouldn't have thought to try, e.g. genetic algorithms can be pretty creative.

Comment by john_maxwell on The LessWrong 2018 Review · 2019-11-24T05:04:13.829Z · score: 8 (4 votes) · LW · GW

Is there some way I can see all the posts I upvoted in 2018 so I can figure out which I think are worthy of nomination?

Compiling the results into a physical book. I find there's something... literally weighty about having your work in printed form. And because it's much harder to edit books than blogposts, the printing gives authors an extra incentive to clean up their past work or improve the pedagogy.

Physical books are also often read in a different mental mode, with a longer attention span, etc. You could also sell it as a Kindle book to get the same effect. Smashwords is a service that lets you upload a book once and sell it on many different platforms.

The end of the review process includes a straightforward vote on which posts seem (in retrospect), useful, and which seem "epistemically sound". This is not the end of the conversation about which posts are making true claims that carve reality at it's joints, but my hope is for it to ground that discussion in a clearer group-epistemic state.

Is the idea to only include in the review those posts which are almost universally regarded as "epistemically sound"?

Comment by john_maxwell on Do you get value out of contentless comments? · 2019-11-24T04:54:35.958Z · score: 2 (1 votes) · LW · GW

I'm not sure how to interpret this. Do they "create a buffer" in the sense of discouraging critics, or "create a buffer" in the sense of a psychological buffer to the demoralization that harsh criticism would otherwise cause?

Comment by john_maxwell on What AI safety problems need solving for safe AI research assistants? · 2019-11-22T19:46:27.325Z · score: 2 (1 votes) · LW · GW

I suspect that the concept of utility functions that are specified over your actions is fuzzy in a problematic way. Does it refer to utility functions that are defined over the physical representation of the computer (e.g. the configuration of atoms in certain RAM memory cells that their value represents the selected action)? If so, we're talking about systems that 'want to affect (some part of) the world', and thus we should expect such systems to have convergent instrumental goals with respect to our world (e.g. taking control over as much resources in our world as possible).

No, it's not a utility function defined over the physical representation of the computer!

The Markov decision process formalism used in reinforcement learning already has the action taken by the agent as one of the inputs which determines the agent's reward. You would have to do a lot of extra work to make it so when the agent simulates the act of modifying its internal circuitry, the Markov decision process delivers a different set of rewards after that point in the simulation. Pretty sure this point has been made multiple times, you can see my explanation here. Another way you could think about it is that goal-content integrity is a convergent instrumental goal, so that's why the agent is not keen to destroy the content of its goals by modifying its internal circuits. You wouldn't take a pill that made you in to a psychopath even if you thought it'd be really easy for you to maximize your utility function as a psychopath.

It's fine to make pessimistic assumptions but in some cases they may be wildly unrealistic. If your Oracle has the goal of escaping instead of the goal of answering questions accurately (or similar), it's not an "Oracle".

Anyway, what I'm interested in is concrete ways things could go wrong, not pessimistic bounds. Pessimistic bounds are a matter of opinion. I'm trying to gather facts. BTW, note that the paper you cite doesn't even claim their assumptions are realistic, just that solving safety problems in this worst case will also address less pessimistic cases. (Personally I'm a bit skeptical--I think you ideally want to understand the problem before proposing solutions. This recent post of mine provides an illustration.)

Comment by john_maxwell on What AI safety problems need solving for safe AI research assistants? · 2019-11-22T03:51:58.379Z · score: 2 (1 votes) · LW · GW

I'm confused about the "I'm not sure how the concept applies in other cases" part. It seems to me that 'arbitrarily capable systems that "want to affect the world" and are in an air-gapped computer' are a special case of 'agents which want to achieve a broad variety of utility functions over different states of matter'.

Well, the reason I mentioned the "utility function over different states of matter" thing is because if your utility function isn't specified over states of matter, but is instead specified over your actions (e.g. behave in a way that's as corrigible as possible), you don't necessarily get instrumental convergence.

I'm not sure what's the interpretation of 'unintended optimization', but I think that a sufficiently broad interpretation would cover the failure modes I'm talking about here.

"Unintended optimization. First, the possibility of mesa-optimization means that an advanced ML system could end up implementing a powerful optimization procedure even if its programmers never intended it to do so." - Source. "Daemon" is an older term.

I believe that researchers tend to model Oracles as agents that have a utility function that is defined over world states/histories (which would make less sense if they are confident that we can use supervised learning to train an arbitrarily powerful Oracle that does not 'want to affect the world').

My impression is that early thinking about Oracles wasn't really informed by how (un)supervised systems actually work, and the intellectual momentum from that early thinking has carried to the present, even though there's no real reason to believe these early "Oracle" models are an accurate description of current or future (un)supervised learning systems.

Comment by john_maxwell on Self-Fulfilling Prophecies Aren't Always About Self-Awareness · 2019-11-20T09:01:10.857Z · score: 3 (2 votes) · LW · GW

This is good stuff!


Here's another attempt at explaining.

  1. Suppose Predict-O-Matic A has access to historical data which suggests Predict-O-Matic B tends to be extremely accurate, or otherwise has reason to believe Predict-O-Matic B is extremely accurate.

  2. Suppose the way Predict-O-Matic A makes predictions is by some process analogous to writing a story about how things will go, evaluating the plausibility of the story, and doing simulated annealing or some other sort of stochastic hill-climbing on its story until the plausibility of its story is maximized.

  3. Suppose that it's overwhelmingly plausible that at some time in the near future, Important Person is going to walk up to Predict-O-Matic B and ask Predict-O-Matic B for a forecast and make an important decision based on what Predict-O-Matic B says.

  4. Because of point 3, stories which don't involve a forecast from Predict-O-Matic B will tend to get rejected during the hill-climbing process. And...

    • Because of point 1, stories which involve an inaccurate forecast from Predict-O-Matic B will tend to get rejected during the hill-climbing process. We will tend to hill-climb our way into having Predict-O-Matic B's prediction change so it matches what actually happens in the rest of the story.

    • Because the person in point 3 is important and Predict-O-Matic B's forecast influences their decision, a change to the part of the story regarding Predict-O-Matic B's prediction could easily mean the rest is no longer plausible and will benefit from revision.

    • So now we've got a loop in the hill-climbing process where changes in Predict-O-Matic B's forecast lead to changes in what happens after Predict-O-Matic B's forecast, and changes in what happens after Predict-O-Matic B's forecast lead to changes in Predict-O-Matic B's forecast. It stops when we hit a fixed point.

Now that I've written this out, I'm realizing that I don't think this would happen for sure. I've argued both that changing the forecast to match what happens will improve plausibility, and that changing what happens so it's a plausible result of the forecast will improve plausibility. But if the only way to achieve one is by discarding the other, I guess both tweaks won't cause improvements to plausibility in general. However, the point remains that a fixed point will be among the most plausible stories available, so any good optimization method will tend to converge on it. (Maybe just simulated annealing, but with a temperature parameter high enough that it finds easy to leap between these kind of semi-plausible stories until it gets a fixed point by chance. Or if we're doing hill climbing based on local improvements in plausibility instead of considering plausibility when taken as a whole.)

I think the scenario is similar to your P(X) and P(Y) discussion in this post.

It just now occurred to me that you could get a similar effect given certain implementations of beam search. Suppose we're doing beam search with a beam width of 1 million. For the sake of simplicity, suppose that when Important Person walks up to Predict-O-Matic B and asks their question in A's sim, each of 1M beam states gets allocated to a different response that Predict-O-Matic B could give. Some of those states lead to "incoherent", low-probability stories where Predict-O-Matic B's forecast turns out to be false, and they get pruned. The only states left over are states where Predict-O-Matic B's prophecy ends up being correct -- cases where Predict-O-Matic B made a self-fulfilling prophecy.

Comment by john_maxwell on The Goodhart Game · 2019-11-20T07:39:10.237Z · score: 2 (1 votes) · LW · GW

(Not 100% sure I understood your comment.) Training is one idea, but you could also just test out heuristics with this framework. For example, I think this scheme could be used to benchmark quantilization against a competing approach.

Comment by john_maxwell on The Goodhart Game · 2019-11-19T21:05:01.847Z · score: 2 (1 votes) · LW · GW

I'm not sure what you're trying to say. I'm using a broad definition of "adversarial example" where it's an adversarial example if you can deliberately make a classifier misclassify. So if an adversary can find a close-up of a sun that your cat detector calls a cat, that's an adversarial example by my definition. I think this is similar to the definition Ian Goodfellow uses.


An adversarial example is an input to a machine learning model that is intentionally designed by an attacker to fool the model into producing an incorrect output.


Comment by john_maxwell on Do we know if spaced repetition can be used with randomized content? · 2019-11-19T02:46:10.517Z · score: 3 (2 votes) · LW · GW

You could have a spaced repetition card that says "do the next exercise in Chapter X of Textbook Y". I think that's better because Textbook Y probably has exercises which will help you mull over a concept from a variety of different angles.

Comment by john_maxwell on Self-Fulfilling Prophecies Aren't Always About Self-Awareness · 2019-11-19T02:36:00.329Z · score: 2 (1 votes) · LW · GW

I think it's relatively straightforward to avoid that if you construct your system well.

Comment by john_maxwell on Robin Hanson on the futurist focus on AI · 2019-11-16T02:23:31.743Z · score: 2 (1 votes) · LW · GW

Recent paper that might be relevant:

Comment by john_maxwell on What AI safety problems need solving for safe AI research assistants? · 2019-11-14T22:15:54.222Z · score: 2 (1 votes) · LW · GW

Those specific failure modes seem to me like potential convergent instrumental goals of arbitrarily capable systems that "want to affect the world" and are in an air-gapped computer.

My understanding is convergent instrumental goals are goals which are useful to agents which want to achieve a broad variety of utility functions over different states of matter. I'm not sure how the concept applies in other cases. Like, if we aren't using RL, and there is no unintended optimization, why specifically would there be pressure to achieve convergent instrumental goals? (I'm not trying to be rhetorical or antagonistic--I really want to hear if you can think of something.)

I'm interested in #1. It seems like the most promising route is to prevent unintended optimization from arising in the first place, instead of trying to outwit a system that's potentially smarter than we are.

Comment by john_maxwell on Open & Welcome Thread - November 2019 · 2019-11-13T02:27:22.288Z · score: 2 (1 votes) · LW · GW

I'm not sure I'm familiar with the word "mixture" in the way you're using it.

Comment by john_maxwell on What AI safety problems need solving for safe AI research assistants? · 2019-11-13T02:23:52.670Z · score: 2 (1 votes) · LW · GW

Do you have any thoughts on how specifically those failure modes might come about?

Comment by john_maxwell on What AI safety problems need solving for safe AI research assistants? · 2019-11-12T09:51:43.111Z · score: 2 (1 votes) · LW · GW

I agree well-calibrated uncertainties are quite valuable, but I'm not convinced they are essential for this sort of application. For example, if my assistant tells me a story about how my proposed FAI could fail, if my assistant is overconfident in its pessimism, then the worst case is that I spend a lot of time thinking about the failure mode without seeing how it could happen (not that bad). If my assistant is underconfident, and tells me a failure mode is 5% likely when it's really 95% likely, it still feels like my assistant is being overall helpful if the failure case is one I wasn't previously aware of. To put it another way, if my assistant isn't calibrated, it feels like I should just be able to ignore its probability estimates and get good use out if it.

but eventually we want to switch over to a more scalable approach that will use few of the same tools.

I actually think the advisor approach might be scaleable, if advisor_1 has been hand-verified, and advisor_1 verifies advisor_2, who verifies advisor_3, etc.

Comment by john_maxwell on What AI safety problems need solving for safe AI research assistants? · 2019-11-12T09:47:12.646Z · score: 2 (1 votes) · LW · GW

Are you referring to the possibility of unintended optimization, or is there something more?

Comment by john_maxwell on Notes on Running Objective · 2019-11-12T09:14:22.681Z · score: 2 (1 votes) · LW · GW

Hey Jeff, for whatever it's worth, I had some really bad RSI a few years ago and discovering has allowed me to almost completely cure it. Happy to chat more about this if you'd like, feel free to reach out!

Comment by john_maxwell on Open & Welcome Thread - November 2019 · 2019-11-12T04:25:26.829Z · score: 2 (1 votes) · LW · GW

See that unbiased "prior-free" estimates must be mixtures of the (unbiased) estimates

I don't follow.

assembly of (higher-variance) estimates

What's an "assembly of estimates"?

treating the two as independent pieces of evidence

But they're distributions, not observations.

Comment by john_maxwell on Open & Welcome Thread - November 2019 · 2019-11-10T17:33:24.428Z · score: 2 (1 votes) · LW · GW

and that their choice of weights minimizes the error

The author has selected a weighted average such that if we treat that weighted average as a random variable, its standard deviation is minimized. But if we just want a random variable whose standard deviation is minimized, we could have a distribution which assigns 100% credence to the number 0 and be done with it. In other words, my question is whether the procedure in this post can be put on a firmer philosophical foundation. Or whether there is some alternate derivation/problem formulation (e.g. a mixture model) that gets us the same formula.

Another way of getting at the same idea: There are potentially other procedures one could use to create a "blended estimate", for example, you could find the point such that the product of the likelihoods of the two distributions is maximized, or take a weighted average of the two estimates using e.g. (1/sigma) as the weight of each estimate. Is there a justification for using this particular loss function, of finding a random variable constructed via weighted average whose variance is minimized? It seems to me that this procedure is a little weird because it's the random variable that corresponds to the person's age that we really care about. We should be looking "upstream" of the estimates, but instead we're going "downstream" (where up/down stream roughly correspond to the direction of arrows in a Bayesian network).

Comment by john_maxwell on [Team Update] Why we spent Q3 optimizing for karma · 2019-11-09T06:11:41.003Z · score: 4 (0 votes) · LW · GW

Cool project!

I suggest you make Q3 "growth quarter" every year, and always aim to achieve 1.5x the amount of growth you were able to achieve during last year's "growth quarter".

You could have an open thread soliciting growth ideas from the community right before each "growth quarter".

Comment by john_maxwell on Open & Welcome Thread - November 2019 · 2019-11-08T00:50:03.009Z · score: 11 (7 votes) · LW · GW

Woah, this seems like a big jump to a form of technocracy / paternalism that I would think would typically require more justification than spending a short amount of time brainstorming in a comment thread why the thing millions of people use daily is actually bad.

Under what circumstances do you feel introducing new policy ideas with the preface "maybe this could be a good idea" is acceptable?

I don't expect anyone important to be reading this thread, certainly not important policymakers. Even if they were, I think it was pretty clear I was spitballing.

Like, banning sites from offering free services if a character limit is involved because high status members of communities you like enjoy such sites

If society's elites are incentivized to use a platform which systematically causes misunderstandings and strife for no good reason, that seems bad.

Now, one counterargument would be "coordination problems" mean those writers would prefer to write somewhere else. But presumably if anyone's aware of "inadequate equilibria" like this and able to avoid it it would be Eliezer.

Let's not fall prey to the halo effect. Eliezer also wrote a long post about the necessity of back-and-forth debate, and he's using a platform which is uniquely bad at this. At some point, one starts to wonder whether Eliezer is a mortal human being who suffers from akrasia and biases just like the rest of us.

I agree it's not the best possible version of a platform by any means, but to say it's obviously net negative seems like a stretch without further evidence.

I didn't make much of an effort to assemble arguments that Twitter is bad. But I think there are good arguments out there. How do you feel about the nuclear diplomacy that's happened on Twitter?