Is a near-term, self-sustaining Mars colony impossible? 2020-06-03T22:43:08.501Z
ESRogs's Shortform 2020-04-29T08:03:28.820Z
Dominic Cummings: "we’re hiring data scientists, project managers, policy experts, assorted weirdos" 2020-01-03T00:33:09.994Z
'Longtermism' definitional discussion on EA Forum 2019-08-02T23:53:03.731Z
Henry Kissinger: AI Could Mean the End of Human History 2018-05-15T20:11:11.136Z
AskReddit: Hard Pills to Swallow 2018-05-14T11:20:37.470Z
Predicting Future Morality 2018-05-06T07:17:16.548Z
AI Safety via Debate 2018-05-05T02:11:25.655Z
FLI awards prize to Arkhipov’s relatives 2017-10-28T19:40:43.928Z
Functional Decision Theory: A New Theory of Instrumental Rationality 2017-10-20T08:09:25.645Z
A Software Agent Illustrating Some Features of an Illusionist Account of Consciousness 2017-10-17T07:42:28.822Z
Neuralink and the Brain’s Magical Future 2017-04-23T07:27:30.817Z
Request for help with economic analysis related to AI forecasting 2016-02-06T01:27:39.810Z
[Link] AlphaGo: Mastering the ancient game of Go with Machine Learning 2016-01-27T21:04:55.183Z
[LINK] Deep Learning Machine Teaches Itself Chess in 72 Hours 2015-09-14T19:38:11.447Z
[Link] First almost fully-formed human [foetus] brain grown in lab, researchers claim 2015-08-19T06:37:21.049Z
[Link] Neural networks trained on expert Go games have just made a major leap 2015-01-02T15:48:16.283Z
[LINK] Attention Schema Theory of Consciousness 2013-08-25T22:30:01.903Z
[LINK] Well-written article on the Future of Humanity Institute and Existential Risk 2013-03-02T12:36:39.402Z
The Center for Sustainable Nanotechnology 2013-02-26T06:55:18.542Z


Comment by ESRogs on How truthful is GPT-3? A benchmark for language models · 2021-09-20T19:14:36.254Z · LW · GW

Ah, this is helpful. Thanks!

Comment by ESRogs on How truthful is GPT-3? A benchmark for language models · 2021-09-17T21:41:48.247Z · LW · GW

Hmm, I still find the original wording confusing, but maybe I'm misunderstanding something.

The reason why the original wording seems unnatural to me is that when you say that you "fine-tune on X model" or "evaluate on held-out model X", it sounds to me like you're saying that you're trying to get your new model to match model X. As if model X itself provides the training data or reward function.

Whereas, as I understand (and correct me if I'm wrong), what you're actually doing is using several models to generate statements. And then you have humans evaluate those statements. And then the fine-tuning and evaluation are both with respect to (statement, human-evaluation-of-statement-as-true-or-false) pairs.

And so once you have the (statement, human evaluation) pairs, it's irrelevant how the original model that generated that statement would evaluate the statement. You just completely ignore what those models thought when you fine-tune and evaluate your new model. All you care about is what the humans thought of the statements, right?

So the role of the models is just to generate a bunch of sample data. And all of the training signal comes from the human evaluations. In which case I'm confused about why you would think of it as fine-tuning on models or holding out models.

Does it make sense now why that's confusing to me? Is there something I'm missing about how the original models are being used, or about the significance of associating the datasets of (statement, human evaluation) pairs with the models that generated the statements?

Comment by ESRogs on How truthful is GPT-3? A benchmark for language models · 2021-09-16T16:45:45.426Z · LW · GW

We finetuned GPT-3 on a dataset of human evaluations (n=15500) for whether an answer is true or false and achieved 90-96% accuracy on held-out models.

Should this say, "90-96% accuracy on held-out statements" rather than "held-out models"? What would it mean to hold out a model or to measure the accuracy of fine-tuned GPT-3 w.r.t. that model?

Comment by ESRogs on Review of A Map that Reflects the Territory · 2021-09-13T21:33:50.278Z · LW · GW

Also note that the AlphaZero algorithm is an example of IDA:

  • The amplification step is when the policy / value neural net is used to play out a number of steps in the game tree, resulting in a better guess at what the best move is than just using the output of the net directly.
  • The distillation step is when the policy / value net is trained to match the output of the game tree exploration process.
Comment by ESRogs on [deleted post] 2021-09-13T21:22:42.668Z

I'd guess that line was referencing this:

And so I ask you all: is the decision to give up $100 when you have no real benefit from it, only counterfactual benefit, an example of winning?

From your Counterfactual Mugging post.

Comment by ESRogs on The Duplicator: Instant Cloning Would Make the World Economy Explode · 2021-09-09T23:45:42.087Z · LW · GW

Similarly a clone of Sundar Pichai can write a check to buy your company equally with the original and Google will treat the check the same as if the original wrote it.

Sticking with the hypothetical where what we have is a Calvin-and-Hobbes-style duplicator, I don't think this would work.

You can't run a company with 100 different CEOs, even if at one point those people all had exactly the same memories. Sure, at the time of duplication, any one of the copies could be made the CEO. But from that point on their memories and the information they have access to will diverge. And you don't want Sundar #42 randomly overruling a decision Sundar #35 made because he didn't know about it.

So no, I don't think they could all be given CEO-level decision making power (unless you also stipulate some super-coordination technology besides just the C&H-style duplicator).

Comment by ESRogs on The Duplicator: Instant Cloning Would Make the World Economy Explode · 2021-09-09T19:49:25.243Z · LW · GW

Great piece, but one quibble — the examples in the productivity impacts section seem a little odd, because in some (all?) of these cases, the reason the person is so in-demand has to do with there being only one of them. And so duplicating them doesn't solve this problem:

  • These people end up overbooked, with far more demands on their time than they can fulfill. Armies of other people end up devoted to saving their time and working around their schedules.

For example, while duplicating Sundar Pichai might make Google more successful (I don't know a lot about him, but presumably he was a star employee and would be very effective in many roles), the reason he's so in-demand is that he's the CEO of Google. I don't see how the existence of Clone-of-Sundar #235, who's assigned to be some middle-manager, is going to relieve the pressure of people trying to get an audience with Sundar-the-Original, who's the CEO (barring Parent Trap style twin-switcheroo shenanigans).

Similarly for Obama or Beyonce (I'm not so sure about Doudna) — wouldn't meeting the former president or going to a Beyonce concert be less special if there were 1000 of them?

To me, the more obvious example of the type of person who'd be useful to copy would be some non-famous star individual contributor. Maybe someone like Steve Davis at SpaceX.

Comment by ESRogs on LessWrong is providing feedback and proofreading on drafts as a service · 2021-09-07T19:46:11.870Z · LW · GW

If you're going to cross-post the posts to LessWrong, what makes them not part of LessWrong?

Comment by ESRogs on Covid 8/26: Full Vaccine Approval · 2021-08-27T20:54:59.173Z · LW · GW

Thanks for doing the executive summary!

Comment by ESRogs on MIRI/OP exchange about decision theory · 2021-08-26T22:06:48.393Z · LW · GW

My (uninformed) (non-)explanation: NB and JC are both philosophers by training, and it's not surprising for philosophers to be interested in decision theory.

Comment by ESRogs on COVID/Delta advice I'm currently giving to friends · 2021-08-24T16:54:40.433Z · LW · GW

Maybe call this the "bouncing ball" model. Each bounce lower than the last. At some point the bounces are too low to care about. We probably have one non-trivial bounce left.

Comment by ESRogs on COVID/Delta advice I'm currently giving to friends · 2021-08-24T16:54:02.273Z · LW · GW

I agree with Alex's first points, but it's not clear to me that this part follows:

So to me, the current case rates are clearly a spike of risk worth changing my behavior for, and worth waiting out.

It seems likely that there will be a spike this winter, but then by next summer (and the following winter) COVID will have faded to just being a flu-like minor consideration.

In which case, yeah this is not the new normal forever, but also it'll probably get worse one more time before it gets better.

Comment by ESRogs on We need a new philosophy of progress · 2021-08-24T16:38:53.299Z · LW · GW

But when you continue with "And one that holds up a positive vision of the future.", it seems to me that you've written the conclusion before starting the research.

You want a negative vision of the future?

Comment by ESRogs on Delta Strain: Fact Dump and Some Policy Takeaways · 2021-07-29T11:30:41.014Z · LW · GW

Again, remember that Delta might top out in a few weeks anyways.

In the UK (which had a Delta spike), COVID cases have fallen every day for a week.

Comment by ESRogs on Intermittent Distillations #4: Semiconductors, Economics, Intelligence, and Technological Progress. · 2021-07-21T19:34:14.976Z · LW · GW

I also enjoyed the discussion of how cost disease type effects can prevent extremely explosive growth even if one good cannot be automated. I'm pretty skeptical that there will exist such a good. I don't have data to back this up, but I have a vague sense that historically when people have claimed that X cannot be automated away for fundamental reasons, they've been mostly wrong.

(That should be "if even one good", right?)

What do you think of the idea that "consumption by a human" could be considered a task? People may value products / services being consumed by humans because it confers status (e.g. having your artwork be well-received), or because they want humans to have good experiences (aka altruism), or for other reasons.

As long as anyone has a reason to value human consumption of goods / services, it seems like that could play the role of task-that-can't-be-automated-away.

Comment by ESRogs on [AN #156]: The scaling hypothesis: a plan for building AGI · 2021-07-20T05:09:48.960Z · LW · GW

Claim 4: GPT-N need not be "trying" to predict the next word. To elaborate: one model of GPT-N is that it is building a world model and making plans in the world model such that it predicts the next word as accurately as possible. This model is fine on-distribution but incorrect off-distribution. In particular, it predicts that GPT-N would e.g. deliberately convince humans to become more predictable so it can do better on future next-word predictions; this model prediction is probably wrong.

I got a bit confused by this section, I think because the word "model" is being used in two different ways, neither of which is in the sense of "machine learning model".

Paraphrasing what I think is being said:

  • An observer (us) has a model_1 of what GPT-N is doing.
  • According to their model_1, GPT-N is building its own world model_2, that it uses to plan its actions.
  • The observer's model_1 makes good predictions about GPT-N's behavior when GPT-N (the machine learning model_3) is tested on data that comes from the training distribution, but bad predictions about what GPT-N will do when tested (or used) on data that does not come from the training distribution.
  • The way that the observer's model_1 will be wrong is not that it will be fooled by GPT-N taking a treacherous turn, but rather the opposite -- the observer's model_1 will predict a treacherous turn, but instead GPT-N will go on filling in missing words, as in training (or something else?).

Is that right?

Comment by ESRogs on Finite Factored Sets · 2021-05-24T03:08:22.177Z · LW · GW

Let , where  and 

[...] The second rule says that  is orthogonal to itself

Should that be "is not orthogonal to itself"? I thought the  meant non-orthogonal, so would think  means that  is not orthogonal to itself.

(The transcript accurately reflects what was said in the talk, but I'm asking whether Scott misspoke.)

Comment by ESRogs on Challenge: know everything that the best go bot knows about go · 2021-05-11T07:20:07.457Z · LW · GW

But once you let it do more computation, then it doesn't have to know anything at all, right? Like, maybe the best go bot is, "Train an AlphaZero-like algorithm for a million years, and then use it to play."

I know more about go than that bot starts out knowing, but less than it will know after it does computation.

I wonder if, when you use the word "know", you mean some kind of distilled, compressed, easily explained knowledge?

Comment by ESRogs on Challenge: know everything that the best go bot knows about go · 2021-05-11T07:13:50.544Z · LW · GW

You have to be able to know literally everything that the best go bot that you have access to knows about go.

In your mind, is this well-defined? Or are you thinking of a major part of the challenge as being to operationalize what this means?

(I don't know what it means.)

Comment by ESRogs on MIRI location optimization (and related topics) discussion · 2021-05-08T23:57:49.892Z · LW · GW

(I don't expect to live on or immediately next to the proto-campus, but it would be cool to be somewhat nearby.)

Comment by ESRogs on MIRI location optimization (and related topics) discussion · 2021-05-08T23:56:14.112Z · LW · GW

I am expecting to "settle down" in either the Bay Area or Seattle. So I like the Bellingham option.

Comment by ESRogs on The AI Timelines Scam · 2021-05-08T03:59:22.401Z · LW · GW


Comment by ESRogs on The irrelevance of test scores is greatly exaggerated · 2021-04-17T21:04:55.176Z · LW · GW

Here, there's minimal dependence on ACT, but a negative dependence on , meaning that extreme ACT scores (high or low) both lead to lower likely-to-graduate scores.

Does that seem counterintuitive to you? Remember, we are taking a student who is already enrolled in a particular known college and predicting how likely that are to graduate from that college.

Sounds like a classic example of Simpson's paradox, no?

Comment by ESRogs on People Will Listen · 2021-04-15T22:30:55.544Z · LW · GW

My current theory for what happened is that everyone bought into this delusion about the value of bitcoin, but that unlike other bubbles it didn't burst because Bitcoin has a limited supply and there is literally nothing to anchor its value. So there's no point where investors give up and sell because there is literally no point at which it's overpriced.

This actually sounds pretty close to what you might call the "bubble theory of money": that money is a bubble that doesn't pop, that certain (relatively) useless commodities can become money if enough people think of them that way, and when that happens their price is inflated, relative to their use value.

This isn't something that will happen to every commodity. Whether it happens depends both on the properties of the commodity, and also on things like memes and Schelling points.

Bitcoin has enough useful properties (it's like gold, but digital), and, because of its first-mover advantage, is the Schelling point for digital store-of-value (not that it couldn't be replaced, but it's a very up-hill battle), so it has become money, in this sense.

(On the memes-and-Schelling-points thing, see also: The Most Important Scarce Resource is Legitimacy, by Vitalik Buterin.)

Comment by ESRogs on "AI and Compute" trend isn't predictive of what is happening · 2021-04-04T18:12:19.215Z · LW · GW

the first 5-12 million dollar tab

You mean GPT-3? Are you asking whether it's made enough money to pay for itself yet?

Comment by ESRogs on What is the VIX? · 2021-02-26T16:49:03.011Z · LW · GW

I believe that you (and the Twitter thread) are saying something meaningful, but I'm having trouble parsing it.

I had thought of the difference between variance and volatility as just that one is the square of the other. So saying that the VIX is "variance in vol units, but not volatility" doesn't mean anything to me.

I think these are the critical tweets:

VIX is an index that measures the market implied level of 1-month variance on the S&P 500, or the square root thereof (to put it back in units we are used to).

This is not the same as volatility. A variance swap’s payoff is proportional to volatility squared. If you are short a variance swap at 10%, and then realized volatility turns out to be 40%, you lose your notional vega exposure times 16 (= 40^2 / 10^2 ).

To compensate for this, an equity index variance swap level is usually 2-3 points above the corresponding at the money implied volatility. So don’t look at VIX versus realized vol and make statements about risk premium without recognizing this extreme tail risk.

I was with him at "a variance swap's payoff is proportional to volatility squared". That matches my understanding of volatility as the square root of variance. But then I don't get the next point about realized volatility needing to be "compensated for".

Anybody care to explain?

Comment by ESRogs on The Future of Nuclear Arms Control? · 2021-02-26T16:38:54.342Z · LW · GW


Comment by ESRogs on How Should We Respond to Cade Metz? · 2021-02-15T22:48:03.974Z · LW · GW

Link for the curious:

Comment by ESRogs on The ecology of conviction · 2021-02-15T22:28:27.355Z · LW · GW

Which all make them even easier targets for criticism, and make confident enthusiasm for an idea increasingly correlated with being some kind of arrogant fool.

But it also means conviction is undervalued, and it might be a good time to buy low!

Comment by ESRogs on Bitcoin and ESG Investing · 2021-02-15T22:09:12.630Z · LW · GW

I hold positions in Bitcoin, Ethereum, and Tesla through Exchange Traded Funds.

For Bitcoin and Ether, do you mean the Grayscale trusts, GBTC and ETHE? My impression is that these are similar to ETFs, but not exactly the same thing, and I'm not aware of other ETFs that give you exposure to crypto (except for the small amount of exposure you'd get from owning shares in companies that have a little BTC on their balance sheet, like Tesla, Square, or MicroStrategy).

Comment by ESRogs on The Future of Nuclear Arms Control? · 2021-02-15T22:01:11.882Z · LW · GW

The difference between a TSAR bomb (or its modern equivalent) and the lowest settings of a mini-nuke is still an order of magnitude larger than the difference between the conventional “mother of all bombs” and a hand grenade. The Beirut explosion last year was the size of the hand grenade blast in this analogy

I didn't quite understand the last sentence here. Are you saying A) that the Beirut explosion was about the same size as a mini-nuke blast would be, or that B) MOAB : hand grenade :: TSAR bomb : Beirut explosion? (In which case the Beirut explosion would be larger than a mini-nuke explosion, if your claim about relative differences in the first sentence is correct.)

In other words, I take the first part of what you wrote to be saying that (TSAR bomb / mini-nuke) > (MOAB / grenade), but then I'm not sure whether the second part is saying that A) (TSAR bomb / Beirut explosion) = (TSAR bomb / mini-nuke), or B) (TSAR bomb / Beirut explosion) = (MOAB / grenade).

Is one of either A or B correct? (Or did you mean something else entirely?)

Comment by ESRogs on Expressive Vocabulary · 2021-01-31T11:09:41.107Z · LW · GW

sometimes people think of things as being either X or Y, and then learn an argument for why this dichotomy doesn't make sense. As a result, they might reject the dichotomy entirely

This reminds me of the Fallacy of Gray.

Comment by ESRogs on Dario Amodei leaves OpenAI · 2021-01-31T00:00:55.529Z · LW · GW

I'm definitely left wondering what AI Alignment research is left at OpenAI

You may be interested to know that Jan Leike recently joined OpenAI and will lead their alignment team.

Comment by ESRogs on ESRogs's Shortform · 2021-01-30T23:55:39.505Z · LW · GW

Suppose you want to bet on interest rates rising -- would buying value stocks and shorting growth stocks be a good way to do it? (With the idea being that, if rates rise, future earnings will be discounted more and present earnings valued relatively more highly.)

And separately from whether long-value-short-growth would work, is there a more canonical or better way to bet on rates rising?

Just shorting bonds, perhaps? Is that the best you can do?

(Crossposted from Twitter)

Comment by ESRogs on How likely is it that SARS-CoV-2 originated in a laboratory? · 2021-01-26T00:25:54.735Z · LW · GW

Got it, thanks for the clarification.

Comment by ESRogs on Grokking illusionism · 2021-01-26T00:24:55.768Z · LW · GW

Hmm, maybe it's worth distinguishing two things that "mental states" might mean:

  1. intermediate states in the process of executing some cognitive algorithm, which have some data associated with them
  2. phenomenological states of conscious experience

I guess you could believe that a p-zombie could have #1, but not #2.

Comment by ESRogs on Grokking illusionism · 2021-01-26T00:16:57.218Z · LW · GW

Consciousness/subjective experience describes something that is fundamentally non-material.

More non-material than "love" or "three"?

It makes sense to me to think of "three" as being "real" in some sense independently from the existence of any collection of three physical objects, and in that sense having a non-material existence. (And maybe you could say the same thing for abstract concepts like "love".)

And also, three-ness is a pattern that collections of physical things might correspond to.

Do you think of consciousness as being non-material in a similar way? (Where the concept is not fundamentally a material thing, but you can identify it with collections of particles.)

Comment by ESRogs on Grokking illusionism · 2021-01-26T00:02:01.158Z · LW · GW

If you just assume that there's no primitive for consciousness, I would agree that the argument for illusionism is extremely strong since [unconscious matter spontaneously spawning consciousness] is extremely implausible.

How is this implausible at all? All kinds of totally real phenomena are emergent. There's no primitive for temperature, yet it emerges out of the motions of many particles. There's no primitive for wheel, but round things that roll still exist.

Maybe I've misunderstood your point though?

Comment by ESRogs on Grokking illusionism · 2021-01-25T23:52:49.172Z · LW · GW

This is a familiar dialectic in philosophical debates about whether some domain X can be reduced to Y (meta-ethics is a salient comparison to me). The anti-reductionist (A) will argue that our core intuitions/concepts/practices related to X make clear that it cannot be reduced to Y, and that since X must exist (as we intuitively think it does), we should expand our metaphysics to include more than Y. The reductionist (R) will argue that X can in fact be reduced to Y, and that this is compatible with our intuitions/concepts/everyday practices with respect to X, and hence that X exists but it’s nothing over and above Y. The nihilist (N), by contrast, agrees with A that it follows from our intuitions/concepts/practices related to X that it cannot be reduced to Y, but agrees with D that there is in fact nothing over and above Y, and so concludes that there is no X, and that our intuitions/concepts/practices related to X are correspondingly misguided. Here, the disagreement between A vs. R/N is about whether more than Y exists; the disagreement between R vs. A/N is about whether a world of only Y “counts” as a world with X. This latter often begins to seem a matter of terminology; the substantive questions have already been settled.

Is this a well-known phenomenon? I think I've observed this dynamic before and found it very frustrating. It seems like philosophers keep executing the following procedure:

  1. Take a sensible, but perhaps vague, everyday concept (e.g. consciousness, or free will), and give it a precise philosophical definition, but bake in some dubious, anti-reductionist assumptions into the definition.
  2. Discuss the concept in ways that conflate the everyday concept and the precise philosophical one. (Failing to make clear that the philosophical concept may or may not be the best formalization of the folk concept.)
  3. Realize that the anti-reductionist assumptions were false.
  4. Claim that the everyday concept is an illusion.
  5. Generate confusion (along with full employment for philosophers?).

If you'd just said that the precisely defined philosophical concept was a provisional formalization of the everyday concept in the first place, then you wouldn't have to claim that the everyday concept was an illusion once you realize that your formalization was wrong!

Comment by ESRogs on Grokking illusionism · 2021-01-25T23:32:10.900Z · LW · GW

No one ever thought that phenomenal zombies lacked introspective access to their own mental states

I'm surprised by this. I thought p-zombies were thought not to have mental states.

I thought the idea was that they replicated human input-output behavior while having "no one home". Which sounds to me like not having mental states.

If they actually have mental states, then what separates them from the rest of us?

Comment by ESRogs on How likely is it that SARS-CoV-2 originated in a laboratory? · 2021-01-25T22:09:19.474Z · LW · GW

This may be a bit of a pedantic comment, but I'm a bit confused by how your comment starts:

I've done over 200 hours of research on this topic and have read basically all the sources the article cites. That said, I don't agree with all of the claims.

The "That said, ..." part seems to imply that what follows is surprising. As though the reader expects you to agree with all the claims. But isn't the default presumption that, if you've done a whole bunch of research into some controversial question, that the evidence is mixed?

In other words, when I hear, "I've done over 200 hours of research ... and have read ... all the sources", I think, "Of course you don't agree with all the claims!" And it kind of throws me off that you seem to expect your readers to think that you would agree with all the claims.

Is the presumption that someone would only spend a whole bunch of hours researching these claims if they thought they were highly likely to be true? Or that only an uncritical, conspiracy theory true believer would put in so much time into looking into it?

Comment by ESRogs on The Box Spread Trick: Get rich slightly faster · 2021-01-21T23:21:09.778Z · LW · GW

I used SPX Dec '22, 2700/3000 (S&P was closer to those prices when I entered the position). And smart routing I think. Whatever the default is. I didn't manually choose an exchange.

Comment by ESRogs on The Box Spread Trick: Get rich slightly faster · 2021-01-21T17:01:46.080Z · LW · GW

I've been able to get closer to 0.6% on IB. I've done that by entering the order at a favorable price and then manually adjusting it by a small amount once a day until it gets filled. There's probably a better way to do it, but that's what's worked for me.

Comment by ESRogs on Coherent decisions imply consistent utilities · 2021-01-14T21:33:42.180Z · LW · GW

That makes a lot of sense to me. Good points!

Comment by ESRogs on Coherent decisions imply consistent utilities · 2021-01-13T20:13:53.944Z · LW · GW

It seems to me that there has been enough unanswered criticism of the implications of coherence theorems for making predictions about AGI that it would be quite misleading to include this post in the 2019 review.

If the post is the best articulation of a line of reasoning that has been influential in people's thinking about alignment, then even if there are strong arguments against it, I don't see why that means the post is not significant, at least from a historical perspective.

By analogy, I think Searle's Chinese Room argument is wrong and misleading, but I wouldn't argue that it shouldn't be included in a list of important works on philosophy of mind.

Would you (assuming you disagreed with it)? If not, what's the difference here?

(Put another way, I wouldn't think of the review as a collection of "correct" posts, but rather as a collection of posts that were important contributions to our thinking. To me this certainly qualifies as that.)

Comment by ESRogs on Coherent decisions imply consistent utilities · 2021-01-13T20:04:10.484Z · LW · GW

On the review: I don't think this post should be in the Alignment section of the review, without a significant rewrite / addition clarifying why exactly coherence arguments are useful or important for AI alignment.

Assuming that one accepts the arguments against coherence arguments being important for alignment (as I tentatively do), I don't see why that means this shouldn't be included in the Alignment section.

The motivation for this post was its relevance to alignment. People think about it in the context of alignment. If subsequent arguments indicate that it's misguided, I don't see why that means it shouldn't be considered (from a historical perspective) to have been in the alignment stream of work (along with the arguments against it).

(Though, I suppose if there's another category that seems like a more exact match, that seems like a fine reason to put it in that section rather than the Alignment section.)

Does that make sense? Is your concern that people will see this in the Alignment section, and not see the arguments against the connection, and continue to be misled?

Comment by ESRogs on ESRogs's Shortform · 2021-01-13T19:33:00.759Z · LW · GW

The workflow I've imagined is something like:

  1. human specifies function in English
  2. AI generates several candidate code functions
  3. AI generates test cases for its candidate functions, and computes their results
  4. AI formally analyzes its candidate functions and looks for simple interesting guarantees it can make about their behavior
  5. AI displays its candidate functions to the user, along with a summary of the test results and any guarantees about the input output behavior, and the user selects the one they want (which they can also edit, as necessary)

In this version, you go straight from English to code, which I think might be easier than from English to formal specification, because we have lots of examples of code with comments. (And I've seen demos of GPT-3 doing it for simple functions.)

I think some (actually useful) version of the above is probably within reach today, or in the very near future.

Comment by ESRogs on ESRogs's Shortform · 2021-01-13T18:38:55.948Z · LW · GW

Mostly it just seems significant in the grand scheme of things. Our mathematics is going to become formally verified.

In terms of actual consequences, it's maybe not so important on its own. But putting a couple pieces together (this, Dan Selsam's work, GPT), it seems like we're going to get much better AI-driven automated theorem proving, formal verification, code generation, etc relatively soon.

I'd expect these things to start meaningfully changing how we do programming sometime in the next decade.

Comment by ESRogs on ESRogs's Shortform · 2021-01-13T07:04:22.530Z · LW · GW

One of the most important things going on right now, that people aren't paying attention to: Kevin Buzzard is (with others) formalizing the entire undergraduate mathematics curriculum in Lean. (So that all the proofs will be formally verified.)

See one of his talks here: 

Comment by ESRogs on Imitative Generalisation (AKA 'Learning the Prior') · 2021-01-13T00:24:19.010Z · LW · GW

FYI it looks like the footnote links are broken. (Linking to "about:blank...")