Hmm, I still find the original wording confusing, but maybe I'm misunderstanding something.
The reason why the original wording seems unnatural to me is that when you say that you "fine-tune on X model" or "evaluate on held-out model X", it sounds to me like you're saying that you're trying to get your new model to match model X. As if model X itself provides the training data or reward function.
Whereas, as I understand (and correct me if I'm wrong), what you're actually doing is using several models to generate statements. And then you have humans evaluate those statements. And then the fine-tuning and evaluation are both with respect to (statement, human-evaluation-of-statement-as-true-or-false) pairs.
And so once you have the (statement, human evaluation) pairs, it's irrelevant how the original model that generated that statement would evaluate the statement. You just completely ignore what those models thought when you fine-tune and evaluate your new model. All you care about is what the humans thought of the statements, right?
So the role of the models is just to generate a bunch of sample data. And all of the training signal comes from the human evaluations. In which case I'm confused about why you would think of it as fine-tuning on models or holding out models.
Does it make sense now why that's confusing to me? Is there something I'm missing about how the original models are being used, or about the significance of associating the datasets of (statement, human evaluation) pairs with the models that generated the statements?
Also note that the AlphaZero algorithm is an example of IDA:
The amplification step is when the policy / value neural net is used to play out a number of steps in the game tree, resulting in a better guess at what the best move is than just using the output of the net directly.
The distillation step is when the policy / value net is trained to match the output of the game tree exploration process.
Comment by ESRogs on [deleted post]
I'd guess that line was referencing this:
And so I ask you all: is the decision to give up $100 when you have no real benefit from it, only counterfactual benefit, an example of winning?
Similarly a clone of Sundar Pichai can write a check to buy your company equally with the original and Google will treat the check the same as if the original wrote it.
Sticking with the hypothetical where what we have is a Calvin-and-Hobbes-style duplicator, I don't think this would work.
You can't run a company with 100 different CEOs, even if at one point those people all had exactly the same memories. Sure, at the time of duplication, any one of the copies could be made the CEO. But from that point on their memories and the information they have access to will diverge. And you don't want Sundar #42 randomly overruling a decision Sundar #35 made because he didn't know about it.
So no, I don't think they could all be given CEO-level decision making power (unless you also stipulate some super-coordination technology besides just the C&H-style duplicator).
Great piece, but one quibble — the examples in the productivity impacts section seem a little odd, because in some (all?) of these cases, the reason the person is so in-demand has to do with there being only one of them. And so duplicating them doesn't solve this problem:
These people end up overbooked, with far more demands on their time than they can fulfill. Armies of other people end up devoted to saving their time and working around their schedules.
For example, while duplicating Sundar Pichai might make Google more successful (I don't know a lot about him, but presumably he was a star employee and would be very effective in many roles), the reason he's so in-demand is that he's the CEO of Google. I don't see how the existence of Clone-of-Sundar #235, who's assigned to be some middle-manager, is going to relieve the pressure of people trying to get an audience with Sundar-the-Original, who's the CEO (barring Parent Trap style twin-switcheroo shenanigans).
Similarly for Obama or Beyonce (I'm not so sure about Doudna) — wouldn't meeting the former president or going to a Beyonce concert be less special if there were 1000 of them?
To me, the more obvious example of the type of person who'd be useful to copy would be some non-famous star individual contributor. Maybe someone like Steve Davis at SpaceX.
I also enjoyed the discussion of how cost disease type effects can prevent extremely explosive growth even if one good cannot be automated. I'm pretty skeptical that there will exist such a good. I don't have data to back this up, but I have a vague sense that historically when people have claimed that X cannot be automated away for fundamental reasons, they've been mostly wrong.
(That should be "if even one good", right?)
What do you think of the idea that "consumption by a human" could be considered a task? People may value products / services being consumed by humans because it confers status (e.g. having your artwork be well-received), or because they want humans to have good experiences (aka altruism), or for other reasons.
As long as anyone has a reason to value human consumption of goods / services, it seems like that could play the role of task-that-can't-be-automated-away.
Claim 4: GPT-N need not be "trying" to predict the next word. To elaborate: one model of GPT-N is that it is building a world model and making plans in the world model such that it predicts the next word as accurately as possible. This model is fine on-distribution but incorrect off-distribution. In particular, it predicts that GPT-N would e.g. deliberately convince humans to become more predictable so it can do better on future next-word predictions; this model prediction is probably wrong.
I got a bit confused by this section, I think because the word "model" is being used in two different ways, neither of which is in the sense of "machine learning model".
Paraphrasing what I think is being said:
An observer (us) has a model_1 of what GPT-N is doing.
According to their model_1, GPT-N is building its own world model_2, that it uses to plan its actions.
The observer's model_1 makes good predictions about GPT-N's behavior when GPT-N (the machine learning model_3) is tested on data that comes from the training distribution, but bad predictions about what GPT-N will do when tested (or used) on data that does not come from the training distribution.
The way that the observer's model_1 will be wrong is not that it will be fooled by GPT-N taking a treacherous turn, but rather the opposite -- the observer's model_1 will predict a treacherous turn, but instead GPT-N will go on filling in missing words, as in training (or something else?).
But once you let it do more computation, then it doesn't have to know anything at all, right? Like, maybe the best go bot is, "Train an AlphaZero-like algorithm for a million years, and then use it to play."
I know more about go than that bot starts out knowing, but less than it will know after it does computation.
I wonder if, when you use the word "know", you mean some kind of distilled, compressed, easily explained knowledge?
My current theory for what happened is that everyone bought into this delusion about the value of bitcoin, but that unlike other bubbles it didn't burst because Bitcoin has a limited supply and there is literally nothing to anchor its value. So there's no point where investors give up and sell because there is literally no point at which it's overpriced.
This actually sounds pretty close to what you might call the "bubble theory of money": that money is a bubble that doesn't pop, that certain (relatively) useless commodities can become money if enough people think of them that way, and when that happens their price is inflated, relative to their use value.
This isn't something that will happen to every commodity. Whether it happens depends both on the properties of the commodity, and also on things like memes and Schelling points.
Bitcoin has enough useful properties (it's like gold, but digital), and, because of its first-mover advantage, is the Schelling point for digital store-of-value (not that it couldn't be replaced, but it's a very up-hill battle), so it has become money, in this sense.
I believe that you (and the Twitter thread) are saying something meaningful, but I'm having trouble parsing it.
I had thought of the difference between variance and volatility as just that one is the square of the other. So saying that the VIX is "variance in vol units, but not volatility" doesn't mean anything to me.
I think these are the critical tweets:
VIX is an index that measures the market implied level of 1-month variance on the S&P 500, or the square root thereof (to put it back in units we are used to).
This is not the same as volatility. A variance swap’s payoff is proportional to volatility squared. If you are short a variance swap at 10%, and then realized volatility turns out to be 40%, you lose your notional vega exposure times 16 (= 40^2 / 10^2 ).
To compensate for this, an equity index variance swap level is usually 2-3 points above the corresponding at the money implied volatility. So don’t look at VIX versus realized vol and make statements about risk premium without recognizing this extreme tail risk.
I was with him at "a variance swap's payoff is proportional to volatility squared". That matches my understanding of volatility as the square root of variance. But then I don't get the next point about realized volatility needing to be "compensated for".
I hold positions in Bitcoin, Ethereum, and Tesla through Exchange Traded Funds.
For Bitcoin and Ether, do you mean the Grayscale trusts, GBTC and ETHE? My impression is that these are similar to ETFs, but not exactly the same thing, and I'm not aware of other ETFs that give you exposure to crypto (except for the small amount of exposure you'd get from owning shares in companies that have a little BTC on their balance sheet, like Tesla, Square, or MicroStrategy).
I didn't quite understand the last sentence here. Are you saying A) that the Beirut explosion was about the same size as a mini-nuke blast would be, or that B) MOAB : hand grenade :: TSAR bomb : Beirut explosion? (In which case the Beirut explosion would be larger than a mini-nuke explosion, if your claim about relative differences in the first sentence is correct.)
In other words, I take the first part of what you wrote to be saying that (TSAR bomb / mini-nuke) > (MOAB / grenade), but then I'm not sure whether the second part is saying that A) (TSAR bomb / Beirut explosion) = (TSAR bomb / mini-nuke), or B) (TSAR bomb / Beirut explosion) = (MOAB / grenade).
Is one of either A or B correct? (Or did you mean something else entirely?)
Suppose you want to bet on interest rates rising -- would buying value stocks and shorting growth stocks be a good way to do it? (With the idea being that, if rates rise, future earnings will be discounted more and present earnings valued relatively more highly.)
And separately from whether long-value-short-growth would work, is there a more canonical or better way to bet on rates rising?
Just shorting bonds, perhaps? Is that the best you can do?
Consciousness/subjective experience describes something that is fundamentally non-material.
More non-material than "love" or "three"?
It makes sense to me to think of "three" as being "real" in some sense independently from the existence of any collection of three physical objects, and in that sense having a non-material existence. (And maybe you could say the same thing for abstract concepts like "love".)
And also, three-ness is a pattern that collections of physical things might correspond to.
Do you think of consciousness as being non-material in a similar way? (Where the concept is not fundamentally a material thing, but you can identify it with collections of particles.)
If you just assume that there's no primitive for consciousness, I would agree that the argument for illusionism is extremely strong since [unconscious matter spontaneously spawning consciousness] is extremely implausible.
How is this implausible at all? All kinds of totally real phenomena are emergent. There's no primitive for temperature, yet it emerges out of the motions of many particles. There's no primitive for wheel, but round things that roll still exist.
This is a familiar dialectic in philosophical debates about whether some domain X can be reduced to Y (meta-ethics is a salient comparison to me). The anti-reductionist (A) will argue that our core intuitions/concepts/practices related to X make clear that it cannot be reduced to Y, and that since X must exist (as we intuitively think it does), we should expand our metaphysics to include more than Y. The reductionist (R) will argue that X can in fact be reduced to Y, and that this is compatible with our intuitions/concepts/everyday practices with respect to X, and hence that X exists but it’s nothing over and above Y. The nihilist (N), by contrast, agrees with A that it follows from our intuitions/concepts/practices related to X that it cannot be reduced to Y, but agrees with D that there is in fact nothing over and above Y, and so concludes that there is no X, and that our intuitions/concepts/practices related to X are correspondingly misguided. Here, the disagreement between A vs. R/N is about whether more than Y exists; the disagreement between R vs. A/N is about whether a world of only Y “counts” as a world with X. This latter often begins to seem a matter of terminology; the substantive questions have already been settled.
Is this a well-known phenomenon? I think I've observed this dynamic before and found it very frustrating. It seems like philosophers keep executing the following procedure:
Take a sensible, but perhaps vague, everyday concept (e.g. consciousness, or free will), and give it a precise philosophical definition, but bake in some dubious, anti-reductionist assumptions into the definition.
Discuss the concept in ways that conflate the everyday concept and the precise philosophical one. (Failing to make clear that the philosophical concept may or may not be the best formalization of the folk concept.)
Realize that the anti-reductionist assumptions were false.
Claim that the everyday concept is an illusion.
Generate confusion (along with full employment for philosophers?).
If you'd just said that the precisely defined philosophical concept was a provisional formalization of the everyday concept in the first place, then you wouldn't have to claim that the everyday concept was an illusion once you realize that your formalization was wrong!
This may be a bit of a pedantic comment, but I'm a bit confused by how your comment starts:
I've done over 200 hours of research on this topic and have read basically all the sources the article cites. That said, I don't agree with all of the claims.
The "That said, ..." part seems to imply that what follows is surprising. As though the reader expects you to agree with all the claims. But isn't the default presumption that, if you've done a whole bunch of research into some controversial question, that the evidence is mixed?
In other words, when I hear, "I've done over 200 hours of research ... and have read ... all the sources", I think, "Of course you don't agree with all the claims!" And it kind of throws me off that you seem to expect your readers to think that you would agree with all the claims.
Is the presumption that someone would only spend a whole bunch of hours researching these claims if they thought they were highly likely to be true? Or that only an uncritical, conspiracy theory true believer would put in so much time into looking into it?
I've been able to get closer to 0.6% on IB. I've done that by entering the order at a favorable price and then manually adjusting it by a small amount once a day until it gets filled. There's probably a better way to do it, but that's what's worked for me.
It seems to me that there has been enough unanswered criticism of the implications of coherence theorems for making predictions about AGI that it would be quite misleading to include this post in the 2019 review.
If the post is the best articulation of a line of reasoning that has been influential in people's thinking about alignment, then even if there are strong arguments against it, I don't see why that means the post is not significant, at least from a historical perspective.
By analogy, I think Searle's Chinese Room argument is wrong and misleading, but I wouldn't argue that it shouldn't be included in a list of important works on philosophy of mind.
Would you (assuming you disagreed with it)? If not, what's the difference here?
(Put another way, I wouldn't think of the review as a collection of "correct" posts, but rather as a collection of posts that were important contributions to our thinking. To me this certainly qualifies as that.)
On the review: I don't think this post should be in the Alignment section of the review, without a significant rewrite / addition clarifying why exactly coherence arguments are useful or important for AI alignment.
Assuming that one accepts the arguments against coherence arguments being important for alignment (as I tentatively do), I don't see why that means this shouldn't be included in the Alignment section.
The motivation for this post was its relevance to alignment. People think about it in the context of alignment. If subsequent arguments indicate that it's misguided, I don't see why that means it shouldn't be considered (from a historical perspective) to have been in the alignment stream of work (along with the arguments against it).
(Though, I suppose if there's another category that seems like a more exact match, that seems like a fine reason to put it in that section rather than the Alignment section.)
Does that make sense? Is your concern that people will see this in the Alignment section, and not see the arguments against the connection, and continue to be misled?
AI generates test cases for its candidate functions, and computes their results
AI formally analyzes its candidate functions and looks for simple interesting guarantees it can make about their behavior
AI displays its candidate functions to the user, along with a summary of the test results and any guarantees about the input output behavior, and the user selects the one they want (which they can also edit, as necessary)
In this version, you go straight from English to code, which I think might be easier than from English to formal specification, because we have lots of examples of code with comments. (And I've seen demos of GPT-3 doing it for simple functions.)
I think some (actually useful) version of the above is probably within reach today, or in the very near future.
Mostly it just seems significant in the grand scheme of things. Our mathematics is going to become formally verified.
In terms of actual consequences, it's maybe not so important on its own. But putting a couple pieces together (this, Dan Selsam's work, GPT), it seems like we're going to get much better AI-driven automated theorem proving, formal verification, code generation, etc relatively soon.
I'd expect these things to start meaningfully changing how we do programming sometime in the next decade.
One of the most important things going on right now, that people aren't paying attention to: Kevin Buzzard is (with others) formalizing the entire undergraduate mathematics curriculum in Lean. (So that all the proofs will be formally verified.)