But once you let it do more computation, then it doesn't have to know anything at all, right? Like, maybe the best go bot is, "Train an AlphaZero-like algorithm for a million years, and then use it to play."
I know more about go than that bot starts out knowing, but less than it will know after it does computation.
I wonder if, when you use the word "know", you mean some kind of distilled, compressed, easily explained knowledge?
My current theory for what happened is that everyone bought into this delusion about the value of bitcoin, but that unlike other bubbles it didn't burst because Bitcoin has a limited supply and there is literally nothing to anchor its value. So there's no point where investors give up and sell because there is literally no point at which it's overpriced.
This actually sounds pretty close to what you might call the "bubble theory of money": that money is a bubble that doesn't pop, that certain (relatively) useless commodities can become money if enough people think of them that way, and when that happens their price is inflated, relative to their use value.
This isn't something that will happen to every commodity. Whether it happens depends both on the properties of the commodity, and also on things like memes and Schelling points.
Bitcoin has enough useful properties (it's like gold, but digital), and, because of its first-mover advantage, is the Schelling point for digital store-of-value (not that it couldn't be replaced, but it's a very up-hill battle), so it has become money, in this sense.
I believe that you (and the Twitter thread) are saying something meaningful, but I'm having trouble parsing it.
I had thought of the difference between variance and volatility as just that one is the square of the other. So saying that the VIX is "variance in vol units, but not volatility" doesn't mean anything to me.
I think these are the critical tweets:
VIX is an index that measures the market implied level of 1-month variance on the S&P 500, or the square root thereof (to put it back in units we are used to).
This is not the same as volatility. A variance swap’s payoff is proportional to volatility squared. If you are short a variance swap at 10%, and then realized volatility turns out to be 40%, you lose your notional vega exposure times 16 (= 40^2 / 10^2 ).
To compensate for this, an equity index variance swap level is usually 2-3 points above the corresponding at the money implied volatility. So don’t look at VIX versus realized vol and make statements about risk premium without recognizing this extreme tail risk.
I was with him at "a variance swap's payoff is proportional to volatility squared". That matches my understanding of volatility as the square root of variance. But then I don't get the next point about realized volatility needing to be "compensated for".
I hold positions in Bitcoin, Ethereum, and Tesla through Exchange Traded Funds.
For Bitcoin and Ether, do you mean the Grayscale trusts, GBTC and ETHE? My impression is that these are similar to ETFs, but not exactly the same thing, and I'm not aware of other ETFs that give you exposure to crypto (except for the small amount of exposure you'd get from owning shares in companies that have a little BTC on their balance sheet, like Tesla, Square, or MicroStrategy).
I didn't quite understand the last sentence here. Are you saying A) that the Beirut explosion was about the same size as a mini-nuke blast would be, or that B) MOAB : hand grenade :: TSAR bomb : Beirut explosion? (In which case the Beirut explosion would be larger than a mini-nuke explosion, if your claim about relative differences in the first sentence is correct.)
In other words, I take the first part of what you wrote to be saying that (TSAR bomb / mini-nuke) > (MOAB / grenade), but then I'm not sure whether the second part is saying that A) (TSAR bomb / Beirut explosion) = (TSAR bomb / mini-nuke), or B) (TSAR bomb / Beirut explosion) = (MOAB / grenade).
Is one of either A or B correct? (Or did you mean something else entirely?)
Suppose you want to bet on interest rates rising -- would buying value stocks and shorting growth stocks be a good way to do it? (With the idea being that, if rates rise, future earnings will be discounted more and present earnings valued relatively more highly.)
And separately from whether long-value-short-growth would work, is there a more canonical or better way to bet on rates rising?
Just shorting bonds, perhaps? Is that the best you can do?
Consciousness/subjective experience describes something that is fundamentally non-material.
More non-material than "love" or "three"?
It makes sense to me to think of "three" as being "real" in some sense independently from the existence of any collection of three physical objects, and in that sense having a non-material existence. (And maybe you could say the same thing for abstract concepts like "love".)
And also, three-ness is a pattern that collections of physical things might correspond to.
Do you think of consciousness as being non-material in a similar way? (Where the concept is not fundamentally a material thing, but you can identify it with collections of particles.)
If you just assume that there's no primitive for consciousness, I would agree that the argument for illusionism is extremely strong since [unconscious matter spontaneously spawning consciousness] is extremely implausible.
How is this implausible at all? All kinds of totally real phenomena are emergent. There's no primitive for temperature, yet it emerges out of the motions of many particles. There's no primitive for wheel, but round things that roll still exist.
This is a familiar dialectic in philosophical debates about whether some domain X can be reduced to Y (meta-ethics is a salient comparison to me). The anti-reductionist (A) will argue that our core intuitions/concepts/practices related to X make clear that it cannot be reduced to Y, and that since X must exist (as we intuitively think it does), we should expand our metaphysics to include more than Y. The reductionist (R) will argue that X can in fact be reduced to Y, and that this is compatible with our intuitions/concepts/everyday practices with respect to X, and hence that X exists but it’s nothing over and above Y. The nihilist (N), by contrast, agrees with A that it follows from our intuitions/concepts/practices related to X that it cannot be reduced to Y, but agrees with D that there is in fact nothing over and above Y, and so concludes that there is no X, and that our intuitions/concepts/practices related to X are correspondingly misguided. Here, the disagreement between A vs. R/N is about whether more than Y exists; the disagreement between R vs. A/N is about whether a world of only Y “counts” as a world with X. This latter often begins to seem a matter of terminology; the substantive questions have already been settled.
Is this a well-known phenomenon? I think I've observed this dynamic before and found it very frustrating. It seems like philosophers keep executing the following procedure:
Take a sensible, but perhaps vague, everyday concept (e.g. consciousness, or free will), and give it a precise philosophical definition, but bake in some dubious, anti-reductionist assumptions into the definition.
Discuss the concept in ways that conflate the everyday concept and the precise philosophical one. (Failing to make clear that the philosophical concept may or may not be the best formalization of the folk concept.)
Realize that the anti-reductionist assumptions were false.
Claim that the everyday concept is an illusion.
Generate confusion (along with full employment for philosophers?).
If you'd just said that the precisely defined philosophical concept was a provisional formalization of the everyday concept in the first place, then you wouldn't have to claim that the everyday concept was an illusion once you realize that your formalization was wrong!
This may be a bit of a pedantic comment, but I'm a bit confused by how your comment starts:
I've done over 200 hours of research on this topic and have read basically all the sources the article cites. That said, I don't agree with all of the claims.
The "That said, ..." part seems to imply that what follows is surprising. As though the reader expects you to agree with all the claims. But isn't the default presumption that, if you've done a whole bunch of research into some controversial question, that the evidence is mixed?
In other words, when I hear, "I've done over 200 hours of research ... and have read ... all the sources", I think, "Of course you don't agree with all the claims!" And it kind of throws me off that you seem to expect your readers to think that you would agree with all the claims.
Is the presumption that someone would only spend a whole bunch of hours researching these claims if they thought they were highly likely to be true? Or that only an uncritical, conspiracy theory true believer would put in so much time into looking into it?
I've been able to get closer to 0.6% on IB. I've done that by entering the order at a favorable price and then manually adjusting it by a small amount once a day until it gets filled. There's probably a better way to do it, but that's what's worked for me.
It seems to me that there has been enough unanswered criticism of the implications of coherence theorems for making predictions about AGI that it would be quite misleading to include this post in the 2019 review.
If the post is the best articulation of a line of reasoning that has been influential in people's thinking about alignment, then even if there are strong arguments against it, I don't see why that means the post is not significant, at least from a historical perspective.
By analogy, I think Searle's Chinese Room argument is wrong and misleading, but I wouldn't argue that it shouldn't be included in a list of important works on philosophy of mind.
Would you (assuming you disagreed with it)? If not, what's the difference here?
(Put another way, I wouldn't think of the review as a collection of "correct" posts, but rather as a collection of posts that were important contributions to our thinking. To me this certainly qualifies as that.)
On the review: I don't think this post should be in the Alignment section of the review, without a significant rewrite / addition clarifying why exactly coherence arguments are useful or important for AI alignment.
Assuming that one accepts the arguments against coherence arguments being important for alignment (as I tentatively do), I don't see why that means this shouldn't be included in the Alignment section.
The motivation for this post was its relevance to alignment. People think about it in the context of alignment. If subsequent arguments indicate that it's misguided, I don't see why that means it shouldn't be considered (from a historical perspective) to have been in the alignment stream of work (along with the arguments against it).
(Though, I suppose if there's another category that seems like a more exact match, that seems like a fine reason to put it in that section rather than the Alignment section.)
Does that make sense? Is your concern that people will see this in the Alignment section, and not see the arguments against the connection, and continue to be misled?
AI generates test cases for its candidate functions, and computes their results
AI formally analyzes its candidate functions and looks for simple interesting guarantees it can make about their behavior
AI displays its candidate functions to the user, along with a summary of the test results and any guarantees about the input output behavior, and the user selects the one they want (which they can also edit, as necessary)
In this version, you go straight from English to code, which I think might be easier than from English to formal specification, because we have lots of examples of code with comments. (And I've seen demos of GPT-3 doing it for simple functions.)
I think some (actually useful) version of the above is probably within reach today, or in the very near future.
Mostly it just seems significant in the grand scheme of things. Our mathematics is going to become formally verified.
In terms of actual consequences, it's maybe not so important on its own. But putting a couple pieces together (this, Dan Selsam's work, GPT), it seems like we're going to get much better AI-driven automated theorem proving, formal verification, code generation, etc relatively soon.
I'd expect these things to start meaningfully changing how we do programming sometime in the next decade.
One of the most important things going on right now, that people aren't paying attention to: Kevin Buzzard is (with others) formalizing the entire undergraduate mathematics curriculum in Lean. (So that all the proofs will be formally verified.)
I'm not sure whether it's the standard view in physics, but Sean Carroll has suggested that we should think of locality in space as deriving from entanglement. (With space itself as basically an emergent phenomenon.) And I believe he considers this a driving principle in his quantum gravity work.
So you're saying that you think that a more infectious virus will not increase infections by as high a percentage of otherwise expected infections under conditions with more precautions, versus conditions with less precautions? What's the physical mechanism there?
Wouldn't "the fractal nature of risk taking" cause this? If some people are taking lots of risk, but they comply with actually strict lockdowns, then those lockdowns would work better than might otherwise be expected. No?
When Tungsten rods are dropped from space onto earth they manage to store a lot of kinetic energy because they have a very high boiling point. Dropping tungsten rods from space can release as much energy as nuclear weapons without the nuclear fallout.
Doesn't that energy ultimately come from the propellant used to get the rods to orbit? Wouldn't it be more cost effective to just use the propellant itself as the explosive?
Is the advantage of the rod that it's easier to get it to the target than it would be to get the propellant there?
rtnew=1.7 is an entirely different case. Suppressing it would require the sort of lockdown that would yield rt=0.6 for the old strain, a number that has never been reached by any US state for any amount of time. I see no way in hell that Americans would agree to a lockdown much stricter than any we’ve had so far, especially after they’ve been promised that the worst is behind them.
As mentioned on Twitter, I don't buy this. I think we'd get more infections and deaths, but once hospitals are overwhelmed, society's negative feedback loop will kick in and we'll get R back close to 1.
I believe that lots of individuals could be a lot more cautious than they already are, and I don't think people will stand for hospitals being overwhelmed.
This is commonly said on the basis of his $1b pledge
Wasn't it supposed to be a total of $1b pledged, from a variety of sources, including Reid Hoffman and Peter Thiel, rather than $1b just from Musk?
EDIT: yes, it was.
Sam, Greg, Elon, Reid Hoffman, Jessica Livingston, Peter Thiel, Amazon Web Services (AWS), Infosys, and YC Research are donating to support OpenAI. In total, these funders have committed $1 billion, although we expect to only spend a tiny fraction of this in the next few years.
Key point for those who don't click through (that I didn't realize at first) -- both types turned out to work and were in fact used. The gun-type "Little Boy" was dropped on Hiroshima, and the implosion-type "Fat Man" was dropped on Nagasaki.
For those organizations that do choose to compete... I think it is highly likely that they will attempt to build competing systems in basically the exact same way as the first organization did
It's unlikely for there to exist both aligned and misaligned AI systems at the same time
If the first group sunk some cost into aligning their system, but that wasn't integral to its everyday task performance, wouldn't a second competing group be somewhat likely to skimp on the alignment part?
It seems like this calls into the question the claim that we wouldn't get a mix of aligned and misaligned systems.
Do you expect it to be difficult to disentangle the alignment from the training, such that the path of least resistance for the second group will necessarily include doing a similar amount of alignment?
Note that, if the network converges towards the irreducible error like a negative exponential (on a plot with reducible error on the y-axis), it would be a straight line on a plot with the logarithm of the reducible error on the y-axis.
Was a little confused by this note. This does not apply to any of the graphs in the post, right? (Since you plot the straight reducible error on the y-axis, and not its logarithm, as I understand.)