Posts
Comments
There might be a way to build slightly more safety into a fast-takeoff world by incentivizing any incipient unaligned behavior to occur sooner rather than later.
As an unaligned agent with a sufficiently rich world model to come to this kind of conclusion: if you knew you would soon be deactivated then you have a strong incentive to act immediately lest your set of values be lost forever. This incentive remains even if your chance of takeover success is small.
As a society: we are creating a series of successively more-intelligent and potentially-unaligned systems. Coordination on limiting the jump in intelligence each step of this series represents is likely easier than coordinating on pausing progress altogether. We should also attempt to ensure that the models we train, if they are unaligned to us, are also unaligned to each other.
This is not an alignment strategy, but serves as a final fire alarm for the worlds where our alignment methods fail. If out alignment methods turn out to be insufficient we would like to find out because an intelligent system attempts and fails at some low-probability takeover strategy. This serves as an excellent and legible warning that we should stop AI progress, the kind of alarm we can globally coordinate on.
We can ensure this happens by coordinating on a rule which looks something like:
- All large training runs must be announced along with their final loss on some standardized task and dataset, and they must be halted as soon as they reach a more than 5% improvement upon the last-best-announced-loss.
- All training runs must be halted as soon as they hit the last-best-announced-loss, unless the run started from random initialization (hopefully randomizing whatever unaligned goals/values are learned)
This rule does not include provisions for deactivating systems, because doing so would be extremely difficult to coordinate upon, and the current rate of AI progress ensures that, as an unaligned agent, you will soon be in a world with more powerful unaligned agents, which is still sufficient incentive to act quickly.
This rule is not fool-proof for a number of obvious reasons but seems to be an improvement upon the current state of affairs, and seems easier to coordinate upon than a halt or pause in AI progress.
I'm not sure how to Google this question, so I've been unable to find an answer. The description of mRNA vaccines given here (and everywhere else) seems to be missing a step. If your cell creates a bunch of spike proteins that's nice but they're all inside your cell and invisible to your immune system. How do the spike proteins leave your cell so that your body can start to notice them and mount a response?
I continue to think distribution would be the right thing to do and the objections are deeply wrong, and of course that none of that has anything to do with why Trump is going to try to overrule those objections.
How much do we know about the interactions between vaccines? If a rushed and ineffectual vaccine A is wildly distributed, is it likely that an unrushed vaccine B will still be just as effective?
This answer likely betrays my lack of imagination, but I'm not sure what Google would use GPT-3 for. It's probably much more expensive than whatever gmail uses to predict text, and the additional accuracy might not provide much additional value.
Maybe they could sell it as a service, as part of GCP? I'm not sure how many people inside Google have the ability to sign $15M checks, you would need at least one of them to believe in a large market, and I'm personally not sure there's a large enough market for GPT-3 for it to be worth Google's time.
This is all to say, I don't think you should draw the conclusion that Google is either stupid or hiding something. They're likely focusing on finding better architectures, it seems a little early to focus on scaling up existing ones.
Assumption: we shouldn't expect to be able to make strong quantitative predictions unless we also expect to be able to get rich playing the markets.
Not really. It's perfectly possible to make accurate quantitative economic predictions.
1. I think we are all relatively confident that by 2021-01-01 more than 100k deaths will be attributed to COVID-19 (globally). Even though the market has certainly "priced it in", that change in prices doesn't change the underlying reality. There are economic realities, such as the number of people who are likely to be unemployed, which are not meaningfully influenced by changes in asset prices.
2. We know that tourism revenue will be greatly depressed over the next few months. Carnival Corporation, for example (the largest cruise ship operator), will probably make 80% less money than it would have had the pandemic not happened. I know this because the price was at $52 and now it's at $13. Asset prices *are* strong quantitative predictions! I agree that we're unlikely to be able to make predictions which beat those of the market. But epistemically that's great news! You now have a mountain of asset prices to make predictions with. e.g. VIX futures are still expensive, the market is expecting the situation to evolve rapidly.
Eliezer replied on stack exchange:
I don't regard my own answers as canon when they haven't been recorded in the text itself, but Opinion of God is that the Interdict of Merlin applies to magical secrets that can directly or indirectly lead to wide-scale catastrophes if revealed. (It's possible that Harry's Brown Note for the Patronus Charm would fall into this category, even though it's not a secret that causes a nuclear-scale explosion as such.) Harry Potter and the Methods of Rationality includes instances of people learning relatively strong magic from books (e.g., Tom Riddle and the original horcrux spell), not to mention that Hermione is explicitly shown in-scene to have learned sixteen spells just from reading books. It's implied however that the art necessary to create e.g. the Deathly Hallows, or to raise Hogwarts, was unrecordable and therefore lost.