Posts
Comments
(Why not simply define the integral of f as the LM of {(x,r)|x in Omega, r <= f(x)}?)
Write clearly. What's the point of the post? How do the parts of the post argue for its point?
Ah, which Round 1 submission was mine? I think I wrote it down somewhere but I don't know where... I suppose technically I could search my harddrive for each of the strings.
If I want to sell my great-grandmother on cryonics, "freezing your brain so in centuries it can be transplanted into a young body" sounds like an easier sell than "freezing your brain so in centuries it can be turned into a robot". Freezing her whole body sounds like an instant, understandable no.
To avert Idiocracy? Just clone Einstein.
I see this not as a question to ask now, but later, on many levels of detail, when the omnipotent singleton is deciding what to do with the world. Of course we will have to figure out the correct way to pose such questions before deployment, but this can be deferred until we can generate research.
I also have this and have had it for a long time, starting with Google DeepDream. (Or perhaps that animation where you stare ahead while on the edges of your field of view a series of faces is shown, which then start to subjectively look nightmarish/like caricatures.) It lessens with exposure, and returns, somewhat weaker, with each new type of generated image. It feels like neurons burning out from overactivation, as though I was Dracula being shown a cross.
- The universe is finite, and has to be distributed in some manner.
- Some people prefer interactions with the people alive today to ones with heavenly replicas of them. You might claim that there is no difference, but I say that in the end it's all atoms, all the meaning is made up anyway, and we know exactly why those people would not approve if we described virtual heavens to them so we shouldn't just do them anyway.
- Some people care about what other people do in their virtual heavens. You could deontologically tell them to fuck off, but I'd expect the model of dictator lottery + acausal trade to arrive at another solution.
- A simple way of rating the scenarios above is to describe them as you have and ask humans what they think.
- In a way... but I expect that what we actually need to solve is just how to make a narrow AI faithfully generate AI papers and AI safety papers that humans would have come up with given time.
- The CEV paper has gone into this, but indeed human utility functions will have to be aggregated in some manner, and the manner in which to do this and allocate ressources can't be derived from first principles. Fortunately human utility functions are logarithmic enough and enough people care about enough other people that the basin of acceptable solutions is quite large, especially if we get the possible future AIs to acausally trade with each other.
We're about to find out what a smart psychopath with Slytherin's lore and no episodic memory or hands does in this situation. Try to wandlessly apparate away and hit the wards, presumably. As far as he sees, Lockhart obviously just Obliviated him with that outstretched wand.
"Huh," Moody said, leaning back in his chair. "Minerva and I will be putting some alarms and enchantments on that ring of yours, son, if you don't mind. Just in case you forget to sustain that Transfiguration one day."
Reinforcements are on the way. I expect that one of the devices in Minerva's office ticks regularly while the ring exists. Time turners are a thing, but Minerva may have received a "NO." because she would have been seen on the Map.
"Why would I do that?" and "You think like a muggle" sound like she thinks Harry is making an epistemological error.
If she were as tuned in as you say, she should see that Harry asks because he doesn't see how tuned in Luna is.
"You're not going to ask me how I know these things?" said Harry.
"Why would I do that?" said Luna.
But, chapter 1:
"How do you know?" Luna asked.
Is it only some knowledge that cannot be gained from nothing? Such as that a thing does not exist?
The spoiler doesn't logically follow from what comes before, right? She merely saw the possibility.
And if she's right, that's why it didn't attack her.
Couldn't you simply ask the usual questions, but each mention of themselves is replaced by "a hypothetical God whose behavior is either to always tell the truth, to always lie, or to always give a random answer, and whose behavior is not identical to either of the other Gods"?
If you simply disagree with all that is said against you, then you cannot lose a debate, which is the archetypal way to learn from a debate. Therefore, your arguments should be made of parts, which can be attacked by a commenter.
Why wouldn't vaccinating half of a married couple be better than half of vaccinating the whole? They can vaccinate whoever interacts with other people more, and then have that person deliberately interact with other people instead of the other - do the groceries, say.
When does she take out Wanda between the two feeding sessions? In the library, because we don't see it? Perhaps Wanda is visible in the library, because Wanda's magic removes all observations that would have made it far enough, which is preempted by the library's magic. Does the second feeding start in the library or right thereafter? Probably after, since we read about it.
Does this mean that humans can only keep a few things in mind in order to make us hide complexity? Under that view the stereotypical forgetful professor isn't brilliant because he has a lot of memory free to think with at any time, but because he has had a lot of practice doing the most with a small memory. These seem experimentally distinguishable.
I conjecture that describing the function of a neural network is the archetypal application of Factored Cognition, because we can cheat by training the neural network to have lots of information bottlenecks along which to decompose the task.
If 30% of people can block the election, someone's going to have to be command the troops. The least perverse option seems to be the last president. Trump could probably have gotten 30% to block it to stay in that chair. A minority blocking the election seems supposed to simulate (aka give a better alternative to) civil war, which is uncommon because it is costly. So perhaps blocking should be made costly to the populace. Say, tax everyone heavily for each blocked election and donate the money to foreign charities. This also incentivizes foreign trolls to cause blocked elections, which seems fair enough - if the enemy controls your election, it should crash, not put a puppet in office.
STAR is useless if people can assign real-valued scores. That makes me think that if it works, it's for reasons of discrete mathematics, so we should analyze the system from the perspective of discrete mathematics before trusting it.
Instead of multiplying values >= 1 and "ignoring" smaller values, you should make explicit that you feed the voter scores through a function (in this case \x -> max(0, log(x))) before adding them up. \x -> max(0, log(x)) does not seem like the optimal function for any seemly purpose.
The fourier transform as a map between function spaces is continuous, one-to-one and maps gaussians to gaussians, so we can translate "convoluting nice distribution sequences tends towards gaussians" to "multiplying nice function sequences tends towards gaussians". The pointwise logarithm as a map between function spaces is continuous, one-to-one and maps gaussians to parabolas, so we can translate further to "nice function series tend towards parabolae", which sounds more almost always false than usually true.
In the second space, functions are continuous and vanish at infinity.
This doesn't work if the fourier transform of some function is somewhere negative, but then multiplying the function sequence has zeroes.
In the third space, functions are continuous and diverge down at infinity.
and it doesn't much matter if you change the kernel each time
That's counterintuitive. Surely for every there's an that'll get you anywhere? If , .
Indeed players might follow a different strategy than they declare. A player can only verify another player's precommitment after pressing the button (or through old-fashioned espionage of their button setup). But I find it reasonable to expect that a player, seeing the shape of the AI race and what is needed to prevent mutual destruction, would actually design their AGI to use a decision theory that would follow through on the precommitment. Humans may not be intuitively compelled by weird decision theories, but they can expect someone to write an AGI that uses them. Although even a human may find giving other players what they deserve more important than not letting the world as we know it continue for another decade.
Compare to Dr. Strangelove's doomsday machine. We expect that a human in the loop would not follow through, but we can't expect that no human would build such a machine.
The crazy distortions are the damage. People fear low-income people stopping their work because they fear that goods produced by low-income workers will become more expensive.
Your argument proves too much - in medieval times, if more than 20% of people stopped working agriculture to buy food with their UBI, food prices would go up until they resumed, as an indicator of damage done to society from people stopping their work.
You can cause more than a dollar of damage to society for every dollar you spend, say by hiring people to drive around throwing eggs at people's houses. Though I guess in total society is still better off by a hundred dollars compared to if you had received them via UBI.
Perhaps the police officer simply thinks that your average person will easily do dangerous things like shaking someone off their car without thinking much of it, but will not take a knife to another's guts. Therefore, the car incident would not mark the man as unusually dangerous.
There is no mathematically canonical way to distinguish between trade and blackmail, between act and omission, between different ways of assigning blame. The world where nobody jumps on cars is as safe as the one where nobody throws people off cars. We decide between them by human intuition, which differs by memetic background.
Yeah, I basically hope that enough people care about enough other people that some of the wealth ends up trickling down to everyone. Win probability is basically interchangeable with other people caring about you and your ressources across the multiverse. Good thing the cosmos is so large.
I don't think making acausal trade work is that hard. All that is required is:
- That the winner cares about the counterfactual versions of himself that didn't win, or equivalently, is unsure whether they're being simulated by another winner. (huh, one could actually impact this through memetic work today, though messing with people's preferences like that doesn't sound friendly)
- That they think to simulate alternate winners before they expand too far to be simulated.
I'm not convinced that we can do nothing if the human wants ghosts to be happy. The AI would simply have to do what would make ghosts happy if they were real. In the worst case, the human's (coherent extrapolated) beliefs are your only source of information on how ghosts work. Any proper general solution to the pointers problem will surely handle this case. Apparently, each state of the agent corresponds to some probability distribution over worlds.
with the exception of people who decided to gamble on being part of the elite in outcome B
Game-theoretically, there's a better way. Assume that after winning the AI race, it is easy to figure out everyone else's win probability, utility function and what they would do if they won. Human utility functions have diminishing returns, so there's opportunity for acausal trade. Human ancestry gives a common notion of fairness, so the bargaining problem is easier than with aliens.
Most of us care some even about those who would take all for themselves, so instead of giving them the choice between none and a lot, we can give them the choice between some and a lot - the smaller their win prob, the smaller the gap can be while still incentivizing cooperation.
Therefore, the AI race game is not all or nothing. The more win probability lands on parties that can bargain properly, the less multiversal utility is burned.
(2) is essentially aiming to take over the world in the name of making it safer, which is not generally considered the kind of thing we should be encouraging lots of people to do.
Wait, you want to do it the hard way? Not only win the AI race with enough head start for safety, but stop right at the finish line and have everyone else stop at the finish line? However would you prevent everyone everywhere from going over? If you manage to find a way, that sounds like taking over the world with extra steps.
All the knowledge a language model character demonstrates is contained in the model. I expect that there is also some manner of intelligence, perhaps pattern-matching ability, such that the model cannot write a smarter character than itself. The better we engineer our prompts, the smaller the model overhang. The larger the overhang, the more opportunity for inner alignment failure.
This all sounds reasonable. I just saw that you were arguing for more being learned at runtime (as some sort of Steven Reknip), and I thought that surely not all the salt machinery can be learnt, and I wanted to see which of those expectations would win.
Do you posit that it learns over the course of its life that salt taste cures salt definiency, or do you allow this information to be encoded in the genome?
Compare to https://www.lesswrong.com/posts/zXfqftW8y69YzoXLj/using-gpt-n-to-solve-interpretability-of-neural-networks-a, which is about defining how well a neural net can be described in terms of gears.
I suggest that you allow submission of posts written before this announcement. This incentivizes behavior that people expect might later be subject to prizes.
For instance, atheists often assume that God must be highly complex (which is essentially the assumption that God must be natural)
What do you mean by natural? In order for God to be simple, his emotions must be denied or explained. For example, there could be some physics exploit by which an ancient human could become omnipotent. Another simple specification of God could be through an equation describing some goal-directed agent. Emotions might fall out of agency via game theory, except that game theory doesn't really apply when you're omnipotent. And what would be the goal? Our world doesn't look like a simple goal is being maximized by a God.
I reply with the same point about orthogonality: Why should (2,1) split into one branch of (2,0) and one branch of (0,1), not into one branch of (1,0) and one branch of (1,1)? Only the former leads to probability equaling squared amplitude magnitude.
(I'm guessing that classical statistical mechanics is invariant under how we choose such branches?)
If a1 is 2 and phi1 has eigenvalue 3, and a2 is 4 and phi2 has eigenvalue 5, then 2*phi1+3*phi2 is mapped to 6*phi1+20*phi2 and therefore not an eigenfunction.
Can subsums and subtensors be defined as diagrams? Tensors not needing to be subtensors sounds to me like subsums/subtensors should be fronter-row citizens than sums/tensors.
It'd be fine if it were linear in general, but it's not for combinations that aren't orthogonal. Suppose a is drawn from R^2. P(sqrt(2))=P(|(1,1)|)=P(1,1)=P(1,0)+P(0,1)=2*P(|(1,0)|)=2*P(1) which agrees with your analysis, but P(sqrt(5))=P(|(2,1)|)=P(2,1)/=P(1,0)+P(1,1)=3*P(1) doesn't add up.
Accepting that probability is some function of the magnitude of the amplitude, why should it be linear exactly under orthogonal combinations?
As far as I understand, whether minimal circuits are daemon-free is precisely the question whether direct descriptions of the input distribution are simpler than hypotheses of form "Return whatever maximizes property _ of the multiverse".
The hypotheses after the modification are supposed to have knowledge that they're in training, for example because they have enough compute to find themselves in the multiverse. Among hypotheses with equal behavior in training, we select the simpler one. We want this to be the one that disregards that knowledge. If the hypothesis has form "Return whatever maximizes property _ of the multiverse", the simpler one uses that knowledge. It is this form of hypothesis which I suggest to remove by inspection.
Take an outer-aligned system, then add a 0 to each training input and a 1 to each deployment input. Wouldn't this add only malicious hypotheses that can be removed by inspection without any adverse selection effects?
More bluntly: It's an outer alignment failure even to add a 0 to each training input and a 1 to each deployment input, because that equates to replacing "Be aligned." with "Be aligned during training."
I don't trust humanity to make it through the invention of nuclear weapons again, so let's not go back too far. Within the last few decades, you could try a reroll on the alignment problem. Collect a selection of safety papers and try to excise hints at such facts as "throwing enough money at simple known architectures produces AGI". Wait to jump back until waiting longer carries a bigger risk of surprise UFAI than it's worth, or until the local intel agency knocks on your door for your time machine. You could build a reverse box - a Faraday bunker that sends you back if it's breached, leaving only a communication channel for new papers, X-risk alerts and UFAI hackers - some UFAIs may not care enough whether I make it out of their timeline. Balance acquiring researcher's recognition codes against the threat of other people taking the possibility of time travel seriously.
I recall Eliezer asking on Facebook for a good word for the field of AI safety research before it was called alignment.
then it would be

I approve the haikuesque format.
Do you agree that the "bijection" Intelligence -> Prediction preserves more structure than Prediction -> Compression?