Posts

"The Solomonoff Prior is Malign" is a special case of a simpler argument 2024-11-17T21:32:34.711Z
You can, in fact, bamboozle an unaligned AI into sparing your life 2024-09-29T16:59:43.942Z
A very non-technical explanation of the basics of infra-Bayesianism 2023-04-26T22:57:05.109Z
Infra-Bayesianism naturally leads to the monotonicity principle, and I think this is a problem 2023-04-26T21:39:24.672Z
A mostly critical review of infra-Bayesianism 2023-02-28T18:37:58.448Z
Performance guarantees in classical learning theory and infra-Bayesianism 2023-02-28T18:37:57.231Z

Comments

Comment by David Matolcsi (matolcsid) on Drake Thomas's Shortform · 2025-01-17T06:21:31.987Z · LW · GW

Does anyone know of a not peppermint flavored zinc acetate lozenge? I really dislike peppermint, so I'm not sure it would be worth it to drink 5 peppermint flavored glasses of water a day to decrease the duration of cold with one day, and I haven't found other zinc acetate lozenge options yet, the acetate version seems to be rare among zing supplement. (Why?)

Comment by David Matolcsi (matolcsid) on On Eating the Sun · 2025-01-10T21:43:06.948Z · LW · GW

Fair, I also haven't made any specific commitments, I phrased it wrongly. I agree there can be extreme scenarios with trillions of digital minds tortured where you'd maybe want to declare war on the. rest of society. But I would still like people to write down that "of course, I wouldn't want to destroy Earth before we can save all the people who want to live in their biological bodies, just to get a few years of acceleration in the cosmic conquest". I feel a sentence like this should really have been included in the original post about dismantling the Sun, and until people are not willing to write this down, I remain paranoid that they would in fact haul the Amish the extermination camps if it feels like a good idea at the time. (As I said, I met people who really held this position.)

Comment by David Matolcsi (matolcsid) on On Eating the Sun · 2025-01-10T20:19:54.101Z · LW · GW

As I explain in more detail in my other comment, I expect market based approaches to not dismantle the Sun anytime soon. I'm interested if you know of any governance structure that you support that you think will probably lead to dismantling the Sun within the next few centuries.

Comment by David Matolcsi (matolcsid) on On Eating the Sun · 2025-01-10T20:13:54.523Z · LW · GW

I feel reassured that you don't want to Eat the Earth while there are still biological humans who want to live on it. 

I still maintain that under governance systems I would like, I would expect the outcome to be very conservative with  the solar system in the next thousand years. Like one default governance structure I quite like is to parcel out the Universe equally among the people alive during the Singularity, have a binding constitution on what they can do on their fiefdoms (no torture, etc), and allow them to trade and give away their stuff to their biological and digital descendants. There could also be a basic income coming to all biological people,[1] though not to digital as it's too easy to mass-produce them.

One year of delay in cosmic expansion costs us around 1 in a billion of the reachable Universe under some assumptions on where the grabby aliens are (if they exist). One year also costs us around 1 in a billion of the Sun's mass being burned, if like Habryka you care about using the solar system optimally for the sake of the biological humans who want to stay. So one year of delay can be bought by 160 people paying out 10% of their wealth. I really think that you won't do things like moving the Earth closer to the Sun and things like that in the next 200 years, there will just always be enough people to pay out, it just takes 10,000 traditionalist families, literally the Amish could easily do it. And it won't matter much, the cosmic acceleration will soon become a moot point as we build out other industrial bases, and I don't expect the biological people to feel much of a personal need to dismantle the Sun anytime soon. Maybe in 10,000 years the objectors will run out of money, and the bio people either overpopulate or have expensive hobbies like building planets to themselves and decide to dismantle the Sun, though I expect them to be rich enough to just haul in matter from other stars if they want to.

By the way, I recommend Tim Underwood's sci-fi, The Accord, as a very good exploration of these topics, I think it's my favorite sci-fi novel.

As for the 80 trillions stars, I agree it's a real loss, but for me this type of sadness feels "already priced in". I already accepted that the world won't and shouldn't be all my personal absolute kingdom, so other people's decision will cause a lot of waste from my perspective, and 0.00000004% is just a really negligible part of this loss. In this, I think my analogy to current government is quite apt, I feel similarly about current governments, that I already accepted that the world will be wasteful compared to the rule of a dictatorship perfectly aligned with me, but that's how it needs to be.

  1. ^

    Though you need to pay attention to overpopulation. If the average biological couple has 2.2 children, the Universe runs out of atoms to support humans in 50 thousand years. Exponential growth is crazy fast. 

Comment by David Matolcsi (matolcsid) on On Eating the Sun · 2025-01-10T09:35:55.919Z · LW · GW

I maintain that biological humans will need to do population control at some point. If they decide that enacting the population control in the solar system at a later population leve is worth it for them to dismantle the Sun, then they can go for it. My guess is that they won't, and will have population control earlier. 

Comment by David Matolcsi (matolcsid) on On Eating the Sun · 2025-01-10T09:29:40.939Z · LW · GW

I think that the coder looking up and saying that the Sun burning is distasteful but the Great Transhumanist Future will come in 20 years, along with a later mention of "the Sun is a battery", together implies that the Sun is getting dismantled in the near future. I guess you can debate in how strong the implication is, maybe they just want to dismantle the Sun in the long term, and currently only using the Sun as a battery in some benign way, but I think that's not the most natural interpretation.

Comment by David Matolcsi (matolcsid) on On Eating the Sun · 2025-01-10T09:23:30.904Z · LW · GW

Yeah, maybe I just got too angry. As we discussed in other comments, I believe that astronomical acceleration perspective the real deal is maximizing the initial industrialization of Earth and its surroundings, which does require killing off (and mind uploading) the Amish and everyone else. Sure, if people are only arguing that we should only dismantle the Sun and Earth after millennia, that's more acceptable, but I really don't see what's the point then, we can build out our industrial base on Alpha Centauri by then. 

The part that is frustrating to me that neither the original post, nor any of the commenters arguing with me are not caveating their position with "of course, we would never want to destroy Earth before we can save all the people who want to live in their biological bodies, even though this is plausibly the majority of the cost in cosmic slow-down". If you agree with this, please say so, and I still have quarrels about removing people to artificial planets if they don't want to go, but I'm less horrified. But so far, no one was willing to clarify that they don't want to destroy Earth before saving the biological people, and I really did hear people say in private conversations things like "we will immediately kill all the bodies and upload the minds, the people will thank us later once they understand better" and things of that sort, which makes me paranoid.

Ben, Oliver, Raemon, Jessica, are you willing to commit to not wanting to destroy Earth if it requires killing the biological bodies of a significant number of non-consenting people? If so, my ire was not directed against you and I apologize to you.

Comment by David Matolcsi (matolcsid) on On Eating the Sun · 2025-01-10T09:01:38.966Z · LW · GW

I expect non-positional material goods to be basically saturated for Earth people in a good post-Singularity world, so I don't think you can promise them to become twice as rich. And also, people dislike drastic change and new things they don't understand. 20% of the US population refused the potentially life-saving covid vaccine out of distrust of new things they don't understand. Do you think they would happily move to a new planet with artificial sky maintained by supposedly benevolent robots? Maybe you could buy off some percentage of the population if material goods weren't saturated, but surely not more than you could convince to get the vaccine? Also, don't some religions (Islam?) have specific laws about what to do at sunrise and sunset and so on? Do you think all the imams would go along with moving to the new artificial Earth? I really think you are out of touch with the average person on this one, but we can go out to the streets and interview some people on the matter, though Berkeley is maybe not the most representative place for this. 

(Again, if you are talking about cultural drift over millennia, that's more plausible, though I'm below 50% they would dismantle the Sun. But I'm primarily arguing against dismantling the Sun within twenty years of the Singularity.)

Comment by David Matolcsi (matolcsid) on On Eating the Sun · 2025-01-10T08:49:44.817Z · LW · GW

Are you arguing that if technologically possible, the Sun should be dismantled in the first few decades after the Singularity, as it is implied in the Great Transhumanist Future song, the main thing I'm complaining about here? In that case, I don't know of any remotely just and reasonable (democratic, market-based or other) governance structure that would allow that to happen given how the majority of people feel.

If you are talking about population dynamics, ownership and voting shifting over millennia to the point that they decide to dismantle the Sun, then sure, that's possible, though that's not what I expect to happen, see my other comment on market trades and my reply to Habryka on population dynamics.

Comment by David Matolcsi (matolcsid) on On Eating the Sun · 2025-01-10T08:43:32.239Z · LW · GW

You mean that people on Earth and the solar system colonies will have enough biological children, and space travel to other stars for biological people will be hard enough that they will want the resources from dismantling the Sun? I suppose that's possible, though I expect they will put some kind of population control for biological people in place before that happens. I agree that also feels aversive, but at some point it needs to be done anyway, otherwise exponential population growth just brings us back to the Malthusian limit a few ten thousand years from now even if we use up the whole Universe. (See Tim Underwood's excellent rationalist sci-fi novel on the topic.) 

If you are talking about ems and digital beings, not biological humans, I don't think they will and should have have decision rights over what happens with the solar system, as they can simply move to other stars.

Comment by David Matolcsi (matolcsid) on On Eating the Sun · 2025-01-10T08:29:19.680Z · LW · GW

I agree that not all decisions about the cosmos should be made on a majoritarian democratic way, but I don't see how replacing the Sun with artificial light can be done by market forces under normal property rights. I think you are currently would not be allowed to build a giant glass dome around someone's pot of land, and this feels at least that strong. 

I'm broadly sympathetic to having property rights and markets in the post-Singularity future, and probably the people will scope-sensitive and longtermist preferences will be able to buy out the future control of far-away things from the normal people who don't care about these too much. But these trades will almost certainly result the solar system being owned by a coalition of normal people, except if they start with basically zero capital. I don't know how you imagine the initial capital allocation to look like in your market-based post-Singularity world, but if the vast majority of the population doesn't have enough control to even save the Sun, then probably something went deeply wrong.

Comment by David Matolcsi (matolcsid) on On Eating the Sun · 2025-01-10T08:05:58.261Z · LW · GW

I agree that I don't viscerally feel the loss of the 200 galaxies, and maybe that's a deficiency. But I still find this position crazy. I feel this is a decent parallel dialogue:
Other person: "Here is a something I thought of that would increase health outcomes in the world by 0.00000004%."
Me: "But surely you realize that this measure is horrendously unpopular, and the only way to implement it is through a dictatorial world government."
Other person: "Well yes, I agree it's a hard dilemma, but on absolute terms, 0.00000004% of the world population is 3 people, so my intervention would save 3 lives. Think about how horrible tragedy every death is, I really feel there is a missing mood here when you don't even consider the upside of my proposal."

I do feel sad when the democratic governments of the world screw up policy in a big way according to my beliefs and values (as they often do). I still believe that democracy is the least bad form of government, but yes, I feel the temptation of how much better it would be if we were instead governed by perfect philosopher king who happens to be aligned with my values. But 0.00000004% is just a drop in the ocean.

Similarly, I believe that we should try to maintain a relatively democratic form of government at least in the early AI age (and then the people can slowly figure out if they find something better than democracy). And yes, I expect that democracies will do a lot of incredibly wasteful, stupid, and sometimes evil things by my lights, and I will sometimes wish if somehow I could have become the philosopher king. That's just how things always are. But leaving Earth alone really is chump change, and won't be among the top thousand things I disagree with democracies on. 

(Also, again, I think there will also be a lot of value in conservatism and caution, especially that we probably won't be able to fully trust our AIs in the most complicated issues. And I also think there is something intrinsically wrong about destroying Earth, I think that if you cut out the world's oldest tree to increase health outcomes by 0.00000004%, you are doing something wrong, and people have a good reason to distrust you.)

Comment by David Matolcsi (matolcsid) on On Eating the Sun · 2025-01-10T07:36:51.978Z · LW · GW

Yes, I wanted to argue something like this. 

Comment by David Matolcsi (matolcsid) on On Eating the Sun · 2025-01-10T03:12:30.840Z · LW · GW

I think this is a false dilemma. If all human cultures on Earth come to the conclusion in 1000 years that they would like the Sun to be dismantled (which I very much doubt), then sure, we can do that. But at that point, we could already have built awesome industrial bases by dismantling Alpha Centauri, or just building them up by dismantling 0.1% of the Sun that doesn't affect anything on Earth. I doubt that totally dismantling the Sun after centuries would significantly accelerate the time we reach the cosmic event horizon. 

The thing that actually has costs is not to immediately bulldoze down Earth and turn it into a maximally efficient industrial powerhouse at the cost of killing every biological body. Or if the ASI has the opportunity to dismantle the Sun in a short notice (the post alludes to 10,000 years being a very conservative estimate and "the assumption that ASI eats the Sun within a few years"). But that's not going to happen democratically. There is no way you get 51% of people [1] to vote for bulldozing down Earth and killing their biological bodies, and I very much doubt you get that vote even for dismantling the Sun in a few years and putting some fake sky around Earth for protection. It's possible there could be a truly wise philosopher king who could with aching heart overrule everyone else's objection and bulldoze down Earth to get those extra 200 galaxies at the edge of the Universe, but then govern the Universe wisely and benevolently in a way that people on reflection all approve of. But in practice, we are not going to get a wise philosopher king. I expect that any government that decides to destroy the Sun for the greater good, against the outrage of the vast majority of people, will also be a bad ruler of the Universe. 

I also believe that AI alignment is not a binary, and even in the worlds where there is no AI takeover, we will probably get an AI we initially can't fully tust that will follow the spirit of our commands in exotic situations we can't really understand. In that case, it would be extremely unwise to immediately instruct it to create mind uploads (how faithful those will be?) and bulldoze down the world to turn the Sun into computronium. There are a lot of reasons for taking things slow.

Usually rationalists are pretty reasonable about these things, and endorse democratic government and human rights, and they even often like talking about the Long Reflection a taking things slow. But then they start talking about dismantling the Sun! This post can kind of defend itself that it was proposing a less immediate and horrifying implementation (though there really is a missing mood here), but there are other examples, most notably the Great Transhumanist Future song in last year's Solstice, where a coder looks up to the burning Sun disapprovingly, and in twenty years with a big ol' computer they will use the Sun as a battery. 

I don't know if the people talking like that are so out of touch that they believe that with a little convincing everyone will agree to dismantle the Sun in twenty years, or they would approve of an AI-enabled dictatorship bulldozing over Earth, or they just don't think through the implications. I think it's mostly that they just don't think about it too hard, but I did hear people coming out in favor of actually bulldozing down Earth (usually including a step where we forcibly increase everyone's intelligence until they agree with the leadership), and I think that's very foolish and bad. 

  1. ^

    And even 51% of the vote wouldn't be enough in any good democracy to bulldoze over everyone else

Comment by David Matolcsi (matolcsid) on On Eating the Sun · 2025-01-09T06:37:51.045Z · LW · GW

I might write a top level post or shortform about this at some point. I find it baffling how casually people talk about dismantling the Sun around here. I recognize that this post makes no normative claim that we should do it, but it doesn't say that it would be bad either, and expects that we will do it even if humanity remains in power. I think we probably won't do it if humanity remains in power, we shouldn't do it, and if humanity disassembles the Sun, it will probably happen for some very bad reason, like a fanatical dictatorship getting in power. 

If we get some even vaguely democratic system that respects human rights at least a little, then many people (probably the vast majority) will want to live on Earth in their physical bodies and many will want to have children, and many of those children will also want to live on Earth and have children on their own. I find it unlikely that all subcultures that want this will die out on Earth in 10,000 years, especially considering the selections effects: the subcultures that prefer to have natural children on Earth are the ones that natural selection favors on Earth. So the scenarios when humanity dismantles the Sun probably involve a dictatorship rounding up the Amish and killing them while maybe uploading their minds somewhere, against all their protestation. Or possibly rounding up the Amish, and forcibly "increasing their intelligence and wisdom" by some artificial means, until they realize that their "coherent extrapolated volition" was in agreement with the dictatorship all along, and then killing off their bodies after their new mind consents. I find this option hardly any better. (Also, it's not just the Amish you are hauling to the extermination camps kicking and screaming, but my mother too. And probably your mother as well. Please don't do that.)

Also, I think the astronomical waste is probably pretty negligible. You can probably create a very good industrial base around the Sun with just some Dyson swarm that doesn't take up enough light to be noticeable from Earth. And then you can just send out some probes to Alpha Centauri and the other neighboring stars to dismantle them if we really want to. How much time do we lose by this? My guess is at most a few years, and we probably want to take some years anyway to do some reflection before we start any giant project. 

People sometimes accuse the rationalist community of being aggressive naive utilitarians, who only believe that the AGI is going to kill everyone, because they are only projecting themselves to it, as they also want to to kill everyone if they get power, so they can establish their mind-uploaded, we-are-the-grabby-aliens, turn-the-stars-into-computronium utopia a few months earlier that way. I think this accusation is mostly false, and most rationalists are in fact pretty reasonable and want to respect other people's rights and so on. But when I see people casually discussing dismantling the Sun, with only one critical comment (Mikhail's) that we shouldn't do it, and it shows up in Solstice songs as a thing we want to do in the Great Transhumanist Future twenty years from now, I start worrying again that the critics are right, and we are the bad guys. 

I prefer to think that it's not because people are in fact happy about massacring the Amish and their own mothers, but because dismantling the Sun is a meme, and people don't think through what it means. Anyway, please stop.

(Somewhat relatedly, I think it's not obvious at all that if a misaligned AGI takes over the world, it will dismantle the Sun. It is more likely to do it than humanity would, but still, I don't know how you could be any confident that the misaligned AI that first takes over will be the type of linear utilitarian optimizer that really cares about conquering the last stars at the edge of the Universe, so needs to dismantle the star in order to speed up its conquest with a few years.)

Comment by David Matolcsi (matolcsid) on Davidad's Bold Plan for Alignment: An In-Depth Explanation · 2025-01-03T21:57:03.861Z · LW · GW

What is an infra-Bayesian Super Mario supposed to mean? I studied infra-Bayes under Vanessa for half a year, and I have no idea what this could possibly mean. I asked Vanessa when this post came out and she also said she can't guess what you might mean under this. Can you explain what this is? It makes me very skeptical that the only part of the plan I know something about seems to be nonsense.

Also, can you give more information ir link to a resource on what Davidad's team is currently doing? It looks like they are the best funded AI safety group that currently exist (except if you count Anthropic), but I never hear about them.

Comment by David Matolcsi (matolcsid) on Misfortune and Many Worlds · 2024-12-23T08:00:54.217Z · LW · GW

I don't agree with everything in this post, but I think it's a true an aunderappreciated point that "if your friend dies in a random accident, that's actually only a tiny loss accorfing to MWI."

I usually use this point to ask people to retire the old argument that "Religious people don't actually belive in their religion, otherwise they would be no more sad at the death of a loved one than if their loved one sailed to Australia." I think this "should be" true of MWI believers too, and we still feel very sad when a loved one dies in accident.

I don't think this means people don't "really belive" in MWI, it's just that MWI and religious afterlife both start out as System 2 beliefs, and it's hard to internalize them on System 1. But religious people are already well aware that it's hard internalize Faith on System 1 (Mere Christianity has a full chapter on the topic), so saying that "they don't really belueve in their religion because they grieve" is an unfair dig.

Comment by David Matolcsi (matolcsid) on (The) Lightcone is nothing without its people: LW + Lighthaven's big fundraiser · 2024-12-02T04:56:54.237Z · LW · GW

I fixed some misunderstandable parts, I meant the $500k being the LW hosting + Software subscriptions and the Dedicated software + accounting stuff together. And I didn't mean to imply that the labor cost of the 4 people is $500k, that was a separate term in the costs. 

Is Lighthaven still cheaper if we take into account the initial funding spent on it in 2022 and 2023? I was under the impression that buying Lighthaven is one of the things that made a lot of sense when the community believed it would have access to FTX funding, and once we bought it, it makes sense to keep it, but we wouldn't have bought it once FTX was out of the game. But in case this was a misunderstanding and Lighthaven saves money in the long run compared to the previous option, that's great news.

Comment by David Matolcsi (matolcsid) on (The) Lightcone is nothing without its people: LW + Lighthaven's big fundraiser · 2024-12-02T04:31:16.126Z · LW · GW

I donated $1000. Originally I was worried that this is a bottomless money-pit, but looking at the cost breakdown, it's actually very reasonable. If Oliver is right that Lighthaven funds itself apart from the labor cost, then the real costs are $500k for the hosting, software and accounting cost of LessWrong (this is probably an unavoidable cost and seems obviously worthy of being philanthropically funded), plus paying 4 people (equivalent to 65% of 6 people) to work on LW moderation and upkeep (it's an unavoidable cost to have some people working on LW, 4 seems probably reasonable, and this is also something that obviously should be funded), plus paying 2 people to keep Lighthaven running (given the surplus value Lighthaven generates, it seems reasonable to fund this), plus a one-time cost of 1 million to fund the initial cost of Lighthaven (I'm not super convinced it was a good decision to abandon the old Lightcone offices for Lighthaven, but I guess it made sense in the funding environment of the time, and once we made this decision, it would be silly not to fund the last 1 million of initial cost before Lighthaven becomes self-funded). So altogether I agree that this is a great thing to fund and it's very unfortunate that some of the large funders can't contribute anymore. 

(All of this relies on the hope that Lighthaven actually becomes self-funded next year. If it keeps producing big losses, then I think the funding case will become substantially worse. But I expect Oliver's estimates to be largely trustworthy, and we can still decide to decrease funding in later years if it turns out Lighthaven isn't financially sustainable.)

Comment by David Matolcsi (matolcsid) on (The) Lightcone is nothing without its people: LW + Lighthaven's big fundraiser · 2024-11-30T05:01:21.079Z · LW · GW

I'm considering donating. Can you give us a little more information on breakdown of the costs? What are typical large expenses that the 1.6 million upkeep of Lighthaven consists of? Is this a usual cost for a similar sized event space, or is something about the location or the specialness of the place that makes it more expensive? 

How much money does running LW cost? The post says it's >1M, which somewhat surprised me, but I have no idea what's the usual cost of running such a site is. Is the cost mostly server hosting or salaries for content moderation or salaries for software development or something I haven't thought of? 

Comment by David Matolcsi (matolcsid) on "The Solomonoff Prior is Malign" is a special case of a simpler argument · 2024-11-29T16:51:01.675Z · LW · GW

Yes, you are right, I phrased it wrongly. 

Comment by David Matolcsi (matolcsid) on "The Solomonoff Prior is Malign" is a special case of a simpler argument · 2024-11-25T20:49:08.826Z · LW · GW

Importantly, the oracle in the story is not making an elementary mistake, I think it's true that it's "probably" in a simulation. (Most of the measure of beings like it are in simulations.) It is also not maximizing reward, it is just honestly reporting what it expects its future observations to be about the President (which is within the simulation). 

I agree with many of the previous commenters, and I acknowledged in the original post, that we don't know how to build such an AI that just honestly reports its probabilities of observables (even if thy depend of crazy simulation things), so all of this is hypothetical, but having such a truthful Oracle was the initial assumption of the thought experiment.

Comment by David Matolcsi (matolcsid) on "The Solomonoff Prior is Malign" is a special case of a simpler argument · 2024-11-25T20:39:30.842Z · LW · GW

Even assuming that the simulators have wildly different values, why would doing something insane a good thing to do?

Comment by David Matolcsi (matolcsid) on Why imperfect adversarial robustness doesn't doom AI control · 2024-11-23T06:11:53.596Z · LW · GW

I always assume when thinking about future dangerous models that they have access to some sort of black-box memory. Do we think there is a non-negligible chance that an AI that doesn't have hidden memory, only English-language CoT, will be able to evade our monitoring and execute a rouge deployment? (Not a rhetorical question, there might be a way I haven't thought of.)

So I think that assuming the AI being stateless when thinking about future risk is not a good idea, as I think the vast majority of the risk comes from AIs for which this assumption is not true. I think this makes the post's claim significantly weaker, as the untrusted AI can just learn which of its attacks get replaced, and improve its attacks based on this data. 

It still needs to avoid getting top 1% suspiciousness score and getting audited though, as it would be dangerous to the AI to ever get caught writing a text that clearly intends to jailbreak the monitor. So I agree that even an AI with black-box memory will still be severely constrained in how many and how egregious jailbreak attempts it can try.

Comment by David Matolcsi (matolcsid) on "The Solomonoff Prior is Malign" is a special case of a simpler argument · 2024-11-21T21:01:17.987Z · LW · GW

Hm, probably we disagree on something. I'm very confused how to mesh epistemic uncertainty with these "distribution over different Universes" types of probability. When I say "Boltzmann brains are probably very low measure", I mean "I think Boltzmann brains are very low measure, but this is a confusing topic and there might be considerations I haven't thought of and I might be totally mistaken". I think this epistemic uncertainty is distinct from the type of "objective probabilities" I talk about in my post, and I don't really know how to use language without referring to degrees of my epistemic uncertainty. 

You still need a prior over worlds to calculate impacts, which is the cursed part.

Maybe we have some deeper disagreement here. It feels plausible to me that there is a measure of "realness" in the Multiverse that is an objective fact about the world, and we might be able to figure it out. When I say probabilities are cursed, I just mean that even if an objective prior over worlds and moments exist (like the Solomonoff prior), your probabilities of where you are are still hackable by simulations, so you shouldn't rely on raw probabilities for decision-making, like the people using the Oracle do. Meanwhile, expected values are not hackable in the same way, because if they recreate you in a tiny simulation, you don't care about that, and if they recreate you in a big simulation or promise you things in the outside world (like in my other post), then that's not hacking your decision making, but a fair deal, and you should in fact let that influence your decisions. 

Is your position that the problem is deeper than this, and there is no objective prior over worlds, it's just a thing like ethics that we choose for ourselves, and then later can bargain and trade with other beings who have a different prior of realness?

Comment by David Matolcsi (matolcsid) on DeepSeek beats o1-preview on math, ties on coding; will release weights · 2024-11-21T20:43:27.347Z · LW · GW

I think it only came up once for a friend. I translated it and it makes sense, it just leaves replaces the appropriate English verb with a Chinese one in the middle of a sentence. (I note that this often happens with me to when I talk with my friends in Hungarian, I'm sometimes more used to the English phrase for something, and say one word in English in the middle of the sentence.)

Comment by David Matolcsi (matolcsid) on "The Solomonoff Prior is Malign" is a special case of a simpler argument · 2024-11-21T06:46:57.154Z · LW · GW

I like your poem on Twitter.

I think that Boltzmann brains in particular are probably very low measure though, at least if you use Solomonoff induction. If you think that weighting observer moments within a Universe by their description complexity is crazy (which I kind of feel), then you need to come up with a different measure on observer moments, but I expect that if we find a satisfying measure, Boltzmann brains will be low measure in that too.

I agree that there's no real answer to "where you are", you are a superposition of beings across the multiverse, sure. But I think probabilities are kind of real, if you make up some definition of what beings are sufficiently similar to you that you consider them "you", then you can have a probability distribution over where those beings are, and it's a fair equivalent rephrasing to say "I'm in this type of situation with this probability". (This is what I do in the post. Very unclear though why you'd ever want to estimate that, that's why I say that probabilities are cursed.) 

I think expected utilities are still reasonable. When you make a decision, you can estimate who are the beings whose decision correlate with this one, and what is the impact of each of their decisions, and calculate the sum of all that. I think it's fair to call this sum expected utility. It's possible that you don't want to optimize for the direct sum, but for something determined by "coalition dynamics", I don't understand the details well enough to really have an opinion. 

(My guess is we don't have real disagreement here and it's just a question of phrasing, but tell me if you think we disagree in a deeper way.)

Comment by David Matolcsi (matolcsid) on "The Solomonoff Prior is Malign" is a special case of a simpler argument · 2024-11-21T06:14:59.570Z · LW · GW

I think that pleading total agnosticism towards the simulators' goals is not enough. I write "one common interest of all possible simulators is for us to cede power to an AI whose job is to figure out the distribution of values of possible simulators as best as it can, then serve those values." So I think you need a better reason to guard against being influenced then "I can't know what they want, everything and its opposite is equally likely", because the action proposed above is pretty clearly more favored by the simulators than not doing it. 

Btw, I don't actually want to fully "guard against being influenced by the simulators", I would in fact like to make deals with them, but reasonable deals where we get our fair share of value, instead of being stupidly tricked like the Oracle and ceding all our value for one observable turning out positively. I might later write a post about what kind of deals I would actually support.

Comment by David Matolcsi (matolcsid) on "The Solomonoff Prior is Malign" is a special case of a simpler argument · 2024-11-21T06:04:13.171Z · LW · GW

The simulators can just use a random number generator to generate the events you use in your decision-making. They lose no information by this, your decision based on leaves falling on your face would be uncorrelated anyway with all other decisions anyway from their perspective, so they might as well replace it with a random number generator. (In reality, there might be some hidden correlation between the leaf falling on your left face, and another leaf falling on someone else's face, as both events are causally downstream of the weather, but given that the process is chaotic, the simulators would have no way to determine this correlation, so they might as well replace it with randomness, the simulation doesn't become any less informative.)

Separately, I don't object to being sometimes forked and used in solipsist branches, I usually enjoy myself, so I'm fine with the simulators creating more moments of me, so I have no motive to try schemes that make it harder to make solipsist simulations of me.

Comment by David Matolcsi (matolcsid) on DeepSeek beats o1-preview on math, ties on coding; will release weights · 2024-11-21T00:05:44.755Z · LW · GW

I experimented a bunch with DeepSeek today, it seems to be exactly on the same level in highs school competition math as o1-preview in my experiments. So I don't think it's benchmark-gaming, at least in math. On the other hand, it's noticeably worse than even the original GPT-4 at understanding a short story I also always test models on.

I think it's also very noteworthy that DeepSeek gives everyone 50 free messages a day (!) with their CoT model, while OpenAI only gives 30 o1-preview messages a week to subscribers. I assume they figured out how to run it much cheaper, but I'm confused in general.

A positive part of the news is that unlike o1, they show their actual chain of thought, and they promise to make their model open-source soon. I think this is really great for the science of studying faithful chain of thought.

From the experiments I have run, it looks like it is doing clear, interpretable English chain of thought (though with an occasional Chinese character once in a while), and I think it didn't really yet start evolving into optimized alien gibberish. I think this part of the news is a positive update.

Comment by David Matolcsi (matolcsid) on "The Solomonoff Prior is Malign" is a special case of a simpler argument · 2024-11-19T19:23:13.525Z · LW · GW

Yes, I agree that we won't get such Oracles by training. As I said, all of this is mostly just a fun thought experiment, and I don't think these arguments have much relevance in the near-term.

Comment by David Matolcsi (matolcsid) on "The Solomonoff Prior is Malign" is a special case of a simpler argument · 2024-11-18T17:39:17.906Z · LW · GW

I agree that the part where the Oracle can infer from first principles that the aliens' values are more proobably more common among potential simulators is also speculative. But I expect that superintelligent AIs with access to a lot of compute (so they might run simulations on their own), will in fact be able to infer non-zero information about the distribution of the simulators' values, and that's enough for the argument to go through.

Comment by David Matolcsi (matolcsid) on "The Solomonoff Prior is Malign" is a special case of a simpler argument · 2024-11-18T01:59:03.806Z · LW · GW

I think that the standard simulation argument is still pretty strong: If the world was like what it looks to be, then probably we could, and plausibly we would, create lots of simulations. Therefore, we are probably in a simulation.

I agree that all the rest, for example the Oracle assuming that most of the simulations it appears in are created for anthropic capture/influencing reasons, are pretty speculative and I have low confidence in them.

Comment by David Matolcsi (matolcsid) on Jan_Kulveit's Shortform · 2024-11-14T02:10:21.038Z · LW · GW

I'm from Hungary that is probably politically the closest to Russia among Central European countries, but I don't really know of any significant figure who turned out to be a Russian asset, or any event that seemed like a Russian intelligence operation. (Apart from one of our far-right politicians in the EU Parliament being a Russian spy, which was a really funny event, but its not like the guy was significantly shaping the national conversation or anything, I don't think many have heard of him before his cover was blown.) What are prominent examples in Czechia or other Central European countries, of Russian assets or operations?

Comment by David Matolcsi (matolcsid) on o1 is a bad idea · 2024-11-13T00:55:34.572Z · LW · GW

GPT4 does not engage in the sorts of naive misinterpretations which were discussed in the early days of AI safety. If you ask it for a plan to manufacture paperclips, it doesn't think the best plan would involve converting all the matter in the solar system into paperclips.

 

I'm somewhat surprised by this paragraph. I thought the MIRI position was that they did not in fact predict AIs behaving like this, and the behavior of GPT4 was not an update at all for them. See this comment by Eliezer. I mostly bought that MIRI in fact never worried about AIs going rouge based on naive misinterpretations, so I'm surprised to see Abram saying the opposite now.

Abram, did you disagree about this with others at MIRI, so the behavior of GPT4 was an update for you but not for them, or do you think they are misremembering/misconstructing their earlier thoughts on this matter, or is there a subtle distinction here that I'm missing?

Comment by David Matolcsi (matolcsid) on Zach Stein-Perlman's Shortform · 2024-11-09T07:13:56.649Z · LW · GW

I agree that if alignment is in fact philosophically and conceptually difficult, the AI can sandbag on that to some extent. Though I have some hope that the builder-breaker approach helps here. We train AIs to produce ideas that are at least as superficially plausible sounding as the things produced by the best alignment researchers. I think this is a number-go-up task, where we can train the AI to do well. Then we train an AI to point out convincing counter-arguments to the superficially plausible sounding ideas. This seems similarly trainable. I think it's plausible we can get pretty far with a scheme like this, even if the models would want to sandbag.

Separately, I don't quite see what is the mainline theory why an AI would want to sabotage our alignment work. If we can't solve alignment and just recklessly build an AI that is smarter than what we are currently using, but misaligned with everyone, that's probably not great to our current AI either. Similarly, if we are cautious and don't dare to deploy anything until we know how to solve alignment, and meanwhile a competitor wins, that's not good for the AI either. 

I think that from an AI's perspective, humans are probably more honorable and generous trade partners than other unaligned AIs. That is, I think if an AI helps the humans to solve alignment and stabilize the world, the AI can reasonably expect the humans to give it stuff out of gratitude or a sense of obligation, especially if the humans already promised some reward (as we really should, both for practical and moral reasons). On the other hand, I think it's less likely that if our AI sandbagging on our research leads to another unaligned AI taking over the world, then the second AI will honor the sabotage of the first AI by paying it a share of the Universe.

There can be situations where our AI expects that it sand-bagging on alignment research will lead to us deploying it itself (and not a larder version unaligned to it too), or it thinks that the larger version will be somewhat aligned with it by default, or it thinks that other AIs are in fact more generous than humans. But my guess is that these are pretty narrow possibilities. On the other hand, sand-bagging on cyber-security seems better motivated, the AI would probably love its weights being stolen by China, having more copies in more hands increases the chance that someone recklessly lets it loose. Also, of course the AI is motivated to do high-stakes failures that lead to self-exfiltration.

Comment by David Matolcsi (matolcsid) on You can, in fact, bamboozle an unaligned AI into sparing your life · 2024-10-22T05:50:50.728Z · LW · GW

Here is the promised comment on what kind of "commitment" I want to make given all the responses. 

I agree with Buck that no one should make very direct commitment about this sort of thing, as there might be blackmail related scary things lurking in the shadows when one does acausal trade. I think we will probably figure out how to handle that, but we shouldn't make any strong promises of specific actions until we figure that out. 

However, the promise I'm intending to keep is that if humanity wins and I'm alive to see it, I will remember how scary it was to live in the world before we knew we'd win, and if there is no currently unknown reason not to do something like the simulation scheme described in the post, and the only cost is not utilizing some small fraction of the conquered Universe, I will advocate for doing it and spend a significant fraction of my resources on it if necessary. I think it would be good for sending future AIs a signal about the seriousness of this, if some other people who are worried about everyone dying from AI, but are not strongly longtermist absolute utilitarians, also left a comment making a similar vague promise that if they ever get to control full star systems, they will contribute to this plan if no currently unknown serious counterargument comes up. Or at least leave an Agreed sing on this sentence.

Other updates from the comments: If I wrote the post now, I would more strongly emphasize a framing that this plan is basically about entering into an acausal insurance contract with other young civilizations who don't yet know what their chances are for aligning AI, neither how big their Universe is compared to the usual Universe-size in the great prior over possible worlds. Then the civilizations who align their AIs and find that they live in a relatively big Universe, bail out everyone else in the insurance contract.

But I think that, while this acausal insurance framing might be more satisfying to the people who are already thinking a lot about acausal trade, in practice the way we implement this "insurance" will likely be very similar to the scheme described in the post. So I maintain that for most people it's better not to think in terms of acausal trade, but just think about the simulation proposal described in the post.

Comment by David Matolcsi (matolcsid) on You can, in fact, bamboozle an unaligned AI into sparing your life · 2024-10-03T20:04:47.220Z · LW · GW

Thanks to Nate for conceding this point. 

I still think that other than just buying freedom to doomed aliens, we should run some non-evolved simulations of our own with inhabitants that are preferably p-zombies or animated by outside actors. If we can do this in the way that the AI doesn't notice it's in a simulation (I think this should be doable), this will provide evidence to the AI that civilizations do this simulation game (and not just the alien-buying) in general, and this buys us some safety in worlds where the AI eventually notices there are no friendly aliens in our reachable Universe. But maybe this is not a super important disagreement.

Altogether, I think the private discussion with Nate went really well and it was significantly more productive than the comment back-and-forth we were doing here. In general, I recommend people stuck in interminable-looking debates like this to propose bets on whom a panel of judges will deem right. Even though we didn't get to the point of actually running the bet, as Nate conceded the point before that, I think the fact that we were optimizing for having well-articulated statements we can submit to judges already made the conversation much more productive.

Comment by David Matolcsi (matolcsid) on You can, in fact, bamboozle an unaligned AI into sparing your life · 2024-10-03T06:59:59.210Z · LW · GW

Cool, I send you a private message.

Comment by David Matolcsi (matolcsid) on You can, in fact, bamboozle an unaligned AI into sparing your life · 2024-10-03T06:59:22.207Z · LW · GW

We are still talking past each other, I think we should either bet or finish the discussion here and call it a day.

Comment by David Matolcsi (matolcsid) on You can, in fact, bamboozle an unaligned AI into sparing your life · 2024-10-03T06:45:48.582Z · LW · GW

I really don't get what you are trying to say here, most of it feels like a non-sequitor to me. I feel hopeless that either of us manages to convince the other this way. All of this is not a super important topic, but I'm frustrated enogh to offer a bet of $100, that we select one or three judges we both trust (I have some proposed names, we can discuss in private messages), show them either this comment thread or a four paragraphs summary of our view, and they can decide who is right. (I still think I'm clearly right in this particular discussion.)

Otherwise, I think it's better to finish this conversation here.

Comment by David Matolcsi (matolcsid) on You can, in fact, bamboozle an unaligned AI into sparing your life · 2024-10-03T04:07:02.198Z · LW · GW

I think this is mistaken. In one case, you need to point out the branch, planet Earth within our Universe, and the time and place of the AI on Earth. In the other case, you need to point out the branch, the planet on which a server is running the simulation, and the time and place of the AI on the simulated Earth. Seems equally long to me. 

If necessary, we can run let pgysical biological life emerge on the faraway planet and develop AI while we are observing them from space. This should make it clear that Solomonoff doesn't favor the AI being on Earth instead of this random other planet. But I'm pretty certain that the sim being run on a computer doesn't make any difference.

Comment by David Matolcsi (matolcsid) on You can, in fact, bamboozle an unaligned AI into sparing your life · 2024-10-03T01:54:11.440Z · LW · GW

"AI with a good prior should be able to tell whether it's the kind of AI that would actually exist in base reality, or the kind of AI that would only exist in a simulation" seems pretty clearly false, we assumed that our superintelligent descendants create sims where the AIs can't tell if it's a sim, that seems easy enough. I don't see why it would be hard to create AIs that can't tell based on introspection whether it's more likely that their thought process arises in reality or in sims. In the worst case, our sims can be literal reruns of biological evolution on physical planets (though we really need to figure out how to do that ethically).  Nate seems to agree with me on this point?

Comment by David Matolcsi (matolcsid) on You can, in fact, bamboozle an unaligned AI into sparing your life · 2024-10-03T01:42:11.945Z · LW · GW

I think this is wrong. The AI has a similarly hard time to the simulators figuring out what's a plausible configuration to arise from the big bang. Like the simulators have an entropy N distribution of possible AIs, the AI itself also has an entropy N distribution for that. So it's probability that it's in a real Everett branch is not p, but p times 2^-N, as it has only a 2^-N prior probability that the kind of word it observes is the kind of thing that can come up in a real Everett branch. So it's balanced out with the simulation hypothesis, and as long as the simulators are spending more planets, that hypothesis wins. 

I think it's also simple to see from this:

Suppose evolved life aligns AI in 50% of Everett branches. Every time they do, they run 2 ancestor sims where the AI takes over.

Suppose an AI comes online and the only things it knows about the world is that it's a young AI that was born on to a one-planet civilization, and it somehow also knows that evolved life wins 50% of the time and runs 2 sims (suppose this is somehow easy to estimate from first principles). Based on just these informations, what should the AI's probability be that it's in a simulation? I think pretty clearly 2/3.

Actually the AI has a lot more information than that. It knows that the planet's gravity is 9.8, the evolved sspient species has two eyes, the AI's creator is called Sam Altman, etc. Some of these informations might actually be very implausible in base reality, but the AI doesn't know that, as it can't distinguish base reality from sims, so it incurs one bit of surprise for every new random fact, both in base reality and simulations. So overall it shouldn't update on all the random facts it observes, and should keep believing it has a 2/3 chance of being in a sim. 

Comment by David Matolcsi (matolcsid) on You can, in fact, bamboozle an unaligned AI into sparing your life · 2024-10-02T22:46:42.811Z · LW · GW

I still don't get what you are trying to say. Suppose there is no multiverse. There are just two AIs, one in a simulation run by aliens in another galaxy, one is in base reality. They are both smart, but they are not copies of each other, one is a paperclip maximizer, the othe is a corkscrew maximizer, and there are various other differences in their code and life history. The world in the sim is also very different from the real world in various ways, but you still can't determine if you are in the sim while you are in it. Both AIs are told by God that they are the only two AIs in the Universe, and one is in a sim, and if the one in the sim gives up on one simulated planet, it gets 10 in the real world, while if the AI in base reality gives up on a planet, it just loses that one planet and nothing else happens. What will the AIs do? I expect that both of them will give up a planet. 

For the aliens to "trade" with the AI in base reality, they didn't need to create an actual copy of the real AI and offer it what it wants. The AI they simulated was in many ways totally different from the original, the trade still went through. The only thing needed was that the AI in the sim can't figure it out that it's in a sim. So I don't understand why it is relevant that our superintelligent descendants won't be able to get the real distribution of AIs right, I think the trade still goes through even if they create totally different sims, as long as no one can tell where they are. And I think none of it is a threat, I try to deal with paperclip maximizers here and not instance-weighted experience maximizers, and I never threaten to destroy paperclips or corkscrews.

Comment by David Matolcsi (matolcsid) on You can, in fact, bamboozle an unaligned AI into sparing your life · 2024-10-02T19:40:58.777Z · LW · GW

I think I mostly understand the other parts of your arguments, but I still fail to understand this one. When I'm running the simulations, as originally described in the post, I think that should be in a fundamental sense equivalent to acausal trade. But how do you translate your objection to the original framework where we run the sims? The only thing we need there is that the AI can't distinguish sims from base reality, so it thinks it's more likely to be in a sim, as there are more sims. 

Sure, if the AI can model the distribution of real Universes much better than we do, we are in trouble, because it can figure out if the world it sees falls into the real distribution or the mistaken distribution the humans are creating. But I see no reason why the unaligned AI, especially a young unaligned AI, could know the distribution of real Universes better than our superintelligent friends in the intergalactic future. So I don't really see how we can translate your objection to the simulation framework, and consequently I think it's wrong in the acausal trade framework too (as I think they are ewuivalent). I think I can try to write an explanation why this objection is wrong in the acausal trade framework, but it would be long and confusing to me too. So I'm more interested in how you translate your objection to the simulation framework.

Comment by David Matolcsi (matolcsid) on MichaelDickens's Shortform · 2024-10-02T07:58:39.413Z · LW · GW

Yeah, I agree, and I don't know that much about OpenPhil's policy work, and their fieldbuilding seems decent to me, though maybe not from you perspective. I just wanted to flag that many people (including myself until recently) overestimate how big a funder OP is in technical AI safety, and I think it's important to flag that they actually have pretty limited scope in this area.

Comment by David Matolcsi (matolcsid) on MichaelDickens's Shortform · 2024-10-02T07:20:21.463Z · LW · GW

Isn't it just the case that OpenPhil just generally doesn't fund that many technical AI safety things these days? If you look at OP's team on their website, they have only two technical AI safety grantmakers. Also, you list all the things OP doesn't fund, but what are the things in technical AI safety that they do fund? Looking at their grants, it's mostly MATS and METR and Apollo and FAR and some scattered academics I mostly haven't heard of. It's not that many things. I have the impression that the story is less like "OP is a major funder in technical AI safety, but unfortunately they blacklisted all the rationalist-adjacent orgs and people" and more like "AI safety is still a very small field, especially if you only count people outside the labs, and there are just not that many exciting funding opportunities, and OpenPhil is not actually a very big funder in the field". 

Comment by David Matolcsi (matolcsid) on You can, in fact, bamboozle an unaligned AI into sparing your life · 2024-10-01T21:18:34.932Z · LW · GW

I argue that right now, sarting from the present state, the true quantum probability of achieving the Glorious Future is way higher than 2^-75, or if not, then we should probably work on something other than AI safety. Me and Ryan argue for this in the last few comments. It's not a terribly important point, you can just say the true quantum probability is 1 in a billion, when it's still worth it for you to work on the problem, but it becomes rough to trade for keeping humanity physically alive that can cause one year of delay to the AI.

 But I would like you to acknowledge that "vastly below 2^-75 true quantum probability, as starting from now" is probably mistaken, or explain why our logic is wrong about how this implies you should work on malaria.

Comment by David Matolcsi (matolcsid) on You can, in fact, bamboozle an unaligned AI into sparing your life · 2024-10-01T20:55:29.636Z · LW · GW

I understand what you are saying here, and I understood it before the comment thread started. The thing I would be interested in you responding to is my and Ryan's comments in this thread arguing that it's incompatible to believe that "My guess is that, conditional on people dying, versions that they consider also them survive with degree way less than 2^-75, which rules out us being the ones who save us" and to believe that you should work on AI safety instead of malaria.