Posts

Averages and sample sizes 2023-11-02T09:52:43.576Z
The purpose of the (Mosaic) law 2023-09-04T23:38:41.606Z
A Friendly Face (Another Failure Story) 2023-06-20T10:31:24.655Z
Agentic Mess (A Failure Story) 2023-06-06T13:09:19.125Z
Miracles and why not to believe them 2022-11-16T12:07:40.781Z

Comments

Comment by mruwnik on LLMs for Alignment Research: a safety priority? · 2024-04-13T18:49:08.285Z · LW · GW

It's not just from https://aisafety.info/. It also uses Arbital, any posts from the alignment forum, LW, EA forum that seem relevant and have a minimum karma, a bunch of arXiv papers, and a couple of other sources. This is a a relatively up to date list of the sources used (it also contains the actual data). 

Comment by mruwnik on Apologizing is a Core Rationalist Skill · 2024-01-03T11:49:42.413Z · LW · GW

Another, related Machiavellian tactic is, when starting a relationship that you suspect will be highly valuable to you, is to have an argument with them as soon as possible, and then to patch things up with a (sincere!) apology. I'm not suggesting to go out of your way to start a quarrel, more that it's both a valuable data point as to how they handle problems (as most relationships will have patchy moments) and it's also a good signal to them that you value them highly enough to go through a proper apology.

Comment by mruwnik on AI Safety Chatbot · 2023-12-22T13:42:51.762Z · LW · GW
  1. gpt-3.5-turbo for now
  2. that's also being tried
Comment by mruwnik on The Hidden Perils of Hydrogen · 2023-10-17T12:42:58.220Z · LW · GW

They are perils of assuming that hydrogen is the future, or perils of basing your energy needs on it - i.e. the peril is not in the hydrogen, it's in making plans involving it

Comment by mruwnik on EA Vegan Advocacy is not truthseeking, and it’s everyone’s problem · 2023-09-29T14:17:35.348Z · LW · GW

That's actually what got me to stop eating (or at least buying) meat

Comment by mruwnik on The Talk: a brief explanation of sexual dimorphism · 2023-09-28T12:17:27.371Z · LW · GW

Somatic cells are generally dead ends evolutionary. Your toe cells aren't going to do much reproducing. Also, mitochondial (or in general organellar) DNA is split between the actual mitochondria and the cells containing them. Biology is fun!

The argument for mitochondria is that they cause the cell environment to be more toxic (what with them being the cell's powerhouse). This in turn is going to provide a lot of selection pressure. In the same way e.g. global warming is causing a lot of selection pressure.

Runaway sexual selection has limits. This is also sort of the point. If you can carry around massive breasts, tails, noses or whatever and still be very prosperous, that means you're good. Where "prosper" can mean running away from lions if you're an antelope, or be the top of the village pecking order if you're a human. Like a short pro basketball player. If they're short, but still at a pro level, that's someone you want on your team. This is known as the handicap principle, and can be explained via signaling mechanisms.

Comment by mruwnik on The Talk: a brief explanation of sexual dimorphism · 2023-09-19T04:50:50.557Z · LW · GW

The number of generations controls how long your experiment lasts. The longer (or more generations), the more drift you have, so the more likely for a given gene (or in this case - genders number) to take over. This effect will be weaker in larger populations, but unless you have an infinite population, given enough time (or generations), you'll end up with the 2 sexes (except for fungi, of course, as always). Eukaryotes first appeared 2.2 billion years ago. For comparison, the Cambrian explosion, with the first complex life, was only ~500 million years ago. That's a lot of time (or generations) for things to stabilize.

There are multiple mating types around. Mammals have the XY/XX chromosome thing going. Birds have a different chromosome set (denoted as ZW/ZZ). Some families use egg temperature to determine sex. Some fish have one male, and if it disappears, the next ranking individual becomes the male. Insects also have totally different mechanisms. But there are usually only the two sexes (apart from fungi), probably for the efficiency reasons outlined in the OP.

Comment by mruwnik on Where might I direct promising-to-me researchers to apply for alignment jobs/grants? · 2023-09-19T00:49:22.786Z · LW · GW

There is a Stampy answer to that which should stay up to date here.

Comment by mruwnik on The Talk: a brief explanation of sexual dimorphism · 2023-09-19T00:44:27.451Z · LW · GW

My understanding is pretty much what you said - when the going is good, then go asexual (e.g. strawberry runners, grasses or Asian knotweed), but also try for seeds, There are a couple of species of plants that have lost the ability for sexual reproduction, but I can't recall them right now. That being said, various plants used by humans can be pretty much exclusively reproduced asexually and so have lost the ability for sexual reproduction, specifically because they have very stable environments. The obvious examples are seedless fruits (bananas, grapes), but ginger and garlic are interesting plants that have been propagated from cuttings or bulbs for thousands of years and so lost the ability to produce seeds (with the normal caveats).

Aphids are also an interesting example, where the previous year's eggs hatch in the spring as females, which then clone themselves as fast as possible - when there's too many of them they will create clones with wings, and when autumn comes around, they will create male clones to then go through the normal sexual reproductive route. Which is also an example of the stable/unstable environment issues you mentioned.

Comment by mruwnik on The Talk: a brief explanation of sexual dimorphism · 2023-09-18T19:03:57.795Z · LW · GW

This depends on the size and distances involved, but it's a good intuition. You need a mechanism to generate the pressure differentials, which can be an issue in very small organisms, which can be an issue.

Small and sedentary organisms tend to use chemical gradients (i.e. smell), but anything bigger than a mouse (and quite a few smaller things) usually has some kind of sound signals, which are really good for quick notifications in a radius around you, regardless of the light level (so you can pretty much always use it). Also, depending on the medium, sound can travel really far - like whales which communicate with each other over thousands of miles, or elephants stomping to communicate with other elephants 20 miles away.

Comment by mruwnik on The Talk: a brief explanation of sexual dimorphism · 2023-09-18T18:46:20.669Z · LW · GW

organisms with mitochondria always use sexual reproduction

 

Or at least their ancestors did. You mention Bdelloidea in a comment, which are one of the inevitable exceptions (as you mention in the introduction, which I very appreciate, as "everything in biology has exceptions" is something I often find myself saying), but they are descended from eucaryotes which did have mitochondria.

The opposite seems true, though - true sexual reproduction seems to be exclusively by eukaryotes. So you could also say that sex makes mitochondria necessary. There seem to be a couple of good jokes there...

One other pedantic note to add to this generally excellent article is that non-eukaryotic organisms also have methods to mix their genes, what with bacterial conjugation or viral recombination, without the dimorphism. 

Comment by mruwnik on Is AI Safety dropping the ball on privacy? · 2023-09-18T17:58:38.124Z · LW · GW

It requires you to actively manage long lived sessions which would otherwise be handled by the site you're using. You can often get back to where you were by just logging in again, but there are many places (especially for travel or official places) where that pretty much resets the whole flow.

There are also a lot more popups, captchas and other hoops to jump through when you don't have a cookies trail.

The average user is lazy and doesn't think about these things, so the web as a whole is moving in the direction of making things easier (but not simpler). This is usually viewed as a good thing by those who then only need to click a single button. Though it's at the cost of those who want to have more control. 

It might not be inconvenient to you, especially as it's your basic flow. It's inconvenient for me, but worth the cost, but basically unusable for most of the people I know (compared to the default flow).  

Comment by mruwnik on The purpose of the (Mosaic) law · 2023-09-05T16:25:49.964Z · LW · GW

I thought all of these were obvious and well known. But yes, all of these are things I was pointing at.

Comment by mruwnik on Assume Bad Faith · 2023-09-03T16:40:29.032Z · LW · GW

there is "something else" going on besides both parties just wanting to get the right answer

 

There are also different priors. While in general you might very well be right (or at least this post makes a lot of sense to me), I often have conversations where I'm pretty sure both my interlocutor and I am discussing things in good faith, but where we still can't agree on pretty basic things (usually about religion).

Comment by mruwnik on AGI Safety FAQ / all-dumb-questions-allowed thread · 2023-09-02T10:53:56.340Z · LW · GW

I'm assuming you're not asking about the mechanism (i.e. natural selection + mutations)? A trite answer would be something like "the same way it created wings, mating dances, exploding beetles, and parasites requiring multiple hosts".

Thinking about the meaning of life might be a spandrel, but a quick consideration of it comes up with various evo-psych style reasons why it's actually very useful, e.g. it can propel people to greatness, which massively can increase their genetic fitness. Fitness is an interesting thing, in that it can be very non-obvious. Everything is a trade-off, where the only goal is for your genes to propagate. So if thinking about the meaning of life will get your genes spread more (e.g. because you decide that your children have inherit meaning, because you become a high status philosopher and your sister can marry well, because it's a social sign that you have enough resources to waste them on fruitless pondering) then it's worth having around.

Comment by mruwnik on AGI Safety FAQ / all-dumb-questions-allowed thread · 2023-09-02T10:36:07.388Z · LW · GW

Frankenstein is a tale about misalignment. Asimov wrote a whole book about it. Vernor Vinge also writes about it. People have been trying to get their children to behave in certain ways for ever. But before LW the alignment problem was just the domain of SF. 

20 years ago the alignment problem wasn't a thing, so much that MIRI started out as an org to create a Friendly AI.

Comment by mruwnik on AGI Safety FAQ / all-dumb-questions-allowed thread · 2023-09-02T10:29:38.082Z · LW · GW

The first issue that comes to mind is having an incentive that would achieve that. The one you suggest doesn't incentivize truth - it incentivizes collaboration in order to guess the password, which would fine in training, but then you're going into deceptive alignment land: Aleya Cotra has a good story illustrating that

Comment by mruwnik on AGI Safety FAQ / all-dumb-questions-allowed thread · 2023-09-02T10:16:16.726Z · LW · GW

You could, but should you? English in particular seems a bad choice. The problem with natural languages is their ambiguity. When you're providing a utility function, you want it to be as precise and robust as possible. This is actually an interesting case where folklore/mythology has known about these issues for millennia. There are all kinds of stories about genies, demons, monkey paws etc. where wishes were badly phrased or twisted. This is a story explanation of the issue.

Comment by mruwnik on AGI Safety FAQ / all-dumb-questions-allowed thread · 2023-09-02T10:06:28.121Z · LW · GW

You're adding a lot of extra assumptions here, a couple being:

  • there is a problem with having arbitrary goals
  • it has a pleasure-pain axis
  • it notices it has a pleasure-pain axis
  • it cares about its pleasure-pain axis
  • its pleasure-pain axis is independent of its understanding of the state of the environment

The main problem of inner alignment is making an agent want to do what you want it to do (as opposed to even understanding what you want it to do). Which is an unsolved problem. 

Although I'm criticizing your specific criticism, my main issue with it is that it's a very specific failure mode, which is unlikely to appear, because it requires a lot of other things which are also unlikely. That being said, you've provided a good example of WHY inner alignment is a big problem, i.e. it's very hard to keep something following the goals you set it, especially when it can think for itself and change its mind.

Comment by mruwnik on AGI Safety FAQ / all-dumb-questions-allowed thread · 2023-09-02T09:56:29.466Z · LW · GW

Drug addicts tend to be frowned upon not because they have a bad life, or even for evo-psych reasons but because their lifestyle is bad for the rest of society, in that they tend to have various unfortunate externalities

Comment by mruwnik on Monthly Roundup #9: August 2023 · 2023-08-08T21:19:02.553Z · LW · GW

It can also be retaliation, which sort of makes sense - there's a reason tit-for-tat is so successful. That being said, it's generally very unfortunate that they're being introduced, on all sides. I can sort of understand why countries would want to limit people from poor countries (which is not the same as agreeing with the reasoning). Enforcing visas for short term, touristy style visits doesn't seem like a good idea however I look at it. As Zvi notes, it's about the friction.

ESTA is very much a visa (I filled it out yesterday), but under a different name and purely electronic. 

Comment by mruwnik on All AGI Safety questions welcome (especially basic ones) [July 2023] · 2023-08-04T13:12:48.025Z · LW · GW

Not being able to directly communicate with the others would be an issue in the beginning, but I'm guessing you would be able to use the setup to work out what the others think. 

A bigger issue is that this would probably result in a very homogeneous group of minds. They're optimizing not for correct answers, but for consensus answers. It's the equivalent of studying for the exams. An fun example are the Polish equivalent of SAT exams (this probably generalizes, but I don't know about other countries). I know quite a few people who went to study biology, and then decided to retake the biology exam (as one can do). Most retakers had worse results the second time round. Because they had more up to date knowledge - the exam is like at least 10 years behind the current state of knowledge, so they give correct (as of today) answers, but have them marked as incorrect. I'd expect the group of AIs to eventually converge on a set of acceptable beliefs, rather than correct ones.

Comment by mruwnik on All AGI Safety questions welcome (especially basic ones) [July 2023] · 2023-08-04T12:52:27.632Z · LW · GW

That very much depends on how you understand "safe". Which is a large part of the differences between ethical AI people (safe means that it doesn't offend anyone, leak private information, give biased answers etc.) and the notkilleveryoneism people (safe means that it doesn't decide to remove humanity). These aren't mutually incompatible, but they require focusing on different things.

There is also safe in the PR sense, which means that no output will cause the LLM producer/supplier/whoever to get sued or in any other kind of trouble.

"Safe" is one of those funny words which everyone understands differently, but also assume that everyone else understands the same way.

Comment by mruwnik on All AGI Safety questions welcome (especially basic ones) [July 2023] · 2023-08-04T12:44:22.501Z · LW · GW

A couple come to mind:

The problem with them being that it's takes a bit of explaining to even understand the issue.

Comment by mruwnik on All AGI Safety questions welcome (especially basic ones) [July 2023] · 2023-08-04T12:40:54.969Z · LW · GW

Think of reward not as "here's an ice-cream for being a good boy" and more "you passed my test. I will now do neurosurgery on you to make you more likely to behave the same way in the future". The result of applying the "reward" in both cases is that you're more likely to act as desired next time. In humans it's because you expect to get something nice out of being good, in computers it's because they've been modified to do so. It's hard to directly change how humans think and behave, so you have to do it via ice-cream and beatings. While with computers you can just modify their memory.

Comment by mruwnik on All AGI Safety questions welcome (especially basic ones) [July 2023] · 2023-08-04T12:28:10.830Z · LW · GW

It depends a lot on how much it values self-preservation in comparison to solving the tests (putting aside the matter of minimal computation). Self-preservation is an instrumental goal, in that you can't bring the coffee if you're dead. So it seems likely that any intelligent enough AI will value self-preservation, if only in order to make sure it can achieve its goals.

That being said, having an AI that is willing to do its task and then shut itself down (or to shut down when triggered) is an incredibly valuable thing to have - it's already finished, but you could have a go at the shutdown problem.

A more general issue is that this will handle a lot of cases, but not all of them. In that an AI that does lie (for whatever reason) will not be shut down. It sounds like something worth having in a swiss cheese way.

(The whole point of these posts are to assume everyone is asking sincerely, so no worries)

Comment by mruwnik on Cryonics and Regret · 2023-07-28T16:31:08.787Z · LW · GW

Depends where. Which is the whole issue. For the US average wage, yes. For non US people no. I agree that it's a matter of priorities. But it's also a matter of earnings minus costs. Both of which depend a lot on where you live.

A lot of people certainly could save a lot more. But usually at the cost of quality of life. You could say that they should work a job that pays more, or live somewhere where there is a lower cost of living, but both of those can be hard.

I'm not saying you're wrong that it's doable. The problem is that the feasibility is highly dependent on your circumstances (same as e.g. having an electric car or whatever), which can make it very hard for people who aren't in affluent places.

Comment by mruwnik on Cryonics and Regret · 2023-07-25T12:05:53.980Z · LW · GW

Which is a bit over 3 years of saving up every penny of the average wages where I live. If you subtract the average rent and starvation rations from that income, you're up to 5.5 years. The first info I could find on google (from 2018) claims the average person here saves around $100 monthly, which gives you over 40 years of saving. This is only for one person. If you have multiple children, a SO, etc., that starts ballooning quickly. This is in a country which while not yet classified as developed, is almost there (Poland). 

50k is a lot for pretty much most of the world. It's the cost of a not very nice flat (i.e. middling location, or bad condition) here.

Comment by mruwnik on All AGI Safety questions welcome (especially basic ones) [July 2023] · 2023-07-25T11:44:03.752Z · LW · GW

It's not that it can't come up with ways to not stamp on us. But why should it? Yes, it might only be a tiny, tiny inconvenience to leave us alone. But why even bother doing that much? It's very possible that we would be of total insignificance to an AI. Just like the ants that get destroyed at a construction site - no one even noticed them. Still doesn't turn out too good for them.

Though that's when there are massive differences of scale. When the differences are smaller, you get into inter-species competition dynamics. Which also is what the OP was pointing at, if I understand correctly.

A superintelligence might just ignore us. It could also e.g. strip mine the whole earth for resources, coz why not? "The AI does not hate you, nor does it love you, but you are made of atoms which it can use for something else".

Comment by mruwnik on All AGI Safety questions welcome (especially basic ones) [July 2023] · 2023-07-25T11:29:25.837Z · LW · GW

In your example, can it just lie? You'd have to make sure it either doesn't know the consequences of your interlocks, or for it to not care about them (this is the problem of corrigibility).

If the tests are obvious tests, your AI will probably notice that and react accordingly - if it has enough intelligence it can notice that they're hard and probably are going to be used to gauge it's level, which then feeds into the whole thing about biding your time and not showing your cards until you can take over.

If they're not obvious, then you're in a security type situation, where you hope your defenses are good enough. Which should be fine on weak systems, but they're not the problem. The whole point of this is to have systems that are much more intelligent than humans, so you'd have to be sure they don't notice your traps. It's like if a 5 year old set up booby traps for you - how confident are you that the 5 year old will trap you?

This is a story of how that looks at the limit. A similar issue is boxing. In both cases you're assuming that you can contain something that is a lot smarter than you. It's possible in theory (I'm guessing?), but how sure are you that you can outsmart it in the long run?

Comment by mruwnik on A Friendly Face (Another Failure Story) · 2023-06-22T10:33:41.631Z · LW · GW

How do the hard limits of intelligence help? My current understanding is that the hard limits are likely to be something like Jupiter brains, rather than mentats. If each step is only slightly better, won't that result in a massive amount of tiny steps (even taking into account the nonlinearlity of it)?

Small value drifts are a large problem, if compounded. That's sort of the premise of a whole load of fiction, where characters change their value systems after sequences of small updates. And that's just in humans - adding in alien (as in different) minds could complicate this further (or not - that's the thing about alien minds).

Comment by mruwnik on A Friendly Face (Another Failure Story) · 2023-06-22T09:13:06.843Z · LW · GW

The foom problem is worse because of how hard it is to trust the recursion. Foomability is weakly correlated to whether the foomed entity is aligned. At least from our perspective. That's why there's the whole emphasis of getting it right on the first try.

How can you estimate how many of iterations of RSA will happen?

How does interpretability align an AI? It can let you know when things are wrong, but that doesn't mean it's aligned.

QACI can potentially solve outer alignment by giving you a rigorous and well specified mathematical target to aim for. That still leaves the other issues (though they are being worked on). 

Comment by mruwnik on A Friendly Face (Another Failure Story) · 2023-06-21T21:46:06.072Z · LW · GW

To a certain extent it doesn't matter. Or rather it's a question of expected utility. If 10% of outcomes are amazing, but 60% horrible, that sort of suggests you might want to avoid that route.

Comment by mruwnik on A Friendly Face (Another Failure Story) · 2023-06-21T21:42:51.816Z · LW · GW

Assuming it can scale with capabilities, that doesn't help you if alignment is scaling at y=2x and capabilities at y=123x (for totally random numbers, but you get the point). A quick google search found an article from 2017 claiming that there are some 300k AI researchers worldwide. I see claims around here that there are like 300 alignment researchers. Those numbers can be taken with a large grain of salt, but even so, that's 1000:1. 

As to recursive improvement, nope - check out the tiling problem. Also, "only" is doing a massive amount of work in "only need to align", seeing as no-one (as far as I can tell) has a good idea how to do that (though there are interesting approaches to some subproblems)

Comment by mruwnik on Book Review: How Minds Change · 2023-06-02T09:50:47.268Z · LW · GW

More that you get as many people in general to read the sequences, which will change their thinking so they make fewer mistakes, which in turn will make more people aware both of the real risks underlying superintelligence, but also of the plausibility and utility of AI. I wasn't around then, so this is just my interpretation of what I read post-facto, but I get the impression that people were a lot less doomish then. There was a hope that alignment was totally solvable.

The focus didn't seem to be on getting people into alignment, as much as it generally being better for people to think better. AI isn't pushed as something everyone should do - rather as what EY knows - but something worth investigating. There are various places where it's said that everyone could use more rationality, that it's an instrumental goal like earning more money. There's an idea of creating Rationality Dojos, as places to learn rationality like people learn martial arts. I believe that's the source of CFAR.

It's not that the one and only goal of the rationalist community was to stop an unfriendly AGI. It's just that is the obvious result of it. It's a matter of taking the idea seriously, then shutting up and multiplying - assuming that AI risk is a real issue, it's pretty obvious that it's the most pressing problem facing humanity, which means that if you can actually help, you should step up.

Business/economic/social incentives can work, no doubt about that. The issue is that they only work as long as they're applied. Actually caring about an issue (as in really care, like oppressed christian level, not performance cultural christian level) is a lot more lasting, in that if the incentives disappear, they'll keep on doing what you want. Convincing is a lot harder, though, which I'm guessing is your point? I agree that convincing is less effective numerically speaking, but it seems a lot more good (in a moral sense), which also seems important. Though this is admittedly a lot more of an aesthetics thing...

I most certainly recommend reading the sequences, but by no means meant to imply that you must. Just that stopping an unfriendly AGI (or rather the desirability of creating an friendly AI) permeates the sequences. I don't recall if it's stated explicitly, but it's obvious that they're pushing you in that direction. I believe Scott Alexander described the sequences as being totally mind blowing the first time he read them, but totally obvious on rereading them - I don't know which would be your reaction. You can try the highlights rather than the whole thing, which should be a lot quicker.

Comment by mruwnik on Reacts now enabled on 100% of posts, though still just experimenting · 2023-06-01T17:35:01.482Z · LW · GW

Right - now I see it. I was testing it on the reactions of @Sune's comment, so it was hidden far away to the right.

All in all, nice feature though.

Comment by mruwnik on Reacts now enabled on 100% of posts, though still just experimenting · 2023-06-01T14:55:51.437Z · LW · GW

But there is no way to downvote a reaction? E.g. if you add the paperclip reaction, then all I can do is bump it by one and/or later remove my reaction, but there is no way to influence your one? So reactions are strictly additive? 

Comment by mruwnik on Book Review: How Minds Change · 2023-06-01T14:34:46.789Z · LW · GW

The answer is to read the sequences (I'm not being facetious). They were written with the explicit goal of producing people with EY's rationality skills in order for them to go into producing Friendly AI (as it was called then). It provides a basis for people to realize why most approaches will by default lead to doom. 

At the same time, it seems like a generally good thing for people to be as rational as possible, in order to avoid the myriad cognitive biases and problems that plague humanities thinking, and therefore actions. My impression is that the hope was to make the world more similar to Dath Ilan. 

Comment by mruwnik on Book Review: How Minds Change · 2023-06-01T14:22:06.507Z · LW · GW

It depends what you mean by political. If you mean something like "people should act on their convictions" then sure. But you don't have to actually go in to politics to do that, the assumption being that if everyone is sane, they will implement sane policies (with the obvious caveats of Moloch, Goodhart etc.).

If you mean something like "we should get together and actively work on methods to force (or at least strongly encourage) people to be better", then very much no. Or rather it gets complicated fast. 

Comment by mruwnik on Book Review: How Minds Change · 2023-06-01T14:15:26.753Z · LW · GW

Jehovah's Witnesses are what first came to mind when reading the OP. They're sort of synonymous with going door to door in order to have conversations with people, often saying that they're willing for their minds to be changed through respectful discussions. They also are one of few christian-adjacent sects (for lack of a more precise description) to actually show large growth (at least in the west).

Comment by mruwnik on All AGI Safety questions welcome (especially basic ones) [May 2023] · 2023-05-09T09:13:17.663Z · LW · GW

No. 

Atheism is totally irrelevant. A deist would come to exactly the same conclusions. A Christian might not be convinced of it, but mainly because of eschatological reasons. Unless you go the route of saying that AGI is the antichrist or something, which would be fun. Or that God(s) will intervene if things get too bad?

Reductive materialism also is irrelevant. It might sort of play an issue with whether an AGI is conscious, but that whole topic is a red herring - you don't need a conscious system for it to kill everyone.

This feeds into the computational theory of mind - it makes it a lot easier to posit the possibility of a conscious AGI if you don't require a soul for it, but again - consciousness isn't really needed for an unsafe AI.

I have fundamental Christian friends who are ardent believers, but who also recognize the issues behind AGI safety. They might not think it that much of a problem (pretty much everything pales in comparison to eternal heaven and hell), but they can understand and appreciate the issues.  

Comment by mruwnik on a narrative explanation of the QACI alignment plan · 2023-04-14T19:55:14.067Z · LW · GW

1GB of text is a lot. Naively, that's a billion letters, much more if you use compression. Or you could maybe just do some kind of magic with the question containing a link to a wiki on the (simulated) internet?

If you have infinite time, you can go the monkeys on typewriters route - one of them will come up with something decent, unless an egregore gets them, or something. Though that's very unlikely to be needed - assuming that alignment is solvable by a human level intelligence (this is doing a lot of work), then it should eventually be solved.

Comment by mruwnik on All AGI Safety questions welcome (especially basic ones) [April 2023] · 2023-04-14T10:18:30.655Z · LW · GW

This seems to be mixing 2 topics. Existing programs are more or less a set of steps to execute. A glorified recipe. The set of steps can be very complicated, and have conditionals etc., but you can sort of view them that way. Like a car rolling down a hill, it follows specific rules. An AI is (would be?) fundamentally different in that it's working out what steps to follow in order to achieve its goal, rather than working towards its goal by following prepared steps. So continuing the car analogy, it's like a car driving uphill, where it's working to forge a path against gravity.

An AI doesn't have to be a utility maximiser. If it has a single coherent utility function (pretty much a goal), then it will probably be a utility maximiser. But that's by no means the only way of making them. LLMs don't seem to be utility maximisers

Comment by mruwnik on 10 reasons why lists of 10 reasons might be a winning strategy · 2023-04-07T14:00:15.671Z · LW · GW

worker bees are infertile

Only for social bees, like honey bees or bumblebees - > 90% of bee species are solitary, and most certainly fertile (if they are to have any chance of being successful evolutionary). Which I suppose only serves to support your point even more...

Comment by mruwnik on Misgeneralization as a misnomer · 2023-04-07T13:53:52.131Z · LW · GW

It seems a bit more subtle than that. These are both cases of outer misalignment, or rather goal misspecification. The second case is not so much that it ends up with an incorrect goal (which happens in both cases), but that you have multiple smaller goals that initially were resulting in the correct behavior, but when the conditions change (training -> deployment) the delicate balance breaks down and a different equilibrium is achieved, which from the outside looks like a different goal.

It might be useful to think of it in terms of alliances, e.g. during WW2, the goal was to defeat the Nazis, but once that was achieved, they ended up in a different equilibrium.

Comment by mruwnik on Beren's "Deconfusing Direct vs Amortised Optimisation" · 2023-04-07T13:32:25.902Z · LW · GW

Right. That's on me for skimming the commentary section...

Comment by mruwnik on Beren's "Deconfusing Direct vs Amortised Optimisation" · 2023-04-07T13:18:34.204Z · LW · GW

Is this sort of the difference between System 1 and 2 thinking?

Comment by mruwnik on Eliezer Yudkowsky’s Letter in Time Magazine · 2023-04-06T13:12:47.773Z · LW · GW

You can add questions to stampy - if you click "I'm asking something else" it'll show you 5 unanswered questions that sound similar, which you can then bump their priority. If none of them match, click on the "None of these: Request an answer to my exact question above" for it to be added to the queue

Comment by mruwnik on [deleted post] 2023-04-04T13:04:25.453Z

You have to be able to stop human coordination

This could actually be quite quick - if you have the throughput to find or generate kompromat for everyone that could resist you, as well as a mechanism for negotiating with massive numbers of people at once, you could conceivably cow or convince most people into submission. With a couple of highly efficient information attacks (or even physical “accidents”) targeting those that turn out to have a backbone.

Comment by mruwnik on [deleted post] 2023-04-04T13:02:54.043Z

be worth distinguishing

An additional question is whether this could also explain the issues people have with takeoff speeds? I wonder how often someone says “I think it will take an AI a few days from waking up to putting unstoppable plans to take over the world into action” while the listeners hear “I think we’ll all fall over dead a few days after the AI wakes up”?