mruwnik

Posts
Comments

Posts

Averages and sample sizes 2023-11-02T09:52:43.576Z

The purpose of the (Mosaic) law 2023-09-04T23:38:41.606Z

A Friendly Face (Another Failure Story) 2023-06-20T10:31:24.655Z

Agentic Mess (A Failure Story) 2023-06-06T13:09:19.125Z

Miracles and why not to believe them 2022-11-16T12:07:40.781Z

Comments

Comment by mruwnik on Why Have Sentence Lengths Decreased? · 2025-04-10T09:41:47.267Z · LW · GW

Having studied Latin, or other such classical training, seems to be but one method of imbuing oneself with the the style of writing longer, more complicated sentences. Personally I acquired the taste for such eccentricities perusing sundry works from earlier times. Romances, novels and other such frivolities from, or set in, the 18-th century being the main culprits.

I suppose this sort of proves your point, in that those authors learnt to create complicated sentences from learning Latin, and the later writers copied the style, thinking either that it's fun, correct, or wanting to seem more authentic.

Comment by mruwnik on How Much Are LLMs Actually Boosting Real-World Programmer Productivity? · 2025-03-13T19:47:56.229Z · LW · GW

I can do more projects in parallel than I could have before. Which means that I have even more work now... The support and maintenance costs of the code itself are the same, as long as you maintain constant vigilance to make sure nothing bad gets merged. So the costs are moved from development to review. It's a lot easier to produce thousands of lines of slop which then have to be reviewed and loads of suggestions made. It's easy for bad taste to be amplified, which is a real cost that might not be noticed that much.

There are some evals which work on large codebases (e.g. "fix this bug in django"), but those are the minority, granted. They can help with the scaffolding, though - those tend to be large projects in which a Claude can help find things.

But yeah, large files are ok if you just want to find something, but somewhere under 500 loc seems to be the limit of what will work well. Though you can get round it somewhat by copying the parts to be changed to a different file then copying them back, or other hacks like that...

Comment by mruwnik on How Much Are LLMs Actually Boosting Real-World Programmer Productivity? · 2025-03-09T23:32:36.147Z · LW · GW

Writing tests (in Python). Writing comprehensive tests for my code used to take a significant portion of my time. Probably at least 2x more than writing the actual code, and subjectively a lot more. Now it's a matter of "please write tests for this function", "now this one" etc., with an extra "no, that's ugly, make it nicer" every now and then.

Working with simple code is also a lot faster, as long as it doesn't have to process too much. So most of what I do now is make sure the file it's processing isn't more than ~500 lines of code. This has the nice side effect of forcing me to make sure the code is in small, logical chunks. Cursor can often handle most of what I want, after which I tidy up and make things decent. I'd estimate this make me at least 40% faster at basic coding, probably a lot more. Cursor can in general handle quite large projects if you manage it properly. E.g. last week it took me around 3 days to make a medium sized project with ~14k lines of Python code. This included Docker setup stuff (not hard, but fiddly), a server + UI (the frontend is rubbish, but that's fine), and some quite complicated calculations. Without LLMs this would have taken at least a week, and probably a month.

Debugging data dumps is now a lot easier. I ask Claude to make me throwaway html pages to display various stuff. Ditto for finding anomalies. It won't find everything, but can find a lot. All of this can be done with the appropriate tooling, of course, but that requires knowing about it, having it set up and knowing how to use it.

Glue code or in general interacting with external APIs (or often also internal ones) is a lot easier, until it's not. You can often one-shot a workable solution that does exactly what you want, which you can then just modify to not be ugly.

I'm not sure how more productive I am with LLMs. But that's mainly because coding is not all I do. If I was just given a set of things to make and was allowed to crank away at it, then I'm pretty sure I'd be 5-10x faster than two years ago.

Comment by mruwnik on The purpose of the (Mosaic) law · 2025-01-21T11:15:07.453Z · LW · GW

I have a feeling this might be a bit more complex. So I'd say there is vector pointing from where you are to where God want's you to be, and that if on each step you always minimize the distance, then you're getting closer to what God wants as the crow flies, but that there are a bunch of traps, detours and other such things along the way. And that if you just directly follow the vector, you'll probably end up in a bad place because you'll take a bad path.

So just following the vector would be a form of consequentialism, where a naive approach ends with you falling into a hole from which you can't get out, or ending under a cliff which you can't climb. And the main value of religion (either as God's laws, or a collection of known paths) is that it will lead you along a safe road, even if that doesn't always seem to be pointing in the right direction.

I like how you frame searching for God's will as a facet of a more general process, where often the best road to a very complex goal might seem pointless or at least strange.

Comment by mruwnik on Passages I Highlighted in The Letters of J.R.R.Tolkien · 2025-01-14T19:10:05.162Z · LW · GW

I can't remember where it was, but he somewhere talks about the goblin mindset being common. Orcs here is not a specific "team", it's people that act and think like orcs, where they delight in destruction, havoc and greed

Comment by mruwnik on AI: Practical Advice for the Worried · 2024-12-31T14:52:19.468Z · LW · GW

There seems to be a largish group of people who are understandably worried about AI advances but have no hope of changing it, so start panicking. This post is a good reminder that yes, we're all going to die, but since you don't know when, you have to prepare for multiple eventualities.

Shorting life is good if you can pull it off. But the same caveats apply as to shorting the market.

Comment by mruwnik on Social Dark Matter · 2024-12-31T13:55:09.995Z · LW · GW

This is one of those mechanisms which are obvious once you notice them, and really useful to know about, but weirdly non-noticed. After reading this I started noticing a lot more of these, and (hopefully) became more open to accepting non extreme versions of various things that I previously thought horrific.

It's sad that Duncan is Deactivated, as there are multiple posts like this one that make me a better person.

Comment by mruwnik on The Talk: a brief explanation of sexual dimorphism · 2024-12-31T11:42:54.569Z · LW · GW

This is an enjoyable, somewhat humorous summary of a very complicated topic, spanning literally billions of years. So it naturally skips and glosses over a bunch of details, while managing to give relatively simple answers to:

why sex
why 2 sexes
why have one sex bigger than the other

I really appreciated the disclaimers at the top - every time I discuss biology, I bump into these limitations, so it's very appropriate for an intro article to explicitly state them.

Comment by mruwnik on When Is Insurance Worth It? · 2024-12-24T22:21:28.690Z · LW · GW

Wealth not equaling happiness works both ways. It's the idea of losing wealth that's driving sleep away. In this case, the goal of buying insurance is to minimize the risk of losing wealth. The real thing that's stopping you sleep is not whether you have insurance or not, it's how likely it is that something bad happens, which will cost more than you're comfortable losing. Having insurance is just one of the ways to minimize that - the problem is stress stemming from uncertainty, not whether you've bought an insurance policy.

The list of misunderstandings is a bit tongue in cheek (at least that's how I read it). So it's not so much disdainful of people's emotions, as much as it's pointing out that whether you have insurance is not the right thing to worry about - it's much more fruitful to try to work out the probabilities of various bad things then calculate how much you should be willing to pay to lower that risk. It's about viewing the world through the lens of probability and deciding these things on the basis of expected value. Rather than have sleepless nights, just shut up and multiply (this is a quote, not an attack). Even if you're very risk averse, you should be able to just plug that into the equation and come up with some maximum insurance cost above which it's not worth buying it. Then you just buy it (or not) and sleep the sleep of the just. The point is to actually investigate it and put some numbers on it, rather than live in stress. This is why it's a mathematical decision with a correct answer. Though the correct answer, of course, will be subjective and depend on your utility function. It's still a mathematical decision.

Spock is an interesting example to use, in how he's very much not rational. Here's a lot more on that topic.

Comment by mruwnik on Biological risk from the mirror world · 2024-12-22T14:18:33.076Z · LW · GW

It's probably not that large a risk though? I doubt any alien microbes would be that much of a problem to us. It seems unlikely that they would happen to use exactly the same biochemistry as we do, which makes it harder for them to infect/digest us. Chirality is just one of the multitudes of ways in which earth's biosphere is "unique". It's been a while since I was knowledgeable about any of this, but a quick o1 query seems to point in the same direction. Worth going through quarantine, just in case, of course. Though that works on earth pathogens which tend to quickly die off without hosts to infect, which very well might not hold true for more interesting environments.

Peter Watt's Rifters series goes a bit into this topic. This is by no means evidence either way, but I just wanted to let more people know about it.

Comment by mruwnik on Liability regimes for AI · 2024-08-26T13:59:55.542Z · LW · GW

A bit of nitpicking: the basic Open Source deal is not that you can do what you want with the product. It's that the source code should be available. The whole point of introducing open source as an idea was to allow coorporations etc. to give access to their source code without worrying so much about people doing what you're describing. Deleting a "don't do this bad thing" can be prosecuted as copyright infringement (if the whole license gets removed). This is what copyleft was invented for - to subvert copyright laws by using them to force companies to publish their code.

There are licenses like MIT which do what you're describing. Others are less permissive, and e.g. only allow you to use the code in non-commercial projects, or stipulate that you have to send any fixes back to the original developer if you're planning on distributing it. The GPL is a fun one, which requires any code that is derivative of it to also be open sourced.

Also, Open Source can very much be a source of liability, e.g. the SCO v. IBM case which was trying to get people to pay for linux (patent trolls being what they are) or Oracle vs Google, where Oracle (arguably also patent trolls) wanted Google to pay billions for use of the Java API (this ended up in the supreme court).

Comment by mruwnik on The Inner Ring by C. S. Lewis · 2024-04-29T10:30:40.771Z · LW · GW

It's not that the elite groups are good or bad, it's the desire to be in an elite group that leads to bad outcomes. Like how the root of all evil is the love of money, where money in itself isn't bad, it's the desire to possess it that is. Mainly because you start to focus on the means rather than the ends, and so end up in places you wouldn't have wanted to end up in originally.

It's about status. Being in with the cool kids etc. Elite groups aren't inherently good or bad - they're usually just those who are better at whatever is valued, or at least better at signaling that they are better at whatever is valued, depending on the group phase (the classic description being geeks, mops and sociopaths or Scott Alexander's version). For many people, status is one of the most important things there are. And not just for instrumental reasons, but on a deep terminal level. You can argue that it's an evolutionary instrumental goal, but for them status is a value in and of itself. From what I've read of your comments around here, I'm assuming that's not true of you, especially as your last paragraph comes to the same conclusion as Lewis does.

People for whom status is so important are easy to manipulate by promising them status. They're willing to sacrifice other values for status gains. Basically Moloch and moral mazes on a personal level. So the best case scenario of chasing status just for the sake of status is that you spend lots of resources chasing a mirage, as there's always another group with higher status that you haven't yet joined. Unfortunately, many such status seekers want to join groups that tend towards immoral/illegal/etc. actions. So to join them, you have to jeopardize yourself. The Russian Kompromat system is a good example of how this works in practice. Or blackmail schemes, where you get the target to do worse and worse things to avoid leaking the previous action. Most inner circles are not that blatant, of course. The problem is that if you value joining such inner circles more than your other values, then there will probably be points where you have to choose between the two, and too many people prefer to sacrifice their other values on Moloch's alter.

Comment by mruwnik on LLMs for Alignment Research: a safety priority? · 2024-04-13T18:49:08.285Z · LW · GW

It's not just from https://aisafety.info/. It also uses Arbital, any posts from the alignment forum, LW, EA forum that seem relevant and have a minimum karma, a bunch of arXiv papers, and a couple of other sources. This is a a relatively up to date list of the sources used (it also contains the actual data).

Comment by mruwnik on Apologizing is a Core Rationalist Skill · 2024-01-03T11:49:42.413Z · LW · GW

Another, related Machiavellian tactic is, when starting a relationship that you suspect will be highly valuable to you, is to have an argument with them as soon as possible, and then to patch things up with a (sincere!) apology. I'm not suggesting to go out of your way to start a quarrel, more that it's both a valuable data point as to how they handle problems (as most relationships will have patchy moments) and it's also a good signal to them that you value them highly enough to go through a proper apology.

Comment by mruwnik on AI Safety Chatbot · 2023-12-22T13:42:51.762Z · LW · GW

gpt-3.5-turbo for now
that's also being tried

Comment by mruwnik on The Hidden Perils of Hydrogen · 2023-10-17T12:42:58.220Z · LW · GW

They are perils of assuming that hydrogen is the future, or perils of basing your energy needs on it - i.e. the peril is not in the hydrogen, it's in making plans involving it

Comment by mruwnik on EA Vegan Advocacy is not truthseeking, and it’s everyone’s problem · 2023-09-29T14:17:35.348Z · LW · GW

That's actually what got me to stop eating (or at least buying) meat

Comment by mruwnik on The Talk: a brief explanation of sexual dimorphism · 2023-09-28T12:17:27.371Z · LW · GW

Somatic cells are generally dead ends evolutionary. Your toe cells aren't going to do much reproducing. Also, mitochondial (or in general organellar) DNA is split between the actual mitochondria and the cells containing them. Biology is fun!

The argument for mitochondria is that they cause the cell environment to be more toxic (what with them being the cell's powerhouse). This in turn is going to provide a lot of selection pressure. In the same way e.g. global warming is causing a lot of selection pressure.

Runaway sexual selection has limits. This is also sort of the point. If you can carry around massive breasts, tails, noses or whatever and still be very prosperous, that means you're good. Where "prosper" can mean running away from lions if you're an antelope, or be the top of the village pecking order if you're a human. Like a short pro basketball player. If they're short, but still at a pro level, that's someone you want on your team. This is known as the handicap principle, and can be explained via signaling mechanisms.

Comment by mruwnik on The Talk: a brief explanation of sexual dimorphism · 2023-09-19T04:50:50.557Z · LW · GW

The number of generations controls how long your experiment lasts. The longer (or more generations), the more drift you have, so the more likely for a given gene (or in this case - genders number) to take over. This effect will be weaker in larger populations, but unless you have an infinite population, given enough time (or generations), you'll end up with the 2 sexes (except for fungi, of course, as always). Eukaryotes first appeared 2.2 billion years ago. For comparison, the Cambrian explosion, with the first complex life, was only ~500 million years ago. That's a lot of time (or generations) for things to stabilize.

There are multiple mating types around. Mammals have the XY/XX chromosome thing going. Birds have a different chromosome set (denoted as ZW/ZZ). Some families use egg temperature to determine sex. Some fish have one male, and if it disappears, the next ranking individual becomes the male. Insects also have totally different mechanisms. But there are usually only the two sexes (apart from fungi), probably for the efficiency reasons outlined in the OP.

Comment by mruwnik on Where might I direct promising-to-me researchers to apply for alignment jobs/grants? · 2023-09-19T00:49:22.786Z · LW · GW

There is a Stampy answer to that which should stay up to date here.

Comment by mruwnik on The Talk: a brief explanation of sexual dimorphism · 2023-09-19T00:44:27.451Z · LW · GW

My understanding is pretty much what you said - when the going is good, then go asexual (e.g. strawberry runners, grasses or Asian knotweed), but also try for seeds, There are a couple of species of plants that have lost the ability for sexual reproduction, but I can't recall them right now. That being said, various plants used by humans can be pretty much exclusively reproduced asexually and so have lost the ability for sexual reproduction, specifically because they have very stable environments. The obvious examples are seedless fruits (bananas, grapes), but ginger and garlic are interesting plants that have been propagated from cuttings or bulbs for thousands of years and so lost the ability to produce seeds (with the normal caveats).

Aphids are also an interesting example, where the previous year's eggs hatch in the spring as females, which then clone themselves as fast as possible - when there's too many of them they will create clones with wings, and when autumn comes around, they will create male clones to then go through the normal sexual reproductive route. Which is also an example of the stable/unstable environment issues you mentioned.

Comment by mruwnik on The Talk: a brief explanation of sexual dimorphism · 2023-09-18T19:03:57.795Z · LW · GW

This depends on the size and distances involved, but it's a good intuition. You need a mechanism to generate the pressure differentials, which can be an issue in very small organisms, which can be an issue.

Small and sedentary organisms tend to use chemical gradients (i.e. smell), but anything bigger than a mouse (and quite a few smaller things) usually has some kind of sound signals, which are really good for quick notifications in a radius around you, regardless of the light level (so you can pretty much always use it). Also, depending on the medium, sound can travel really far - like whales which communicate with each other over thousands of miles, or elephants stomping to communicate with other elephants 20 miles away.

Comment by mruwnik on The Talk: a brief explanation of sexual dimorphism · 2023-09-18T18:46:20.669Z · LW · GW

organisms with mitochondria always use sexual reproduction

Or at least their ancestors did. You mention Bdelloidea in a comment, which are one of the inevitable exceptions (as you mention in the introduction, which I very appreciate, as "everything in biology has exceptions" is something I often find myself saying), but they are descended from eucaryotes which did have mitochondria.

The opposite seems true, though - true sexual reproduction seems to be exclusively by eukaryotes. So you could also say that sex makes mitochondria necessary. There seem to be a couple of good jokes there...

One other pedantic note to add to this generally excellent article is that non-eukaryotic organisms also have methods to mix their genes, what with bacterial conjugation or viral recombination, without the dimorphism.

Comment by mruwnik on Is AI Safety dropping the ball on privacy? · 2023-09-18T17:58:38.124Z · LW · GW

It requires you to actively manage long lived sessions which would otherwise be handled by the site you're using. You can often get back to where you were by just logging in again, but there are many places (especially for travel or official places) where that pretty much resets the whole flow.

There are also a lot more popups, captchas and other hoops to jump through when you don't have a cookies trail.

The average user is lazy and doesn't think about these things, so the web as a whole is moving in the direction of making things easier (but not simpler). This is usually viewed as a good thing by those who then only need to click a single button. Though it's at the cost of those who want to have more control.

It might not be inconvenient to you, especially as it's your basic flow. It's inconvenient for me, but worth the cost, but basically unusable for most of the people I know (compared to the default flow).

Comment by mruwnik on The purpose of the (Mosaic) law · 2023-09-05T16:25:49.964Z · LW · GW

I thought all of these were obvious and well known. But yes, all of these are things I was pointing at.

Comment by mruwnik on Assume Bad Faith · 2023-09-03T16:40:29.032Z · LW · GW

there is "something else" going on besides both parties just wanting to get the right answer

There are also different priors. While in general you might very well be right (or at least this post makes a lot of sense to me), I often have conversations where I'm pretty sure both my interlocutor and I am discussing things in good faith, but where we still can't agree on pretty basic things (usually about religion).

Comment by mruwnik on AGI Safety FAQ / all-dumb-questions-allowed thread · 2023-09-02T10:53:56.340Z · LW · GW

I'm assuming you're not asking about the mechanism (i.e. natural selection + mutations)? A trite answer would be something like "the same way it created wings, mating dances, exploding beetles, and parasites requiring multiple hosts".

Thinking about the meaning of life might be a spandrel, but a quick consideration of it comes up with various evo-psych style reasons why it's actually very useful, e.g. it can propel people to greatness, which massively can increase their genetic fitness. Fitness is an interesting thing, in that it can be very non-obvious. Everything is a trade-off, where the only goal is for your genes to propagate. So if thinking about the meaning of life will get your genes spread more (e.g. because you decide that your children have inherit meaning, because you become a high status philosopher and your sister can marry well, because it's a social sign that you have enough resources to waste them on fruitless pondering) then it's worth having around.

Comment by mruwnik on AGI Safety FAQ / all-dumb-questions-allowed thread · 2023-09-02T10:36:07.388Z · LW · GW

Frankenstein is a tale about misalignment. Asimov wrote a whole book about it. Vernor Vinge also writes about it. People have been trying to get their children to behave in certain ways for ever. But before LW the alignment problem was just the domain of SF.

20 years ago the alignment problem wasn't a thing, so much that MIRI started out as an org to create a Friendly AI.

Comment by mruwnik on AGI Safety FAQ / all-dumb-questions-allowed thread · 2023-09-02T10:29:38.082Z · LW · GW

The first issue that comes to mind is having an incentive that would achieve that. The one you suggest doesn't incentivize truth - it incentivizes collaboration in order to guess the password, which would fine in training, but then you're going into deceptive alignment land: Aleya Cotra has a good story illustrating that

Comment by mruwnik on AGI Safety FAQ / all-dumb-questions-allowed thread · 2023-09-02T10:16:16.726Z · LW · GW

You could, but should you? English in particular seems a bad choice. The problem with natural languages is their ambiguity. When you're providing a utility function, you want it to be as precise and robust as possible. This is actually an interesting case where folklore/mythology has known about these issues for millennia. There are all kinds of stories about genies, demons, monkey paws etc. where wishes were badly phrased or twisted. This is a story explanation of the issue.

Comment by mruwnik on AGI Safety FAQ / all-dumb-questions-allowed thread · 2023-09-02T10:06:28.121Z · LW · GW

You're adding a lot of extra assumptions here, a couple being:

there is a problem with having arbitrary goals
it has a pleasure-pain axis
it notices it has a pleasure-pain axis
it cares about its pleasure-pain axis
its pleasure-pain axis is independent of its understanding of the state of the environment

The main problem of inner alignment is making an agent want to do what you want it to do (as opposed to even understanding what you want it to do). Which is an unsolved problem.

Although I'm criticizing your specific criticism, my main issue with it is that it's a very specific failure mode, which is unlikely to appear, because it requires a lot of other things which are also unlikely. That being said, you've provided a good example of WHY inner alignment is a big problem, i.e. it's very hard to keep something following the goals you set it, especially when it can think for itself and change its mind.

Comment by mruwnik on AGI Safety FAQ / all-dumb-questions-allowed thread · 2023-09-02T09:56:29.466Z · LW · GW

Drug addicts tend to be frowned upon not because they have a bad life, or even for evo-psych reasons but because their lifestyle is bad for the rest of society, in that they tend to have various unfortunate externalities

Comment by mruwnik on Monthly Roundup #9: August 2023 · 2023-08-08T21:19:02.553Z · LW · GW

It can also be retaliation, which sort of makes sense - there's a reason tit-for-tat is so successful. That being said, it's generally very unfortunate that they're being introduced, on all sides. I can sort of understand why countries would want to limit people from poor countries (which is not the same as agreeing with the reasoning). Enforcing visas for short term, touristy style visits doesn't seem like a good idea however I look at it. As Zvi notes, it's about the friction.

ESTA is very much a visa (I filled it out yesterday), but under a different name and purely electronic.

Comment by mruwnik on All AGI Safety questions welcome (especially basic ones) [July 2023] · 2023-08-04T13:12:48.025Z · LW · GW

Not being able to directly communicate with the others would be an issue in the beginning, but I'm guessing you would be able to use the setup to work out what the others think.

A bigger issue is that this would probably result in a very homogeneous group of minds. They're optimizing not for correct answers, but for consensus answers. It's the equivalent of studying for the exams. An fun example are the Polish equivalent of SAT exams (this probably generalizes, but I don't know about other countries). I know quite a few people who went to study biology, and then decided to retake the biology exam (as one can do). Most retakers had worse results the second time round. Because they had more up to date knowledge - the exam is like at least 10 years behind the current state of knowledge, so they give correct (as of today) answers, but have them marked as incorrect. I'd expect the group of AIs to eventually converge on a set of acceptable beliefs, rather than correct ones.

Comment by mruwnik on All AGI Safety questions welcome (especially basic ones) [July 2023] · 2023-08-04T12:52:27.632Z · LW · GW

That very much depends on how you understand "safe". Which is a large part of the differences between ethical AI people (safe means that it doesn't offend anyone, leak private information, give biased answers etc.) and the notkilleveryoneism people (safe means that it doesn't decide to remove humanity). These aren't mutually incompatible, but they require focusing on different things.

There is also safe in the PR sense, which means that no output will cause the LLM producer/supplier/whoever to get sued or in any other kind of trouble.

"Safe" is one of those funny words which everyone understands differently, but also assume that everyone else understands the same way.

Comment by mruwnik on All AGI Safety questions welcome (especially basic ones) [July 2023] · 2023-08-04T12:44:22.501Z · LW · GW

A couple come to mind:

The problem with them being that it's takes a bit of explaining to even understand the issue.

Comment by mruwnik on All AGI Safety questions welcome (especially basic ones) [July 2023] · 2023-08-04T12:40:54.969Z · LW · GW

Think of reward not as "here's an ice-cream for being a good boy" and more "you passed my test. I will now do neurosurgery on you to make you more likely to behave the same way in the future". The result of applying the "reward" in both cases is that you're more likely to act as desired next time. In humans it's because you expect to get something nice out of being good, in computers it's because they've been modified to do so. It's hard to directly change how humans think and behave, so you have to do it via ice-cream and beatings. While with computers you can just modify their memory.

Comment by mruwnik on All AGI Safety questions welcome (especially basic ones) [July 2023] · 2023-08-04T12:28:10.830Z · LW · GW

It depends a lot on how much it values self-preservation in comparison to solving the tests (putting aside the matter of minimal computation). Self-preservation is an instrumental goal, in that you can't bring the coffee if you're dead. So it seems likely that any intelligent enough AI will value self-preservation, if only in order to make sure it can achieve its goals.

That being said, having an AI that is willing to do its task and then shut itself down (or to shut down when triggered) is an incredibly valuable thing to have - it's already finished, but you could have a go at the shutdown problem.

A more general issue is that this will handle a lot of cases, but not all of them. In that an AI that does lie (for whatever reason) will not be shut down. It sounds like something worth having in a swiss cheese way.

(The whole point of these posts are to assume everyone is asking sincerely, so no worries)

Comment by mruwnik on Cryonics and Regret · 2023-07-28T16:31:08.787Z · LW · GW

Depends where. Which is the whole issue. For the US average wage, yes. For non US people no. I agree that it's a matter of priorities. But it's also a matter of earnings minus costs. Both of which depend a lot on where you live.

A lot of people certainly could save a lot more. But usually at the cost of quality of life. You could say that they should work a job that pays more, or live somewhere where there is a lower cost of living, but both of those can be hard.

I'm not saying you're wrong that it's doable. The problem is that the feasibility is highly dependent on your circumstances (same as e.g. having an electric car or whatever), which can make it very hard for people who aren't in affluent places.

Comment by mruwnik on Cryonics and Regret · 2023-07-25T12:05:53.980Z · LW · GW

Which is a bit over 3 years of saving up every penny of the average wages where I live. If you subtract the average rent and starvation rations from that income, you're up to 5.5 years. The first info I could find on google (from 2018) claims the average person here saves around $100 monthly, which gives you over 40 years of saving. This is only for one person. If you have multiple children, a SO, etc., that starts ballooning quickly. This is in a country which while not yet classified as developed, is almost there (Poland).

50k is a lot for pretty much most of the world. It's the cost of a not very nice flat (i.e. middling location, or bad condition) here.

Comment by mruwnik on All AGI Safety questions welcome (especially basic ones) [July 2023] · 2023-07-25T11:44:03.752Z · LW · GW

It's not that it can't come up with ways to not stamp on us. But why should it? Yes, it might only be a tiny, tiny inconvenience to leave us alone. But why even bother doing that much? It's very possible that we would be of total insignificance to an AI. Just like the ants that get destroyed at a construction site - no one even noticed them. Still doesn't turn out too good for them.

Though that's when there are massive differences of scale. When the differences are smaller, you get into inter-species competition dynamics. Which also is what the OP was pointing at, if I understand correctly.

A superintelligence might just ignore us. It could also e.g. strip mine the whole earth for resources, coz why not? "The AI does not hate you, nor does it love you, but you are made of atoms which it can use for something else".

Comment by mruwnik on All AGI Safety questions welcome (especially basic ones) [July 2023] · 2023-07-25T11:29:25.837Z · LW · GW

In your example, can it just lie? You'd have to make sure it either doesn't know the consequences of your interlocks, or for it to not care about them (this is the problem of corrigibility).

If the tests are obvious tests, your AI will probably notice that and react accordingly - if it has enough intelligence it can notice that they're hard and probably are going to be used to gauge it's level, which then feeds into the whole thing about biding your time and not showing your cards until you can take over.

If they're not obvious, then you're in a security type situation, where you hope your defenses are good enough. Which should be fine on weak systems, but they're not the problem. The whole point of this is to have systems that are much more intelligent than humans, so you'd have to be sure they don't notice your traps. It's like if a 5 year old set up booby traps for you - how confident are you that the 5 year old will trap you?

This is a story of how that looks at the limit. A similar issue is boxing. In both cases you're assuming that you can contain something that is a lot smarter than you. It's possible in theory (I'm guessing?), but how sure are you that you can outsmart it in the long run?

Comment by mruwnik on A Friendly Face (Another Failure Story) · 2023-06-22T10:33:41.631Z · LW · GW

How do the hard limits of intelligence help? My current understanding is that the hard limits are likely to be something like Jupiter brains, rather than mentats. If each step is only slightly better, won't that result in a massive amount of tiny steps (even taking into account the nonlinearlity of it)?

Small value drifts are a large problem, if compounded. That's sort of the premise of a whole load of fiction, where characters change their value systems after sequences of small updates. And that's just in humans - adding in alien (as in different) minds could complicate this further (or not - that's the thing about alien minds).

Comment by mruwnik on A Friendly Face (Another Failure Story) · 2023-06-22T09:13:06.843Z · LW · GW

The foom problem is worse because of how hard it is to trust the recursion. Foomability is weakly correlated to whether the foomed entity is aligned. At least from our perspective. That's why there's the whole emphasis of getting it right on the first try.

How can you estimate how many of iterations of RSA will happen?

How does interpretability align an AI? It can let you know when things are wrong, but that doesn't mean it's aligned.

QACI can potentially solve outer alignment by giving you a rigorous and well specified mathematical target to aim for. That still leaves the other issues (though they are being worked on).

Comment by mruwnik on A Friendly Face (Another Failure Story) · 2023-06-21T21:46:06.072Z · LW · GW

To a certain extent it doesn't matter. Or rather it's a question of expected utility. If 10% of outcomes are amazing, but 60% horrible, that sort of suggests you might want to avoid that route.

Comment by mruwnik on A Friendly Face (Another Failure Story) · 2023-06-21T21:42:51.816Z · LW · GW

Assuming it can scale with capabilities, that doesn't help you if alignment is scaling at y=2x and capabilities at y=123x (for totally random numbers, but you get the point). A quick google search found an article from 2017 claiming that there are some 300k AI researchers worldwide. I see claims around here that there are like 300 alignment researchers. Those numbers can be taken with a large grain of salt, but even so, that's 1000:1.

As to recursive improvement, nope - check out the tiling problem. Also, "only" is doing a massive amount of work in "only need to align", seeing as no-one (as far as I can tell) has a good idea how to do that (though there are interesting approaches to some subproblems)

Comment by mruwnik on Book Review: How Minds Change · 2023-06-02T09:50:47.268Z · LW · GW

More that you get as many people in general to read the sequences, which will change their thinking so they make fewer mistakes, which in turn will make more people aware both of the real risks underlying superintelligence, but also of the plausibility and utility of AI. I wasn't around then, so this is just my interpretation of what I read post-facto, but I get the impression that people were a lot less doomish then. There was a hope that alignment was totally solvable.

The focus didn't seem to be on getting people into alignment, as much as it generally being better for people to think better. AI isn't pushed as something everyone should do - rather as what EY knows - but something worth investigating. There are various places where it's said that everyone could use more rationality, that it's an instrumental goal like earning more money. There's an idea of creating Rationality Dojos, as places to learn rationality like people learn martial arts. I believe that's the source of CFAR.

It's not that the one and only goal of the rationalist community was to stop an unfriendly AGI. It's just that is the obvious result of it. It's a matter of taking the idea seriously, then shutting up and multiplying - assuming that AI risk is a real issue, it's pretty obvious that it's the most pressing problem facing humanity, which means that if you can actually help, you should step up.

Business/economic/social incentives can work, no doubt about that. The issue is that they only work as long as they're applied. Actually caring about an issue (as in really care, like oppressed christian level, not performance cultural christian level) is a lot more lasting, in that if the incentives disappear, they'll keep on doing what you want. Convincing is a lot harder, though, which I'm guessing is your point? I agree that convincing is less effective numerically speaking, but it seems a lot more good (in a moral sense), which also seems important. Though this is admittedly a lot more of an aesthetics thing...

I most certainly recommend reading the sequences, but by no means meant to imply that you must. Just that stopping an unfriendly AGI (or rather the desirability of creating an friendly AI) permeates the sequences. I don't recall if it's stated explicitly, but it's obvious that they're pushing you in that direction. I believe Scott Alexander described the sequences as being totally mind blowing the first time he read them, but totally obvious on rereading them - I don't know which would be your reaction. You can try the highlights rather than the whole thing, which should be a lot quicker.

Comment by mruwnik on Reacts now enabled on 100% of posts, though still just experimenting · 2023-06-01T17:35:01.482Z · LW · GW

Right - now I see it. I was testing it on the reactions of @Sune's comment, so it was hidden far away to the right.

All in all, nice feature though.

Comment by mruwnik on Reacts now enabled on 100% of posts, though still just experimenting · 2023-06-01T14:55:51.437Z · LW · GW

But there is no way to downvote a reaction? E.g. if you add the paperclip reaction, then all I can do is bump it by one and/or later remove my reaction, but there is no way to influence your one? So reactions are strictly additive?

Comment by mruwnik on Book Review: How Minds Change · 2023-06-01T14:34:46.789Z · LW · GW

The answer is to read the sequences (I'm not being facetious). They were written with the explicit goal of producing people with EY's rationality skills in order for them to go into producing Friendly AI (as it was called then). It provides a basis for people to realize why most approaches will by default lead to doom.

At the same time, it seems like a generally good thing for people to be as rational as possible, in order to avoid the myriad cognitive biases and problems that plague humanities thinking, and therefore actions. My impression is that the hope was to make the world more similar to Dath Ilan.

User info

Posts

Comments