Posts

Snake Eyes Paradox 2023-06-11T04:10:38.733Z

Comments

Comment by Martin Randall (martin-randall) on What I expected from this site: A LessWrong review · 2024-12-20T17:14:23.805Z · LW · GW

There’s also a question about cross-domain transferability of good takes.

Agreed. That isn't a difference between contributing "considerations" and "predictions" (using Habryka's reported distinction). There are people who contribute good analysis about geopolitics. Others contribute good analysis about ML innovations. Does that transfer to analysis about AGI / ASI? Time will tell - mostly when it's already too late. We will try anyway.

In terms of predicting the AI revolution the most important consideration is what will happen to power. Will it be widely or narrowly distributed? How much will be retained by humans? More importantly, can we act in the world to change any of this? These are similar to geopolitical questions, so I welcome analysis and forecasts from people with a proven track record in geopolitics.

The industrial revolution is a good parallel. Nobody in 1760 (let alone 1400) predicted the detailed impacts of the industrial revolution. Some people predicted that population and economic growth would increase. Adam Smith had some insights into power shifts (Claude adds Benjamin Franklin, François Quesnay and James Steuart). That's about the best I expect to see for the AI revolution. It's not nothing.

Comment by Martin Randall (martin-randall) on How I Learned To Stop Trusting Prediction Markets and Love the Arbitrage · 2024-12-19T21:59:02.305Z · LW · GW

I don't think we disagree on culture. I was specifically disagreeing with the claim that Metaculus doesn't have this problem "because it is not a market and there is no cost to make a prediction". Your point that culture can override or complement incentives is well made.

Comment by Martin Randall (martin-randall) on How I Learned To Stop Trusting Prediction Markets and Love the Arbitrage · 2024-12-19T20:29:29.940Z · LW · GW

The cost to make a prediction is time. The incentive of making it look like "Metaculus thinks X" is still present. The incentive to predict correctly is attenuated to the extent that it's a long-shot conditional or a far future prediction. So Metaculus can still have the same class of problem.

Comment by Martin Randall (martin-randall) on Iron deficiencies are very bad and you should treat them · 2024-12-19T11:54:17.741Z · LW · GW

If anyone's tracking impact, my family had five people tested due in large part due to this post, of whom five were low and started supplementing. We're not even vegan.

Comment by Martin Randall (martin-randall) on An Equilibrium of No Free Energy · 2024-12-19T03:45:58.113Z · LW · GW

Would you, failing to observe anything on the subject after a couple of hours of Googling, conclude that your civilization must have some unknown good reason why not everyone was doing this already?

No, but not for "Civilizational Adequacy" reasons. In a hypothetical civilization that is Adequate in the relevant ways, but otherwise like ours, I would also not conclude that there was an unknown good reason why not everyone was doing this already. Here's a simple model to apply for many values of X:

  • If civilization has an unknown-to-me reason why X is a good idea, I expect to observe search results saying that X is a good idea and giving the reason.
  • If civilization has an unknown-to-me reason why X is a bad idea, I expect to observe search results saying that X is a bad idea and giving the reason.
  • If civilization does not know if X is a good or bad idea, I expect to observe no search results, or mixed search results.

I don't see any way for me to conclude, from a lack of search results, that civilization must have some unknown good reason why not everyone was doing this already.

I tried to disprove this by thinking of a really random and stupid X. Something so randomly stupid that I could be the first person in civilization to think of it. The idea of "inject bleach to cure covid" was already taken. What if I generalize to injecting some other thing to cure some other thing? My brain generated "cure cancer by injecting hydrogen peroxide". No, sorry, not random and stupid enough, the internet contains Does Hydrogen Peroxide Therapy Work?, it was the first search result.

More randomness and stupidity needed! How about putting cats in the freezer to reduce their metabolism and therefore save money on cat food? Well, yes this appears to be a new innovation in the field of random stupid ideas. On the other hand the first ten search results included Cat survives 19-hour ordeal in freezer so civilization has a reason why that is a bad idea and it's available for anyone who searches for it.

I'm obviously not creative enough so I asked Claude. After a few failed attempts (emojis in variables names! jumping jacks before meetings!) we got to:

Improve remote work meetings by requiring everyone to keep one finger touching their nose at all times while speaking, with their video on. Missing the nose means you lose speaking privileges for 30 seconds.

Success: a truly novel, truly terrible idea. In this case, civilization has multiple good reasons why not everyone is doing this already, but there aren't any specific search results on this specific idea. Even then, if I spend a couple of hours searching and reading, I'm going to hit some tangential search results that will give me some hints.

Comment by Martin Randall (martin-randall) on Understanding Shapley Values with Venn Diagrams · 2024-12-19T01:18:50.360Z · LW · GW

I don't think this proposal satisfies Linearity (sorry, didn't see kave's reply before posting). Consider two days, two players.

Day 1:

  • A => $200
  • B => $0
  • A + B => $400

Result: $400 to A, $0 to B.

Day 2:

  • A => $100
  • B => $100
  • A + B => $200

Result: $100 to A, $100 to B.

Combined:

  • A => $300
  • B => $100
  • A + B => $600
  • So: Synergy(A+B) => $200

Result: $450 to A, $150 to B. Whereas if you add the results for day 1 and day 2, you get $500 to A, $100 to B.

Comment by Martin Randall (martin-randall) on Aumann-agreement is common · 2024-12-16T19:28:18.229Z · LW · GW

This is pretty common in any joint planning exercise. My friend and I are deciding which movie to see together. We share relevant information about what movies are available and what movies we each like and what movies we have seen. We both conclude that this movie here is the best one to see together.

Comment by Martin Randall (martin-randall) on Aumann-agreement is common · 2024-12-16T19:23:59.843Z · LW · GW

This is excellent. Before reading this post in 2023, I had the confusion described. Roughly, that Aumann agreement is rationally correct, but this mostly doesn't happen, showing that mostly people aren't rational. After reading this post, I understood that Aumann agreement is extremely common, and the exceptions where it doesn't work are best understood as exceptions. Coming back to read it in 2024, it seems obvious. This is a symptom of the post doing its job in 2023.

This is part of a general pattern. When I think that human behavior is irrational, I know nothing. When I see how human behavior can be modeled as rational, I have learned something. Another example is how people play The Ultimatum Game. When I was shown how turning down an "unfair" share can be modeled as a rational response to coercion, I had a better model with better predictions and a better appreciation of my fellow humans.

The post is short, clearly written, seeks to establish a single thing, establishes it, and moves on without drama. Perhaps this is why it didn't get a lot of engagement when it was posted. The 2023 review is a chance to revisit this.

I could build on this post by describing how Aumann agreement occurs in prediction markets. On Manifold there are frequently markets where some group of people think "90% YES" and others think "90% NO" and there are big feelings. If this persists over a long enough period, with no new evidence coming in, the market settles at some small percentage range with people on both sides hiding behind walls of limit orders and scowling at each other. To some extent this is because both sides have built up whatever positions satisfy their risk tolerance. But a lot of it is the horrible feeling that the worst people in the world may be making great points.

Comment by Martin Randall (martin-randall) on GPTs are Predictors, not Imitators · 2024-12-16T15:14:04.084Z · LW · GW

Does this look like a motte-and-bailey to you?

  1. Bailey: GPTs are Predictors, not Imitators (nor Simulators).
  2. Motte: The training task for GPTs is a prediction task.

The title and the concluding sentence both plainly advocate for (1), but it's not really touched by the overall post, and I think it's up for debate (related: reward is not the optimization target). Instead there is an argument for (2). Perhaps the intention of the final sentence was to oppose Simulators? If that's the case, cite it, be explicit. This could be a really easy thing for an editor to fix.


Does this look like a motte-and-bailey to you?

  1. Bailey: The task that GPTs are being trained on is ... harder than the task of being a human.
  2. Motte: Being an actual human is not enough to solve GPT's task.

As I read it, (1) is false, the task of being a human doesn't cap out at human intelligence. More intelligent humans are better at minimizing prediction error, achieving goals, inclusive genetic fitness, whatever you might think defines "the task of being a human". In the comments, Yudkowsky retreats to (2), which is true. But then how should I understand this whole paragraph from the post?

And since the task that GPTs are being trained on is different from and harder than the task of being a human, it would be surprising - even leaving aside all the ways that gradient descent differs from natural selection - if GPTs ended up thinking the way humans do, in order to solve that problem.

If we're talking about how natural selection trained my genome, why are we talking about how well humans perform the human task? Evolution is optimizing over generations. My human task is optimizing over my lifetime. Also, if we're just arguing for different thinking, surely it mostly matters whether the training task is different, not whether it is harder?


Overall I think "Is GPT-N bounded by human capabilities? No." is a better post on the mottes and avoids staking out unsupported baileys. This entire topic is becoming less relevant because AIs are getting all sorts of synthetic data and RLHF and other training techniques thrown at them. The 2022 question of the capabilities of a hypothetical GPT-N that was only trained on the task of predicting human text is academic in 2024. On the other hand, it's valuable for people to practice on this simpler question before moving on to harder ones.

Comment by Martin Randall (martin-randall) on grey goo is unlikely · 2024-12-16T04:13:58.450Z · LW · GW

Does the recent concern about mirror life change your mind? It's not nano, but it does imply there's a design space not explored by bio life, which implies there could be others, even if specifically diamonds don't work.

Comment by Martin Randall (martin-randall) on [Fiction] A Disneyland Without Children · 2024-12-16T01:31:18.739Z · LW · GW

I enjoyed this but I didn't understand the choice of personality for Alice and Charlie, it felt distracting. I would have liked A&C to have figured out why this particular Blight didn't go multi-system.

Comment by Martin Randall (martin-randall) on Understanding Shapley Values with Venn Diagrams · 2024-12-15T16:09:19.320Z · LW · GW

Playing around with the math, it looks like Shapley Values are also cartel-independent, which was a bit of a surprise to me given my prior informal understanding. Consider a lemonade stand where Alice (A) has the only lemonade recipe and Bob (B1) and Bert (B2) have the only lemon trees. Let's suppose that the following coalitions all make $100 (all others make $0):

  • A+B1
  • A+B2
  • A+B1+B2 (excess lemons get you nothing)

Then the Shapley division is:

  • A: $50
  • B1: $25
  • B2: $25

If Bob and Bert form a cartel/union/merger and split the profits then the fair division is the same.

Previously I was expecting that if there are a large number of Bs and they don't coordinate, then Alice would get a higher proportion of the profits, which is what we see in real life. This also seems to be the instinct of others (example).

I think I'm still missing something, not sure what.

Comment by Martin Randall (martin-randall) on My Mental Model of Infohazards · 2024-12-15T15:43:14.491Z · LW · GW

A dissenting voice on info-hazards. I appreciate the bulleted list starting of premises and building towards conclusions. Unfortunately I don't think all the reasoning holds up to close scrutiny. For example, the conclusion that "infohoarders are like black holes for infohazards" conflicts with the premise that "two people can keep a secret if one of them is dead". The post would have been stronger if it had stopped before getting into community dynamics.

Still, this post moved and clarified my thinking. My sketch at a better argument for a similar conclusion is below:

Definitions:

  • hard-info-hazard: information that reliably causes catastrophe, no mitigation possible.
  • soft-info-hazard: information that risks catastrophe, but can be mitigated.

Premises:

  1. Two people can keep a secret if one of them is dead.
  2. If there are hard-info-hazards then we are already extinct, we just don't know it.
  3. You, by yourself, are not smart enough to tell if an info-hazard is hard or soft.
  4. Authorities with the power to mitigate info-hazards are not aligned with your values.

Possible strategies on discovering an infohazard:

  1. Tell nobody.
  2. Tell everybody.
  3. Follow a responsible disclosure process.

Expected Value calculations left as an exercise for the reader, but responsible disclosure seems favored. The main exception is if we are in Civilizational Hospice where we know we are going extinct in the next decade anyway and are just trying to live our last few years in peace.

Comment by Martin Randall (martin-randall) on Noting an error in Inadequate Equilibria · 2024-12-15T03:35:28.137Z · LW · GW

Sometimes when I re-read Yudkowsky's older writings I am still comfortable with the model and conclusion, but the evidence seems less solid than on first reading. In this post, Matthew Barnett poses problems for the evidence from Japan in Yudkowsky's Inadequacy and Modesty. Broadly he claims that Haruhiko Kuroda's policy was not as starkly beneficial as Yudkowsky claims, although he doesn't claim the policy was a mistake.

LessWrong doesn't have a great system for handling (alleged) flaws in older posts. Higher rated posts have become more visible with the "enriched" feed, which is good, but there isn't an active mechanism for revising them in the face of critiques. In this case the author is trying to make our extinction more dignified and revisiting Japan's economy in 2013 isn't an optimal use of time. In general, authors shouldn't feel that posting to LessWrong obliges them to defend their writings in detail years or decades later.

I don't know that Barnett's critique is important enough to warrant a correction or a footnote. But it makes me wish for an editor or librarian to make that judgment, or for someone to make a second edition, or some other way that I could recommend "Read the Sequences" without disclaimers.

Comment by Martin Randall (martin-randall) on Inadequacy and Modesty · 2024-12-15T03:25:54.444Z · LW · GW

I'm an epistemically modest person, I guess. My main criticism is one that is already quoted in the text, albeit with more exclamation points than I would use:

You aren’t so specially blessed as your priors would have you believe; other academics already know what you know! Civilization isn’t so inadequate after all!

It's not just academics. I recall having a similar opinion to Yudkowsky-2013. This wasn't a question of careful analysis of econobloggers, I just read The Economist, the most mainstream magazine to cover this type of question, and I deferred to their judgment. I started reading The Economist because my school and university had subscriptions. The reporting is paywalled but I'll cite Revolution in the Air (2013-04-13) and Odd men in (1999-05-13) for anyone with a subscription, or just search for Haruhiko Kuroda's name.

Japan 2013 monetary policy is a win for epistemic modesty. Instead of reading econblogs and identifying which ones make the most sense, or deciding which Nobel laureates and prestigious economists have the best assessment of the situation, you can just upload conventional economic wisdom into your brain as an impressionable teenager and come to good conclusions.

Disclaimer: Yudkowsky argues this doesn't impact his thesis about civilizational adequacy, defined later in this sequence. I'm not arguing that thesis here, better to take that up where it is defined and more robustly defended.

Comment by Martin Randall (martin-randall) on The Hidden Complexity of Wishes · 2024-12-13T03:56:08.355Z · LW · GW

I liked this discussion but I've reread the text a few times now, and I don't think this fictional Outcome Pump can be sampling from the quantum wavefunction. The post gives examples that work with classical randomness, and not so much with quantum randomness. Most strikingly:

... maybe a powerful enough Outcome Pump has aliens coincidentally showing up in the neighborhood at exactly that moment.

The aliens coincidentally showing up in the neighborhood is a surprise to the user of the Outcome Pump, but not to the aliens who have been traveling for a thousand years to coincidentally arrive at this exact moment. They could be from the future, but the story allows time rewinding, not time travel. It's not sampling from the user's prior, because the user didn't even consider the gas main blowing up.

I think the simplest answer consistent with the text is that the Outcome Pump is magic, and sampling from what the user's prior "should be", given their observations.

Comment by Martin Randall (martin-randall) on The Hidden Complexity of Wishes · 2024-12-13T03:32:20.022Z · LW · GW

Yes, and. The post is about the algorithmic complexity of human values and it is about powerful optimizers ("genies") and it is about the interaction of those two concepts. The post makes specific points about genies, including intelligent genies, that it would not make if it was not also about genies. Eg:

There are three kinds of genies: Genies to whom you can safely say "I wish for you to do what I should wish for"; genies for which no wish is safe; and genies that aren't very powerful or intelligent.

You wrote, "the Outcome Pump is a genie of the second class". But the Time Travel Outcome Pump is fictional. The genie of the second class that Yudkowsky-2007 expects to see in reality is an AI. So the Outcome Pump is part of a parable for this aspect of powerful & intelligent AIs, despite being unintelligent.

There's lots of evidence I could give here, the tags ("Parables & Fables"), a comment from Yudkowsky-2007 on this post, and the way others have read it, both in the comments and in other posts like Optimality is the Tiger. Also, the Time Travel Outcome Pump is easy to use safely, it's not the case that "no wish is safe", and that attitude only makes sense parabolically. I don't think that's a valuable discussion topic, I'm not sure you would even disagree.

However, when reading parables, it's important to understand what properties transfer and what properties do not. Jesus is recorded as saying "The Kingdom of Heaven is like a pearl of great price". If I read that and go searching for heaven inside oysters then I have not understood the parable. Similarly, if someone reads this parable and concludes that an AI will not be intelligent then they have not understood the parable or the meaning of AI.

I don't really see people making that misinterpretation of this post, it's a pretty farcical take. I notice you disagree here and elsewhere. Given that, I understand your desire for a top-of-post clarification. Adding this type of clarification is usually the job of an editor.

Comment by Martin Randall (martin-randall) on Making a conservative case for alignment · 2024-12-10T02:57:19.963Z · LW · GW

Also, I will refer to them using the name they actually used at that time. (If I talk about the Ancient Rome, I don't call it Italian Republic either.)

A closer comparison than Ancient Rome is that all types of people change their names on occasions, e.g. on marriage, so we have lots of precedent for referring to people whose names have changed. This includes cases where they strongly dislike their former names. Those traditions balance niceness, civilization, rationality, and free speech.

Disclaimer: not a correction, just a perspective.

Comment by Martin Randall (martin-randall) on Ayn Rand’s model of “living money”; and an upside of burnout · 2024-12-09T03:54:04.230Z · LW · GW

Thanks for the extra information. Like you, my plans and my planning can be verbal, non-verbal, or a mix.

Why refer to it as a "verbal conscious planner" - why not just say "conscious planner"? Surely the difference isn't haphazard?

I can't speak for the author, but thinking of times where I've "lacked willpower" to follow a plan, or noticed that it's "draining willpower" to follow a plan, it's normally verbal plans and planning. Where "willpower" here is the ability to delay gratification rather than to withstand physical pain. My model here is that verbal plans are shareable and verbal planning is more transparent, so it's more vulnerable to hostile telepaths and so to self-deception and misalignment. A verbal plan is more likely to be optimized to signal virtue.

Suppose I'm playing chess and I plan out a mate in five, thinking visually. My opponent plays a move that lets me capture their queen but forgoes the mate. I don't experience "temptation" to take the queen, or have to use "willpower" to press ahead with the mate. Whereas a verbal plan like "I'm still a bit sick, I'll go to bed early" is more likely to be derailed by temptation. This could of course be confounded by the different situations.

I think you raise a great question, and the more I think about it the less certain I am. This model predicts that people who mostly think visually have greater willpower than those who think verbally. Which I instinctively doubt, it doesn't sound right. But then I read about the power of visualization and maybe I shouldn't? Eg Trigger-Action Planning specifically calls out rehearsed visualization as helping to install TAPs.

Comment by Martin Randall (martin-randall) on (The) Lightcone is nothing without its people: LW + Lighthaven's big fundraiser · 2024-12-07T04:01:34.478Z · LW · GW

Thanks for explaining. I now understand you to mean that LessWrong and Lighthaven are dramatically superior to the alternatives, in several ways. You don't see other groups trying to max out the quality level in the same ways. Other projects may be similar in type, but they are dissimilar in results.

To clarify on my own side, when I say that there are lots of similar projects to Lighthaven, I mean that many people have tried to make conference spaces that are comfortable and well-designed, with great food and convenient on-site accommodation. Similarly, when I say that there are lots of similar projects to LessWrong, I mean that there are many forums with a similar overall design and moderation approach. I wasn't trying to say that the end results are similar in terms of quality. These are matters of taste, anyway.

Sorry for the misunderstanding.

Comment by Martin Randall (martin-randall) on (The) Lightcone is nothing without its people: LW + Lighthaven's big fundraiser · 2024-12-06T23:55:41.677Z · LW · GW

I don't understand. LessWrong is a discussion forum, there are many discussion forums. LightHaven is a conference center, there are many conference centers. There are lots of similar projects. I'm confident that Lightcone provides value, and plan to donate, but I don't understand this frame.

Comment by Martin Randall (martin-randall) on Noting an error in Inadequate Equilibria · 2024-12-06T02:59:47.690Z · LW · GW

I agree it's not worldwide. An alternative read is that Japan's GDP in 2013 was ~5 trillion US dollars, and so there were trillions of dollars "at stake" in monetary policy, but that doesn't mean that any particular good policy decision has an expected value in the trillions. If the total difference between good policy and typical policy is +1% GDP growth then these are billion dollar decisions, not trillion dollar decisions.

By contrast "trillions of euros of damage" is wrong (or hyperbole). The EU's GDP is about 5x Japan's but the ECB has stronger constraints on its actions, including its scope for quantitative easing. I expect those also to be billion dollar decisions in general.

Comment by Martin Randall (martin-randall) on Noting an error in Inadequate Equilibria · 2024-12-06T02:36:25.645Z · LW · GW

If someone doesn't reliably know what Japan's monetary policy currently is, then they probably also don't reliably know what Japan's monetary policy should be. If your map has "you are here" in the wrong place, then your directions are suspect.

Comment by Martin Randall (martin-randall) on Evaluating the historical value misspecification argument · 2024-12-05T04:32:28.479Z · LW · GW

If memory serves, I had a convo with some openai (or maybe anthropic?) folks about this in late 2021 or early 2022ish, where they suggested testing whether language models have trouble answering ethical Qs, and I predicted in advance that that'd be no harder than any other sort of Q. As makes me feel pretty good about me being like "yep, that's just not much evidence, because it's just not surprising."

This makes sense. Barnett is talking about an update between 2007 and 2023. GPT-3 was 2020. So by 2021/2022 you had finished making the update and were not surprised further by GPT-4.

Comment by Martin Randall (martin-randall) on Evaluating the historical value misspecification argument · 2024-12-05T04:20:29.312Z · LW · GW

Barnett is talking about what GPT-4 can do. GPT-4 is not a superintelligence. Quotes about what superintelligence can do are not relevant.

Where does Barnett say "AI is good at NLP, therefore alignment is easy"? I don't see that claim.

Evidence that MIRI believed "X is hard" is not relevant when discussing whether MIRI believed "Y is hard". Many things are hard about AI Alignment.

Comment by Martin Randall (martin-randall) on Evaluating the historical value misspecification argument · 2024-12-05T04:13:39.512Z · LW · GW

In this post Matthew Barnett notices that we updated our beliefs between ~2007 and ~2023. I say "we" rather than MIRI or "Yudkowsky, Soares, and Bensinger" because I think this was a general update, but also to defuse the defensive reactions I observe in the comments.

What did we change our mind about? Well, in 2007 we thought that safely extracting approximate human values into a convenient format would be impossible. We knew that a superintelligence could do this. But a superintelligence would kill us, so this isn't helpful. We knew that human values are more complex than fake utility functions or magical categories. So we can't hard-code human values into a utility function. So we looked for alternatives like corrigibility.

By 2023, we learned that a correctly trained LLM can extract approximate human values without causing human extinction (yet). This post points to GPT-4 as conclusive evidence, which is fair. But GPT-3 was an important update and many people updated then. I imagine that MIRI and other experts figured it out earlier. This update has consequences for plans to avoid extinction or die with more dignity.

Unfortunately much of the initial commentary was defensive, attacking Barnett for claims he did not make. Yudkowsky placed a disclaimer on Hidden Complexity of Wishes implausibly denying that it is an AI parable. This could be surprising. Yudkowsky's Coming of Age and How to Actually Change Your Mind sequences are excellent. What went wrong?

An underappreciated sub-skill of rationality is noticing that I have, in the past, changed my mind. For me, this is pretty easy when I think back to my teenage years. But I'm in my 40s now, and I find it harder to think of major updates during my 20s and 30s, despite the world (and me) changing a lot in that time. Seeing this pattern of defensiveness in other people made me realize that it's probably common, and I probably have it to. I wish I had a guide to middle-aged rationality. In middle-age my experience is supposed to be my value-add, but conveniently forgetting my previous beliefs throws some of that away.

Comment by Martin Randall (martin-randall) on Magical Categories · 2024-12-05T03:24:16.104Z · LW · GW

I shall call this the fallacy of magical categories - simple little words that turn out to carry all the desired functionality of the AI.  Why not program a chess-player by running a neural network (that is, a magical category-absorber) over a set of winning and losing sequences of chess moves, so that it can generate "winning" sequences?  Back in the 1950s it was believed that AI might be that simple, but this turned out not to be the case.

And then in the 2020s it turned out to be the case again! Eg ChessGPT. Today I learned that Stockfish is now a neural network (trained on board positions, not move sequences).

This in no way cuts against the point of this post, but it stood out when I read this 16 years after it was posted.

Comment by Martin Randall (martin-randall) on Evaluating the historical value misspecification argument · 2024-12-05T02:56:01.703Z · LW · GW

This is good news because this is more in line with my original understanding of your post. It's a difficult topic because there are multiple closely related problems of varying degrees of lethality and we had updates on many of them between 2007 and 2023. I'm going to try to put the specific update you are pointing at into my own words.

From the perspective of 2007, we don't know if we can lossilly extract human values into a convenient format using human intelligence and safe tools. We know that a superintelligence can do it (assuming that "human values" is meaningful), but we also know that if we try to do this with an unaligned superintelligence then we all die.

If this problem is unsolvable then we potentially have to create a seed AI using some more accessible value, such as corrigibility, and try to maintain that corrigibility as we ramp up intelligence. This then leads us to the problem of specifying corrigibility, and we see "Corrigibility is anti-natural to consequentialist reasoning" on List of Lethalities.

If this problem is solvable then we can use human values sooner and this gives us other options. Maybe we can find a basin of attraction around human values for example.

The update between 2007 and 2023 is that the problem appears solvable. GPT-4 is a safe tool (it exists and we aren't extinct yet) and does a decent job. A more focused AI could do the task better without being riskier.

This does not mean that we are not going to die. Yudkowsky has 43 items on List of Lethalities. This post addresses part of item 24. The remaining items are sufficient to kill us ~42.5 times. It's important to be able to discuss one lethality at a time if we want to die with dignity.

Comment by Martin Randall (martin-randall) on Welcome to The Research Triangle SSC Meetup!!! · 2024-12-04T22:18:44.736Z · LW · GW

Is this still active?

Comment by Martin Randall (martin-randall) on Evaluating the historical value misspecification argument · 2024-12-04T18:22:41.031Z · LW · GW

My read of older posts from Yudkowsky is that he anticipated a midrange level of complexity of human values, compared to your scale of simple mathematical function to perfect simulation of human experts.

Yudkowsky argued against very low complexity human values in a few places. There's an explicit argument against Fake Utility Functions that are simple mathematical functions. The Fun Theory Sequence is too big if human values are a 100 line python program.

But also Yudkowsky's writing is incompatible with extremely complicated human values that require a perfect simulation of human experts to address. This argument is more implicit, I think because that was not a common position. Look at Thou Art Godshatter and how it places the source of human values in the human genome, downstream of the "blind idiot god" of Evolution. If true, human values must be far less complicated than the human genome.

GPT-4 is about 1,000x bigger than the human genome. Therefore when we see that GPT-4 can represent human values with high fidelity this is not a surprise to Godshatter Theory. It will be surprising if we see that very small AI models, much smaller than the human genome, can represent human values accurately.

Disclaimers: I'm not replying to the thread about fragility of value, only complexity. I disagree with Godshatter Theory on other grounds. I agree that it is a small positive update that human values are less complex than GPT-4.

Comment by Martin Randall (martin-randall) on The Hidden Complexity of Wishes · 2024-12-04T17:45:55.986Z · LW · GW

The example in the post below is not about an Artificial Intelligence literally at all! If the post were about what AIs supposedly can't do, the central example would have used an AI!

Contra this assertion, Yudkowksy-2007 was very capable of using parables. The "genie" in this article is easily recognized as metaphorically referring to an imagined AI. For example, here is Yudkowsky-2007 in Lost Purposes, linking here:

I have seen many people go astray when they wish to the genie of an imagined AI, dreaming up wish after wish that seems good to them, sometimes with many patches and sometimes without even that pretense of caution.

Similarly, portions of Project Lawful are about AI, That Alien Message is about AI, and so forth.

I'm very sympathetic to the claim that this parable has been misinterpreted. This is a common problem with parables! They are good at provoking thought, they are good at strategic ambiguity, and they are bad at clear communication.

I'm not sympathetic to the claim that this post is not about AI literally at all.

Comment by Martin Randall (martin-randall) on Universal Basic Income and Poverty · 2024-12-04T15:00:25.814Z · LW · GW
Comment by Martin Randall (martin-randall) on 2024 Unofficial LessWrong Census/Survey · 2024-12-03T22:42:52.128Z · LW · GW

Should non-theistic religions, such as Buddhism, go under "Deist/Pantheist/etc" or "Atheist but spiritual"?

Some of the probability questions are awkward given the recent argument that (raw) probabilities are cursed and given that P("god") is higher in simulations and there is an explicit P(simulation). I weighted the probabilities by leverage in an unprincipled way.

It would be nice to have an "undefined" answer for some probability questions, eg the P(Cryonics) and P(Anti-Agathics) questions mostly gave me a divide-by-zero exception. I suppose the people who believe in epsilon probabilities have to suck it up as well. But there was an N/A exception given for the singularity, so maybe that could be a general note.

Comment by Martin Randall (martin-randall) on Universal Basic Income and Poverty · 2024-12-03T21:03:40.910Z · LW · GW

Aphyer was discussing "working 60-hour weeks, at jobs where they have to smile and bear it when their bosses abuse them", not specifically "poor" people. My experience of people working such hours, even on a low wage, is that they are proud of their work ethic and their ability to provide and that because of their hard work they have nice things and a path to retirement. They don't consider themselves poor - they are working hard precisely to not be poor. As a concrete example, people in the armed forces have to smile and bear it when their bosses send them into war zones, never mind lower level abuse like being yelled at and worked past the point of exhaustion and following deliberately stupid orders.

That said, your question prompted me to get some statistics regarding the consumption patterns of low income households. I found the US BLS expenditure by income decile, and looked at the lowest decile.

This is emphatically not the same thing as either "poor" or "working 60-hour weeks". People in this decile are not employed for 60hrs/week, because 60hrs/week for 40 weeks at federal minimum wage is $17,400 and puts someone in the second decile for income. Most of these people are retired or unemployed and spending down savings, which is why mean expenditure is $31,000/year vs mean income of $10,000/year. I welcome better data, I could not find it.

Those caveats aside, the bottom decile spent, on average (mean):

  • 0.4% on sugar/sweets, $116/yr
  • 0.8% on alcohol, $236/yr.
  • 4.7% on food away from home, $1,458/yr
  • 3.8% on entertainment, $1,168/yr
  • 1.2% on nicotine, $383/yr

We're looking at ~10% spending on these categories. From experience and reading I expect some fraction of spending in other categories to be "luxury" in the sense of not being strictly required, perhaps ~10%. This is in no way a criticism. Small luxuries are cheap and worth it. Few people would agree to work ~20% fewer hours if it meant living in abject poverty.

I'm curious what answer you would give to your own questions.

Comment by Martin Randall (martin-randall) on My experience using financial commitments to overcome akrasia · 2024-12-03T15:53:01.803Z · LW · GW

Update: I'm now familiar with the term "demand avoidance". One recommendation for caregivers is "declarative language". On LessWrong we might call it "guess culture" or perhaps "tell culture". Aesthetically I dislike it, but it works for this child (in combination with other things, including your good advice of persuasion and positive reinforcement).

Comment by Martin Randall (martin-randall) on Ayn Rand’s model of “living money”; and an upside of burnout · 2024-12-02T02:33:15.711Z · LW · GW

Oh, I remember and liked that comment! But I didn't remember your username. I have a bit more information about that now, but I'll write it there.

From the model in this article, I think the way this should work in the high-willpower case is that your planner gets credit aka willpower for accurate short-term predictions and that gives it credit for long-term predictions like "if I get good grades then I will get into a good college and then I will get a good job and then I will get status, power, sex, children, etc".

In your case it sounds like your planner was predicting "if I don't get good grades then I will be homeless" and this prediction was wrong, because your parents supported you. Also it was predicting "if I get a good job then it will be horrifying", which isn't true for most people. Perhaps it was mis-calibrated and overly prone to predicting doom? You mention depression in the linked comment. From the model in this article, someone's visceral processes will respond to a mis-calibrated planner by reducing its influence aka willpower.

I don't mean to pry. The broader point is that improving the planner should increase willpower, with some lag while the planner gets credit for its improved plans. The details of how to do that will be different for each person.

Comment by Martin Randall (martin-randall) on Ayn Rand’s model of “living money”; and an upside of burnout · 2024-12-01T23:24:07.606Z · LW · GW

What should happen is that you occasionally fail to do homework and instead play video games. Then there are worse negative consequences as predicted. And then your verbal planner gets more credit and so you have more willpower.

Comment by Martin Randall (martin-randall) on Ayn Rand’s model of “living money”; and an upside of burnout · 2024-12-01T23:17:45.768Z · LW · GW

Our conscious thought processes are all the ones we are conscious of. Some of them are verbal, in words, eg thinking about what you want to say before saying it. Some of them are nonverbal, like a conscious awareness of guilt.

Most people have some form of inner monologue, aka self-talk, but not all. It sounds like you may be one of those with limited or no self-talk. Whereas I might think, in words, "I should get up or I'll be late for work", perhaps you experience a rising sense of guilt.

To benefit from this article you'll need to translate it to fit your brain patterns.

Comment by Martin Randall (martin-randall) on A shot at the diamond-alignment problem · 2024-11-29T17:31:10.743Z · LW · GW

It's looking like the values of humans are far, far simpler than a lot of evopsych literature and Yudkowsky.

I've missed this. Any particular link to get to me started reading about this update? Shard theory seems to imply complex values in individual humans. Though certainly less fragile than Yudkowsky proposed.

Comment by Martin Randall (martin-randall) on Visible Thoughts Project and Bounty Announcement · 2024-11-28T15:58:43.290Z · LW · GW

But I don't think these came about through training on synthetic thought-annotated texts.

Comment by Martin Randall (martin-randall) on Unnatural Categories Are Optimized for Deception · 2024-11-27T04:10:43.845Z · LW · GW

I find this hypothetical about neural fireplaces curious, because the ambiguity exists in real fireplaces, speculative fiction is not needed. Please excuse any inaccuracies in this brief history of fireplaces:

  1. Wood-burning fireplaces
  2. Gas-burning fireplaces
  3. Central heating
  4. Electric heaters
  5. Decorative fireplaces (no heat)

The original fireplaces produced both heat and a decorative flame effect. With each new type of invention there was a question of what to do with our previous terms. We've ended up with "heaters" to refer to things that heat a room and "fireplace" to refer to things that have a decorative flame effect. Both of these things are slightly fuzzy natural categories in the sense of this post.

Except... maybe we should say that "decorative" is a privative adjective and so a "decorative fireplace" isn't really a fireplace? For the sake of the thought experiment, let's say that practical rural folk place a higher value on having a secondary heat source because it takes longer to restore electricity after a storm. Meanwhile snobby urbanites place a higher value on decorative flame effects because they value gaining status through conspicuous consumption.

I see that someone could say "well, it's not a real fireplace, is it?" in order to signal that they share the values of practical rural folks. If they're actually a snobby urbanite politician and they don't actually have those practical rural values then they are being deceptive. That would be a deception about values, not about heat sources.

If a practical rural person says "well, it's not a real fireplace, is it?", then that could indeed be a true signal of their values. But my guess is the more restrictive meaning of fireplace came first. The causal diagram is something like:

  • Practical Rural Values -> Categorize functional fireplaces separately to decorative fireplaces -> Use the short word "fireplace" for functional fireplaces (for communication and signaling)

Not:

  • Practical Rural Values -> Use the short word "fireplace" for functional fireplaces (for signaling) -> Categorize functional fireplaces separately to decorative fireplaces

Because until practical rural folks have settled on a common meaning of "fireplace", they can't reliably use that meaning to signal their values to each other or to outsiders.

Except... maybe if it got caught up in the modern culture war there could be a flood of fireplace-related memes and then everyone would have very strong opinions about the best definition of "fireplace" a few months later for no real reason? Wow, that sure would suck for the CEO of Decorative Fireplaces Inc.

Comment by Martin Randall (martin-randall) on Making a conservative case for alignment · 2024-11-24T03:57:59.818Z · LW · GW

I saw the the EA Forum's policy. If someone repeatedly and deliberately misgenders on the EA Forum they will be banned from that forum. But you don't need to post on the EA Forum at all in order to be part of the rationalist community. On the provided evidence, it is false that:

You are required to say certain things or you will be excluded from the community.

I want people of all political beliefs, including US conservative-coded beliefs, to feel welcome in the rationalist community. It's important to that goal to distinguish between policies and norms, because changing policies requires a different process to changing norms, and because policies and norms are unwelcoming in different ways and to different extents.

It's because of that goal that I'm encouraging you to change these incorrect/misleading/unclear statements. If newcomers incorrectly believe that they are required to say certain things or they will be excluded from the community, then they will feel less welcome, for nothing. Let's avoid that.

Comment by Martin Randall (martin-randall) on The Worst Form Of Government (Except For Everything Else We've Tried) · 2024-11-24T00:53:05.793Z · LW · GW

Yes, the UK govt is sometimes described as "an elected dictatorship". To the extent this article's logic applies, it works almost exactly the opposite of the description given.

  • The winning party is determined by democracy (heavily distorted by fptp single winner constituencies).
  • Once elected, factions within the winning party have the ability to exert veto power in the House of Commons. The BATNA is to bring down the government and force new elections.

The civil service and the judiciary also serve as checks on the executive, along with being a signatory to various international treaties.

Also the UK is easy mode, with a tradition of common law rights stretching back centuries. Many differences with Iraq.

Comment by Martin Randall (martin-randall) on Rationality Quotes - Fall 2024 · 2024-11-23T23:55:43.929Z · LW · GW

Is the quote here: "we're here to devour each other alive"?

Comment by Martin Randall (martin-randall) on Making a conservative case for alignment · 2024-11-21T04:38:10.662Z · LW · GW

Thanks for clarifying. By "policy" and "standards" and "compelled speech" I thought you meant something more than community norms and customs. This is traditionally an important distinction to libertarians and free speech advocates. I think the distinction carves reality at the joints, and I hope you agree. I agree that community norms and customs can be unwelcoming.

Comment by Martin Randall (martin-randall) on Social events with plausible deniability · 2024-11-20T03:24:46.539Z · LW · GW

As described, this type of event would not make me unrestrained in sharing my opinions.

The organizers have additional information regarding what opinions are in the bowl, so are probably in a position to determine which expressed opinions are genuinely held. This is perhaps solvable but it doesn't sound like an attempt was made to solve this. That's fine if I trust the organizers, but if I trust the organizers to know my opinions then I could just express my opinions to the organizers directly and I don't need this idea.

I find it unlikely that someone can pass an Ideological Turing Test for a random opinion that they read off a piece of paper a few minutes ago, especially compared to a genuine opinion they hold. It would be rather depressing if they could, because it implies that their genuine opinions have little grounding. An attendee could deliberately downplay their level of investment and knowledge to increase plausible deniability. But such conversations sound unappealing.

There are other problems. My guess is that most of the work was done by filtering for "a certain kind of person".

Comment by Martin Randall (martin-randall) on Don't Dismiss on Epistemics · 2024-11-20T02:49:58.335Z · LW · GW

Besides, my appeal to authority trumps yours. Yes, they successfully lobbied the American legal system for the title of doctor - arguably this degrades the meaning of the word. Do you take physicians or the American legal system to be the higher authority on matters of health?

The AMA advocates for US physicians, so it has the obvious bias. Adam Smith:

People of the same trade seldom meet together, even for merriment and diversion, but the conversation ends in a conspiracy against the public, or in some contrivance to raise prices.

I do not consider the AMA an impartial authority on matters such as:

  • Are chiropractors doctors?
  • Can AIs give medical advice?
  • How many new doctors should be trained in the US?
  • Can nurses safely provide more types of medical care?
  • How much should doctors be paid?
  • How much training should new doctors receive?
  • Should non-US doctors practice medicine in the US?
  • Should US medical insurance cover medical treatments outside the US?
  • Should we spend more on healthcare in general?

I therefore tend to hug the query and seek other evidence.

Comment by Martin Randall (martin-randall) on Social events with plausible deniability · 2024-11-19T23:59:34.352Z · LW · GW

The example here is that I'm working for an NGO that opposes iodizing salt in developing countries because it is racist, for reasons. I've been reading online that it raises IQ and that raising IQ is good, actually. I want to discuss this in a safe space.

I can do this by having any friends or family who don't work for the NGO. This seems more likely to work than attending a cancellation party at the NGO. If the NGO prevents me from having outside friends or talking to family then it's dangerous and I should get out regardless of its opinion on iodization.

There are better examples, I could offer suggestions if you like, probably you can also think of many.

Comment by Martin Randall (martin-randall) on "The Solomonoff Prior is Malign" is a special case of a simpler argument · 2024-11-19T20:17:29.109Z · LW · GW

We can't reliably kill agents with St Petersburg Paradox because if they keep winning we run out of resources and can no longer double their utility. This doesn't take long, the statistical value of a human life is in the millions and doubling compounds very quickly.

It's a stronger argument for Pascal's Mugging.

Comment by Martin Randall (martin-randall) on Making a conservative case for alignment · 2024-11-18T22:46:15.551Z · LW · GW

cited thread.

Gilliland's idea is that it is the proportion of trans people that dissuades some right-wing people from joining. That seems plausible to me, it matches the "Big Sort" thesis and my personal experience. I agree that his phrasing is unwelcoming.

I tried to find an official pronoun policy for LessWrong, LessOnline, EA Global, etc, and couldn't. If you're thinking of something specific could you say what? As well as the linked X thread I have read the X thread linked from Challenges to Yudkowsky's pronoun reform proposal. But these are the opinions of one person, they don't amount to politically-coded compelled speech. I'm not part of the rationalist community and this is a genuine question. Maybe such policies exist but are not advertised.