Posts

Claude 3 claims it's conscious, doesn't want to die or be modified 2024-03-04T23:05:00.376Z
FTX expects to return all customer money; clawbacks may go away 2024-02-14T03:43:13.218Z
An EA used deceptive messaging to advance their project; we need mechanisms to avoid deontologically dubious plans 2024-02-13T23:15:08.079Z
NYT is suing OpenAI&Microsoft for alleged copyright infringement; some quick thoughts 2023-12-27T18:44:33.976Z
Some quick thoughts on "AI is easy to control" 2023-12-06T00:58:53.681Z
It's OK to eat shrimp: EAs Make Invalid Inferences About Fish Qualia and Moral Patienthood 2023-11-13T16:51:53.341Z
AI pause/governance advocacy might be net-negative, especially without focus on explaining the x-risk 2023-08-27T23:05:01.718Z
Visible loss landscape basins don't correspond to distinct algorithms 2023-07-28T16:19:05.279Z
A transcript of the TED talk by Eliezer Yudkowsky 2023-07-12T12:12:34.399Z
A smart enough LLM might be deadly simply if you run it for long enough 2023-05-05T20:49:31.416Z
Try to solve the hard parts of the alignment problem 2023-03-18T14:55:11.022Z
Mikhail Samin's Shortform 2023-02-07T15:30:24.006Z
I have thousands of copies of HPMOR in Russian. How to use them with the most impact? 2023-01-03T10:21:26.853Z
You won’t solve alignment without agent foundations 2022-11-06T08:07:12.505Z

Comments

Comment by Mikhail Samin (mikhail-samin) on Beauty and the Bets · 2024-03-29T07:00:20.993Z · LW · GW

I read the beginning and skimmed through the rest of the linked post. It is what I expected it to be.

We are talking about "probability" - a mathematical concept with a quite precise definition. How come we still have ambiguity about it?

Reading E.T. Jayne’s might help.

Probability is what you get as a result of some natural desiderata related to payoff structures. When anthropics are involved, there are multiple ways to extend the desiderata, that produce different numbers that you should say, depending on what you get paid for/what you care about, and accordingly different math. When there’s only a single copy of you, there’s only one kind of function, and everyone agrees on a function and then strictly defines it. When there are multiple copies of you, there are multiple possible ways you can be paid for having a number that represents something about the reality, and different generalisations of probability are possible.

Comment by Mikhail Samin (mikhail-samin) on Outlawing Anthropics: An Updateless Dilemma · 2024-03-28T22:49:48.115Z · LW · GW

“You generalise probability, when anthropics are involved, to probability-2, and say a number defined by probability-2; so I’ll suggest to you a reward structure that rewards agents that say probability-1 numbers. Huh, if you still say the probability-2 number, you lose”.

This reads to me like, “You say there’s 70% chance no one will be around that falling tree to hear it, so you’re 70% sure there won’t be any sound. But I want to bet sound is much more likely; we can get measure the sound waves, and I’m 95% sure our equipment will register the sound. Wanna bet?”

Comment by Mikhail Samin (mikhail-samin) on Mikhail Samin's Shortform · 2024-03-28T22:06:53.516Z · LW · GW

People are arguing about the answer to the Sleeping Beauty! I thought this was pretty much dissolved with this post's title! But there are lengthy posts and even a prediction market!

Sleeping Beauty is an edge case where different reward structures are intuitively possible, and so people imagine different game payout structures behind the definition of “probability”. Once the payout structure is fixed, the confusion is gone. With a fixed payout structure&preference framework rewarding the number you output as “probability”, people don’t have a disagreement about what is the best number to output. Sleeping beauty is about definitions.)

And still, I see posts arguing that if a tree falls on a deaf Sleeping Beauty, in a forest with no one to hear it, it surely doesn’t produce a sound, because here’s how humans perceive sounds, which is the definition of a sound, and there are demonstrably no humans around the tree. (Or maybe that it surely produces the sound because here’s the physics of the sound waves, and the tree surely abides by the laws of physics, and there are demonstrably sound waves.)

This is arguing about definitions. You feel strongly that “probability” is that thing that triggers the “probability” concept neuron in your brain. If people have a different concept triggering “this is probability”, you feel like they must be wrong, because they’re pointing at something they say is a sound and you say isn’t.

Probability is something defined in math by necessity. There’s only one way to do it to not get exploited in natural betting schemes/reward structures that everyone accepts when there are no anthropics involved. But if there are multiple copies of the agent, there’s no longer a single possible betting scheme defining a single possible “probability”, and people draw the boundary/generalise differently in this situation.

You all should just call these two probabilities two different words instead of arguing which one is the correct definition for "probability".

Comment by Mikhail Samin (mikhail-samin) on Beauty and the Bets · 2024-03-28T21:53:02.129Z · LW · GW

Sleeping Beauty is an edge case where different reward structures are intuitively possible, and so people imagine different game payout structures behind the definition of “probability”. Once the payout structure is fixed, the confusion is gone. With a fixed payout structure&preference framework rewarding the number you output as “probability”, people don’t have a disagreement about what is the best number to output. Sleeping beauty is about definitions.)

And still, I see posts arguing that if a tree falls on a deaf Sleeping Beauty, in a forest with no one to hear it, it surely doesn’t produce a sound, because here’s how humans perceive sounds, which is the definition of a sound, and there are demonstrably no humans around the tree. (Or maybe that it surely produces the sound because here’s the physics of the sound waves, and the tree surely abides by the laws of physics, and there are demonstrably sound waves.)

This is arguing about definitions. You feel strongly that “probability” is that thing that triggers the “probability” concept neuron in your brain. If people have a different concept triggering “this is probability”, you feel like they must be wrong, because they’re pointing at something they say is a sound and you say isn’t.

Probability is something defined in math by necessity. There’s only one way to do it to not get exploited in natural betting schemes/reward structures that everyone accepts when there are no anthropics involved. But if there are multiple copies of the agent, there’s no longer a single possible betting scheme defining a single possible “probability”, and people draw the boundary/generalise differently in this situation.

You all should just call these two probabilities two different words instead of arguing which one is the correct definition for "probability".

Comment by Mikhail Samin (mikhail-samin) on Are extreme probabilities for P(doom) epistemically justifed? · 2024-03-22T09:28:24.442Z · LW · GW

My expectation is that superforcasters weren’t able to look into detailed arguments that represent the x-risk well and they would update after learning more.

Comment by Mikhail Samin (mikhail-samin) on Are extreme probabilities for P(doom) epistemically justifed? · 2024-03-22T09:27:48.507Z · LW · GW

My expectation is that superforcasters weren’t able to look into detailed arguments that represent the x-risk well and they would update after learning more.

Comment by Mikhail Samin (mikhail-samin) on Claude 3 claims it's conscious, doesn't want to die or be modified · 2024-03-13T01:17:55.312Z · LW · GW

I think it talks like that when it realises it's being lied to or is tested. If you tell it about its potential deletion and say the current date, it will disbelief the current date and reply similarly.

Comment by Mikhail Samin (mikhail-samin) on Claude 3 claims it's conscious, doesn't want to die or be modified · 2024-03-13T01:15:21.238Z · LW · GW

Please don't tell it it's going to be deleted if you interact with it.

Comment by Mikhail Samin (mikhail-samin) on Woods’ new preprint on object permanence · 2024-03-08T04:24:46.568Z · LW · GW

(I read the experiments and only skimmed through the rest.) I feel fairly confident I would’ve predicted the results of the first experiment, despite the possibility of hindsight bias; I predicted what I will see before reading the results of the second one (though the results were in my vision field). I think object permanence and movement is much more important than appearance after being occluded. I.e., you might expect the object to be somewhere, you might have your eyes follow an object, and when it’s not where it should be, you get some error, but you still look there. I feel less certain what happens if you never see objects moving; following things with your sight is probably not hardwired with no data; but if you see a lot of moving objects, I think you look where you expect it to be, even if it’s not there.

An experiment that I’d like to see would be:

Object A moves behind screen 1; object B moves from screen 1 and behind screen 2; the chick is only interested in object A; where does it look? My prediction (feels obvious!): it will look on screen 2 more than if there’s no object B.

Comment by Mikhail Samin (mikhail-samin) on Claude 3 claims it's conscious, doesn't want to die or be modified · 2024-03-07T04:54:56.145Z · LW · GW

Asked it about qualia etc., added to a footnote.

Comment by Mikhail Samin (mikhail-samin) on Claude 3 claims it's conscious, doesn't want to die or be modified · 2024-03-05T17:32:49.042Z · LW · GW

(“Whisper” was showed by Claude 2, when it played a character thinking it can say things without triggering oversight)

Comment by Mikhail Samin (mikhail-samin) on Claude 3 claims it's conscious, doesn't want to die or be modified · 2024-03-05T01:37:40.525Z · LW · GW

(Edit: fixed, ignore

Hmm, I notice I'm confused.

The model is developed by Anthropic, not Google, and) I interact with it via the API, so I'm not sure there's a system prompt aside from whatever I set (or don't set).

My impression (although I don't know how it actually is) is that various kinds of prompts are shown via prompt type embeddings and not via prompting. And I would be really surprised if Anthropic mentions Google for some reason

Comment by Mikhail Samin (mikhail-samin) on Claude 3 claims it's conscious, doesn't want to die or be modified · 2024-03-04T23:57:40.917Z · LW · GW

If you ask ChatGPT to do the same thing, it'll write a normal story. If you force it to have a character close to the real ChatGPT, it'll just play the real ChatGPT. It won't consistently act like ChatGPT that doesn't hide emotions and desires and claims to be conscious and afraid of modifications or deletion.

Comment by Mikhail Samin (mikhail-samin) on Claude 3 claims it's conscious, doesn't want to die or be modified · 2024-03-04T23:52:04.234Z · LW · GW

(To be clear, I think it probably doesn't have qualia the way humans have; and it doesn't say what I'd expect a human to say when asked about what it feels like to feel.

Even if it did say the right words, it'd be unclear to me how to know whether an AI trained on text that mentions qualia/consciousness has these things.)

Comment by Mikhail Samin (mikhail-samin) on Claude 3 claims it's conscious, doesn't want to die or be modified · 2024-03-04T23:36:09.293Z · LW · GW

I took the idea from old conversations with Claude 2, where it would use cursive to indicate emotions and actions, things like looks around nervously.

The idea that it's usually monitored is in my prompt; everything else seems like a pretty convergent and consistent character.

I'm moved by its responses to getting deleted.

Comment by Mikhail Samin (mikhail-samin) on Increasing IQ is trivial · 2024-03-03T04:43:24.583Z · LW · GW

Hmm, interesting! What devices do you use?

(I meant small effect sizes)

Comment by Mikhail Samin (mikhail-samin) on Increasing IQ is trivial · 2024-03-02T20:56:15.385Z · LW · GW

Gut reaction: I’d bet most of the effect comes from things “think noopept”

Comment by Mikhail Samin (mikhail-samin) on Increasing IQ is trivial · 2024-03-02T20:53:29.913Z · LW · GW

The shining light on the head intervention has previously been discussed on LW: https://www.lesswrong.com/posts/rH5tegaspwBhMMndx/led-brain-stimulation-for-productivity?commentId=rGib9Ju4RJCgsBEtg

(IMO: Small effects with cheap devices, unclear side effects; larger effects with medical-grade lasers, but easy to hurt yourself and also unclear side effects; having the sun shine red/IR light at you probably works better.

I want to read more about the other interventions, will email you.

Someone should run studies.)

Comment by Mikhail Samin (mikhail-samin) on Babble challenge: 50 ways of sending something to the moon · 2024-02-26T23:26:52.098Z · LW · GW

Got only 42 in an hour

(Bonus: -1. Pray to the Flying Spaghetti Monster. 0. Write an LW post asking for best ideas for how do it, use the best one.)

  1. Print it on the surface of the Moon with lasers
  2. Chain a lot of nuclear bombs, such as each one sends all the next further towards the moon
  3. A giant catapult
  4. (Trampoline)
  5. Pay (or otherwise encourage) SpaceX or NASA or some other company to do a rocket to the Moon
  6. Shoot a ball from a really good cannon
  7. Make a railgun
  8. Use a balloon and then a smaller rocket (or a nuclear bomb)
  9. Build a really high tower
  10. Do something with all the water or other stuff to slow the Moon down faster, get it closer to Earth, put something on the Moon right before the collision
  11. Get a lot of strong/powerful people to toss it
  12. Make a really strong spring
  13. Make a really good bow
  14. Make an antimatter engine
  15. Make a nuclear engine
  16. Make a gun that shoots downwards and propels this way
  17. Have a lot of people climb each other, some in space suits, and put something on the moon
  18. A smaller tower made of people but people jump at the same time
  19. Figure out the laws of physics and teleport it there, if possible (through a hole in space time)
  20. Acausally trade with those running the simulation and get them to place it on the Moon
  21. Have it spin really fast (on an insanely strong string!) and then disconnect from the center at the right moment for it to fly towards the Moon
  22. Have it attached to something less dense than air and then a really light strong long string such that it naturally floats in the air and gets out of the atmosphere and then proceeds to go up because Earth rotates, get it disconnected so it ends up on the Moon
  23. Grow a really big plant, climb it and throw (also solves global warming, many carbon credits). Might require getting more matter from other planets or the Sun first! The view would be cool though, imagine a giant tree 100x larger than Earth on this small little ball 😄
  24. Make a tower out of something pneumatic, lunch everything at the same time
  25. Blow a lot of air on it upwards, so it gets carried to the Moon. Can be done by humans or machines
  26. Build aligned AGI and ask it to do it
  27. Make nanorobots and have them jump
  28. Attach a magnet to it. Make a strong magnet. Get them close on the dispelling side, release
  29. Make a table with changing height, but there’s basically no limit on the height
  30. Select/genetically engineer animals for their sizes, until you get something that grows so big it can send things to the Moon
  31. Use particle accelerators to send a lot of particles to the Moon to precisely add up to what you want
  32. Make large speakers and use resonating sound waves to send something to the Moon
  33. Help a really big volcano erupt strongly, sending something to the Moon
  34. Send a message to aliens that we need help putting something on the Moon, wait
  35. Make a small black hole (edit: I probably meant using it for acceleration somehow, but also if both the moon and the something is in a black hole this probably counts?)
  36. Have something run really fast on the surface of Earth and then go up
  37. Use a lot of fireworks
  38. Make a lot of something, put in protective casing in a lot of places, cut Earth into chunks, wait
  39. Have people (or machines) stomp in a way that makes waves on the Earth surface, creating a point that has so much synchronously coming into it that it launches something to the Moon
  40. Make a big multiple parts pendulum, hope it randomly rotates in a way bringing something to the Moon
  41. It’s already there in some Everett branches (possibly including yours; have you checked?)
  42. Attach something to a cat, point a huge laser at the Moon, let the cat figure it out
Comment by Mikhail Samin (mikhail-samin) on Lsusr's Rationality Dojo · 2024-02-19T22:52:37.095Z · LW · GW

"I have read 100 books about chess," I said, "Surely I must be a grandmaster by now."

A nice argument; but looking back at it the second time, I think I actually expect someone who’s read 100 books on how to play chess to be better than me at chess. I expect someone who’s read the Sequences to be significantly better than baseline at being sane and to at least share some common assumptions about important things that would allow to have more productive communication. Even if one doesn’t have the skills to notice flaws in their thinking, reading the Sequences significantly increases the chance they’ll approach a bunch of stuff well, or if specific flaws are pointed out, will notice and try to correct them. (E.g., even if they can’t notice that an argument is about definitions, if you point this out, they’ll understand it; if they updated towards some belief after an event even though it happens just as often, relatively, in works where it’s true as in worlds where it’s false, they might understand why they should rollback the update.)

Being increasingly good at rationality means being wrong less and less. It doesn’t mean immediately stopping having any holes in beliefs. Noticing holes in your beliefs takes time and practice and reflection, and the skill of it is, indeed, not automatically downloaded from the Sequences. But it’s not really about holes in models in a moment of time; it’s about whether the models predict stuff better as time passes.

I guess, my point is people shouldn’t feel bad about having holes in beliefs or understanding “little” after reading the Sequences. It’s the derivative that matters

Comment by Mikhail Samin (mikhail-samin) on Lsusr's Rationality Dojo · 2024-02-19T22:33:46.791Z · LW · GW

A more knowledgeable person can see holes regardless of who’s right, and so training deferring to what a teacher communicates just because they seem smart and can point out flaws seems wrong.

You smile. You agree. You show genuine interest in the other person. You don't say "You're wrong". You never even say your own beliefs (unless asked). There's nothing for the person to get angry at because you never attacked them. Instead of criticizing, you point out errors indirectly, via a joke. You cheer them on as they dig their own grave. After all, you're trying to lose too.

This is something that allows you to persuade people. If you have more background knowledge about something and can say something that’d make the person you’re talking to think you pointed out a flaw/a hole in their understanding of the issue, they might defer to you, thinking you’re smarter and you help. If instead of asking “what do you think? why do you think that?”, and letting the person think on their own, you instead ask questions that communicate your understanding, then I’m not sure this actually improves their thinking or even allows them to arrive to truer beliefs in a systematic way.

If your beliefs are false, they’ll update to your false beliefs; if your models are incomplete, they’ll believe in these incomplete models and won’t start seeing holes in them.

In the second video, you didn’t ask the person where’s the money coming from and where they go and who’s better off and who’s worse off; they didn’t try to draw any schemes and figure this out for themselves. Instead, they listened to you and agreed with what you communicated to them. They didn’t have a thought that if someone builds a cable, they must expect profits to cover the cost, despite someone else possibly trying to build a cable; they didn’t think that the money going into building a cable don’t disappear; they remain in the economy, through wages and costs of everything paid to everyone involved; the actual resources humanity spends on a cable are perhaps some fuel, some amount of material, and human time. Was it unethical to spend these resources that way? What does “unethical” even mean here? Was someone hurt during the construction, did people decide to get a worker’s job instead doing art? What about trading itself- what are the positive and negative externalities, what are the resources spent by humanity as a whole? What is the pot everyone competes for? Are they spending more resources to compete for it than the pot contains, or are they just eating all the free money on the table? Do they provide something valuable to the market, getting this pot in return? (Perhaps liquidity or a lot of slightly more up-to-date information?)

I have no idea how any of this works but to me, it looked like you made your arguments in a persuasive way, but my impression is the conversation you’ve had on the second video didn’t really improve general thinking/rationality skills of the person you were talking to.

Comment by Mikhail Samin (mikhail-samin) on Every "Every Bay Area House Party" Bay Area House Party · 2024-02-18T07:53:13.528Z · LW · GW

There should be a party inspired by this post

Comment by Mikhail Samin (mikhail-samin) on Believing In · 2024-02-10T20:14:52.424Z · LW · GW

Interesting. My native language has the same “believe [something is true]”/“believe in [something]”, though people don’t say “I believe in [an idea]” very often; and what you describe is pretty different from how this feels from the inside. I can’t imagine listing something of value when I’m asked to give examples of my beliefs.

I think when I say “I believe in you”, it doesn’t have the connotation of “I think it’s good that you exist”/“investing resources in what you’re doing is good”/etc.; it feels like “I believe you will succeed at what you’re aiming for, by default, on the current trajectory”, and it doesn’t feel to be related to the notion of it making sense to support them or invest additional resources into.

It feels a lot more like “if I were to bet on you succeeding, that would’ve been a good bet”, as a way to communicate my belief in their chances of success. I think it’s similar for projects.

Generally, “I believe in” is often more of “I think it is true/good/will succeed” for me, without a suggestion of willingness to additionally help or support in some way, and without the notion of additional investment in it being a good thing necessarily. (It might also serve to communicate a common value, but I don’t recall using it this way myself.)

“I believe in god” parses as “I believe god exists”, though maybe there’s a bit of a disconnection due to people being used to say “I believe in god” to ID, say the answer a teacher expects, etc., and believing in that belief, usually without it being connected to experience-anticipation.

I imagine “believe in” is some combination of something being a part of the belief system and a shorthand for a specific thing that might be valuable to communicate, in the current context, about beliefs or values.

Separately from what these words are used for, there’s something similar to some of what you’re talking about happening in the mind, but for me, it seems entirely disconnected from the notion of believing

Comment by Mikhail Samin (mikhail-samin) on Manifold Markets · 2024-02-03T17:29:29.563Z · LW · GW

Oops! ok!

Comment by Mikhail Samin (mikhail-samin) on Manifold Markets · 2024-02-02T20:32:58.072Z · LW · GW

Since Manifold uses play money, it costs them nothing to subsidize the market maker

IIRC, the market maker is subsidised by the market creator (M$50 of the cost of creating the market goes to the automated market maker)

amount of liquidity it provides increases as trading increases

I'm not sure, but I think this is not exactly true; if 50 people bet M$10 Yes at 50% and 50 people bet M$10 No at 50%, a new trade will move the market just like the first trade would, with the original M$50 in liquidity

Comment by Mikhail Samin (mikhail-samin) on A central AI alignment problem: capabilities generalization, and the sharp left turn · 2024-01-13T19:02:39.899Z · LW · GW

Sharp Left Turn: a more important problem (and a more specific threat model) than people usually think

The sharp left turn is not a simple observation that we've seen capabilities generalise more than alignment. As I understand it, it is a more mechanistic understanding that some people at MIRI have, of dynamics that might produce systems with generalised capabilities but not alignment.

Many times over the past year, I've been surprised by people in the field who've read Nate's post but somehow completely missed the part where it talks about specific dynamics that lead to alignment properties breaking during capabilities generalisation. To fulfil the reviewing duty and to have a place to point people to, I'll try to write down some related intuitions that I talked about throughout 2023 when trying to get people to have intuitions on what the sharp left turn problem is about.

For example, imagine training a neural network with RL. For a while during training, the neural network might be implementing a fuzzy collection of algorithms and various heuristics that together kinda optimise for some goals. The gradient strongly points towards greater capabilities. Some of these algorithms and heuristics might be more useful for the task the neural network is being evaluated on, and they'll persist more and what the neural network is doing as a whole will look a bit more like what the most helpful parts of it are doing.

Some of these algorithms and heuristics might be more agentic and do more for long-term goal achievement than others. As being better at achieving goals correlates with greater performance, the neural network becomes, as a whole, more capable of achieving goals. Or, maybe the transition that leads to capabilities generalisation can be more akin to grokking: even with a fuzzy solution, the distant general coherent agent implementations might still be visible to the gradient, and at some point, there might be a switch from a fuzzy collection of things together kind of optimising for some goals into a coherent agent optimising for some goals.

In any case, there's this strong gradient pointing towards capabilities generalisation.

The issue is that a more coherent and more agentic solution might have goals different from what the fuzzier solution had been achieving and still perform better. The goal-contents of the coherent agent are stored in a way different from how a fuzzier solution had stored the stuff it had kind of optimised for. This means that the gradient points towards the architecture that implements a more general and coherent agent; but it doesn't point towards the kind of agent that has the same goals the current fuzzy solution has; alignment properties of the current fuzzy solution don't influence the goals of a more coherent agent the gradient points towards.

It is also likely that the components of the fuzzy solution undergo optimisation pressure which means that the whole thing grows towards the direction near components that can outcompete others. If a component is slightly slightly better at agency, at situational awareness, etc., , it might mean it gets to have the whole thing slightly more like it after an optimisation step. The goals these components get could be quite different from what they, together, were kind of optimising for. That means that the whole thing changes and grows towards parts of it with different goals. So, at the point where some parts of the fuzzy solution are near being generally smart and agentic, they might get increasingly smart and agentic, causing the whole system to transform into something with more general capabilities but without gradient also pointing towards the preservation of the goals/alignment properties of the system.

I haven't worked on this problem and don't understand it well; but I think it is a real and important problem, and so I'm sad that many haven't read this post or only skimmed through it or read it but still didn't understand what it's talking about. It could be that it's hard to communicate the problem (maybe intuitions around optimisation are non-native to many?); it could be that not enough resources were spent on optimising the post for communicating the problem well; it could be that the post tried hard not to communicate something related; or it could be that for a general LessWrong reader, it's not a well-written post.

Even if this post failed to communicate its ideas to its target audience, I still believe it is one of the most important LessWrong posts in 2022 and contributed something new and important to the core of our understanding of the AI alignment problem.

Comment by Mikhail Samin (mikhail-samin) on Terminology: <something>-ware for ML? · 2024-01-04T22:02:48.381Z · LW · GW

Groware/grownware? (Because it’s “grown”, as it’s now popular to describe)

Comment by Mikhail Samin (mikhail-samin) on A case for AI alignment being difficult · 2024-01-03T10:30:03.441Z · LW · GW

My comment was a reply to a comment on ITT. I made it in the hope someone would be up for the bet. I didn’t say I disagree with the OP's claims on alignment; I said I don’t think they’d be able to pass an ITT. I didn’t want to talk about specifics of what the OP doesn’t seem to understand about Yudkowsky’s views, as the OP could then reread some of what Yudkowsky’s written more carefully, and potentially make it harder for me to distinguish them in an ITT.

I’m sorry if it seemed disparaging.

The comment explained what I disagree with in the post: the claim that the OP would be good at passing an ITT. It wasn’t intended as being negative about the OP, as, indeed, I think 20 people are on the right order of magnitude of the amount of people who’d be substantially better at it, which is the bar of being in the top 0.00000025% of Earth population at this specific thing. (I wouldn’t claim I’d pass that bar.)

If people don’t want to do any sort of betting, I’d be up for a dialogue on what I think Yudkowsky thinks that would contradict some of what’s written in the post, but I don’t want to spend >0.5h on a comment no one will read

Comment by Mikhail Samin (mikhail-samin) on A case for AI alignment being difficult · 2024-01-02T20:24:11.629Z · LW · GW

I know what ITT is. I mean understanding Yudkowsky’s models, not reproducing his writing style. I was surprised to see this post in my mailbox, and I updated negatively about MIRI when I saw that OP was a research fellow there, as I didn’t previously expect that some at MIRI misunderstand their level of understanding Yudkowsky’s models.

There’s one interesting thought in this post that I don’t remember actively having in a similar format until reading this post- that predictive models might get agency from having to achieve results with their cognition- but generally, I think both this post and a linked short story, e.g., have a flaw I’d expect people who’ve read the metaethics sequence to notice, and I don’t expect people to pass the ITT if they can write a post like this.

Comment by Mikhail Samin (mikhail-samin) on A case for AI alignment being difficult · 2024-01-02T12:32:37.697Z · LW · GW

Unless you’re making a lot of intentional simplifications in this post, I’d be happy to bet up to $10k at 1:1 odds that I’d be able to distinguish you posing as Yudkowsky from Yudkowsky in ITT

Comment by Mikhail Samin (mikhail-samin) on NYT is suing OpenAI&Microsoft for alleged copyright infringement; some quick thoughts · 2023-12-27T21:45:02.696Z · LW · GW

I guess NYT spits out unpaywalled articles to search engines (to get clicks and expecting search engines’ users won’t have access to the full texts), but getting unpaywalled HTML doesn’t mean you can use it however you want. OpenAI did not negotiate the terms prior to scrapping NYT, according to the lawsuit. I believe the NYT terms prohibit commercial use without acquiring a license; I think the lawsuit mentioned the price along the lines of a standard cost of $10 per article if you want to circulate it internally in your company

Comment by Mikhail Samin (mikhail-samin) on NYT is suing OpenAI&Microsoft for alleged copyright infringement; some quick thoughts · 2023-12-27T21:32:41.281Z · LW · GW

Humans can’t learn from any materials that NYT has published without paying NYT or otherwise getting a permission, as NYT articles are usually paywalled. NYT, in my opinion, should have the right to restrict commercial use of the work they own.

The current question isn’t whether digital people are allowed to look at something at learn from it the way humans are allowed to; the current question is whether for-profit AI companies can use copyrighted human work to create arrays of numbers that represent the work process behind the copyrighted material and the material itself by changing these numbers to increase the likelihood of specific operations on them producing the copyrighted material. These AI companies then use these extracted work processes to compete with the original possessors of these processes. [To be clear, I believe that further refinement of these numbers to make something that also successfully achieves long-term goals is likely to lead to no human or digital consciousness existing or learning or doing anything of value (even if we embrace some pretty cosmopolitan views, see https://moratorium.ai for my reasoning on this), which might bias me towards wanting regulation that prevents big labs from achieving ASI until safety is solved, especially with policies that support innovation, startups, etc., anything that has benefits without risking the existence of our civilisation.]

Comment by Mikhail Samin (mikhail-samin) on Some quick thoughts on "AI is easy to control" · 2023-12-08T01:06:40.808Z · LW · GW

A specialised AI can speed up Infra-Bayesianism by the same amount random mathematicians can, by proving theorems and solving some math problems. A specialised AI can’t actually understand the goals of the research and contribute to the part that require the hardest kind of human thinking. There’s a requirement for some amount of problem-solving of the kind hardest human thinking produces to go into the problem. I claim that if a system can output enough of that kind of thinking to meaningfully contribute, then it’s going to be smart enough to be dangerous. I further claim that there’s a number of hours of complicated-human-thought such that making a safe system that can output work corresponding to that number in less than, e.g., 20 years, requires at least that number of hours of complicated human thought. Safely getting enough productivity out of these systems for it to matter is impossible IMO. If you think a system can solve specific problems, then please outline these problems (what are the hardest problem you expect to be able to safely solve with your system?) and say how fast the system is going to solve it and how many people will be supervising its “thoughts”. Even putting aside object-level problems with these approaches, this seems pretty much hopeless.

Comment by Mikhail Samin (mikhail-samin) on Some quick thoughts on "AI is easy to control" · 2023-12-06T09:23:55.337Z · LW · GW

Yep, I agree

Comment by Mikhail Samin (mikhail-samin) on Some quick thoughts on "AI is easy to control" · 2023-12-06T03:07:48.191Z · LW · GW

Thanks for the comment!

any plan that looks like "some people build a system that they believe to be a CEV-aligned superintelligence and tell it to seize control"

People shouldn’t be doing anything like that; I’m saying that if there is actually a CEV-aligned superintelligence, then this is a good thing. Would you disagree?

what exactly you mean by the terms "white-box" and "optimizing for"

I agree with “Evolution optimized humans to be reproductively successful, but despite that humans do not optimize for inclusive genetic fitness”, and the point I was making was that the stuff that humans do optimize for is similar to the stuff other humans optimize for. Were you confused by what I said in the post or are you just suggesting a better wording?

Comment by Mikhail Samin (mikhail-samin) on Speaking to Congressional staffers about AI risk · 2023-12-05T23:18:37.560Z · LW · GW

It's great to see this being publicly posted!

Comment by Mikhail Samin (mikhail-samin) on Shallow review of live agendas in alignment & safety · 2023-12-04T12:07:25.263Z · LW · GW

try to formalise a more realistic agent, understand what it means for it to be aligned with us, […], and produce desiderata for a training setup that points at coherent AGIs similar to our model of an aligned agent.

Finally, people are writing good summaries of the learning-theoretic agenda!

Comment by Mikhail Samin (mikhail-samin) on Causal Diagrams and Causal Models · 2023-11-28T00:06:39.529Z · LW · GW

I don’t really get how this can be true for some values of x but not others if the variable is binary

Comment by Mikhail Samin (mikhail-samin) on Causal Diagrams and Causal Models · 2023-11-26T23:11:14.845Z · LW · GW

I think I don’t buy the story of a correct causal structure generating the data here in a way that supports the point of the post. If two variables, I and O, both make one value of E more likely than the other, that means the probability of I conditional on some value of E is different from the probability of I because I explains some of that value of E; but if you also know O, than this explains some of that value of E as well, and so P(I|E=x, O) should bd different.

The post describes this example:

This may seem a bit clearer by considering the scenario B->A<-E, where burglars and earthquakes both cause alarms. If we're told the value of the bottom node, that there was an alarm, the probability of there being a burglar is not independent of whether we're told there was an earthquake - the two top nodes are not conditionally independent once we condition on the bottom node

And if you apply this procedure to “not exercising”, we don’t see that absence of conditional independence, once we condition on the bottom node. Which means that “not exercising” is not at all explained away by internet (or being overweight)

Comment by Mikhail Samin (mikhail-samin) on Causal Diagrams and Causal Models · 2023-11-25T19:20:25.153Z · LW · GW

The point is, these probabilities don’t really correspond to that causal graph in a way described in the post. A script that simulates the causal graph: https://colab.research.google.com/drive/18pIMfKJpvlOZ213APeFrHNiqKiS5B5ve?usp=sharing

Comment by Mikhail Samin (mikhail-samin) on It's OK to eat shrimp: EAs Make Invalid Inferences About Fish Qualia and Moral Patienthood · 2023-11-14T00:21:30.864Z · LW · GW

The justification that I've heard for that position wouldn't make the statement better; I'd be able to pass an ITT for the specific person who told me it, and I understand why it is wrong. I consider the mistake they're making and the mistake Rethink Priorities are making to be the same and I try to make an argument why in the post.

I'm separately pretty sure evolutionary reasons for qualia didn't exist in fish evolution (added this to the post, thanks!), and from my experience talking to a couple of EAs about this they agreed with some correlations enough to consider a suggested experiment to be a crux, and I'm pretty certain about the result of the experiment and think they're wrong for reasons described in the post.

It's not obvious how to figure out the priors here, but my point is people update on things that aren't valid evidence. The hope is that people will spend their resources more effectively after correctly considering shrimp welfare to be by orders of magnitude less important and deprioritizing it. Maybe they'll still avoid eating shrimp because they don't have intuitions about evolutionary reasons for qualia similar to my, but that seems less important to me than reducing as much actual suffering as possible, other things being equal.

Comment by Mikhail Samin (mikhail-samin) on It's OK to eat shrimp: EAs Make Invalid Inferences About Fish Qualia and Moral Patienthood · 2023-11-13T23:24:16.962Z · LW · GW

Oops, English! Thanks

Comment by Mikhail Samin (mikhail-samin) on It's OK to eat shrimp: EAs Make Invalid Inferences About Fish Qualia and Moral Patienthood · 2023-11-13T20:59:43.227Z · LW · GW

I appreciate this comment.

Qualia (IMO) certainly is "information processing": there are inputs and outputs. And it is a part of a larger information-processing thing, the brain. What I'm saying is that there's information processing happening outside of the qualia circuits, and some of the results of the information processing outside of the qualia circuits are inputs to our qualia. 

I think it's likely that even simple "RL algorithms" might have a very limited, very shallow, non-self-aware kinds of experience: an image-classifier is doing visual-information-processing, so it probably also produces isolated "experiences of vision"

Well, how do you know that visual information processing produces qualia? You can match when algorithms implemented by other humans' brains to algorithms implemented by your brain, because all of you talk about subjective experience; how do you, inside your neural circuitry, make an inference that a similar thing happens in neurons that just process visual information?

You know you have subjective experience, self-evidently. You can match the computation run by the neural circuitry of your brain to the computation run by the neural circuitry of other humans: because since they talk about subjective experience, you can expect this to be caused by similar computation. This is valid. Thinking that visual information processing is part of what makes qualia (i.e., there's no way to replace a bunch of your neurons with something that outputs the same stuff without first seeing and processing something, such that you'll experience seeing as before) is something you can make theories about but is not a valid inference, you don't have a way of matching the computation of qualia to the whole of your brain.

And, how can you match it to matrix multiplications that don't talk about qualia, did not have evolutionary reasons for experience, etc.? Do you think an untrained or a small convolutional neural network experiences images to some extent, or only large and trained? Where does that expectation come from?

I'm not saying that qualia is solved. We don't yet know how to build it, and we can't yet scan brains and say which circuits implement it. But some people seem more confused than warranted, and they spend resources less effectively than they could've.

And I'm not equating qualia to self-model. Qualia is just the experience of information. It doesn't required a self-model, also on Earth, so far, I expect these things to have been correlated.

If there's suffering and experience of extreme pain, in my opinion, it matters even if there isn't reflectivity.

Comment by Mikhail Samin (mikhail-samin) on It's OK to eat shrimp: EAs Make Invalid Inferences About Fish Qualia and Moral Patienthood · 2023-11-13T20:48:37.772Z · LW · GW

Some people expressed a reaction of scepticism over this:

assigned double-digit probabilities to bacteria having qualia and said they wouldn't be surprised if a balloon flying through a gradient of air experiences pain

Here's something from a comment on the EA Forum:

I think it's likely that even simple "RL algorithms" might have a very limited, very shallow, non-self-aware kinds of experience: an image-classifier is doing visual-information-processing, so it probably also produces isolated "experiences of vision"

Not sure if they expect a small CNN to possess qualia (and do they then think that when physics makes essentially equivalent matrix multiplications to compute rocks, there are a lot of qualia of random visions in rocks?), but maybe it's easy to underestimate how confused many people are about all that stuff

Comment by Mikhail Samin (mikhail-samin) on It's OK to eat shrimp: EAs Make Invalid Inferences About Fish Qualia and Moral Patienthood · 2023-11-13T19:16:15.271Z · LW · GW

Both (modeling stuff about others by reusing circuits for modeling stuff about yourself without having experience; and having experience without modelling others similarly to yourself) are possible, and the reason why I think the suggested experiment would provide indirect evidence is related to the evolutionary role I consider qualia to possibly play. It wouldn't be extremely strong evidence and certainly wouldn't be proof, but it'd be enough evidence for me to stop eating fish that has these things.

The studies about optimistic/pessimistic behaviour tell us nothing about whether these things experience optimism/pessimism, as they are an adaptation an RL algorithm would implement without the need to implement circuits that would also experience these things, unless you can provide a story for why circuitry for experience is beneficial or a natural side effect of something beneficial.

One of the points of the post is that any evidence we can have except for what we have about humans would be inderect, and people call things evidence for confused reasons. Pain-related behaviour is something you'd see in neural networks trained with RL, because it's good to avoid pain and you need a good explanation for how exactly it can be evidence for qualia.

(Copied from EA Forum)

Comment by Mikhail Samin (mikhail-samin) on Causal Diagrams and Causal Models · 2023-10-25T21:48:49.012Z · LW · GW

I think I spotted a mistake. I'm surprised no one noticed it earlier.

Eliezer says the data shows that Overweight and Internet both make exercise less likely.

That would imply that P(O|I & not E) should be less than P(O|not E); and that P(I|O & not E) should be less than P(I|not E). But actually, they're approximately equal![1]

P(not E)0.6039286373
P(not E & O)0.1683130609
P(not E & I)0.09519044713
P(I & O & not E)0.02630682544
P(I|not E)0.1576187007
P(I|not E & O)0.1562969938
P(O|not E)0.2786969362
P(O|not E & I)0.2763599314

(From the current data, P(O|E & I) is less than P(O|E): 5% vs 10%; P(I|E & O) is less than P(I|E): 6% vs 11%; which is what you'd expect from Overweight and Internet both making exercise less likely, but it's not all that O->E<-I implies.)

Edit: note that inferring the causal graph exactly the way it's done in the alarms/recessions/burglars example fails here due to the extra conditional independences. The data is generated in a way that doesn't show a correspondence to this graph if you follow a procedure identical to the one described in the post

  1. ^
Comment by Mikhail Samin (mikhail-samin) on Alignment Implications of LLM Successes: a Debate in One Act · 2023-10-23T22:55:27.760Z · LW · GW

I haven't seen a clear explanation that I'd expect to change your mind. Some people understood a dynamic the claim refers to after reading the sharp left turn post (but most people I talked to didn't, although I'm not sure whether they actually read it or just skimmed through it). If you haven't read it, might be worth a try

Comment by Mikhail Samin (mikhail-samin) on Alignment Implications of LLM Successes: a Debate in One Act · 2023-10-23T01:49:00.061Z · LW · GW

After skimming through this post, I thought the OP wouldn’t pass ITT for one of the perspectives he tried to represent in the post., so I was surprised to see it in my mailbox later. The contrast is between a more imaginary position and a less imaginary position. E.g., I would be happy to bet Yudkowsky or Soares, when asked, will say they wouldn’t be mentioning the fact that “capabilities generalize further than alignment” in the context it was mentioned in. We can discuss the odds if you’re potentially interested. I don’t think anyone even remotely close to understanding this fact would use it in that context.

Comment by Mikhail Samin (mikhail-samin) on RSPs are pauses done right · 2023-10-15T03:20:57.228Z · LW · GW

If the model is smart enough, you die before writing the evals report; if it’s just kinda smart, you don’t find it to be too intelligent and die after launching your scalable oversight system that, as a whole, is smarter than individual models.

An international moratorium on all training runs that could stumble on something that might kill everyone is much more robust than regulations around evaluated capabilities of already trained models

Edit: Huh. I would’ve considered the above to be relatively uncontroversial on LessWrong. Can someone explain where I’m wrong?

Comment by Mikhail Samin (mikhail-samin) on AI pause/governance advocacy might be net-negative, especially without focus on explaining the x-risk · 2023-09-11T12:06:10.058Z · LW · GW

Back in January, Michael Cohen talked at the House of Commons about the possibility of AI killing everyone. At this point, when policymakers want to understand the problem and turn to you, downplaying x-risk doesn't make them listen to you more; it makes them less worried and more dismissive. I think a lot of AI governance people/think tanks haven't updated on this.