Posts

Roman Malov's Shortform 2024-12-19T21:14:54.805Z
Visual demonstration of Optimizer's curse 2024-11-30T19:34:07.700Z

Comments

Comment by Roman Malov on Roman Malov's Shortform · 2024-12-19T21:14:55.985Z · LW · GW

I recently prepared an overview lecture about research directions in AI alignment for the Moscow AI Safety Hub. I had limited time, so I did the following: I reviewed all the sites on the AI safety map, examined the 'research' sections, and attempted to classify the problems they tackle and the research paths they pursue. I encountered difficulties in this process, partly because most sites lack a brief summary of their activities and objectives (Conjecture is one of the counterexamples). I believe that the field of AI safety would greatly benefit from improved communication, and providing a brief summary of a research direction seems like low-hanging fruit.

Comment by Roman Malov on Visual demonstration of Optimizer's curse · 2024-12-13T22:03:38.002Z · LW · GW

So,  is a random variable in the sense that it is drawn from a distribution of functions, and the expected value of those functions at each point  is equal to . Am I understanding you correctly?
 

Comment by Roman Malov on Deep Deceptiveness · 2024-11-04T22:37:49.703Z · LW · GW

I've read it as a part of Agents Foundation course, and I consider this post really effective and clarifying. It got me thinking, can this generalize to other failure modes? Like if programers notice that AI spend too much resources on self-preservation, and then train against such behavior, this failure mode would still arise because self-preservation is an instrumental goal and is a fact about the world and ways in which goal can be achieved in this world.

Comment by Roman Malov on Hell is wasted on the evil · 2024-10-18T21:46:19.687Z · LW · GW

I'm not a native speaker, can someone please explain the meaning of "Hell is wasted on the evil" in simpler terms?

Comment by Roman Malov on [deleted post] 2024-09-01T21:03:31.184Z

Thank you, that seems to be the clarification I needed. And reminded me of a good video, which also touches the subject.

Comment by Roman Malov on [deleted post] 2024-09-01T19:46:23.824Z

Thank's for your answer, I will read linked post.

I told in the text that I'm going to try to convey the "process" in the comments, and I'll try to do it now.

all sophisticated-enough minds

I think that the recursive buck is passed to the word "enough". You need to have stratification of sophistication of minds, and have a cutoff for when they reach acceptable level off sophistication.

Comment by Roman Malov on [deleted post] 2024-09-01T19:28:10.428Z

So in the universe with only bosons (so Pauli principle doesn't apply), everything is the Same?

When I imagine a room full of photons, I see a lot of things that can be Different. For example, the coordinates of photons, wavelength, polarization, their number.

Or are you saying that Pauli principle is sufficient, but not necessary?

Comment by Roman Malov on [deleted post] 2024-09-01T17:44:44.919Z

If you read further, you can see how this is also passing the recursive buck. 

You: "There are no clear separation between objects, I only use this to increase my utility function"

Me: "How are you deciding on where to stop dividing reality?"

You: "Well, I calculate my marginal utility from creating an additional concept and then Compare it to zer... ah, yeah, there is the recursive buck. It even capitalized as I said it."

So yeah, while this is a desirable point to stop, this method still relies on your ability to Differentiate between usefulness of two models, and as far as I can tell, in the end, we can only feel it.

Comment by Roman Malov on Chapter 91: Roles, Pt 2 · 2024-08-26T02:55:17.603Z · LW · GW

Sebz n gval fcbg ba gur raq bs Uneel'f jnaq, n phovp zvyyvzrgre bs napube, fgergpurq bhg n guva yvar bs Genafsvtherq fcvqre-fvyx.

sebz gur puncgre 114

Comment by Roman Malov on Chapter 90: Roles, Pt 1 · 2024-08-25T18:00:34.433Z · LW · GW

Or if I'd - if I'd only gone with - if, that night -

I'm guessing he is talking about the night he lost his potential phoenix.

Comment by Roman Malov on Chapter 89: Time Pressure, Pt 2 · 2024-08-25T17:45:44.343Z · LW · GW

I think that's intended author's choice. Like what Harry saw was too terrible to acknowledge. Or maybe it's just to create more suspense.

Comment by Roman Malov on Chapter 27: Empathy · 2024-08-06T01:07:29.014Z · LW · GW

Snape told him that he wanted to check if Harry resembled his father, and the test consisted of stopping bullies, so that might be the reason for Harry's guess.

Comment by Roman Malov on Ilya Sutskever created a new AGI startup · 2024-06-19T20:36:45.814Z · LW · GW

safety always remains ahead

When was it ever ahead? I mean, to be sure that safety is ahead, you need to first make advancement there compatible with capabilities. And to do that, you shouldn't advance the capabilities.

Comment by Roman Malov on [Aspiration-based designs] Outlook: dealing with complexity · 2024-05-02T20:37:34.741Z · LW · GW

maybe you meant pairwise linearly independent (by looking at the graph)

Comment by Roman Malov on [Aspiration-based designs] Outlook: dealing with complexity · 2024-05-02T20:33:22.700Z · LW · GW

Pick  many linearly independent linear combinations  
isn't there at most  linearly independent linear combinations of ?

Comment by Roman Malov on My thoughts on the Beff Jezos - Connor Leahy debate · 2024-02-03T21:19:04.986Z · LW · GW

The current population size that Mars can support is 0, so even 1 person would be overpopulation. To complete the analogy, we are currently sending the entire population to Mars, and someone says: "But what about oxygen? We don't know if it's on Mars, maybe we should work on spacesuits?" and another says, "Nah, we'll figure it out when we get there."