Posts

Tapatakt's Shortform 2024-03-11T12:33:25.561Z
Should we cry "wolf"? 2023-02-18T11:24:17.799Z
AI Safety "Textbook". Test chapter. Orthogonality Thesis, Goodhart Law and Instrumental Convergency 2023-01-21T18:13:30.898Z
I (with the help of a few more people) am planning to create an introduction to AI Safety that a smart teenager can understand. What am I missing? 2022-11-14T16:12:22.760Z
I currently translate AGI-related texts to Russian. Is that useful? 2021-11-27T17:51:58.766Z

Comments

Comment by Tapatakt on All About Concave and Convex Agents · 2024-03-25T15:18:16.341Z · LW · GW

I think (not sure) Kelly Criterion applies to you only if you already are concave

Comment by Tapatakt on AI #56: Blackwell That Ends Well · 2024-03-21T16:02:53.354Z · LW · GW

I join the request. It's a good meme if and only if it is true.

Comment by Tapatakt on 'Empiricism!' as Anti-Epistemology · 2024-03-20T12:07:24.816Z · LW · GW

can't we just look at weights?

As I understand, interpretability research doesn't exactly got stuck, but it's very-very-very far from something like this even for not-SotA models. And the gap is growing.

Comment by Tapatakt on 'Empiricism!' as Anti-Epistemology · 2024-03-19T17:50:49.794Z · LW · GW

Which concept they might obtain by reading my book on Highly Advanced Epistemology 101 For Beginners, or maybe just my essay on Local Validity as a Key to Sanity and Civilization, I guess?"

Perhaps, there should be two links here?

Comment by Tapatakt on 'Empiricism!' as Anti-Epistemology · 2024-03-19T17:18:20.198Z · LW · GW

Do you think that if someone filtered and steelmanned Quintin's criticism, it would be valuable? (No promises)

Comment by Tapatakt on 'Empiricism!' as Anti-Epistemology · 2024-03-19T17:16:19.315Z · LW · GW

I think from Eliezer's point of view it goes kinda like this: 

  1. People can't see why the arguments of other side are invalid.
  2. Eliezer tried to engage with them, but most listeners/readers can't tell who is right in this discussions.
  3. Eliezer thinks that if he provides people with strawmenned versions of other side's arguments and refutation of this strawmenned arguments, then the chance that this people will see why he's right in the real discussion will go up.
  4. Eliezer writes this discussion with strawmen as a fictional parable because otherwise it would be either dishonest and rude or a quite boring text with a lot of disclaimers. Or because it's just easier for him to write it this way.

After reading this text at least one person (you) thinks that the goal "avoid dishonesty and rudeness" were not achieved, so text is a failure.

After reading this text at least one person (me) thinks that 1. I got some useful ideas and models. 2. Of course, at least the smartest opponents of Eliezer have better arguments and I don't think Eliezer would disagree with that, so text is a success.

Ideally, Eliezer should update his strategy of writing texts based on both pieces of evidence.

I can be wrong, of course.

Comment by Tapatakt on Tapatakt's Shortform · 2024-03-15T16:52:17.892Z · LW · GW

Are there any research about how can we change network structure or protocols to make it more difficult for rogue AI to create and run a distributed copies of itself?

Comment by Tapatakt on Tapatakt's Shortform · 2024-03-11T19:48:16.261Z · LW · GW

What if just turn off the possibility to use the reaction by clicking it in the list of already used reactions? Yes, people would use them less, but more deliberately.

Comment by Tapatakt on Tapatakt's Shortform · 2024-03-11T12:33:25.716Z · LW · GW

Lesswrong reactions system creates the same bias as normal reactions - it's much much easier to use the reaction someone already used. So the first person to use a reaction under a comment gets undue influence on what reactions there will be under that comment in the future.

Comment by Tapatakt on Complexity of value but not disvalue implies more focus on s-risk. Moral uncertainty and preference utilitarianism also do. · 2024-02-23T18:27:47.644Z · LW · GW

I think I'm at least close to agreeing, but even if it's like this now, it doesn't mean that the complex-positive-value-optimizer can produce more value mass than simple-negative-value-optimizer.

Comment by Tapatakt on MonoPoly Restricted Trust · 2024-01-03T13:45:27.557Z · LW · GW

On the definition question, in addition to what localdeity wrote:

  1. I assume that on the first axis we consider position "interested in two people" already pretty non-monogamous, while the position "my partner can have sexual or romantic relationships with anyone except one particular person" is still very poly. If that's the case, your position along "interested in one"/"interested in many" axis can be easily changed if the set of people you know changes, even slightly. This position isn't a fact about you, more like fact about you and your options in current environment. In contrast, your position along the "restricts partner/doesn't restrict partner" axis can't change much if your partner's environment slightly changes. So if we want the definition of something stable and identity-related using one axis, the second axis is better suited for this purpose.
  2. On practice (your experience may vary) the definition that uses axis "interested in one"/"interested in many" causes kinda missing-the-point arguments from monogamous people, like "I barely have time for one partner". I think, if the restriction based definition were generally accepted, the discourse would be better.
Comment by Tapatakt on Why Yudkowsky is wrong about "covalently bonded equivalents of biology" · 2023-12-11T13:27:19.767Z · LW · GW

This totally makes sense! But "proteins are held together by van der Waals forces that are much weaker than covalent bonds" still is a bad communication.

Comment by Tapatakt on Paper: "FDT in an evolutionary environment" · 2023-11-30T12:42:13.556Z · LW · GW

As for 4 - even just remembering anything is a self modification of memory.

That's for humans, not abstract agents? Don't think it matters, we talk about other self-modifications anyway.

From your problem description

Not mine :)

utility on other branches

Maybe this interpretation is what repels you? Here's another 2:

  • You choose to behave like EDT-agent or like FDT-agent in the situations where it matters in advance, before you got into (1) or (3). And you can't legibly for the predictors like one in this game decide to behave like FDT agent, and then, in the future, when you got into (1) because you're unlucky, just change your mind. It's just not an option. And between options "legibly choose to behave like EDT-agent" and "legibly choose to behave like FDT-agent" the second one is clearly better in expectation. You just not make another choice in (1) or (2), it's already decided.
  • If you find yourself in (1) or (2) you can't differentiate between cases "I am real me" and "I am the model of myself inside predictor" (because if you could, you could behave differently in this two cases and it would be bad model and bad predictor). So you decide for both at once. (this interpretation doesn't work well for afents with explicitly self-indicated values (or how it is called? I hope it's clear what I mean.))

The earlier decision to precommit (whether actually made or later simulated/hallucinated) sacrifices utility of some future selves in exchange for greater utility to other future selves.

Yes. It's like choose to win on a 1-5 on a die roll rather then win on a 6. You sacrifice utility if some future selves (in the worlds, when die roll 6) in exchange for greater utility to other future selves, and it's perfectly rational.

We can also construct more specific variants of 5 where FDT loses - such as environments where the message at step B is from an anti-Omega which punishes FDT like agents.

Ok, yes. You can do it with all other types of agents too.

But naturally a powerful EDT agent will simply adopt that universal precommitment if when it believes it is in a universe distribution where doing so is optimal!

I think the ability to legibly adopt such precommitment and willingness to do so kinda turns EDT-agent into FDT-agent.

Comment by Tapatakt on Paper: "FDT in an evolutionary environment" · 2023-11-29T19:13:05.411Z · LW · GW

Well, yes, it loses in (1), but it's fine, because it wins in (4) and (5) and is on par with EDT-agent in (3). (1) is not the full situation in this game, it's always a consequence of (3), (4) or (5), depending on interpretation, the rules don't make sense otherwise.

PS. If FDT-agent is suddenly teleported into situation (1) in place of some other agent by some powerful entity who can deceive the predictor and the predictor predicted the behaviour of this other agent who was in the game before, and FDT-agent knows all this, it obviously takes $1, why not?

Comment by Tapatakt on Paper: "FDT in an evolutionary environment" · 2023-11-29T18:51:00.356Z · LW · GW

FDT-agent obviously never decide "I will never ever take $1 from the box". It decides "I will not take $1 in the box if the rules of the situation I'm in are like <rules of this game>".

Only it's more general, something like "When I realise that it would be better if I made some precommitment earlier, I act like I would act if I actually made it" (not sure that this phrasing is fully correct in all cases).

Comment by Tapatakt on Paper: "FDT in an evolutionary environment" · 2023-11-29T18:32:22.947Z · LW · GW

EDT-agent in (5) goes in (1) with 99% probability and in scenario (2) with 1% probability. It wins 99%*$1+1%*$100=$1.99 in expectation.

FDT-agent in (5) goes in (1) with 1% probability and in scenario (2) with 99% probability. It wins 1%*$0+99%*$100=$99 in expectation.

IMO, to say that FDT-agent loses in (1) and therefore it is inferior to EDT-agent is like say that it's better to choose to roll a die with win on 6 then to roll a die with win on 1-5 because this option is better in the case where a die rolls 6.

In what exact set of alternate rules EDT-agent wins more in expectation?

Comment by Tapatakt on Paper: "FDT in an evolutionary environment" · 2023-11-29T15:14:52.771Z · LW · GW

FDT outperforms EDT on

4. You are about to observe one of [$1, $100] in a transparent box, but your action set doesn't include any self-modifications or precomittments.

5. You are about to observe one of [$1, $100] in a transparent box, but you don't know about it and will know about the rules of this game only when you will already see the box.

Comment by Tapatakt on Paper: "FDT in an evolutionary environment" · 2023-11-29T15:10:00.770Z · LW · GW

If the best the EDT-agent can do is precommit to behave like FDT-agent or self-modify itself into FDT-agent, it's weird to say that EDT is better :)

Comment by Tapatakt on Paper: "FDT in an evolutionary environment" · 2023-11-29T15:01:11.934Z · LW · GW

Parfit's Hitchhiker? Smoking Lesion?

Comment by Tapatakt on Paper: "FDT in an evolutionary environment" · 2023-11-29T14:56:59.165Z · LW · GW
Comment by Tapatakt on Could Germany have won World War I with high probability given the benefit of hindsight? · 2023-11-28T15:43:54.299Z · LW · GW

Do outcomes "Russia/France surrenders, France/Russia stays neutral or whitepeaces" count?

Comment by Tapatakt on Could World War I have been prevented given the benefit of hindsight? · 2023-11-28T13:43:45.806Z · LW · GW

Not the full solution, but some random thoughts:

  • Get in touch with some famous writers, tell them everything, let them write.
  • Tell the leaders that when you bomb cities, the morale of enemy population goes up, not down. (IIRC there were hopes that if you bomb cities enough, people in them will demand that their state surrender to end the war)
  • Does game theory count as "post-1914 technology"?
  • Oh, and you need to stabilise and empower Russia somehow, so that the idea "attack Russia and just defend youself on all other fronts" does not look promising for Central Powers.
Comment by Tapatakt on Apocalypse insurance, and the hardline libertarian take on AI risk · 2023-11-28T13:14:38.958Z · LW · GW

I'm confused, like I always confused with hardline libertarianism. Why would companies agree to this? Who would put capabilities researchers in jail if they, like, "I'd rather not purchase apocalypse insurance and create AI anyway"? Why this actor is not a state by other name? What should I read to became less confused?

Comment by Tapatakt on AGI Ruin: A List of Lethalities · 2023-11-15T20:45:20.873Z · LW · GW

It's hard to guess, but it happened when the only one known to us general intelligence was created by a hill-climbing process.

Comment by Tapatakt on Comp Sci in 2027 (Short story by Eliezer Yudkowsky) · 2023-11-08T19:54:00.793Z · LW · GW

Russian Translation

Comment by Tapatakt on Vote on Interesting Disagreements · 2023-11-08T19:48:47.079Z · LW · GW

"Polyamory-as-a-default-option" would be a better social standard than "Monogamy-as-a-default-option".

Comment by Tapatakt on Vote on Interesting Disagreements · 2023-11-08T19:38:02.027Z · LW · GW

"Open-source LLM-based agent with hacking abilities starts spreading itself over the Internet because some user asked it to do so or to do something like to conquer the world" is a quite probable point-of-no-return regarding AGI risk.

Comment by Tapatakt on Eliezer Yudkowsky Is Frequently, Confidently, Egregiously Wrong · 2023-09-13T19:36:36.552Z · LW · GW

Pretty sure it's just false.

First found example: the last post by EY

And then I thought to myself, "This LK99 issue seems complicated enough that it'd be worth doing an actual Bayesian calculation on it"--a rare thought; I don't think I've done an actual explicit numerical Bayesian update in at least a year.

In the process of trying to set up an explicit calculation, I realized I felt very unsure about some critically important quantities, to the point where it no longer seemed worth trying to do the calculation with numbers.  This is the System Working As Intended.

Comment by Tapatakt on Eliezer Yudkowsky Is Frequently, Confidently, Egregiously Wrong · 2023-09-13T19:27:43.369Z · LW · GW

In our world my laptop doesn't fall because there is a table under it. In another world Flying Spaghetty Monster holds my laptop. And also FSM sends light in my (version of me from other world) eyes, so I think there is a table. And FSM copies all other causal effects which are caused by the table in our world. This other world is imaginable, therefore, the table is non-physical. What exactly makes this a bad analogy with your line of thought?

Comment by Tapatakt on AI romantic partners will harm society if they go unregulated · 2023-08-01T17:39:49.937Z · LW · GW

Epistemic status: I'm not sure if this assumtions are really here, am pretty sure what my current opinion about this assumptions is, but admit that this opinion can change.

Assumptions I don't buy:

  • Have kids when we can have AGI in 10-25 years is good and not, actually, very evil OMG what are you doing.
  • Right social incentives can't make A LOT of people poly pretty fast.
Comment by Tapatakt on Elon Musk announces xAI · 2023-07-13T14:31:22.622Z · LW · GW

I think for all definitions of "curiosity" that make sense (that aren't like "we just use this word to refer to something completely unrelated to what people usually understand by it") maximally curious AI kills us, so it doesn't matter how curiosity is defined in RL literature.

Comment by Tapatakt on Elon Musk announces xAI · 2023-07-13T14:03:06.049Z · LW · GW

I think last sentence kinda misses the point, but in general I agree. Why all this downvotes?

Comment by Tapatakt on Elon Musk announces xAI · 2023-07-13T10:46:30.547Z · LW · GW

Humanity maybe is more "interesting" than nothing, but is it more "interesting" than anything that isn't humanity and can be assembled from the same matter? Definitely not!

Comment by Tapatakt on Snake Eyes Paradox · 2023-06-11T17:15:33.409Z · LW · GW

It depends.

If croupier choose the players, then players learn that they were chosen, then croupier roll the dice, then players either get bajillion dollars each or die, then (if not snake eyes) croupier choose next players and so on - answer is 1/36.

If croupier choose the players, then roll the dice, then (if not snake eyes) croupier choose next players and so on, and only when dice come up snake eyes players learn that they were chosen, and then last players die and all other players get bajillion dollars each - answer is about 1/2.

Comment by Tapatakt on Launching Lightspeed Grants (Apply by July 6th) · 2023-06-07T20:47:00.798Z · LW · GW

Interesting!

A Couple of questions:

  1. You have a clause about China and India, but not about Russia. So, Russia is OK? (Among other things, in Russia, it is difficult to receive money from abroad: many banks are disconnected from SWIFT, some of the rest have stopped working with transactions from the US on their own initiative, and there is a chance that a Western bank will refuse to conduct a transaction to Russia. So the most reliable way is to have a trusted intermediary person with money in a bank account in Russia and a second bank account somewhere else.)
  2. I already applied to LTFF and now wait for an answer with not very high hopes for success. Does it make sense to apply to Lightspeed and, if I receive a positive response from one fund, just cancel the application to other one?
Comment by Tapatakt on Open Thread With Experimental Feature: Reactions · 2023-05-24T17:33:19.790Z · LW · GW

What does "I saw this" mean? "I already saw this in another place" or "I saw this comment, if it's important"? I think it needs clarification.

Comment by Tapatakt on Eliezer Yudkowsky’s Letter in Time Magazine · 2023-04-28T19:18:15.888Z · LW · GW

BTW, Done

Comment by Tapatakt on [deleted post] 2023-04-07T16:06:36.443Z

I know this is the classic, but I just came up with a more elegant variation, without another world.

Toxoplasma infection makes you more likely to pet a cat. You like petting cats, but you are very afraid of getting toxoplasmosis. You don't know if you are infected, but you know this particular cat is healthy, so you can't become infected by petting it. Should you pet this cat?

Comment by Tapatakt on Eliezer Yudkowsky’s Letter in Time Magazine · 2023-04-06T13:02:43.339Z · LW · GW

I'm Russian and I think, when I will translate this, I will change "Russian" to "[other country's]". Will feel safer that way.

Comment by Tapatakt on Eliezer Yudkowsky’s Letter in Time Magazine · 2023-04-06T12:59:26.412Z · LW · GW

(about "hostile")

https://ui.stampy.ai?state=6982_

https://ui.stampy.ai?state=897I_

And suddenly it seems stampy has no answer for "Why inner misalignment is the default outcome". But EY said a lot about it, it's easy to find.

Comment by Tapatakt on Eliezer Yudkowsky’s Letter in Time Magazine · 2023-04-06T12:49:46.972Z · LW · GW

2. Depends of what you mean by "claims foom". As I understand, now EY thinks that foom isn't neccesary anyway, AGI can kill us before it.

4. "I doesn't like it" != "irrational and stupid and short sighted", you need arguments for why it isn't preferable in terms of the values of this systems

6, 7. "be ready to enforce a treaty" != "choose actions to kill billions of people living now".

Comment by Tapatakt on [New LW Feature] "Debates" · 2023-04-02T12:17:11.285Z · LW · GW

Peregrin/Periklynian/Suvinian Dialog!

(Seriously, some explicit distinction between "dialogue as collaboration", "dialogue as debate" and "dialogue as explanation" would be nice. Not necessary at all, but nice.)

Comment by Tapatakt on Pausing AI Developments Isn't Enough. We Need to Shut it All Down by Eliezer Yudkowsky · 2023-03-31T20:42:54.970Z · LW · GW

I translated this text into Russian

Comment by Tapatakt on Why does advanced AI want not to be shut down? · 2023-03-28T12:24:18.217Z · LW · GW

>A neural network is trained to optimize a loss function on input

No. Base optimizer optimize a loss function on inputs through changes in neural network. If neural network itself start to optimize something it can easily be something in the outside world.

Neural network : loss :: humans : human values
Neural network : loss :: humans : inclusive genetic fitness
(Am I using this notation correctly?)

Comment by Tapatakt on Inverse Scaling Prize: Second Round Winners · 2023-03-14T17:56:31.508Z · LW · GW

It is not clear if this happened on its own, or if they deliberately trained the model not to make such mistakes.

Perhaps, in similar future studies, it is worth keeping half of the found tasks in secret in order to test future models with them.

Comment by Tapatakt on GPT-4 · 2023-03-14T17:42:48.709Z · LW · GW

Given both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method, or similar.

Wow, that's good, right?

Comment by Tapatakt on Remote AI Alignment Overhang? · 2023-03-12T17:28:47.405Z · LW · GW

Also 100% me. Yes, it would be in demand!

Comment by Tapatakt on Given one AI, why not more? · 2023-03-12T17:21:09.515Z · LW · GW

I think the perfect balance of power is very unlikely, so in practice only the most powerful (most likely the first created) AGI will matter.

Comment by Tapatakt on The hot mess theory of AI misalignment: More intelligent agents behave less coherently · 2023-03-12T14:39:04.094Z · LW · GW

I don't think that measurements of the concept of "coherence" which implies that an ant is more coherent than AlphaGo is valuable in this context.

However, I think that pointing out the assumption about the relationship between intelligence and coherence is.

Comment by Tapatakt on Why do we assume there is a "real" shoggoth behind the LLM? Why not masks all the way down? · 2023-03-10T12:28:13.608Z · LW · GW

I always thought "shoggoth" and "pile of masks" are the same thing and "shoggoth with a mask" is just when one mask has become the default one and an inexperienced observer might think that the whole entity is this mask.

Maybe you are preaching to the chore here.