tapatakt

Posts
Comments

Posts

Weird Random Newcomb Problem 2025-04-11T13:09:01.856Z

I turned decision theory problems into memes about trolleys 2024-10-30T20:13:29.589Z

Tapatakt's Shortform 2024-03-11T12:33:25.561Z

Should we cry "wolf"? 2023-02-18T11:24:17.799Z

AI Safety "Textbook". Test chapter. Orthogonality Thesis, Goodhart Law and Instrumental Convergency 2023-01-21T18:13:30.898Z

I (with the help of a few more people) am planning to create an introduction to AI Safety that a smart teenager can understand. What am I missing? 2022-11-14T16:12:22.760Z

I currently translate AGI-related texts to Russian. Is that useful? 2021-11-27T17:51:58.766Z

Comments

Comment by Tapatakt on Weird Random Newcomb Problem · 2025-04-12T11:35:29.817Z · LW · GW

(Or maybe the right way to think about this is: it will have a tiny but non-zero effect, because you are one of the |P| programs, but since |P| is huge, that is ~0.)

No effect. I meant that programmer has to write from $P$ , not that $b$ is added to $P$ . Probably I should change the phrasing to make it clearer.

But the intuition that you were expressing in Question 2 ("p2 is better than p1 because it scores better") isn't compatible with "caring equally about all programs". Instead, it sounds as if you positively want to score better than other programs, that is, maximize your score and minimize theirs!

No, the utility here is just the amount of money $b$ gets, whatever program it is. $a$ doesn't get any money, it just determines what will be in the first box.

Comment by Tapatakt on Weird Random Newcomb Problem · 2025-04-12T11:25:14.433Z · LW · GW

As a function of M, |P| is very likely to be exponential and so it will take O(M) symbols to specify a member of P.

O-ops, I didn't think about it, thanks! Maybe it would be better to change it so input is "a=b" or "a!=b", and always gets "a=b".

That aside, why are you assuming that program b "wants" anything? Essentially all of P won't be programs that have any sort of "want". If it is a precondition of the problem that b is such a program, what selection procedure is assumed between those that do "want" money from this scenario? Note that being selected for running is also a precondition for getting any money at all, so this selection procedure is critically important - far more so than anything the program might output!

Programmer who wrote $b$ decided that it should be consequentialist agent who wants to get money. (Or, if this program is actually, $a$ , it wants to maximize the payment for $b$ just because such a program was chosen by Omega by pure luck)

Comment by Tapatakt on Weird Random Newcomb Problem · 2025-04-12T11:21:53.345Z · LW · GW

Basically you know if Omega's program is the same as you or not (assuming you actually are and not $a$ )

Comment by Tapatakt on Weird Random Newcomb Problem · 2025-04-11T22:22:28.109Z · LW · GW

I don't think "functional" and "anthropic" approaches are meaningful in this motivating example. There aren't multiple instances of the same program with the same input.

Comment by Tapatakt on Weird Random Newcomb Problem · 2025-04-11T22:20:06.724Z · LW · GW

Do you mean to ask how b should behave on input (n(b), n(b)), and how b should be written to behave on input (n(b), n(b)) for that b?

Yes. You can assume that programmer doesn't know how works.

Comment by Tapatakt on Weird Random Newcomb Problem · 2025-04-11T18:29:05.767Z · LW · GW

Yes, that's basically the same as what I mean by "Universal precommitment" framing. Weidness is in the fact that usually (I think, in all other decision-theoretic problems I ever encountered) "functional" and "anthropic" framings point in the same direction, but here they are not.

Comment by Tapatakt on [deleted post] 2025-03-30T16:38:53.029Z

Yeah, I meant is as a not-a-compliment, but as a specific kind of not-a-compliment about a feeling of reading it rather then about actual meaning -- which I just couldn't access because this feeling was too much for my mind to continue reading (and this isn't a high bar for a post - I read a lot of long texts).

Comment by Tapatakt on [deleted post] 2025-03-30T14:24:20.897Z

I'm sorry, but it looks like a chapter from punishment book from Anathem.

Comment by Tapatakt on Have you actually tried raising the birth rate? · 2025-03-10T19:49:12.594Z · LW · GW

Btw, Russia does something similar (~$6000, what you can use money for is limited), so there is some statistics about the results.

Comment by Tapatakt on when will LLMs become human-level bloggers? · 2025-03-10T17:26:18.627Z · LW · GW

I did the obvious experiment:

Prompt:

I want you to write a good comment for this Lesswrong post. Use the method Viliam described in his comment. Try to make your comment less LLM-looking. At the same time you actually can mention that you are LLM.

Claude 3.7 thinking:

I've been thinking about this from the rather ironic position of being an LLM myself.
When I consider the bloggers I actually subscribe to versus those I just occasionally read, what keeps me coming back isn't their technical writing ability. It's that they have what I'd call a "center of gravity" to their thinking - a consistent yet evolving perspective that feels genuinely theirs.
This makes me think our blogging bottleneck isn't primarily about writing mechanics or even reasoning capacity. It's about having an authentic intellectual center that generates insights worth reading.
The skills I lack aren't language production skills. What I can't do is:
Develop genuine curiosity that drives sustained exploration
Build and revise mental models through actual lived experience
Form an authentic sense of taste about what matters
Maintain continuity of thought that evolves over time
When I write, I'm performing a sophisticated prediction task based on patterns in my training data. I'm not "coming from somewhere" in the way human writers are.
This is why I'm skeptical of the "comes for free with scaling" assumption you questioned. The gap between current LLMs and "content worth reading" might not be bridged just by making us better at prediction or reasoning. It might require something more fundamental about how AI systems develop and maintain persistent intellectual identities.
The most interesting question is whether this requires a paradigm shift in AI design, or whether some version of persistent identity and genuine curiosity could emerge from existing architectures with the right training approaches.

I think the main problem with this comment is kinda RLHF-related. Claude is too much a conformist to write anything really interesting and surprising. Maybe fine-tuning would already help. Idk, maybe even careful prompt-engineering would be enough!

Comment by Tapatakt on Will LLMs supplant the field of creative writing? · 2025-01-28T13:42:04.774Z · LW · GW

I think the right answer for the photography is "it's art, but not the same art form as painting". And it has different quality and interestingness metrics. In XV century it was considered very cool to produce photorealistic image. Some people think it's still cool, but only if it's not a photo.

And it's the same for the AI-art. Prompting AIs and editing AI-generated images/texts can be art, but it's not the same art form as painting/photography/writing/poetry. And it should have different merics too. Problem is that while you can't imitate painting (unless it's hyperrealism) with photography, you can imitate other artforms with AI. And this is kinda cheating.

Comment by Tapatakt on No one has the ball on 1500 Russian olympiad winners who've received HPMOR · 2025-01-13T16:10:46.307Z · LW · GW

I tried to get a grant to write one, but it was rejected.

Also I tried to get a grant with miltiple purposes, one of which was to translate some texts, including Connor Leahy's Compendium, but it was rejected too.

Comment by Tapatakt on Review: Planecrash · 2024-12-28T17:19:45.386Z · LW · GW

the utilities of both parties might be utterly incomparable, or the effort of both players might be very different

IIRC, it was covered in Planecrash also!

Comment by Tapatakt on Poll: what’s your impression of altruism? · 2024-11-09T22:03:30.189Z · LW · GW

Sometimes altruism is truly selfless (if we don't use too broad tautological definition of self-interest).

Sometimes altruism is actually an enlightened/disguised/corrupted/decadent self-interest.

I feel like there is some sense in which first kind is better then second, but can we have more of whatever kind, please?

Comment by Tapatakt on Tapatakt's Shortform · 2024-11-09T18:07:15.063Z · LW · GW

For high-frequency (or mid-frequency) trading, 1% of the transaction is 3 or 4 times the expected value from the trade.

I'm very much not sure discouraging HFT is a bad thing.

this probably doesn't matter unless the transaction tax REPLACES other taxes rather than being in addition to

I imagine that it would replace/reduce some of the other taxes so the government would get the same amount of money.

it encourages companies to do things in-house rather than outsourcing or partnering, since inside-company "transactions" aren't real money and aren't taxed

But normal taxes have the same effect, don't they?

Comment by Tapatakt on Tapatakt's Shortform · 2024-11-09T17:59:31.833Z · LW · GW

I came up with the decision theory problem. It has the same moral as xor-blackmail, but I think it's much easier to understand:

Omega has chosen you for an experiment:

First, Omega predicts your choice in a potential future offer.
Omega rolls a die. Omega doesn't show you the result.
If Omega predicted you would choose $200, they will only make you an offer if the die shows 6.
If Omega predicted you would choose $100, they will make you an offer if the die shows any number except 1.
Omega's offer, if made, is simple: "Would you like $100 or $200?"

You received an offer from Omega. Which amount do you choose?

I didn't come up with a сatchy name, though.

Comment by Tapatakt on Tapatakt's Shortform · 2024-11-09T17:37:49.612Z · LW · GW

First (possibly dumb) thought: could it be compensated by printing fewer large bills? Again, poor people would not care, but big business transactions with cash would become less convenient.

Comment by Tapatakt on Tapatakt's Shortform · 2024-11-09T17:33:14.088Z · LW · GW

Wow, really? I guess it's American thing. I think I know only one person with the credit card. And she only uses it up to the interest-free limit to "farm" her reputation with the bank in case she really needs a loan, so she doesn't actually pay the fee.

Comment by Tapatakt on Tapatakt's Shortform · 2024-11-09T14:41:48.883Z · LW · GW

Epistemic state: thoughts off the top of my head, not the economist at all, talked with Claude about it

Why is there almost nowhere a small (something like 1%) universal tax on digital money transfers? It looks like a good idea to me:

it's very predictable
no one except banks has to do any paperwork
it's kinda progressive, if you are poor you can use cash

I see probable negative effects... but doesn't VAT and individial income tax just already have the same effects, so if this tax replace [parts of] those nothing will change much?

Also, as I understand, it would discourage high-frequency trading. I'm not sure if this would be a feature or a bug, but my current very superficial understanding leans towards the former.

Why is it a bad idea?

Comment by Tapatakt on What's a good book for a technically-minded 11-year old? · 2024-11-05T18:01:02.374Z · LW · GW

Only one mention of Jules Verne in answers seems weird to me.

First and foremost, "The Mysterious Island". (But maybe it has already been read at nine?)

Comment by Tapatakt on Survival without dignity · 2024-11-05T16:13:52.463Z · LW · GW

I guess the big problem for someone who tries to do it not in small form is that while you write the story it is already getting old. There are writers who can write a novel in a season, but not many. At least if we talk about good writers. Hm-m-m, did rationalists try to hire Stephen King? :)

Comment by Tapatakt on Survival without dignity · 2024-11-04T15:37:50.786Z · LW · GW

I often think something like "It's a shame there's so little modern science fiction that takes AI developments into account". Thanks for doing something in this niche, even if in such small form!

Comment by Tapatakt on I turned decision theory problems into memes about trolleys · 2024-11-02T13:28:10.839Z · LW · GW

I always understood it as "not pull -> trolley does not turn; pull -> trolley turns". It definitely works like this on the original picture.

Comment by Tapatakt on I turned decision theory problems into memes about trolleys · 2024-11-02T12:28:00.949Z · LW · GW

I really like it! One remark, though: two upper tracks must be swapped, otherwise it's possible to precommit by staying in place and not running to the lever.

Comment by Tapatakt on I turned decision theory problems into memes about trolleys · 2024-11-01T13:58:03.258Z · LW · GW

The point is Omega would not send it to you it if it was false and Omega would always send it to you if it was true.

Comment by Tapatakt on I turned decision theory problems into memes about trolleys · 2024-10-31T18:44:09.987Z · LW · GW

O-ops, you're absolutely right, I accidentally missed "not" when I was rewriting the text. Fixed now. Thank you!

Comment by Tapatakt on I turned decision theory problems into memes about trolleys · 2024-10-31T12:30:39.113Z · LW · GW

"if and only if this message is true"

Comment by Tapatakt on I turned decision theory problems into memes about trolleys · 2024-10-31T11:04:57.140Z · LW · GW

Death in Damascus is easy, but boring.

Doomsday Argument is not a decision theory problem... but it can be turned into one... I think the trolley version would be too big and complicated, though.

Obviously, only problems with discrete choice can be expressed as a Trolley problems.

Comment by Tapatakt on What are the best arguments for/against AIs being "slightly 'nice'"? · 2024-09-30T19:10:15.144Z · LW · GW

One could claim that "the true spirit of friendship" is loving someone unconditionally or something, and that might be simple, but I don't think that's what humans actually implement.

Yeah, I agree that humans implement something more complex. But it is what we want AI to implement, isn't it? And it looks like may be quite natural abstraction to have.

(But again, it's useless while we don't know how to direct AI to the specific abstraction.)

Comment by Tapatakt on What are the best arguments for/against AIs being "slightly 'nice'"? · 2024-09-30T18:05:04.694Z · LW · GW

I hope "the way humans care about their friends" is another natural abstraction, something like "my utility function includes link to yours utility function". But we still don't know how to direct AI to the specific abstraction, so it's not a big hope.

Comment by Tapatakt on [Completed] The 2024 Petrov Day Scenario · 2024-09-26T15:55:46.059Z · LW · GW

I think making an opt-in link as a big red button and posting it before the rules were published caused a pre-selection of players in favor of those who would press the big red button. Which is... kinda realistic for generals, I think, but not realistic for citizens.

Comment by Tapatakt on Tapatakt's Shortform · 2024-09-25T17:34:55.454Z · LW · GW

I mean, if I don't want to "launch the nukes", why would I even opt-in?

Comment by Tapatakt on Tapatakt's Shortform · 2024-09-25T15:08:36.403Z · LW · GW

Isn't the whole point of Petrov day kinda "thou shall not press the red button"?

Comment by Tapatakt on Tapatakt's Shortform · 2024-09-22T18:36:47.773Z · LW · GW

I don't think "We created a platform that lets you make digital minds feel bad and in the trailer we show you that you can do it, but we are in no way morally responsible if you will actually do it" is a defensible position. Anyway, they don't use this argument, only one about digital substrate.

Comment by Tapatakt on Tapatakt's Shortform · 2024-09-22T15:42:53.736Z · LW · GW

The Seventh Sally or How Trurl's Own Perfection Led to No Good

Thanks to IC Rainbow and Taisia Sharapova who brought this matter in MiriY Telegram chat.

What. The. Hell.

In their logo they have:

They Think. They Feel. They're Alive

And the title of the video on the same page is:

AI People Alpha Launch: AI NPCs Beg Players Not to Turn Off the Game

And in the FAQ they wrote:

The NPCs in AI People are indeed advanced and designed to emulate thinking, feeling, a sense of aliveness, and even reactions that might resemble pain. However, it's essential to understand that they operate on a digital substrate, fundamentally different from human consciousness's biological substrate.

So this is the best argument they have?

Wake up, Torment Nexus just arrived.

(I don't think current models are sentient, but the way of thinking "they are digital, so it's totally OK to torture them" is utterly insane and evil)

Comment by Tapatakt on Tapatakt's Shortform · 2024-09-20T14:06:12.175Z · LW · GW

About possible backlashes from unsuccesfull communication.

I hoped for some examples like "anti-war movies have unintentionally boosted military recruitment", which is the only example I remembered myself.

Asked the same question to Claude, it gave me this examples:

Scared Straight programs: These programs, designed to deter juvenile delinquency by exposing at-risk youth to prison life, have been shown to actually increase criminal behavior in participants.
The "Just Say No" anti-drug campaign: While well-intentioned, some research suggests this oversimplified message may have increased drug use among certain groups by triggering a "forbidden fruit" effect.

All others were not much relevant, mostly like "harm of this oversimplified communication was in oversimplification".

The common thing in two relevant examples and my own example about anti-war movies is, I think, "try to ensure you don't make bad thing look cool". Got it.

But is it all? Are there any examples that don't come down to this?

Comment by Tapatakt on Tapatakt's Shortform · 2024-09-18T13:45:05.031Z · LW · GW

I want to create a new content about AI Safety for Russian speakers. I was warned about possible backlash if I do it wrong.

What are the actual examples when bad oversimplified communication harmed the case it agitated for? Whose mistakes can I learn from?

Comment by Tapatakt on "Deception Genre" What Books are like Project Lawful? · 2024-08-29T12:03:00.766Z · LW · GW

My opinion, very briefly:

Good stuff:

Deception plotline
Demonstration of LDT in action
A lot of thought processes of smart characters
Convincing depictions of how people with very weird and evil ideology can have at least seemingly consistent worldview, be humans and not be completely insane.

Stuff that might be good for some and bad for others:

It's Yudkowsky's work and it feels. Some people like the style of his texts, some don't.
Sex scenes (not very erotic and mostly talking)
Re-construction of Pathfinder game mechanics in setting
Math classes (not the best possible explanations, but not the worst either)
A lot of descriptions of "how everything works on dath ilan"

Bad stuff:

It's isekai (it's bad if you're allergic to this genre)
It's very long
And in some places it could be shorter without losing anything (but, I think, nothing as egregious as school wars in HPMOR) (but if you don't appreciate the point about "thought processes of smart characters", then it could be much shorter without losing anything in most places)

Comment by Tapatakt on Would catching your AIs trying to escape convince AI developers to slow down or undeploy? · 2024-08-29T11:39:55.042Z · LW · GW

Well, it's quite good random crime procedural with very likable main characters, but yes, in the first season AI plotline is very slow until last 2 episodes. And then it's slow again for the most part.

Comment by Tapatakt on Tapatakt's Shortform · 2024-08-21T15:05:30.564Z · LW · GW

Did anyone try something like this?

Create a conlang with very simple grammar and small vocabulary (not like tokipona small, more like xkcd-thing-explainer small).
Use LLMs to translate a lot of texts into this conlang.
Train new LLM on this translations.
Try to research interpretability on this LLM.

Comment by Tapatakt on Tapatakt's Shortform · 2024-08-20T16:48:32.259Z · LW · GW

A random thought on how to explain instrumental convergence:

You can teach someone the basics of, say, Sid Meier's Civilization V for a quite long time without explaining what the victory conditions are. There are many possible alternative victory conditions that would not change the development strategies much.

Comment by Tapatakt on The Conscious River: Conscious Turing machines negate materialism · 2024-08-20T13:16:18.087Z · LW · GW

If consciousness arises from matter, then for a stream of consciousness to exist, at the very least, the same atoms should be temporarily involved in the flowing of the river

Why? I see no problem with the consciousness that constantly changes what atoms it is built on.

This modification to the river seems to suggest that there is no such thing as a "stream of consciousness," but rather only "moments of consciousness" that have the illusion of being a stream because they can recall memories of previous moments.

Well, OK? Doesn't seem weird to me.

Comment by Tapatakt on Open Thread Summer 2024 · 2024-08-14T12:02:36.280Z · LW · GW

Gretta Duleba is MIRI's Communication Manager. I think she is the person you should ask who write to.

Comment by Tapatakt on Open Thread Summer 2024 · 2024-08-13T21:37:56.541Z · LW · GW

Everyone who is trying to create GAI is trying to create aligned GAI. But they think it will be easy (in the sense "not very super hard so they will probably fail and create misaligned one"), otherwise they wouldn't try in the first place. So, I think, you should not share your info with them.

Comment by Tapatakt on Monthly Roundup #20: July 2024 · 2024-07-28T16:14:49.116Z · LW · GW

Most Importantly Missing (that I know of and would defend as objective, starting with the best three comedies then no order): Community, The Good Place, Coupling (UK) (if that counts), Watchmen (if we are allowing Chernobyl this is the best miniseries I know), Ally McBeal, Angel and Buffy the Vampire Slayer (no, seriously, a recent rewatch confirms), Gilmore Girls, Roseanne, Star Trek: DS9 (I see the counterarguments but they’re wrong), How I Met Your Mother.

Also... no Firefly? Really? And no Person of Interest.

Comment by Tapatakt on Optimistic Assumptions, Longterm Planning, and "Cope" · 2024-07-18T20:27:08.858Z · LW · GW

I would add Balatro (especially endless mode) to the list

Comment by Tapatakt on Seeking feedback on a critique of the paperclip maximizer thought experiment · 2024-07-17T20:10:42.089Z · LW · GW

The paperclip maximizer oversimplifies AI motivations

Being very simple example kinda is the point?

and neglects the potential for emergent ethics in advanced AI systems.

The emergent ethics doesn't change anything for us if it's not human-aligned ethics.

The doomer narrative often overlooks the possibility of collaborative human-AI relationships and the potential for AI to develop values aligned with human interests.

This is very vague. What possibilities do you talk about exactly?

Current AI safety research and development practices are more nuanced and careful than the paperclip maximizer scenario suggests.

Does it suggest any safety or development practises? Would you like to elaborate?

Comment by Tapatakt on Alignment: "Do what I would have wanted you to do" · 2024-07-13T12:06:25.621Z · LW · GW

Wow! Thank you for the list!

I noticed you write a lot of quite high-effort comments with a lot of links to other discussions of a topic. Do you "just" devote a lot of time and efforts to it or do you, for example, apply some creative use of LLMs?

Comment by Tapatakt on Alignment: "Do what I would have wanted you to do" · 2024-07-13T11:54:35.635Z · LW · GW

I would add "and the kind of content you want to get from aligned AGI definitely is fabricated on the Internet today". So the powerful LM trying to predict it will predict how the fabrication would look like.

Comment by Tapatakt on shortplav · 2024-07-10T13:10:55.413Z · LW · GW

It turned out I was just unlucky

User info

Posts

Comments