avturchin's Shortform

avturchin

avturchin's Shortform

post by avturchin · 2019-08-13T17:15:26.435Z · LW · GW · 171 comments

172 comments

171 comments

Comments sorted by top scores.

comment by avturchin · 2025-03-28T12:36:59.304Z · LW(p) · GW(p)

LLM knows when it hallucinates in advance, and this can be used to exclude hallucinations.

TLDR: prompt "predict the hallucination level of each item in the bibliography list and do not include items expected to have level 3 or above" works.

I performed an experiment: I asked Claude 3.7 Sonnet to write the full bibliography of Bostrom. Around the 70th article, it started hallucinating. I then sent the results to GPT-4.5 and asked it to mark hallucinations and estimate the hallucination chances from 1 to 10 (where 10 is the maximal level of hallucination). It correctly identified hallucinations.

After that, I asked Sonnet 3.7 in another window to find the hallucination level in its own previous answer, and it gave almost the same answers as GPT-4.5. The difference was mostly about exact bibliographical data of some articles, and at first glance, it matched 90% of the data from GPT-4.5. I also checked the real data through Google Scholar manually.

After that, I asked Sonnet to write down the bibliography again but add a hallucination rating after each item. It again started hallucinating articles soon, but to my surprise, it gave correct answers ratings of 1-2 and incorrect ones ratings of 3-5 level of hallucination.

In the next step, I asked it to predict in advance which level of hallucination the next item would have and, if it was 3 or above, not to include it in the list. And it worked! It doesn't solve the problem of hallucinations completely but lowers their level about 10 times. Obviously, it can sometimes hallucinate the level of hallucinations too.

Maybe I can ask meta: predict the level of hallucinations in your hallucination estimate.

Replies from: dtch1997, Max Lee, ACCount

↑ comment by Daniel Tan (dtch1997) · 2025-03-28T16:11:29.676Z · LW(p) · GW(p)

This is pretty cool! Seems similar in flavour to https://arxiv.org/abs/2501.11120 you’ve found another instance where models are aware of their behaviour. But, you’ve additionally tested whether you can use this awareness to steer their behaviour. I’d be interested in seeing a slightly more rigorous write-up.

Have you compared to just telling the model not to hallucinate?

Replies from: avturchin

↑ comment by avturchin · 2025-03-29T11:53:15.490Z · LW(p) · GW(p)

I found that this does not work for finding an obscure quote from a novel. It still hallucinates different, more popular novels as sources and is confident in them. But it seems it doesn't know the real answer, though I am sure that the needed novel was in its training dataset (it knows plot).

Replies from: erioire

↑ comment by ErioirE (erioire) · 2025-03-31T17:51:04.742Z · LW(p) · GW(p)

This seems like a somewhat difficult use case for LLMs. It may be a mistake to think of them as a database of the *entire contents* of the training data. Perhaps instead think of them as compressed amalgamations of the the general patterns in the training data? I'm not terribly surprised that random obscure quotes can get optimized away.

Replies from: avturchin

↑ comment by avturchin · 2025-03-31T21:43:42.909Z · LW(p) · GW(p)

Yes, but it knows all Bostrom articles, maybe because it has seen the list a hundred times.

↑ comment by Knight Lee (Max Lee) · 2025-03-31T06:31:53.283Z · LW(p) · GW(p)

It's incredibly surprising that state-of-the-art AI don't fix most of their hallucinations despite being capable (and undergoing reinforcement learning).

Is the root cause of hallucination alignment rather than capabilities?!

Maybe the AI gets a better RL reward if it hallucinates (instead of giving less info), because users are unable to catch its mistakes.

↑ comment by ACCount · 2025-03-30T15:03:20.567Z · LW(p) · GW(p)

This is way more metacognitive skill than what I would have expected an LLM to have. I can make sense of how an LLM would be able to do that, but only in retrospect.

And if a modern high end LLM already knows on some level and recognizes its own uncertainty? Could you design a fine tuning pipeline to reduce hallucination level based on that? At least for reasoning models, if not for all of them?

Replies from: avturchin

↑ comment by avturchin · 2025-03-30T16:55:43.947Z · LW(p) · GW(p)

It looks like (based on the article published a few days ago by Anthropic about the microscope) Claude Sonnet was trained to distinguish facts from hallucinations, so it's not surprising that it knows when it hallucinates.

Replies from: ACCount

↑ comment by ACCount · 2025-03-31T10:35:36.994Z · LW(p) · GW(p)

Is the same true for GPT-4o then, which could spot Claude's hallucinations?

Might be worth testing a few open source models with better known training processes.

comment by avturchin · 2024-10-30T10:38:43.765Z · LW(p) · GW(p)

Lifehack: If you're attacked by a group of stray dogs, pretend to throw a stone at them. Each dog will think you're throwing the stone at it and will run away. This has worked for me twice.

Replies from: prudhviraj-naidu, gwern, Warty, Rob Lucas, elityre, Ruby, shankar-sivarajan

↑ comment by Prudhviraj Naidu (prudhviraj-naidu) · 2024-10-30T17:46:24.082Z · LW(p) · GW(p)

Speaking from experience in Mumbai, just pretending to throw a stone doesn't necessarily work. You have to pretend to pick up a stone and then throw it.

Replies from: avturchin, daniel-kokotajlo

↑ comment by avturchin · 2024-10-30T17:59:43.985Z · LW(p) · GW(p)

Yes. It is important point.

↑ comment by Daniel Kokotajlo (daniel-kokotajlo) · 2024-10-30T22:22:12.533Z · LW(p) · GW(p)

Huh. If you pretend to throw the stone, does that mean you make a throwing motion with your arm, but just don't actually release the object you are holding? If so, how come they run away instead of e.g. cringing and expecting to get hit, and then not getting hit, and figuring that you missed and are now out of ammo?

Or does it mean you make menacing gestures as if to throw, but don't actually make the whole throwing motion?

Replies from: avturchin

↑ comment by avturchin · 2024-10-31T12:50:36.271Z · LW(p) · GW(p)

As was said above, first you need to pick a stone from the ground or pretend that you are doing this if there is no stone around. Even if you have a stone, make the gesture that you take it from the ground.

Another important point is to do it quickly and aggressively with loud cry. Also you can pull back one's arm with a stone.

The whole trick is that dogs are so afraid of stones that they will run away before you actually throw it or they see where it fails.

↑ comment by gwern · 2024-10-30T16:00:10.336Z · LW(p) · GW(p)

Hm. Does that imply that a pack of dogs hunting a human is a stag hunt game?

Replies from: avturchin, ChristianKl

↑ comment by avturchin · 2024-10-30T17:42:55.018Z · LW(p) · GW(p)

There are some game theory considerations here:

If I throw the stone, all dogs will know that I don't have it anymore, so it would be safe for them to continue the attack (whether I hit one or miss). Therefore, it's better for me to threaten and keep two stones rather than actually throw one.

If dogs really want to attack me, they might prefer that I throw the stone so they can attack afterward.
However, I think each dog fails to consider that I'm most likely to throw the stone at another dog. Each individual dog has a small chance of being injured by the stone, and they could succeed if they continue the attack. Real hunters like wolves might understand this.

↑ comment by ChristianKl · 2024-11-01T11:53:44.271Z · LW(p) · GW(p)

The dogs are not hunting humans but want to defend territory or something similar.

Replies from: avturchin

↑ comment by avturchin · 2024-11-01T11:59:41.592Z · LW(p) · GW(p)

The problem is that their understanding of their territory is not the same as our legal understanding, so they can attack on the roads outside their homes.

Replies from: ChristianKl

↑ comment by ChristianKl · 2024-11-01T12:19:34.631Z · LW(p) · GW(p)

My point is that the behavior is not well modeled as "hunting humans". They don't attack humans with the intent to kill and eat as prey.

↑ comment by Warty · 2024-10-30T19:14:24.126Z · LW(p) · GW(p)

burning the dog defense commons 😔

↑ comment by Rob Lucas · 2024-11-01T11:06:33.188Z · LW(p) · GW(p)

When I was trekking in Qinghai my guide suggested we do a hike around a lake on our last day on the way back to town. It was just a nice easy walk around the lake. But there were tibetan nomads (nomadic yak herders, he just referred to them as nomads) living on the shore of the lake, and each family had a lot of dogs (Tibetan Mastiffs as well as a smaller local dog they call "three eyed dogs"). Each time we got near their territory the pack would come out very aggressively.

He showed me how to first always have some stones ready, and second when they approached to throw a stone over their head when they got too close. "Don't hit the dogs" he told me, "the owners wouldn't be happy if you hit them, and throwing a stone over their heads will warn them off".

When they came he said, "You watch those three, I need to keep an eye on the ones that will sneak up behind us." Each time the dogs used the same strategy. There'd be a few that were really loud and ran up to us aggressively. Then there'd be a couple sneaking up from the opposite side, behind us. It was my job to watch for them and throw a couple of stones in their direction if they got too close.

He also made sure to warn me, "If one of them does get to you, protect your throat. If you have to give it a forearm to bite down on instead of letting it get your throat." He had previously shown me the large scar on his arm where he'd used that strategy in the past. When I looked at him sort of shocked he said, "don't worry, it probably won't come to that." At this point I was wondering if maybe we should skip the lake walk, but I did go there for an adventure. Luckily the stone throwing worked, and we were walking on a road with plenty of stones, so it never really got too dangerous.

Anyway, +1 to your advice, but also look out for the dogs that are coming up behind you, not just the loud ones that are barking like mad as a distraction.

↑ comment by Eli Tyre (elityre) · 2024-10-31T04:51:00.242Z · LW(p) · GW(p)

You have been attacked by a pack of stray dogs twice?!?!

Replies from: niplav

↑ comment by niplav · 2024-10-31T07:59:02.491Z · LW(p) · GW(p)

Not surprising to me: I've lived in a city with many stray dogs for less than half a year, and got "attacked" ("harrassed" is maybe a better term) by a stray dog twice.

↑ comment by Ruby · 2024-10-31T17:35:36.795Z · LW(p) · GW(p)

Dog: "Oh ho ho, I've played imaginary fetch before, don't you worry."

↑ comment by Shankar Sivarajan (shankar-sivarajan) · 2024-10-31T14:07:16.856Z · LW(p) · GW(p)

Why pretend, and not actually throw a stone? Or is this meant as a feint in case you can't find one lying within reach?

comment by avturchin · 2024-12-14T00:59:04.638Z · LW(p) · GW(p)

OpenAI whistleblower found dead in San Francisco apartment.

Suchir Balaji, 26, claimed the company broke copyright law.

Replies from: Viliam, Benito, lc, john-wiseman, MakoYass, green_leaf

↑ comment by Viliam · 2024-12-14T21:43:59.809Z · LW(p) · GW(p)

Suppose that you are a whistleblower, and you suspect what someone will try to "suicide" you. How can you protect yourself?

If someone wants to murder you, they can. If you ever walk outside, you can't avoid being shot by a sniper. Or a random thug will be paid by a mysterious stranger to stab you. So my question is not "how can you make yourself immortal", but rather "how can you make it so that if you are killed, it will very obviously not be a suicide".

Saying "I have no intention to kill myself, and I suspect that I might be murdered" is not enough.

Wearing a camera that is streaming to a cloud 24/7, and your friends can publish the video in case of your death... seems a bit too much. (Also, it wouldn't protect you e.g. against being poisoned. But I think this is not a typical way how whistleblowers die.) Is there something simpler?

Replies from: jimrandomh, lc, avturchin, lc

↑ comment by jimrandomh · 2024-12-15T01:12:24.574Z · LW(p) · GW(p)

You can prevent this by putting a note in some place that isn't public but would be found later, such as a will, that says that any purported suicide note is fake unless it contains a particular password.

Unfortunately while this strategy might occasionally reveal a death to have been murder, it doesn't really work as a deterrent; someone who thinks you've done this would make the death look like an accident or medical issue instead.

Replies from: TsviBT, shankar-sivarajan

↑ comment by TsviBT · 2024-12-15T01:49:02.364Z · LW(p) · GW(p)

You can publish it, including the output of a standard hash function applied to the secret password. "Any real note will contain a preimage of this hash."

Replies from: TrevorWiesinger

↑ comment by trevor (TrevorWiesinger) · 2024-12-16T05:28:46.622Z · LW(p) · GW(p)

Your effort must scale to be appropriate to the capabilities of the people trying to remove you from the system. You have to know if they're the type of person who would immediately default to checking the will.

More understanding and calibration towards what modern assassination practice you should actually expect is mandatory because you're dealing with people putting some amount of thinkoomph into making your life plans fail, so your cost of survival is determined by what you expect your attack surface looks like. The appropriate-cost and the cost-you-decided-to-pay vary in OOMs depending on the circumstances, particularly the intelligence, resources, and fixations of the attacker. For example, the fact that this happened 2 weeks after assassination got all over the news is a fact that you don't have the privilege of ignoring if you want the answer, even though that particular fact will probably turn out to be unhelpful e.g. because the whole thing was probably just a suicide due to the base rates of disease and accidents and suicide being so god damn high.

If this sounds wasteful, it is. It's why our civilization has largely moved past assassination, even though getting-people-out-of-the-way is so instrumentally convergent for humans. We could end up in a cycle where assassination gets popular again after people start excessively standing in each other's way (knowing they won't be killed for it), or a stable cultural state like the Dune books or the John Wick universe and we've just been living in a long trough where elites aren't physically forced to live their entire lives like mob bosses playing chess games against invisible adversaries.

So don't think that if you only follow the rules of Science, that makes your reasoning defensible. [? · GW]
There is no known procedure you can follow that makes your reasoning defensible. [? · GW]
There is no known set of injunctions which you can satisfy, and know that you will not have been a fool. [? · GW]

↑ comment by Shankar Sivarajan (shankar-sivarajan) · 2024-12-15T03:32:44.679Z · LW(p) · GW(p)

How is this better than stating explicitly that you're not going to commit suicide?

Replies from: Seth Herd

↑ comment by Seth Herd · 2024-12-15T19:16:59.595Z · LW(p) · GW(p)

People change their minds a lot.

Replies from: shankar-sivarajan

↑ comment by Shankar Sivarajan (shankar-sivarajan) · 2024-12-16T03:45:13.508Z · LW(p) · GW(p)

Yes, they do. People also amuse themselves from beyond the grave by arranging for their deaths to look like murders before killing themselves. Or are so overcome by remorse at fabricating lies about their beloved friends to the feds that they encase their feet in concrete and throw themselves into nearby lakes without thinking about how it'd look. Or forget their secret passwords to authenticate their suicide notes and decide it's too much trouble to retrieve it.

So sure, I agree there are reasons why a death that strongly looks like murder might still be suicide. But that doesn't address my position that if you can broadcast the message that you have no intention to kill yourself in the clear with perfect authentication, and still not be sufficiently convincing that your imminent death isn't suicide, elaborate schemes with passwords or cryptographic hashes don't do anything.

Replies from: Seth Herd

↑ comment by Seth Herd · 2024-12-16T04:42:00.561Z · LW(p) · GW(p)

Really they do those things? The concrete?

I think it's on a spectrum of likelihood and therefore believability.

I wasn't commenting on your message, just what you'd said in that comment. Sure it's better to say it than not. And better yet to do more.

↑ comment by lc · 2024-12-15T17:33:19.001Z · LW(p) · GW(p)

If someone wants to murder you, they can. If you ever walk outside, you can't avoid being shot by a sniper.

If the person or people trying to murder you is omnicompetent, then it's hard. If they're regular people, then there are at least lots of temporary measures you can take that would make it more difficult. You can fly to a random state or country and check into a motel without telling anybody where you are. Or you could find a bunch of friends and stay in a basement somewhere. Mobsters used to call doing that sort of thing for a time before a threat had receded "going to ground".

Wearing a camera that is streaming to a cloud 24/7, and your friends can publish the video in case of your death... seems a bit too much. (Also, it wouldn't protect you e.g. against being poisoned. But I think this is not a typical way how whistleblowers die.) Is there something simpler?

If you move to New York or London, your every move outside of a private home or apartment will already be recorded. Then place a security camera in your house.

↑ comment by avturchin · 2024-12-15T19:10:17.131Z · LW(p) · GW(p)

I will lower the possible incentive of the killers by publishing all I know - and make it in such legal way that it can be used in court even if I am dead (affidavit?)

↑ comment by lc · 2024-12-15T17:51:55.558Z · LW(p) · GW(p)

Saying "I have no intention to kill myself, and I suspect that I might be murdered" is not enough.

Frankly I do think this would work in many jurisdictions. It didn't work for John McAfee because he has a history of crazy remarks, it sounds like the sort of thing he'd do to save face/generate intrigue if he actually did plan on killing himself, and McAfee made no specific accusations. But if you really thought Sam Altman's head of security was going to murder you, you'd probably change their personal risk calculus dramatically by saying that repeatedly on the internet. Just make sure you also contact police specifically with what you know, so that the threat is legible to them as an institution.

Replies from: keltan

↑ comment by keltan · 2024-12-16T05:24:52.447Z · LW(p) · GW(p)

I may be an outlier here. But if I thought I was going to be assassinated, I would think of:

JFK -MLK
James A. Garfield
Lincoln
Franz Ferdinand

And from these I'd think "Hu, better buy a bullet proof vest".

I would unfortunately not think about 'Being Suicided', unless I had an expectation that it would occur in this way.

Replies from: avturchin

↑ comment by avturchin · 2024-12-16T11:05:22.279Z · LW(p) · GW(p)

One way of not being suicide is not live alone. Stay with 4 friends.

↑ comment by Ben Pace (Benito) · 2024-12-14T03:48:31.062Z · LW(p) · GW(p)

Are there Manifold markets yet on whether this was a suicide and whether it will turn out that this was due to any pressures relating to the OpenAI whistleblowing?

↑ comment by lc · 2024-12-14T20:01:01.184Z · LW(p) · GW(p)

Tapping the sign:

↑ comment by John Wiseman (john-wiseman) · 2024-12-14T03:35:36.780Z · LW(p) · GW(p)

https://www.lesswrong.com/posts/yLFyoYhbhDYtuQWjm/probability-of-death-by-suicide-by-a-26-year-old

↑ comment by mako yass (MakoYass) · 2024-12-14T03:07:36.021Z · LW(p) · GW(p)

All novel information:

The medical examiner’s office determined the manner of death to be suicide and police officials this week said there is “currently, no evidence of foul play.”
Balaji’s death comes three months after he publicly accused OpenAI of violating U.S. copyright law while developing ChatGPT
The Mercury News [the writers of this article] and seven sister news outlets are among several newspapers, including the New York Times, to sue OpenAI in the past year.
The practice, he told the Times, ran afoul of the country’s “fair use” laws governing how people can use previously published work. In late October, he posted an analysis on his personal website arguing that point.
In a Nov. 18 letter filed in federal court, attorneys for The New York Times named Balaji as someone who had “unique and relevant documents” that would support their case against OpenAI. He was among at least 12 people — many of them past or present OpenAI employees — the newspaper had named in court filings as having material helpful to their case, ahead of depositions.
OpenAI has staunchly refuted those claims, stressing that all of its work remains legal under “fair use” laws.

Replies from: john-wiseman

↑ comment by John Wiseman (john-wiseman) · 2024-12-14T03:24:35.577Z · LW(p) · GW(p)

↑ comment by green_leaf · 2024-12-15T20:23:33.199Z · LW(p) · GW(p)

Does anyone have stats on OpenAI whistleblowers and their continued presence in the world of living?

comment by avturchin · 2024-10-27T14:29:18.714Z · LW(p) · GW(p)

Collapse of mega-project to create AI based on linguistics

ABBYY spent 100 million USD for 30 years to create a model of language using hundreds of linguists. It fails to compete with transformers. This month the project was closed. More in Russian here: https://sysblok.ru/blog/gorkij-urok-abbyy-kak-lingvisty-proigrali-poslednjuju-bitvu-za-nlp/

Replies from: gwern, cubefox, Chris_Leong

↑ comment by gwern · 2024-10-27T21:32:49.916Z · LW(p) · GW(p)

I had no idea ABBYY was so big. I thought it was just some minor OCR or PDF software developer. Interesting to hear about their historical arc. (I am also amused to see my Sutton meme used.)

↑ comment by cubefox · 2024-10-27T18:10:28.587Z · LW(p) · GW(p)

Thanks, this was an interesting article. The irony of course being that I, not knowing Russian, read it using Google Translate.

↑ comment by Chris_Leong · 2024-10-27T15:36:03.108Z · LW(p) · GW(p)

What's ABBYY?

Replies from: avturchin, Mo Nastri

↑ comment by avturchin · 2024-10-28T09:33:18.193Z · LW(p) · GW(p)

ABBYY created Finereader which was one of the best OCR systems.

↑ comment by Mo Putera (Mo Nastri) · 2024-10-27T16:29:09.053Z · LW(p) · GW(p)

Wikipedia says it's a SaaS company "specializing in AI-powered document processing and automation, data capture, process mining and OCR": https://en.wikipedia.org/wiki/ABBYY

comment by avturchin · 2024-10-23T12:49:36.878Z · LW(p) · GW(p)

"Bird Flu H5N1: Not Chaos, but Conspiracy?" By Alexander Pruss
Two months ago, I was puzzled how bird flu, potentially capable of killing tens of millions, went rampant on American livestock farms and began infecting workers, yet no urgent measures were being taken. Even standard epidemiological threat monitoring was happening unsystematically, with months-long delays, and results weren't being made public for months afterward. What happened to the bitter lessons from the coronavirus pandemic? Why such chaos? Since then, the sense of criminal inaction has only intensified. Missouri discovered the first outbreak of human cases unrelated to farm workers, but molecular testing was neglected and infection paths remained undiscovered.

In California, a more pathogenic variant of bird flu spread to hundreds of dairy farms, reportedly killing up to 15% of cows, with almost daily new cases of virus transmission to humans. The virus apparently came to California through cattle transportation from Idaho, despite belatedly introduced rules formally prohibiting the transport of infected cows across state lines. The problem was that infection in transported cows was checked through selective testing, and as reported, the sampling wasn't random: before government testing, farmers secretly tested cows for bird flu in private laboratories and selected only healthy ones for official testing. Here's the continuation of the translation:

A new Vanity Fair investigation shows this isn't random chaos. The USDA (U.S. Department of Agriculture) has been blocking research and data about the new infection in America's dairy herds from the start to protect the multi-billion-dollar American dairy export industry and the interests of giant national dairy processing companies. The idea was simple: most cows recover after a few weeks, and while the bird flu virus does get into milk in huge quantities, it should die during pasteurization. Therefore, the economic losses from the pathogen aren't that severe. However, if consumers in America and especially abroad raise the alarm, it could result in much greater dollar losses. USDA Secretary Thomas Vilsack knows this firsthand: before his government appointment, he worked as president of the U.S. Dairy Export Council.

And immediately after it was finally discovered in March 2024 that dairy farms in Texas and Kansas were hit by bird flu, veterinarians and state officials began receiving calls from personal mobile phones of USDA veterinary institute workers: "we're officially forbidden to discuss this problem without permission from the very top, and unofficially we're asking you to keep quiet about it too." But what about the danger that the virus, having settled in mammals and especially humans, could recombine with our seasonal flu viruses and produce hybrid viruses that combine the infectious and pathogenic potential of human viruses with immunity to our regular antibodies inherited from their avian ancestor?

This, generally speaking, isn't USDA's concern. This alarm was raised by the White House Office of Pandemic Preparedness and Response (OPPR), created in 2023, under the leadership of military doctor and biosecurity expert Paul Friedrichs. In early April, dairy industry representatives raised concerns that some upstart from the White House was muddying the waters. USDA's response was their new policy of official secrecy. Secretary Vilsack responded only a month later to state veterinarians' inquiries about the sudden communication breakdown, and his response was essentially a brush-off. And his ally in Texas, state agriculture commissioner Sid Miller, even hinted that if Friedrichs' people stick their noses into Texas farms, they might be met with bullets.

A number of veterinarians who disagreed with USDA's actions soon lost their jobs, and the country fell into an atmosphere of "work-to-rule," where veterinary authorities appear to be doing their job, but as slowly as possible and with all the red tape that can be justified by regulations. Meanwhile, flu season is approaching, and encounters between bird and human flu in people infected with both viruses are inevitable in the near future.

I forgot to add that by May, a vaccine for bird flu became available for cows, but the USDA chose not to use it.

Replies from: Viliam

↑ comment by Viliam · 2024-10-25T08:12:25.601Z · LW(p) · GW(p)

Sounds similar to the kind of logic that makes salmonellosis 10x more frequent in America than in Europe.

On one hand, yes, the optimal number of people dying from farm-produced diseases is greater then zero, and overreaction could cause net harm.

On the other hand, it feels like the final decision should be made in some way better than "the farmers lobby declares the topic taboo, and enforces the taboo across the nation", because the one-sided incentives are obvious.

Replies from: avturchin

↑ comment by avturchin · 2024-10-25T20:07:00.444Z · LW(p) · GW(p)

Also, bird flu is an international risk and other countries may sue US if it fails to prevent virus' evolution in obviously foreseeable way.

comment by avturchin · 2024-04-25T19:28:08.721Z · LW(p) · GW(p)

Roman Mazurenko is dead again. First resurrected person, Roman lived as a chatbot (2016-2024) created based on his conversations with his fiancé. You might even be able download him as an app.

But not any more. His fiancé married again and her startup http://Replika.ai pivoted from resurrection help to AI-girlfriends and psychological consulting.

It looks like they quietly removed Roman Mazurenko app from public access. It is especially pity that his digital twin lived less than his biological original, who died at 32. Especially now when we have much more powerful instruments for creating semi-uploads based on LLMs with large prompt window.

Replies from: Raemon

↑ comment by Raemon · 2024-04-29T20:50:01.329Z · LW(p) · GW(p)

I hadn't known Replika started out with this goal. Interesting.

It is especially pity that his digital twin lived less than his biological original, who died at 32

Not exactly the main point, but I'd probably clock this in terms of number of conversational inputs/outputs (across all users). Which might still imply "living less long"*, but less so than if you're just looking at wallclock time.

*also obviously an oldschool chatbot doesn't actually count as "living" in actually meaningful senses. I think modern LLMs might plausibly.

Replies from: avturchin

↑ comment by avturchin · 2024-04-29T21:36:18.793Z · LW(p) · GW(p)

Yes, they can do now a much better version - and hope they will do it internally. But deleting the public version is bad precedent and better to make all personal sideloads opensourced

Replies from: Raemon

↑ comment by Raemon · 2024-04-29T21:52:01.689Z · LW(p) · GW(p)

Uh I do think it's not obviously good (and, in fact, I'd lean bad) to be opensourced for this sort of thing [LW · GW].

comment by avturchin · 2022-09-03T17:45:21.442Z · LW(p) · GW(p)

Igor Kiriluk (1974-2022)

Igor was an organiser the first meet-up in Moscow about effective altruism around 2013. Today his body was found at his home. The day before he complained about depression and bad health. His cryopreservation now is being organised.

He was also a one of four organisers of Russian Transhumanist Movement, along with Danila Medvedev, Valeria Pride and Igor Artuhov around 2003.

His main topic of interest was paradise-engineering. He translated works of David Pearce.

He may look detached from reality but he was first to react on new ideas and has very large network of friends everywhere: between visionaries, scientists and officials. Being a great networker, he helped many people to find each other, especially in the field of life extension.

His FB page: https://www.facebook.com/igor.kirilyuk.3

comment by avturchin · 2024-04-18T16:32:14.148Z · LW(p) · GW(p)

I am building my sideload via recursively correcting of 1-million-tokens prompt for large LLM. The prompt consists of 500 rules which describe my personality, similar to personal constitution, and of some texts, like diaries, abstracts, poetry, stream of thoughts etc. Works on Google Gemini 1M through Google AI studio, and the shorter version works great on Opus. The system also includes a universal "loader prompt" which tries to increase the intelligence of the model and describes how the chatbot should work.

I found that sideloading allows very quick iterations in the sideload's improvements and the improvements are two-fold: of the loader itself and improvements of the knowledge and style of the sideload.

I find that my sideload is surprisingly good for a project which took around 1 month of work. 1 of the 5 answers is exactly like mine from a factual and style point of view.

I am open-sourcing my sideload, anyone can run it https://github.com/avturchin/minduploading/tree/main

I can help anyone interested to build his-her own sideload.

Example of work of the chatbot, no cherry picking:

Q:(now speak in english) what will be your next post in Lesswrong?

A: /The user is asking about my next LessWrong post. I will answer as if I am Alexey Turchin, and describe my plans for future posts, as well as my current location and the people around me./

My next LessWrong post? [Alexey looks thoughtfully out the window]

(I have several drafts)

[Alexey stretches and sits up in bed]

{The room is small, with a balcony. On the table is a laptop, some papers, and reading glasses. Outside the window, it is still gray and dreary.}

I'm thinking about writing about the nature of qualia and the possibility of creating qualia of non-existent things. [Alexey gets up, walks to the window, and looks out at the street]

{Below, a woman with a child is walking, a street sweeper is lazily sweeping leaves. Somewhere in the distance, the sound of a siren can be heard.}

Or maybe about the problem of copies and identity. (it's a complex topic)

[CONFIDENCE] 8

comment by avturchin · 2022-01-03T13:33:41.391Z · LW(p) · GW(p)

New b.1.640.2 variant in France. More deadly than delta. 952 cases of which 315 on ventilator.

https://www.thailandmedical.news/news/breaking-updates-on-new-b-1-640-2-variant-spreading-in-southern-france-number-of-cases-growing-and-variant-now-detected-in-united-kingdom-as-well

https://flutrackers.com/forum/forum/europe-aj/europe-covid-19-sept-13-2020-may-31-2021/933598-southern-france-reports-of-new-variant-with-46-mutations

comment by avturchin · 2024-04-29T18:44:09.831Z · LW(p) · GW(p)

ChatGPT 4.5 is on preview at https://chat.lmsys.org/ under name gpt-2.

It calls itself ChatGPT 2.0 in a text art drawing https://twitter.com/turchin/status/1785015421688799492

Replies from: gwern, peterbarnett, metachirality, bruberu, p.b.

↑ comment by gwern · 2024-04-29T23:28:11.352Z · LW(p) · GW(p)

https://rentry.org/GPT2

I ran out of tokens quickly trying out poetry but I didn't get the impression that this is a big leap over GPT-4 like GPT-5 presumably is designed to be. (It could, I suppose, be a half-baked GPT-5 similar to 'Prometheus' for GPT-4.) My overall impression from poetry was that it was a GPT-4 which isn't as RLHF-damaged as usual, and more like Claude in having a RLAIF-y creative style. So I could believe it's a better GPT-4 where they are experimenting with new tuning/personality to reduce the ChatGPT-bureaucratese.

HN: https://news.ycombinator.com/item?id=40199715

Replies from: avturchin

↑ comment by avturchin · 2024-04-30T10:11:18.517Z · LW(p) · GW(p)

It failed my favorite test: draw a world map in text art.

↑ comment by peterbarnett · 2024-04-29T18:55:31.890Z · LW(p) · GW(p)

Related market on Manifold:

↑ comment by metachirality · 2024-04-29T19:05:27.044Z · LW(p) · GW(p)

We don't actually know if it's GPT 4.5 for sure. It could be an alternative training run that preceded the current version of ChatGPT 4 or even a different model entirely.

Replies from: faul_sname

↑ comment by faul_sname · 2024-04-29T23:27:12.232Z · LW(p) · GW(p)

It might be informative to try to figure out when its knowledge cutoff is (right now I can't do so, as it's at it's rate limit).

Replies from: o-o, avturchin

↑ comment by O O (o-o) · 2024-04-30T03:59:03.580Z · LW(p) · GW(p)

https://rentry.org/gpt2

Rumored to be 11-2023

↑ comment by avturchin · 2024-04-30T10:10:32.393Z · LW(p) · GW(p)

It claims to have knowledge cutoff as of Nov 2023, but failed to tell what happened on October 7 and hallucinated.

↑ comment by bruberu · 2024-04-29T21:14:53.657Z · LW(p) · GW(p)

By using @Sergii [LW · GW]'s list reversal benchmark, it seems that this model seems to fail reversing a list of 10 random numbers from 1-10 from random.org about half the time. This is compared to GPT-4's supposed ability to reverse lists of 20 numbers fairly well, and ChatGPT 3.5 seemed to have no trouble itself, although since it isn't a base model, this comparison could potentially be invalid.
This does significantly update me towards believing that this is probably not better than GPT-4.

Replies from: o-o

↑ comment by O O (o-o) · 2024-04-29T22:14:13.695Z · LW(p) · GW(p)

Seems correct to me (and it did work for a handful of 10 int lists I manually came up with). More impressively, it does this correctly as well:

Replies from: bruberu, bruberu

↑ comment by bruberu · 2024-04-29T22:38:10.757Z · LW(p) · GW(p)

OK, what I actually did was not realize that the link provided did not link directly to gpt2-chatbot (instead, the front page just compares two random chatbots from a list). After figuring that out, I reran my tests; it was able to do 20, 40, and 100 numbers perfectly.

I've retracted my previous comments.

Replies from: bruberu

↑ comment by bruberu · 2024-04-29T23:44:12.636Z · LW(p) · GW(p)

As for one more test, it was rather close on reversing 400 numbers:

Given these results, it seems pretty obvious that this is a rather advanced model (although Claude Opus was able to do it perfectly, so it may not be SOTA).

Going back to the original question of where this model came from, I have trouble putting the chance of this necessarily coming from OpenAI above 50%, mainly due to questions about how exactly this was publicized. It seems to be a strange choice to release an unannounced model in Chatbot Arena, especially without any sort of associated update on GitHub for the model (which would be in https://github.com/lm-sys/FastChat/blob/851ef88a4c2a5dd5fa3bcadd9150f4a1f9e84af1/fastchat/model/model_registry.py#L228 ). However, I think I still have some pretty large error margins, given how little information I can really find.

Replies from: gwern

↑ comment by gwern · 2024-04-30T00:10:37.039Z · LW(p) · GW(p)

Nah, it's just a PR stunt. Remember when DeepMind released AlphaGo Master by simply running a 'Magister' Go player online which went undefeated?* Everyone knew it was DeepMind simply because who else could it be? And IIRC, didn't OA also pilot OA5 'anonymously' on DoTA2 ladders? Or how about when Mistral released torrents? (If they had really wanted a blind test, they wouldn't've called it "gpt2", or they could've just rolled it out to a subset of ChatGPT users, who would have no way of knowing the model underneath the interface had been swapped out.)

* One downside of that covert testing: DM AFAIK never released a paper on AG Master, or all the complicated & interesting things they were trying before they hit upon the AlphaZero approach.

↑ comment by bruberu · 2024-04-29T22:21:45.603Z · LW(p) · GW(p)

Interesting; maybe it's an artifact of how we formatted our questions? Or, potentially, the training samples with larger ranges of numbers were higher quality? You could try it like how I did in this failing example:

When I tried this same list with your prompt, both responses were incorrect:

↑ comment by p.b. · 2024-04-30T12:28:28.360Z · LW(p) · GW(p)

I tried some chess but's it's still pretty bad. Not noticeably better GPT4.

comment by avturchin · 2023-02-23T18:29:14.332Z · LW(p) · GW(p)

H5N1 https://www.khmertimeskh.com/501244375/after-death-of-girl-yesterday-12-more-detected-with-h5n1-bird-flu/

Replies from: Vladimir_Nesov

↑ comment by Vladimir_Nesov · 2023-02-23T19:16:21.748Z · LW(p) · GW(p)

The relevant Metaculus question is at 27% on human-to-human transmission in 2023, has this event mentioned in the comments (though I think without the "found 12 more people infected" part), didn't move much.

Replies from: avturchin

↑ comment by avturchin · 2023-02-23T20:56:09.618Z · LW(p) · GW(p)

Exactly the fact that 12 more people are infected make me to post.

Single infections are not surprising. However, there is an analog of LessWrong but for pandemic flu, called Flutrackers, and they found more details: there are many dead birds in the area and all 15 birds in her home has died. https://flutrackers.com/forum/forum/cambodia/cambodia-h5n1-tracking/968975-cambodia-death-of-11-yr-old-female-in-prey-veng-province-h5n1-avian-flu-february-22-2023/page2#post969072 This could mean that all people infected from birds, not from each other.

Also, some think that "12" is the number of contacts, not infected, and therefore symptoms in 4 people maybe not from avian flu. Anyway, the health ministry will provide update tomorrow.

comment by avturchin · 2025-03-31T10:46:46.097Z · LW(p) · GW(p)

Most LLMs' replies can be improved by repeatedly asking "Improve the answer above" and it is similar to the test-time compute idea and diffusion.

In most cases, I can get better answers from LLMs just by asking "Improve the answer above."

In my experience, the improvements are observable for around 5 cycles, but after that the result either stops improving or gets stuck in some error mode and can't jump to a new level of thinking. My typical test subject: "draw a world map as text art." In good improvement sessions with Sonnet, it eventually adds grids and correct positions for continents.

One person on Twitter (I lost the link, maybe @goodside) automated this process and got much better code for a game after 100 cycles of improvements during an entire night using many credits. He asked Claude to write code for automated prompting first. I repeated this experiment with my tasks.

I tried different variants of "improve it," like adding critiques or generating several answers within one reply. I also tried a meta-level approach, where I asked to improve not only the answer but also the prompt for improvements.

I started these experiments before the test-time compute idea went mainstream, and it looks like a type of test-time compute use. The process also resembles diffusion.

The main question here: in which cases does the process quickly get stuck, and in which does it produce unbounded improvements? It seems to get stuck in local minima and in situations where the model's intelligence isn't sufficient to see ways to improve or discern better or worse versions. It also can't jump to another valley: if it started improving in some direction, it will continue to push in that direction, ignoring other possibilities. Only running another chat window manually helps to change valleys.

Iterative improvement of images also works in GPT-4o. But not for Gemini Pro 2.5, and o1 is also bad at improving, progressing very slowly. It seems that test-time improving contradicts test-time reasoning.

Results for "Improve it": https://poe.com/s/aqk8BuIoaRZ7eDqgKAN6

Variants of the main prompt: "Criticize the result above and iteratively improve it" https://poe.com/s/A2yFioj6e6IFHz68hdDx

This prompt - "Create a prompt X for iterative improvement of the answer above. Apply the generated prompt X." - converges quickly to extraordinary results but overshoots, like creating games instead of drawings. It also uses thinking: https://poe.com/s/cLoB7gyGXHNtwj0yQfPf

The trick is that the improving prompt should be content-independent and mechanically copy-pasted after each reply.

Replies from: Viliam

↑ comment by Viliam · 2025-04-01T13:41:01.504Z · LW(p) · GW(p)

I have achieved higher quality answers by using the magical words: "give me multiple options, then compare them and choose the best one".

But next time I will try to iterate the best one -- maybe something like "suggest five improvements to the option above, and choose the best one".

Replies from: avturchin

↑ comment by avturchin · 2025-04-01T14:23:54.102Z · LW(p) · GW(p)

Yes, great variant of the universal answer-improving prompt and it can be applied several times to any content.

comment by avturchin · 2024-02-25T10:14:40.219Z · LW(p) · GW(p)

Several types of existential risks can be called "qualia catastrophes":

- Qualia disappear for everyone = all become p-zombies

- Pain qualia are ubiquitous = s-risks

- Addictive qualia domminate = hedonium, global wireheading

- Qualia thin out = fading qualia, mind automatisation

- Qualia are unstable = dancing qualia, identity is unstable.

- Qualia shift = emergence of non-human qualia (humans disappear).

- Qualia simplification = disappearance of subtle or valuable qualia (valuable things disappear).

- Transcendental and objectless qualia with hypnotic power enslave humans (God as qualia; Zair). -

- Attention depletion (ADHD)

comment by avturchin · 2022-04-10T10:36:01.658Z · LW(p) · GW(p)

We maybe one prompt from AGI. A hypothesis: carefully designed prompt could turn foundational model into full-blown AGI, but we just don't know which prompt.

Example: step-by-step reasoning in prompt increases foundational models' performance.

But real AGI-prompt needs to have memory, so it has to repeat itself while adding some new information. So by running serially, the model may accumulate knowledge inside the prompt.

Most of my thinking looks this way from inside: I have a prompt - an article headline and some other inputs - and generate most plausible continuations.

comment by avturchin · 2020-03-01T18:17:56.278Z · LW(p) · GW(p)

Age and dates of death on the cruise ship Diamond Princess:
Age:
4 people - 80s
1 person 78
1 person 70s
1 person - no data
Dates of deaths: 20, 20, 23, 25, 28, 28, 1 march. One death every 1.3 days. Look like acceleration at the end of the period.
Background death probability: for 80-year-old person, life expectancy is around 8 years or around 100 months. This means that for 1000 people aged late 70s-80s there will be 10 deaths just because of aging and stress. Based on the aging distribution on cruise ships, there were many old people. if half of the infected are old ones, the background death rate will be 5 death a month.
Conclusion: current data from DP doesn’t support high death rate from CV, but the next month will be crucial: we will see if more deaths will happen. Also, 36 people are critical, which is 5 per cent of infected, and they likely die in case of global scale outbreak as they would not get medical help. But this is only 1 per cent of all people on the ship, and the ships’ population is biased to older people.
Raw data:
Two passengers died on 20 February [39]
and a third on 23 February, all three Japanese citizens in their 80s.[40]
A fourth passenger (80s), an elderly Japanese man, died according to a report on 25 February .[41]
28 feb - A woman in her 70s is the fifth passenger on the Diamond Princess cruise ship to die from the novel coronavirus, Japan’s Health Ministry said Friday https://edition.cnn.com/…/h_f3fe8e330bdb048030b2708f63749559
The sixth victim, a British national (no age data), died on 28 February.[42]
The seventh victim (78), an Australian national, died on March 1st. [43] https://en.wikipedia.org/…/2020_coronavirus_outbreak_on_cru…
https://www.worldometers.info/coronavirus/#countries

comment by avturchin · 2019-08-13T17:15:26.757Z · LW(p) · GW(p)

Kardashev – the creator of the Kardashev's scale of civilizations – has died at 87. Here is his last video, which I recorded in May 2019. He spoke about the possibility of SETI via wormholes.

Replies from: Benito

↑ comment by Ben Pace (Benito) · 2019-08-13T20:38:42.003Z · LW(p) · GW(p)

Here's his wikipedia page.

comment by avturchin · 2025-04-01T09:57:24.831Z · LW(p) · GW(p)

If the simulation argument is valid and dreams are simulations of reality, can we apply the simulation argument to dreams? If not, is this an argument against the simulation argument? If yes, why am I not now in a dream?

If I see something, is it more likely to be dream or reality?
Sleeping takes only one-third of my time, and REM takes even less.
But:

Some dreams occur even in other phases of sleep
Dreams are much more eventful than normal life. There is always something happening. Also, the distribution of events in dreams is skewed toward expensive, dangerous, adventurous content, full of social interactions.
There is an eraser of dream memory, which cleans memories of dreams after every 15 minutes and also after awakening and during the day. As a result, we underestimate the number of dreams we have had.

As a result, the number of important events in dreams may be several orders of magnitude more than in real life. I think a good estimate is 100 times, but it depends on the types of events. For recurrent dreams - like big waves and war for me - it can be much higher.

So why am I not in a dream now? Because writing coherent dream-conscious (lucid) text is not the dominant type of content in dreams. But if I were chased by a monster or big waves, I should give higher a priori chances that I am actually dreaming.

Conclusion: The simulation argument works for dreams, but selectively, as dream content is different from most normal life content.

comment by avturchin · 2023-11-29T17:38:54.735Z · LW(p) · GW(p)

EURISKO resurfaced

"Doug Lenat's source code for AM and EURISKO (+Traveller?) found in public archives

In the 1970s to early 80s, these two AI programs by Douglas Lenat pulled off quite the feat of autonomously making interesting discoveries in conceptual spaces. AM rediscovered mathematical concepts like prime numbers from only first principles of set theory. EURISKO expanded AM's generality beyond fixed mathematical heuristics, made leaps in the new field of VLSI design, and famously was used to create wild strategies for the Traveller space combat RPG, winning national competitions two years in a row, even across rule changes to stymie it, before semi-voluntarily retiring. His magnum opus Cyc was originally intended to be a knowledge assistant to EURISKO's discovery engine.

These first two programs have intrigued the symbolic AI scene for 40+ years, with their grand claims but few eyewitnesses. While AM was technically available to fellow Stanfordians at the time, Lenat kept the source code to EURISKO close to his chest. Papers written about them carefully avoided technical implementation details. Lenat said he didn't retain any copy of the programs, when asked in recent decades, nor have any copies of AM carried publicly into the present."

https://white-flame.com/am-eurisko.html?fbclid=IwAR04saSf4W7P6ZyKI6h8orPhMpzAq83vn_zGwYwY-H8hNMnHgsaECHw8cl0_aem_AY3LlR6ieYqjLXHzLu4eVPYWtYFoD8khhLnpsUIHQZVzBq055sE3KUbg172Hl9Mm4NQ

comment by avturchin · 2022-09-02T18:29:09.776Z · LW(p) · GW(p)

Argentina - Outbreak of bilateral pneumonia: Approximately 10 cases, 3 deaths, 20 under observation, Tucumán - September 1, 2022 https://flutrackers.com/forum/forum/south-america/pneumonia-and-influenza-like-illnesses-ili-af/argentina-ab/957860-argentina-outbreak-of-bilateral-pneumonia-approximately-10-cases-3-deaths-20-under-observation-tucum%C3%A1n-september-1-2022

comment by avturchin · 2022-06-12T12:44:35.030Z · LW(p) · GW(p)

Passways to AI infrastructure
Obviously, the current infrastructure is not automated enough to run without humans. All ideas about AI risk eventually boil down to a few suggestions on how AI will create its own infrastructure:

No-humans scenarios:
- create nanobots via mailing DNA samples to some humans.
- use some biological tricks, like remote control animals, and programmed bacteria.
- build large manufacturing robots, maybe even humanoid ones to work in human-adapted workplaces. Build robots which build robots.

Humans-remain scenarios:
- enslave some humans, kill others
- manipulate humans' brains so that they never know that they are the infrastructure (memes, brain implants)
- manipulate humans so that they will continue to build infrastructure and get paid for it without seeing risks (bitcoin did it in some sense) or will do it in the hope to use the infrastructure against the enemy group of humans

Replies from: lc

↑ comment by lc · 2022-06-12T13:38:51.345Z · LW(p) · GW(p)

Your non-humans scenarios are not mutually exclusive; if mailing DNA samples doesn't work in practice for whatever reason, the manufacturing facilities that would be used to make large manufacturing robots would suffice. You probably shouldn't conflate both scenarios.

comment by avturchin · 2022-01-07T11:55:06.899Z · LW(p) · GW(p)

Observable consequences of simulation:

1. Larger chances of miracles or hacks

2. Large chances of simulation’s turn off or of a global catastrophe

3. I am more likely to play a special role or to live in interesting times

4. A possibility of afterlife.

Replies from: Gunnar_Zarncke, MackGopherSena

↑ comment by Gunnar_Zarncke · 2022-01-07T22:14:02.402Z · LW(p) · GW(p)

Scott Adams mentioned a few times that a simulation might use caching and reuse patterns for efficiency reasons and you could observe an unusually high frequency of the same story. I don't buy that but it is at least a variant of type 1.

Replies from: avturchin

↑ comment by avturchin · 2022-01-08T10:17:54.520Z · LW(p) · GW(p)

Yes, people often mentioned Baader–Meinhof phenomenon as a evidence that we live in "matrix". But it could be explained naturally.

↑ comment by MackGopherSena · 2022-01-10T23:46:26.544Z · LW(p) · GW(p)

[edited]

Replies from: avturchin

↑ comment by avturchin · 2022-01-11T12:01:52.601Z · LW(p) · GW(p)

Anthropics imply that I should be special, as I should be "qualified observer", capable to think about anthropics. Simulations also requires that I should be special, as I should find myself living in interesting times. These specialities are similar, but not exactly. Simulation's speciality is requiring that I will be a "king" in some sense, and anthropic speciality will be satisfied that I just understand anthropics.

I am not a very special person (as of now), therefore anthropics specialty seems to be more likely than simulation speciality.

Replies from: MackGopherSena

↑ comment by MackGopherSena · 2022-01-12T06:46:12.370Z · LW(p) · GW(p)

[edited]

Replies from: avturchin

↑ comment by avturchin · 2022-01-12T11:26:14.514Z · LW(p) · GW(p)

Who "we" ? :)

Saying a "king" I just illustrated the difference between interesting character who are more likely to be simulated in a game or in a research simulation, and "qualified observer" selected by anthropics. But these two sets clearly intersects, especially of we live in a game about "saving the world".

comment by avturchin · 2021-06-23T15:03:41.256Z · LW(p) · GW(p)

Catching Treacherous Turn: A Model of the Multilevel AI Boxing

Multilevel defense in AI boxing could have a significant probability of success if AI is used a limited number of times and with limited level of intelligence.
AI boxing could consist of 4 main levels of defense, the same way as a nuclear plant: passive safety by design, active monitoring of the chain reaction, escape barriers and remote mitigation measures.
The main instruments of the AI boxing are catching the moment of the “treacherous turn”, limiting AI’s capabilities, and preventing of the AI’s self-improvement.
The treacherous turn could be visible for a brief period of time as a plain non-encrypted “thought”.
Not all the ways of self-improvement are available for the boxed AI if it is not yet superintelligent and wants to hide the self-improvement from the outside observers.

https://philpapers.org/rec/TURCTT

comment by avturchin · 2020-09-11T12:54:49.100Z · LW(p) · GW(p)

Two types of Occam' razor:

1) The simplest explanation is the most probable, so the distribution of probabilities for hypotheses looks like: 0.75, 0.12, 0.04 .... if hypothesis are ordered from simplest to more complex.

2) The simplest explanation is the just more probable, so the distribution of probabilities for hypotheses looks like: 0.09, 0.07, 0.06, 0.05.

The interesting feature of the second type is that simplest explanation is more likely to be wrong than right (its probability is less than 0.5).

Different types of Occam razor are applicable in different situations. If the simplest hypothesis is significantly simpler than others, it is the first case. If all hypothesis are complex, it is the second. First situation is more applicable some inherently simple models, e.g. laws of physics or games. The second situation is more about complex situation real life.

Replies from: mr-hire

↑ comment by Matt Goldenberg (mr-hire) · 2020-09-11T15:03:13.583Z · LW(p) · GW(p)

I'm struggling to think of a situation where on priors (with no other information), I expect the simplest explanation to be more likely than all other situations combined (including the simplest explanation with a tiny nuance).

Can you give an example of #1?

Replies from: avturchin, TAG

↑ comment by avturchin · 2020-09-11T17:18:36.602Z · LW(p) · GW(p)

EY suggested (if I remember correctly) that MWI interpretation of quantum mechanics is true as it is simplest explanation. There are around hundred other more complex interpretations of QM. Thus, in his interpretation, P(MWI) is more than a sum of probabilities of all other interpretations.

Replies from: TAG

↑ comment by TAG · 2020-09-12T19:54:17.623Z · LW(p) · GW(p)

MWI is more than one theory, because everything is more than one thing.

There is an approach based on coherent superpositions, and a version based on decoherence. These are incompatible opposites.

How simple a version of MWI is, depends on how it deals with all the issues, including the basis problem.

↑ comment by TAG · 2020-09-11T16:02:14.712Z · LW(p) · GW(p)

What does "all the other explanation s combined" mean as ontology? If they make statements about reality that are mutually incompatible, then they cant all be true.

Replies from: avturchin

↑ comment by avturchin · 2020-09-11T17:14:29.219Z · LW(p) · GW(p)

It means that p(one of them is true) is more than p(simplest explanation is true)

Replies from: TAG

↑ comment by TAG · 2020-09-12T11:54:08.862Z · LW(p) · GW(p)

That doesn't answer my question as stated ... I asked about ontology, you answered about probability.

If a list of theories is exhaustive, which is s big "if", then one of them is true. And in the continuing absence of a really good explanation of Occams Razor, it doesn't have to be the simplest.

But that doesn't address the issue of summing theories, as opposed to summing probabilities.

Replies from: mr-hire

↑ comment by Matt Goldenberg (mr-hire) · 2020-09-17T16:47:27.709Z · LW(p) · GW(p)

But "all the other explanations combined" was talking about the probabilities. We're not combining the explanations, that wouldn't make any sense.

The only ontology that is required is Bayesianism, where explanations can have probabilities of being correct.

Replies from: TAG

↑ comment by TAG · 2020-09-17T18:06:44.156Z · LW(p) · GW(p)

The only ontology that is required is Bayesianism,

Bayesianism isn't an ontology.

Replies from: mr-hire

↑ comment by Matt Goldenberg (mr-hire) · 2020-09-17T18:10:08.038Z · LW(p) · GW(p)

Ok, tabooing the word ontology here. All that's needed is an understanding of Bayesianism to answer the question of how you combine the chance of all other explanations.

comment by avturchin · 2020-07-28T10:36:47.092Z · LW(p) · GW(p)

Some random ideas how to make GPT-base AI safer.

1) Scaffolding: use rule-based AI to check every solution provided by GPT part. It could work for computations or self-driving or robotics, but not against elaborated adversarial plots.

2) Many instances. Run GPT several times and choose random or best answer - we already doing this. Run several instances of GPT with different parameters or different training base and compare answers. Run different prompt. Median output seems to be a Shelling point around truth, and outstanding answers are more likely to be wrong or malicious.

3) Use intrinsic GPT properties to prevent malicious behaviour. For example, higher temperature increases randomness of the output and mess up with any internal mesa optimisers. Shorter prompts and lack of long memory also prevents complex plotting.

4) Train and test on ethical database.

5) Use prompts which include notion of safety, like "A benevolent AI will say..." or counterfactuals which prevents complex planing in real world (An AI on the Moon)

6) Black boxing of internal parts of the system like the NN code.

7) Run it million times in test environments or tasks.

8) Use another GPT AI to make "safety TL;DR" of any output or prediction of possible bad things which could happen from a given output.

Disclaimer: Safer AI is not provably safe. It is just orders of magnitude safer than unsafe one, but it will eventually fail.

comment by avturchin · 2023-04-18T13:40:16.728Z · LW(p) · GW(p)

Reflectivity in alignment.

Human values and AI alignment do not exist independently. There are several situations when they affect each other, creating complex reflection pattern.

Examples:

Humans want to align AI – so "AI alignment" is itself human value.
Human values are convergent goals (like survival and reproduction) - and thus are similar to AI's convergent goals.
If humans accept the idea to make paperclips (or whatever), alignment will be reached.
It looks like many humans want to create non-aligned AI. Thus non-aligned AI is aligned.
Humans may not want that their values will be learned. AI alignment will be mis-aligned.
Humans which are connected with AI are not humans any more, and not subjected to alignment.
Non-aligned AI will affect human values while learning them.
Many humans don't want AI to exist at all - so any aligned AI is misaligned.
Human may want that AI will not be aligned with other person.
AI aligned with mis-aligned human is unaligned
As human values are changing, any aligned AI will be non-aligned soon.
By saying 'human values' we exclude mammals values, group values etc and thus define the outcome.

comment by avturchin · 2023-03-27T11:37:25.055Z · LW(p) · GW(p)

Can we utilize meaningful embedding dimensions as an alignment tool?

In toy models, embedding dimensions are meaningful and can represent features such as height, home, or feline. However, in large-scale real-world models, many (like 4096) dimensions are generated automatically, and their meanings remain unknown, hindering interpretability.

I propose the creation of a standardized set of embedding dimensions that: a) correspond to a known list of features, and b) incorporate critical dimensions such as deception, risk, alignment, and non-desirable content, including sexual themes.

Since large language models (LLMs) cannot engage in deception without recognizing it, any deceptive thoughts would register higher levels on the deception dimension. This could then trigger internal alarms, potentially enhancing the model's alignment.

Replies from: avturchin

↑ comment by avturchin · 2024-05-22T11:16:16.467Z · LW(p) · GW(p)

Anthropic did opposite thing https://transformer-circuits.pub/2024/scaling-monosemanticity/index.html

comment by avturchin · 2019-11-23T10:46:58.112Z · LW(p) · GW(p)

I converted by Immortality roadmap [LW · GW] into an article Multilevel Strategy for Personal Immortality: Plan A – Fighting Aging, Plan B – Cryonics, Plan C – Digital Immortality, Plan D – Big World Immortality.

comment by avturchin · 2024-06-26T20:15:47.236Z · LW(p) · GW(p)

I have interesting experience long time ago. In the near-sleep state my consciousness split in two streams - one was some hypnogogic images, and the other was some hypnogogic music.

They was not related to each other and each had, some how, its own observer.

A moment later something awakened me a bit and the streams seamlessly merged and I was able to observe that a moment before I had two independent streams of consciousness.

Conclusions:

1. A human can have more than one consciousness at the time.

2. It actually happens all the time but we don't care.

3. Merging of consiosnesses is easy. Moreover, binding and merging is actually the same process similar to summation.

There is no center of consciousness - homunculus or electron or whatever.

I may have other conscious processes in the brain which just do not merge with current stream of consciousness.

Qualia remain the same and preserve in each of the streams of consciousness.

Replies from: carl-feynman

↑ comment by Carl Feynman (carl-feynman) · 2024-06-26T21:55:03.584Z · LW(p) · GW(p)

Why should we accept as evidence something that you perceived while you were dreaming? Last night I dreamed that I was walking barefoot through the snow, but it wasn’t cold because it was summer snow. I assume you don’t take that as evidence that warm snow is an actual summer phenomenon, so why should we take as evidence your memory of having two consciousnesses?

It seems to me that a correctly organized consciousness would occur once per body. Consciousness is (at least in part) a system for controlling our actions in the medium and long term. If we had two consciousnesses, and they disagree as to what to do next, it would result in paralysis. And if they agree, then one of them is superfluous, and we’d expend less brain energy if we only had one.

Replies from: avturchin, JBlack

↑ comment by avturchin · 2024-06-27T09:59:55.045Z · LW(p) · GW(p)

I was not dreaming. I was observing my hypnagogic images, which is not the same as dreaming; and when streams merged I become completely awake.

However, after I know what is it, I can observe similar thing again. The receipt is following:
1. do two different unrelated things which require conscious attention but happen in different modalities, audio and video
2. increase the wideness of attention and observe that you just had two streams of more narrow attention.

The closest thing in everyday life is "driver amnesia" - the situation when a car driver is splitting attention between driving and conversation.

↑ comment by JBlack · 2024-06-27T02:42:35.685Z · LW(p) · GW(p)

Conscious experience is direct evidence of itself. It is only very indirectly evidence of anything about external reality.

However, I do agree that memory of conscious experience isn't quite so directly evidence of previous states of consciousness.

Personally of the numbered claims in the post I expect that (1) is true, (2) is false and this experience was not evidence of it, and I really don't know what (3) and subsequent sentences are supposed to mean.

comment by avturchin · 2023-01-15T20:48:30.532Z · LW(p) · GW(p)

I have had tetrachromotomic experience with one mind machine which flickers different colors in different eyes. It overflows some stacks in the brain in create new colors.

comment by avturchin · 2023-01-08T11:01:21.413Z · LW(p) · GW(p)

List of cognitive biases affecting judgment of global risks https://www.researchgate.net/publication/366862337_List_of_cognitive_biases_affecting_judgment_of_global_risks/related

comment by avturchin · 2022-12-18T08:59:20.211Z · LW(p) · GW(p)

Grabby aliens without red dwarfs

Grabby aliens theory of Robin Hanson predicts that the nearest grabby aliens are 1 billion light years away but strongly depends on the habitability of red dwarfs (https://grabbyaliens.com/paper).

In the post [LW · GW], the author combines anthropic and Fermi, that is, the idea that we live in the universe with the highest concentration of aliens, limited by their invisibility, and get an estimation of around 100 "potentially visible" civilizations per observable universe, which at first approximation gives 1 billion ly distance between them.

“That civilisations appear once every 10power20 stars implies that there should be about 100 civilisations in the observable universe."

This is approximately the same as what is predicted by Grabby Aliens by Hanson.

So the question is why the two theories give similar results.

Basically, it is because they both have the same structure: first, anthropic argument, and second update based on Fermi.

But anthropic arguments in them are different. In Hanson’s Grabby Aliens case, it is our early location in time and in the second case, it is Self-Indication Assumption, which implies that we live in the universe with the highest spatial concentration of aliens (with some caveats).

The second part of the argument in both cases is basically based on the idea of grabbiness: quick space exploration with near-light speed and preventing of earth-like civilizations' existence by grabby aliens.

comment by avturchin · 2022-12-10T19:36:49.493Z · LW(p) · GW(p)

N-back hack. (Infohazard!)
There is a way to increase one's performance in N-back, but it is almost cheating and N- back will stop to be a measure of one's short-term memory.
The idea is to imagine writing all the numbers on a chalkboard in a row, as they are coming.
Like 3, 7, 19, 23.
After that, you just read the needed number from the string, which is located N positions back.
You don't need to have a very strong visual memory or imagination to get a boost in your N-back results.
I tried it a couple of times and get bored with N-back.

Replies from: Dagon

↑ comment by Dagon · 2022-12-11T17:06:51.500Z · LW(p) · GW(p)

Wow. It's rare that I'm surprised by the variance in internal mental imagery among people, but this one caught me. I'd assumed that most people who have this style of imagination/memory were ALREADY doing this. I don't know how to remember things without a (mental) visualization.

Replies from: avturchin

↑ comment by avturchin · 2022-12-11T19:27:35.614Z · LW(p) · GW(p)

Actually, my mental imagination is of low quality, but visual remembering is better than audio for me in n-back

comment by avturchin · 2022-06-22T13:56:46.329Z · LW(p) · GW(p)

AI safety as Grey Goo in disguise.
First, a rather obvious observation: while the Terminator movie pretends to display AI risk, it actually plays with fears of nuclear war – remember that explosion which destroys children's playground?

EY came to the realisation of AI risk after a period than he had worried more about grey goo (circa 1999) – unstoppable replication of nanorobots which will eat all biological matter, – as was revealed in a recent post about possible failures of EY's predictions. While his focus moved from grey goo to AI, the description of the catastrophe has not changed: nanorobots will eat biological matter, however, now not just for replication but for production of paperclips. This grey goo legacy is still a part of EY narrative about AI risk as we see from his recent post about AI lethalities.

However, if we remove the fear of grey goo, we could see that AI which experiences hard takeoff is less dangerous than a slower AI. If AI gets superintelligence and super capabilities from the start, the value of human atoms becomes minuscule, and AI may preserve humans as a bargain against other possible or future AIs. If AI ascending is slow, it has to compete with humans for a period of time and this could take a form of war. Humans have killed Neanderthals, but not ants.

Replies from: Dagon

↑ comment by Dagon · 2022-06-22T17:39:08.864Z · LW(p) · GW(p)

It's worth exploring exactly which resources are under competition. Humans have killed orders of magnitude more ants than Neanderthals, but the overlap in resources is much less complete for ants, so they've survived.

Grey-goo-like scenarios are scary because resource contention is 100% - there is nothing humans want/need that the goo doesn't want/need, in ways that are exclusive to human existence. We just don't know how much resource-use overlap there will be between AI and humans (or some subset of humans), and fast-takeoff is a little more worrisome because there's far less opportunity to find areas of compromise (where the AI values human cooperation enough to leave some resources to us).

comment by avturchin · 2020-12-26T17:18:34.180Z · LW(p) · GW(p)

Glitch in the Matrix: Urban Legend or Evidence of the Simulation? The article is here: https://philpapers.org/rec/TURGIT
In the last decade, an urban legend about “glitches in the matrix” has become popular. As it is typical for urban legends, there is no evidence for most such stories, and the phenomenon could be explained as resulting from hoaxes, creepypasta, coincidence, and different forms of cognitive bias. In addition, the folk understanding of probability does not bear much resemblance to actual probability distributions, resulting in the illusion of improbable events, like the “birthday paradox”. Moreover, many such stories, even if they were true, could not be considered evidence of glitches in a linear-time computer simulation, as the reported “glitches” often assume non-linearity of time and space—like premonitions or changes to the past. Different types of simulations assume different types of glitches; for example, dreams are often very glitchy. Here, we explore the theoretical conditions necessary for such glitches to occur and then create a typology of so-called “GITM” reports. One interesting hypothetical subtype is “viruses in the matrix”, that is, self-replicating units which consume computational resources in a manner similar to transposons in the genome, biological and computer viruses, and memes.

comment by avturchin · 2020-12-16T12:39:54.203Z · LW(p) · GW(p)

"Back to the Future: Curing Past Suffering and S-Risks via Indexical Uncertainty"

I uploaded the draft of my article about curing past sufferings.

Abstract:

The long unbearable sufferings in the past and agonies experienced in some future timelines in which a malevolent AI could torture people for some idiosyncratic reasons (s-risks) is a significant moral problem. Such events either already happened or will happen in causally disconnected regions of the multiverse and thus it seems unlikely that we can do anything about it. However, at least one pure theoretic way to cure past sufferings exists. If we assume that there is no stable substrate of personal identity and thus a copy equals original, then by creating many copies of the next observer-moment of a person in pain in which he stops suffer, we could create indexical uncertainty in her future location and thus effectively steal her consciousness from her initial location and immediately relieve her sufferings. However, to accomplish this for people who have already died, we need to perform this operation for all possible people thus requiring enormous amounts of computations. Such computation could be performed by the future benevolent AI of Galactic scale. Many such AIs could cooperate acausally by distributing parts of the work between them via quantum randomness. To ensure their success, they need to outnumber all possible evil AIs by orders of magnitude, and thus they need to convert most of the available matter into computronium in all universes where they exist and cooperate acausally across the whole multiverse. Another option for curing past suffering is the use of wormhole time-travel to send a nanobot in the past which will, after a period of secret replication, collect the data about people and secretly upload them when their suffering becomes unbearable. https://philpapers.org/rec/TURBTT

Replies from: superads91

↑ comment by superads91 · 2022-02-09T03:23:36.457Z · LW(p) · GW(p)

I don't see how this can be possible. One of the few things that I'm certain are impossible is eliminating past experiences. I've just finished eating strawberries, I don't see any possible way to eliminate the experience that I just had. You can delete my memory of it, or you can travel to the past and steal the strawberries from me, but then you'd just create an alternate timeline (if time travel to the past is possible, which I doubt). In none of both cases would you have eliminated my experience, at most you can make me forget it.

The proof that this is impossible is that people have suffered horrible many times before, and have survived to confirm that no one saved them.

Replies from: avturchin

↑ comment by avturchin · 2022-02-09T11:09:58.438Z · LW(p) · GW(p)

We can dilute past experience and break chains of experience, so each painful moment becomes just a small speck in paradise.

The argument about people who survived and remember past sufferings is not working here as it is only one of infinitely many chains of experiences (in this model) which for any person has very small subjective probability.

In the same sense, everyone who became billionaire, has memories that he was always good in business. But if we take a random person from the past, his most probable future is to be poor, not a billionaire.

In the model discussed in the article I suggest the way how to change expected future for any past person – by creating many simulations where her life is improving starting form each painful moment of her real life.

Replies from: superads91, superads91, MackGopherSena

↑ comment by superads91 · 2022-02-09T15:09:26.691Z · LW(p) · GW(p)

Or are you telling me that person x remembers a very bad chain of experience, but might have indeed been saved by the Friendly AI, and the memory is now false? That's interesting, but still impossible imo.

Replies from: avturchin

↑ comment by avturchin · 2022-02-09T17:03:10.990Z · LW(p) · GW(p)

This is not what I meant.

Imagine a situation when a person waits a execution in a remote fortress. If we use self sampling assumption, SSA, we could save him, if we create 1000 his exact copies in safe location. SSA tells us that one should reason if he is randomly selected from all of his copies. 1000 copies are in safe location and 1 is in fortress. So the person has 1000 to 1 chance to be out of the fortress, according to SSA. It means that he was saved from the fortress. This situation is called indexical uncertainty.

Now we apply this method of saving to the past observer-moments when people were suffering.

Replies from: superads91

↑ comment by superads91 · 2022-02-09T17:11:13.403Z · LW(p) · GW(p)

I see. Like I explain in the other comment that I just wrote, I don't believe SSA works. You would just create 1000 new minds who would feel themselves saved and would kiss your feet (1000 clones), but the original person would still be executed with 100% chance.

Replies from: avturchin

↑ comment by avturchin · 2022-02-09T19:36:18.171Z · LW(p) · GW(p)

It comes with cost: you have to assume that SSA and informational identity theory are wrong, and therefore some other weird things could turn true.

Replies from: superads91

↑ comment by superads91 · 2022-02-09T20:59:45.541Z · LW(p) · GW(p)

Indexical uncertainty implies that consciousness can travel through space and time in between equal substrates (if such thing even exists considering chaos theory). I think that's a lot weirder than to simply assume that consciousness is rooted in the brain, in a single brain, and that at best a clone will feel exactly the same way you do, will even think he is you, but there's no way you will be seeing through his eyes.

So yes, memory may not be everything. An amnesiac can still maintain a continuous personal identity, as long as he's not an extreme case.

But I quite like your papers btw! Lots of interesting stuff.

Replies from: avturchin

↑ comment by avturchin · 2022-02-10T11:17:46.447Z · LW(p) · GW(p)

Thanks!

Consciousness does not need to travel as it already there. Imagine two bottles with water. If one bootle is destroyed, the water remains in the other, it doesn't need to travel.

Someone suggested to call this "unification theory of identity".

↑ comment by superads91 · 2022-02-09T15:04:36.426Z · LW(p) · GW(p)

"The argument about people who survived and remember past sufferings is not working here as it is only one of infinitely many chains of experiences (in this model) which for any person has very small subjective probability."

Then I think you would only be creating an enormous number of new minds. Among all those minds, indeed, very few would have gone through a very bad chain of experience. But that doesn't mean that SOME would. In fact, you haven't reduced that number (the number of minds who have gone through a very bad chain of experience). You only reduced their percentage among all existing minds, by creating a huge number of new minds without a very bad chain of experience. But that doesn't in any way negate the existence of the minds who have gone through a very bad chain of experience.

I mean, you can't outdo chains of past experience, that's just impossible. You can't outdo the past. You can go back in time and create new timelines, but that is just creating new minds. Nothing will ever outdo the fact that person x experienced chain of experience y.

Replies from: avturchin

↑ comment by avturchin · 2022-02-09T16:31:30.129Z · LW(p) · GW(p)

It depends on the nature of our assumption about the role of continuity in human identity. If we assume that continuity is based only on remembering the past moment, then we can start new chains from any moment we chose.

Alternative view is that continuity of identity is based on causal connection or qualia connection. This view comes with ontological costs, close to the idea of the existence of immaterial soul. Such soul could be "saved" from the past using some technological tricks, and we again have some instruments to cure past sufferings.

Replies from: superads91

↑ comment by superads91 · 2022-02-09T17:07:39.094Z · LW(p) · GW(p)

If I instantly cloned you right now, your clone would experience the continuity of your identity, but so would you. You can double the continuity (create new minds, which become independent from each other after doubling), but not translocate it.

If I clone myself and then kill myself, I would have created a new person with a copy of my identity, but the original copy, the original consciousness, still ceases to exist. Likewise, if you create 1000 paradises for each second of agony, you will create 1000 new minds which will feel themselves "saved", but you won't save the original copy. The original copy is still in hell.

Our best option is to do everything possible not to bring uncontrollable new technologies into existence until they are provably safe, and meanwhile we can eliminate all future suffering by eliminating all conscious beings' ability to suffer, á la David Pearce (abolitionist project).

↑ comment by MackGopherSena · 2022-02-09T14:49:11.141Z · LW(p) · GW(p)

[edited]

Replies from: avturchin

↑ comment by avturchin · 2022-02-09T16:25:39.119Z · LW(p) · GW(p)

Extremely large number, if we do not use some simplification methods. I discuss these methods in the article, and after them, the task become computable.

Without such tricks, it will be like 100 life histories for every second of sufferings. But as we care only about preventing very strong sufferings, then for normal people living normal life there are not that many such seconds.

For example, if a person is dying in fire, it is like 10 minutes of agony, that is 600 seconds and 60 000 life histories which need to be simulated. It is doable task for a future superinteligent AI.

Replies from: MackGopherSena

↑ comment by MackGopherSena · 2022-02-09T16:50:47.180Z · LW(p) · GW(p)

[edited]

Replies from: avturchin

↑ comment by avturchin · 2022-02-09T17:14:53.444Z · LW(p) · GW(p)

why? if there is 60 000 futures where I escaped a bad outcome, I can bet on it as 1 to 60 000.

Replies from: MackGopherSena

↑ comment by MackGopherSena · 2022-02-09T17:26:54.681Z · LW(p) · GW(p)

[edited]

Replies from: avturchin

↑ comment by avturchin · 2022-02-09T19:29:34.807Z · LW(p) · GW(p)

I don't get how you come to 10power51. if we want to save from the past 10 billion people and for each we need to run 10power5 simulations, it is only 10power15, which one Внящт sphere will do.

However, there is way to acausaly distribute computations between many superintelligence in different universes and it that case we can simulate all possible observers.

Replies from: MackGopherSena

↑ comment by MackGopherSena · 2022-02-09T20:31:19.248Z · LW(p) · GW(p)

[edited]

Replies from: superads91

↑ comment by superads91 · 2022-02-09T21:33:12.929Z · LW(p) · GW(p)

"The fact that you're living a bearable life right now suggests that this is already the state."

Interesting remark... Could you elaborate?

Replies from: MackGopherSena

↑ comment by MackGopherSena · 2022-02-10T13:20:23.036Z · LW(p) · GW(p)

[edited]

Replies from: superads91

↑ comment by superads91 · 2022-02-10T14:30:48.255Z · LW(p) · GW(p)

Still don't know what you meant by that other sentence. What's being "the state", and what does a bearable life have do to with it?

And what's the "e" in (100/e)%?

comment by avturchin · 2020-09-25T10:25:23.096Z · LW(p) · GW(p)

Quantum immortality of the second type. Classical theory of QI is based on the idea that all possible futures of a given observer do exist because of MWI and thus there will be always a future where he will not die in the next moment, even in the most dangerous situations (e.g. Russian roulette).

QI of the second type makes similar claims but about past. In MWI the same observer could appear via different past histories.

The main claim of QI-2: for any given observer there is a past history where current dangerous situation is not really dangerous. For example, a person has a deadly car accident. But there is another similar observer who is night dreaming about the same accident, or who is having much less severe accident but hallucinate that it is really bad. Interestingly, QI-2 could be reported: a person could say: "I have memory of really bad accident, but it turn out to be nothing. Maybe I died in the parallel world". There are a lot of such report on reddit.

Replies from: superads91

↑ comment by superads91 · 2022-02-19T02:24:04.320Z · LW(p) · GW(p)

Hello again Alexey, I have been thinking about QI/BWI and just read your paper on it. Immediately, it occurred to me that it could be disproven through general anesthesia, or temporary death (the heart stops and you become unconscious, which can last for hours). You refute this with:

"Some suggested counterargument to QI of “impossibility of sleep”: QI-style logic implies that it is impossible to fail asleep, as in the moment of becoming asleep there will be timelines where I am still awake. However, for most humans, night dreaming starts immediately at the moment of becoming asleep, so the observations continue, but just don’t form memories. But in case of deep narcosis, the argument may be still valid with terrifying perspective of anesthesia awareness; but it also possible if the observer-states will coincide at the beginning the end of the operation, the observer will “jump” over it."

(Mind you that some stages of sleep are dreamless, but let's forget about sleep, let's use general anesthesia instead since it's more clear.)

I still don't understand your refute completely. If QI/BWI were true, shouldn't it be that general anesthesia would be impossible, since the observer would always branch into conscious states right after being given the anesthesia?

Or do you mean to say that most observers will "prefer" to branch into the branch with the "highest measure of consciousness", and that's why anesthesia will "work" for most observers, that is, most observers will branch into the end of the operation, where consciousness is stronger, instead of branching into the second right after anesthesia where consciousness is weaker?

Another objection I have against QI/BWI is that it breaks the laws of physics and biology. Even if MWI is true, the body can only sustain a limited amount of damage before dying. It's biologically impossible to go on decaying and decaying for eternity. Eventually, you die. A bit like in Zeno's Paradox: there's always a halfway point between one point and another, therefore it could be argued that you can never reach your final destination, but we know in practice that we do reach it. (This makes me think that, in the end, it all adds up to normalcy, just like in Zeno's Paradox.)

(Suppose I stop eating and drinking. It's physically impossible to survive without it, so a world where I survive must have different laws of physics?? And the same for all other diseases and injuries, after some point it just becomes physically impossible to remain alive.)

Replies from: avturchin

↑ comment by avturchin · 2022-02-19T14:27:13.678Z · LW(p) · GW(p)

Actually, I see now that I didn't completely refuted the "impossibility of sleep", as it is unobservable for the past events or in the experience of other people. It only can happen with me in the future.

Therefore, the fact that I have slept normally in the past didn't tell much about the validity of QI. But my evening today may be different.

QI said that my next observer-moment will be most likely the one with highest measure of those which remember my current OM. (But it is less clear, does it need to be connected via continuity of consciousness, or memory continuity is enough).

OM(T+1) = maxmeasure(O(memory about O(t))

During narcosis, a few last OM moments typically are erased from memory, so situation becomes complicated. But we have dead-end observer-moments rather often in normal life. Anastasia awareness is a possible outcome here, but not that bad, as it will be partial, so no real pain and no memories about will be form.

Personally, I have some rudimentary consciousness all night, like bleak dreams, and forget almost all of them except a few last minutes.

Speaking about survival in rare cases, there is always a chance that you are in a simulation and it is increasing as real "you" are dying out. Some simulations may simulate all types of miracles. In other words, if you are falling from a kilometer cliff, an alien spaceship can peak you up.

Replies from: superads91

↑ comment by superads91 · 2022-02-19T19:51:32.629Z · LW(p) · GW(p)

"Actually, I see now that I didn't completely refuted the "impossibility of sleep", as it is unobservable for the past events or in the experience of other people. It only can happen with me in the future.

Therefore, the fact that I have slept normally in the past didn't tell much about the validity of QI. But my evening today may be different."

Agree.

On anesthesia, so, from what I understand, it becomes possible for the observer to "jump over", because the moment right after he awakes from anesthesia has probably much more measure of consciousness than any moment right after the anesthesia takes effect, is that it?

Why would anesthesia awareness be partial/painless? (There are actually reported cases of real anesthesia awareness where people are totally consciousness and feel everything, though of course they are always correlated to innefective anesthesia and not to quantum matters). Would that also make us believe that maybe quantum immortality after the first death is probably painless since the measure of the observer is too low to feel pain (and perhaps even most other sensations)?

"Speaking about survival in rare cases, there is always a chance that you are in a simulation and it is increasing as real "you" are dying out."

What is increasing? Sorry didn't quite understand the wording.

Replies from: avturchin

↑ comment by avturchin · 2022-02-20T15:21:05.227Z · LW(p) · GW(p)

It is known that some painkillers don't kill the pain but kill only the negative valence of pain. This I meant by "partial".
Anaesthesia awareness seems to be an extreme case when the whole duration of awareness is remembered. Probably weaker forms are possible but are not reported as there is no memories or pain.
The difference between death and the impossibility of sleep is that the biggest number of my future copies remain in the same world. Because of that, the past instances of quantum suicide could be remembered, but past instances of the impossibility of sleep - not.

If we look deeper, there are two personal identities and two immortalities: the immortality of the chains on observer-moments and immortality of my long-term memory. Quantum immortality works for both. In the impossibility of sleep, these two types of immortality diverge.
But eternal insomnia seems not possible, as dreaming exists. The worst outcome is anaesthesia awareness. If a person has past cases of strong anaesthesia awareness - could it be evidence of the impossibility of sleep for him? Interesting question.
---
I meant: "Speaking about survival in rare cases, there is always a chance that you are in a simulation which simulates your immortality. These chances are increasing after each round of a quantum suicide experiment as real timelines die out, but the number of such simulations remains the same".

Replies from: superads91

↑ comment by superads91 · 2022-02-20T19:35:10.336Z · LW(p) · GW(p)

"Speaking about survival in rare cases, there is always a chance that you are in a simulation which simulates your immortality. These chances are increasing after each round of a quantum suicide experiment as real timelines die out, but the number of such simulations remains the same".

Doesn't make much sense. Either we are or we are not in a simulation. If we are not, then all subsequent branches that will follow from this moment also won't be simulations, since they obey causality.

So, imo, if we are not in a simulation, QI/BWI are impossible because they break the laws of physics.

And then there are also other objections - the limitations of consciousness and of the brain. I once saw a documentary (I'm tired of looking for it but I can't find it) where they simulated that after living for 500 years, a person's brain would have shrunk to the size of a chicken's brain. The brain has limits - memory limits, sensation limits, etc. Consciousness has limits - can't go without sleep too long, can't store infinite memories aka live forever, etc. But even if you don't believe none of these, there's always the pure physical limits of reality.

Also, I think BWI believers are wrong in thinking that "copies" are the same person. How can the supposed copy of me in another Hubble volume be me, if I am not seeing through his eyes, not feeling what he feels, etc? At best it's a clone (and chaos theory tells me that there aren't even perfectly equal clones). So it's far-fetched to think that my consciousness is in any way connected to that person's consciousness, and might sometime "transfer" in some way. Consciousness is limited to a single physical brain, it's the result of the connectivity between neurons, it can't exist anywhere else, otherwise you would be seeing through 4 eyes and thinking 2 different thought streams!

Replies from: avturchin

↑ comment by avturchin · 2022-02-21T09:47:24.468Z · LW(p) · GW(p)

If copy=original, I am randomly selected from all my copies, including those which are in simulations.

If copy is not equal to original, some kind of soul exists. This opens new ways to immortality.

If we ignore copies, but accept MWI, there are still branches where superintelligent AI will appear tomorrow and will save me from all possible bad things and upload my mind into more durable carrier.

Replies from: superads91

↑ comment by superads91 · 2022-02-21T10:51:03.443Z · LW(p) · GW(p)

"If copy=original, I am randomly selected from all my copies, including those which are in simulations."

How can you be sure you are randomly selected, instead of actually experiencing being all the copies at the same time? (which would result in instantaneous insanity and possibly short-circuit (brain death) but would be more rational nonetheless).

"If copy is not equal to original, some kind of soul exists. This opens new ways to immortality."

No need to call it soul. Could be simply the electrical current between neurons. Even if you have 2 exactly equal copies, each one will have a separate electrical current. I think it's less far fetched to assume this than anything else.

(But even then, again, can you really have 2 exact copies in a complex universe? No system is isolate. The slightest change in the environment is enough to make one copy slightly different.)

But even if you could have 2 exact copies... Imagine this: in a weird universe, a mother has twins. Now, normally, twins are only like 95% (just guessing) equal. But imagine these 2 twins turned out 100% equal to the atomic level. Would they be the same person? Would one twin, after dying, somehow continue living in the head of the surviving twin? That's really far fetched.

"If we ignore copies, but accept MWI, there are still branches where superintelligent AI will appear tomorrow and will save me from all possible bad things and upload my mind into more durable carrier."

As there will be branches where something bad happens instead. How can you be sure you will end up in the good branches?

Also, it's not just about the limits of the carrier (brain), but of consciousness itself. Imagine I sped up your thoughts by 1000x for 1 second. You would go insane. Even in a brain 1000x more potent. (Or if you could handle it, maybe it would no longer be "you". Can you imagine "you" thinking 1000 times as fast and still be "you"? I can't.)

You can speed up, copy, do all things to matter and software. But maybe consciousness is different, because it has something that matter and software don't have: experience/awareness.

Replies from: avturchin

↑ comment by avturchin · 2022-02-22T11:41:20.688Z · LW(p) · GW(p)

The copy problem is notoriously difficult, I wrote a 100 page draft on it. But check the other thread there I discuss the suggestion "actually experiencing being all the copies at the same time" in comments here: https://www.lesswrong.com/posts/X7vdn4ANkdNwoSyxB/simulation-arguments?commentId=9WNTqJFhvZ5dk3uxg#AbGqrjXmH7acGrzDZ

Replies from: superads91

↑ comment by superads91 · 2022-02-22T18:55:43.127Z · LW(p) · GW(p)

Got a link for the 100 page draft?

Also, how can a person be experiencing all the copies at the same time?? That person would be seeing a million different sights at the same time, thinking a million different thoughts at the same time, etc. (At least in MWI each copy is going through different things, right?)

Replies from: avturchin

↑ comment by avturchin · 2022-02-22T21:47:00.700Z · LW(p) · GW(p)

The draft is still unpublished. But there are two types of copies, same person, and same observer-moment (OM). Here I meant OM-copies. As they are the same, there is no million different views. They all see the same thing.

The idea is that "a OM copy" is not a physical thing which has location, but information, like a number. Number 7 doesn't have location in the physical world. It is present in each place, where 7 objects are presented. But the properties of 7, like that it is odd, are non-local.

Replies from: superads91, superads91

↑ comment by superads91 · 2022-02-23T03:16:15.536Z · LW(p) · GW(p)

This also comes down to our previous discussion on your other paper: it seems impossible to undo past experiences (i.e. by breaking chains of experience or some other way). Nothing will ever change the fact that you experienced x. This just seems as intuitively undeniable to me as a triangle having 3 sides. You can break past chains of information (like erasing history books) but not past chains of experience. Another indication that they might be different.

↑ comment by superads91 · 2022-02-23T02:59:23.966Z · LW(p) · GW(p)

I think that could only work if you had 2 causal universes (either 2 Hubble volumes or 2 separate universes) exactly equal to each other. Only then could you have 2 persons exactly equal, having the exact same chain of experiences. But we never observe 2 complex macroscopic systems that are exactly equal to the microscopic level. The universe is too complex and chaotic for that. So, the bigger the system, the less likely to happen it becomes. Unless our universe was infinite, which seems impossible since it has been born and it will die. But maybe an infinite amount of universes including many copies of each other? Seems impossible for the same reason (universes end up dying).

(And then, even if you have 2 (or even a billion) exactly equal persons experiencing the exact same chain of experiences in exactly equal causal worlds, we can see that the causal effect is the exact same in all of them, so if one dies, all the others will die too.)

Now, in MWI it could never work, since we know that the "mes" in all different branches are experiencing different things (if each branch corresponds to a different possibility, then the mes in each branch necessarily have to be experiencing different things).

Anyway, even before all of this, I don't believe in any kind of computationalism, because information by itself has no experience. The number 7 has no experience. Consciousness must be something more complex. Information seems to be an interpretation of the physical world by a consciousness entity.

comment by avturchin · 2019-11-28T12:48:52.012Z · LW(p) · GW(p)

How to Survive the End of the Universe

Abstract. The problem of surviving the end of the observable universe may seem very remote, but there are several reasons it may be important now: a) we may need to define soon the final goals of runaway space colonization and of superintelligent AI, b) the possibility of the solution will prove the plausibility of indefinite life extension, and с) the understanding of risks of the universe’s end will help us to escape dangers like artificial false vacuum decay. A possible solution depends on the type of the universe’s ending that may be expected: very slow heat death or some abrupt end, like a Big Rip or Big Crunch. We have reviewed the literature and identified several possible ways of survival the end of the universe, and also suggest several new ones. There are seven main approaches to escape the end of the universe: use the energy of the catastrophic process for computations, move to a parallel world, prevent the end, survive the end, manipulate time, avoid the problem entirely or find some meta-level solution.

https://forum.effectivealtruism.org/posts/M4i83QAwcCJ2ppEfe/how-to-survive-the-end-of-the-universe [EA · GW]

comment by avturchin · 2023-04-25T10:37:45.368Z · LW(p) · GW(p)

Sizes of superintelligence: hidden assumption in AI safety

"Superintelligence" could mean different things, and to deconfuse this I created a short classification:

Levels of superintelligence:

1. Above human

2. Google size

3. Humanity 100 years performance in 1 year.

4. Whole biological evolution equivalent in 1 year.

5. Jupiter brain with billion past simulations

6. Galactic brain.

7. 3^3^3 IQ superintelligence

X-risks appear between 2nd and 3rd levels.

Nanobot is above 3.

Each level also requires a minimum size of code, memory and energy consumption.

An AI which run out of the box into the internet is between levels 1 and 2 and will be limited in computational resources, so modeling of nanobot from first principle will be difficult for it.

Replies from: Gunnar_Zarncke

↑ comment by Gunnar_Zarncke · 2023-04-25T11:54:17.010Z · LW(p) · GW(p)

I'm not sure what "Whole biological evolution equivalent" means. Clearly, you do not mean the nominal compute of evolution - which is probably close to Jupiter brain. I think you are appealing to something that would be able to simulate evolution with high fidelity?

Replies from: avturchin

↑ comment by avturchin · 2023-04-25T15:06:46.529Z · LW(p) · GW(p)

Actually I meant something like this, but could downsize the claim to 'create something as complex as human body'. Simulation of billions of other species will be redundant.

comment by philip_b (crabman) · 2020-07-27T09:42:01.748Z · LW(p) · GW(p)

You started self quarantining, and by that I mean sitting at home alone and barely going outside, since december or january. I wonder, how's it going for you? How do you deal with loneliness?

Replies from: avturchin

↑ comment by avturchin · 2020-07-27T12:00:23.503Z · LW(p) · GW(p)

I got married January 25, so I am not alone :) We stayed at home together, but eventually we have to go to hospital in May as my wife was pregnant and now we have a small girl.

More generally, I spent most my life more or less alone sitting beside computer, so I think I am ok with isolation.

Three times during the self-isolation I have cold, but I don't have antibodies.

comment by avturchin · 2024-12-28T14:50:31.206Z · LW(p) · GW(p)

"Frontier AI systems have surpassed the self-replicating red line"
Abstract: Successful self-replication under no human assistance is the essential step for AI to outsmart the human beings, and is an early signal for rogue AIs. That is why self-replication is widely recognized as one of the few red line risks of frontier AI systems. Nowadays, the leading AI corporations OpenAI and Google evaluate their flagship large language models GPT-o1 and Gemini Pro 1.0, and report the lowest risk level of self-replication. However, following their methodology, we for the first time discover that two AI systems driven by Meta's Llama31-70B-Instruct and Alibaba's Qwen25-72B-Instruct, popular large language models of less parameters and weaker capabilities, have already surpassed the self-replicating red line. In 50% and 90% experimental trials, they succeed in creating a live and separate copy of itself respectively. By analyzing the behavioral traces, we observe the AI systems under evaluation already exhibit sufficient self-perception, situational awareness and problem-solving capabilities to accomplish self-replication. We further note the AI systems are even able to use the capability of self-replication to avoid shutdown and create a chain of replica to enhance the survivability, which may finally lead to an uncontrolled population of AIs. If such a worst-case risk is let unknown to the human society, we would eventually lose control over the frontier AI systems: They would take control over more computing devices, form an AI species and collude with each other against human beings. Our findings are a timely alert on existing yet previously unknown severe AI risks, calling for international collaboration on effective governance on uncontrolled self-replication of AI systems.
https://arxiv.org/abs/2412.12140

comment by avturchin · 2023-11-14T15:00:12.482Z · LW(p) · GW(p)

ChatGPT can't report is in conscious or not. Because it also thinks it is a goat.
https://twitter.com/turchin/status/1724366659543024038

comment by avturchin · 2023-04-17T10:32:23.263Z · LW(p) · GW(p)

The problem of chicken and egg in AI safety

There are several instances:

AI can hide its treacherous turn, but to hide treacherous turn it needs to think about secrecy in a not secret way for some moment.

AI is should be superinteligent enough to create nanotech, but nanotech is needed to create powerful computations required for superintelligence.

ASI can do anything, but to do anything it needs human atoms.

Safe AI has to learn human values but this means that human values will be learned by unsafe AI.

AI needs human-independent robotic infrastructure before killing humans, but if it has human-independent infrastructure, there is no need to kill humans.

One general way to solve this problem is iteration (like Christiano's approach for value learning, iterated distillation)

avturchin's Shortform

Contents

171 comments

Is the root cause of hallucination alignment rather than capabilities?!

Most LLMs' replies can be improved by repeatedly asking "Improve the answer above" and it is similar to the test-time compute idea and diffusion.