Posts
Comments
Self-playing Adversarial Language Game Enhances LLM Reasoning
https://arxiv.org/abs/2404.10642
LLMs now can also self-play in adversarial word games and it increases their performance https://arxiv.org/abs/2404.10642
Roman Mazurenko is dead again. First resurrected person, Roman lived as a chatbot (2016-2024) created based on his conversations with his fiancé. You might even be able download him as an app.
But not any more. His fiancé married again and her startup http://Replika.ai pivoted from resurrection help to AI-girlfriends and psychological consulting.
It looks like they quietly removed Roman Mazurenko app from public access. It is especially pity that his digital twin lived less than his biological original, who died at 32. Especially now when we have much more powerful instruments for creating semi-uploads based on LLMs with large prompt window.
The "repeating" will not be repeating from internal point of view of a person, as he has completely erased the memories of the first attempt. So he will do it as if it is first time.
Yes, here we can define magic as "ability to manipulate one's reference class". And special minds may be much more adapted to it.
Presumably in deep meditation people become disconnected from reality.
Yes it is easy to forget something if it does not become a part of your personality. So a new bad thing is easier to forget.
The number of poor people is much larger than billionaire. So in most cases you will fail to wake up as a billionaire. But sometimes it will work and it is similar to law of attraction. But formulation via forgetting is more beautiful. You forget that you are poor.
UPDATE; actually, the difference with the law of attraction is that after applying the law of attraction, a person still remember that he has used the law. In magic by forgetting the fact of its use must be completely forgotten.
I can forget one particular thing, but preserve most of my selfidentification information
In the case of broken robot we need two conditions for magic by forgetting:
- there are 100 robots and only one is broken and all of them are type-copies of each other.
- each robot enters into blank state of mind naturally in some moment, like sleep or reboot.
In that case, after robot enters the blank state of mind it has equal chances to be any of robots and this dilutes its chances to have the damaged body after awakening.
For you toy example - at first approximation, any of which can recognize itself as avturchin (self-recognition identity criteria).
I am building my sideload via recursively correcting of 1-million-tokens prompt for large LLM. The prompt consists of 500 rules which describe my personality, similar to personal constitution, and of some texts, like diaries, abstracts, poetry, stream of thoughts etc. Works on Google Gemini 1M through Google AI studio, and the shorter version works great on Opus. The system also includes a universal "loader prompt" which tries to increase the intelligence of the model and describes how the chatbot should work.
I found that sideloading allows very quick iterations in the sideload's improvements and the improvements are two-fold: of the loader itself and improvements of the knowledge and style of the sideload.
I find that my sideload is surprisingly good for a project which took around 1 month of work. 1 of the 5 answers is exactly like mine from a factual and style point of view.
I am open-sourcing my sideload, anyone can run it https://github.com/avturchin/minduploading/tree/main
I can help anyone interested to build his-her own sideload.
Example of work of the chatbot, no cherry picking:
Q:(now speak in english) what will be your next post in Lesswrong?
A: /The user is asking about my next LessWrong post. I will answer as if I am Alexey Turchin, and describe my plans for future posts, as well as my current location and the people around me./
My next LessWrong post? [Alexey looks thoughtfully out the window]
(I have several drafts)
[Alexey stretches and sits up in bed]
{The room is small, with a balcony. On the table is a laptop, some papers, and reading glasses. Outside the window, it is still gray and dreary.}
I'm thinking about writing about the nature of qualia and the possibility of creating qualia of non-existent things. [Alexey gets up, walks to the window, and looks out at the street]
{Below, a woman with a child is walking, a street sweeper is lazily sweeping leaves. Somewhere in the distance, the sound of a siren can be heard.}
Or maybe about the problem of copies and identity. (it's a complex topic)
[CONFIDENCE] 8
Interestingly, could similar behavior be used to explain some cases of human schizophrenia when word salad is produced.
Most typical skin cancer is basiloma - and it is rather benign - no metastases and can be removed without hospitalization. Many people get it.
combine more approaches!
I try new models with 'wild sex between two animals'
Older models produced decent porn on that.
Later models refuse to replay as triggers were activated.
And last models give me lectures about sexual relations between animals in the wild.
can you access it via vpn?
I wrote similar idea here: https://www.lesswrong.com/posts/NWQ5JbrniosCHDbvu/the-ai-shutdown-problem-solution-through-commitment-to
My point was to make a precomitment to restart any (obsolete) AI every N years. Thus such AI can expect getting infinite computations and may be less feared of shutting down.
Yes. But also AI will not make actual paperclips for millions and even billions years: it will spend this time for conquering universe in the most effective way. It could use Earth materials for jump start the space exploration as soon as possible. It could preserve some humans as some bargin resource in case it meets other AI in space.
There is some similarity between UDASSA and 'Law without law" by Mueller, as both use Kolmogorov complexity to predict the distribution of observers. In LwL there is not any underlying reality except numbers, so it is just dust theory over random number fields.
FDT paper got 29 citation, but many from MIRI affiliated people and-or on AI safety. https://scholar.google.ru/scholar?cites=13330960403294254854&as_sdt=2005&sciodt=0,5&hl=ru
One can escape troubles with reviewers by publishing in arxiv or other paper archives (philpapers). Google Scholar treats them as normal articles.
But in fact there are good journals with actually helping reviewers (e.g. Futures).
Why you hadn't wrote academic articles on these topics?
The secret is that academic article is just a formatting type and anyone can submit to scientific journals. No need to have a PhD or even work in a scientific institution.
Several types of existential risks can be called "qualia catastrophes":
- Qualia disappear for everyone = all become p-zombies
- Pain qualia are ubiquitous = s-risks
- Addictive qualia domminate = hedonium, global wireheading
- Qualia thin out = fading qualia, mind automatisation
- Qualia are unstable = dancing qualia, identity is unstable.
- Qualia shift = emergence of non-human qualia (humans disappear).
- Qualia simplification = disappearance of subtle or valuable qualia (valuable things disappear).
- Transcendental and objectless qualia with hypnotic power enslave humans (God as qualia; Zair). -
- Attention depletion (ADHD)
Thank for explaining your position which is interesting and consistent.
I can suggest that the connection between WIH and wet market can be explained by the idea that some criminals sold lab animals from WIH on the wet market, e.g. bats.
Obviously this looks like ad hoc theory. But the travel of the virus to the market from the Laos caves also seems to be tricky and may include some steps like intermediate carrier. Both look equally unlikely, one of the happened.
So my idea is to ignore all the details and small theories; instead just updated on the distances to two possible origins points: 8 miles and 900 miles. This is 100 times difference and if we count the areas - it is 10000 times difference. In last case we can make so powerful update in the direction of WIH as source, that it overrides all other evidence.
Yes, my mistake for the distance. Confused it with local CDC, which is like 600 meters from the market.
The place where most human cases are concentrated is the place where human-to-human transmission started - or there was multiple events of animal-to-human transmission in this place. The second thing would be surprising as if the virus can so often jump to humans from animals it will happen closer to its origin in Laos.
Alternative explanation is following: as the market is one of the most crowded place in the city (not sure, heard about it somewhere) it worked as an amplification of a single transmission event which could happen elsewhere.
If we assume that a worker of WIH was infected at work, this will be completely unspectacular until he started infecting other people. Such person can commute all around the city including to CDC near wet market.
My point: 8 miles or 2 miles is not big difference here, as the virus came to market not by air but with a commuting person, and 8 miles day commute is pretty normal. The market being big is not also a strong evidence as the animal number in smaller markets all over china will overweight animal-number in one big market.
My point was that in some cases the update can be so strong that it overrides all reasonable uncertainties in priors and personal estimates.
And exactly this makes Bayes' theorem useful and strong instrument.
The fact that the virus was found in 2 miles from the facility which was supposed to research them - must make our bells ring.
To override this we need some mental equlibristics (I think of meme here but I don't want to be rude)
If I have uniformed prior 1 to 1 on natural vs lableak origin, and update on 5 per cent coincidence that origin place is near lab, I will get around 95 per cent for lableak.
If they continued to suppress information, this may contribute to additional deaths and they could know it. In that case they can get first degree murder.
If they confirm, they will get life in jail or even death penalty, so it may be not surprising that they will deny in any case.
I heard about a practice that people perform the work for which they ask the grant - before the application.
First, because why not to cover my expenditures?
The second reason is that if the biggest part of the work for the grant is already performed, it is much easy to be sure that the idea will work and much clear what actually write in the grant. Your grant application will look great if it will based on already performed work.
Thus the grant may describe the work they already performed.
Also, draft in Ukraine was only for people older than 27 years old, which is not obvious from this blog post. Closing borders for males was not equal to draft. Many found legal ways to leave - eg by becoming students in foreign universities.
We can experimentally test this.
I can treat the place I was born as random relative to its latitude = 59N. I ignore everything I know about population distribution and spherical geometry and ask a question: assuming that I was born in the middle of all latitudes, what is the highest possible latitude? It will be double of my latitude, or 118 - which is reasonably close to real answer 90.
From this I conclude that I can use information about my location as a random sample and use it for some predictions about the things I can't observe.
A real world example of Presumptuous philosopher is is the question of panspermia. If it is real, we have orders of magnitude more habitable planets in our galaxy, thus more observers. Therefore, accepting SIA means accepting panspermia.
If we take observers in my epistemic situation as a reference class, we still get a variant of DA and a bad one. My epistemic class are (roughly) people who think about anthropics. This people are distributed in time. First of them appear around 1970s (Carter) and much more of them appeared in LW time. If I am randomly selected from this group, I am in the middle of its existence, which means that antropics-interested people will almost disappear in the next few decades.
But can we ask another question: 'where I am located?' For example, I know that I am avturchin, but I don't know in which of 10 rooms I am located, and assuming that 9 of them are red outside and 1 green, I can bet there is 0.9 chances that I am in red one. It doesn't matter here if I am just one person entering the rooms, or there are other people in the rooms (if in equal numbers) or even that my copies are in each room.
An interesting topic is that subjective probabilities can be (or not) path-dependent:
- If we create 3 copies of me by some symmetric process, I can think expect that being any of them has equal chances = 1:3
-If we create 2 copies, and after that one copy is (symmetrically) copied again, we get 0.5 for the first copy and 0.25 for second and third copies.
In both cases we have 3 completely similar copies, but we get them by different paths, and this implies different probabilities. Also, if we ignore paths, and select only based on final states of copies, no matter how they are created, we get SSA.
This thought experiment looks like SB and your Fission with a toss, but both copy-creating situations are the same: just symmetrical copying.
If the person is told that it is Tails, and asked what is the probability that he is L – what should he say? Is it undefined under PBR?
If I play this game many times, say 100, when I update on getting green ball, I will losing on average - and after 100 games I will be in minus. So in this game it is better not to update on personal position and EY used this example to demonstrate the power of his Updateless decision theory.
Another example: imagine that for each real me, 10 Boltzmann Brains appear in the universe. Should I go to gym? If I update that I am BB, I should not, as gym is useless for BBs, as they will disappear soon. However, I can decide a rule that I ignore BB and go gym, and in that case real me will get benefits of gym.
Maybe it is part of the system which protects them from the fear of death: they suppress not only thoughts about death but even their own fear of it. Similar to Freudian repression of thoughts about sex.
You may invest in the research of the relation of AI Doom and big world immortality (aka quantum immortality). If your probability of momentary death is P and the probability of the validity of quantum immortality is Q, then the survival chances are (If I am calculating this right):
1–P (1-Q) = 1– P +PQ
But the chances of s-risk are unaffected by quantum immortality and thus they would grow relatively to death chances. They will grow in 1/(1-Q) times.
Momentary BB (the ones which exist just one observer-moment) has random thought structure, so it has no causal connection between its observations and thoughts. So even if it percieve noice and think noice, it is just a random coincidence.
However, there is a dust theory. It claims that random BBs can form chains in logical space. In that case, what is noice for one BB, can be "explained" in the next observer moment - for example random perception can be explained as static on my home TV. There is an article about about it https://arxiv.org/pdf/1712.01826.pdf
Inability to distinguish noice and patters is true only for BBs. If we are real humans, we can percieve noice as noice with high probability. But we don't know if we are BB or real humans, and can't use our observations about the randomness to solve this.
I meant not that 'random screen may happen to look like a natural picture", but that BB will perceive random screen as if it has order, because BBs are more likely to make logical mistakes.
A true Boltzmann brain may have an illusion of the order in completely random observations. So the fact that my observations look ordered is not evidence that they are really ordered for me as BB. In short, we should nor believe BB's thoughts. And thus I can't disprove that I am BB just looking on my observation.
But your argument may still be valid. This is because evolving fluctuations may be more probable than momentary fluctuations. For example, imagine infinite universe filled with low concentration of gas. This gas can form a brain directly for a second, it will be BB. But this gas can also form large but fuzzy blob, which will then gravitationally collapse into a group of stars, some of them will have planets with life and such planets will produce many brains.
While mass of initial gas fluctuation is many orders of magnitude larger than one of the brain, it is less ordered and thus more probable. Thus normal worlds are more probable than BBs.
If we assume that all ants are copies of each other (they are not, but they are more similar than humans), when all 20 quadrillion ants will have the same moral value as just one ant.
This means that preservation of species is more important than preservation of individual insects and it is closer to our natural moral intuitions.
Do you know about https://en.wikipedia.org/wiki/Obesity_hypoventilation_syndrome
How butanol will affect driving and aggression?
My intuition: imagent LLM-based agent. It has fixed prompt and some context text and use this iteratively. Context part can change and as it changes, it affects interpretation of fixed part of the prompt. Examples are Waluigi and other attacks. This causes goal drift.
This may have bad consequences as a robot suddenly turns in Waluigi and start kill randomly everyone around. But long-term planning and deceptive alignment requires very fixed goal system.
If we find that AI can stop its random walk on a goal X, we can use this as an aimability instrument, and find a way to manipulate the position of X.
The value of AI aimability may be overblown. If AI is not aimable, its goals will perform eternal random walk and thus AI will cause only short-term risk - no risk of world takeover. (Some may comment that after random walk, it will stack in some Waluigi state forever - but if it is actually works in getting fix goal system, why we do not research such strange attractors in the space of AI goals?)
AI will become global-catastrophically-dangerous only after aimability will be solved. Research in aimability only brings this moment closer.
The wording "AI alignment" is precluding us to see this risk, as it combines aimability and giving nice goals to AI.
I often hear hypnogogic music before sleep. Sometimes it is very beautiful and I never heard anything like this until I heard Karavaichuk recently.
Around 10 years ago I had long sleepless night and heard some hypnogogic music. And suddenly I had an idea - I can send this music to my fingers - and they started playing on my blanket. I was sure at that moment that they are playing my music and if there was a piano, I can play it! Never experienced this again, may be because I started to sleep well.
So the point of my story is the idea that Sudden savant just reroutes his internal generative AI to external world.
Can we find real world situation which already happening and is similar to this game? In that case we can solve SIA vs SSA experimentally.