Posts

Comments

Comment by Raphael Roche (raphael-roche) on Why Have Sentence Lengths Decreased? · 2025-04-21T05:42:32.288Z · LW · GW

You're right. I said "pronunciation," but the problem is more exactly about the translation between graphemes and phonemes.

Comment by Raphael Roche (raphael-roche) on Why Have Sentence Lengths Decreased? · 2025-04-20T23:24:32.798Z · LW · GW

You're right. The idea behind Académie française style guidelines is that language is not only about factual communication, but also an art, literature. Efficiency is one thing, aesthetics another. For instance, poetry conveys meaning or at least feeling, but in a strange way compared to prose. Poetry would not be very effective to describe an experimental protocol in physics, but it is usually more beautiful to read than the methodology section of a scientific publication. I also enjoy the 'hypotaxic' excerpt above much more than the 'parataxic' one. Rich sentences are not bad per se, they need more effort and commitment to read, but sometimes, if well written, give a greater reward, because complexity can hold more subtlety, more information. Short sentences are not systematically superior in all contexts; they can look as flat as a 2D picture compared to a 3D picture.

Comment by Raphael Roche (raphael-roche) on Why Have Sentence Lengths Decreased? · 2025-04-20T22:30:44.878Z · LW · GW

This is interesting. I think English concentrates its weirdness in pronunciation, which is very irregular. Although adult native speakers don't realize it, this presents a serious learning difficulty for non-native speakers and young English-speaking children. Studies show that English-speaking students need more years of learning to master their language (at least for reading) than French students do, who themselves need more years than young Italian, Spanish or Finnish students (Stanislas Dehaene, Reading in the brain).

Comment by Raphael Roche (raphael-roche) on Why Have Sentence Lengths Decreased? · 2025-04-20T21:59:17.687Z · LW · GW

Redundancy makes sure the information passes through. In French, the word 'aujourd'hui' ('today') etymologically means 'au jour de ce jour' ('on the day of this day'), but it is not uncommon to say 'au jour d'aujourd'hui' which would literally mean 'on the day of on the day of this day'. It is also common to say 'moi, je' ('me, I') and increasingly people even say 'moi, personnellement, je' ('me, personally, I'). This represents a kind of emphasis but also a kind of fashion, simular to what happens in the fashion industry, or a kind of drift, similar to what happens in the evolution of species.

Comment by Raphael Roche (raphael-roche) on Frontier AI Models Still Fail at Basic Physical Tasks: A Manufacturing Case Study · 2025-04-15T19:34:30.275Z · LW · GW

AI is very useful in legal matters and is clearly a promising sector for business. It is possible that some legal jobs (especially documentation and basic, non-personalized legal information jobs) are already being challenged by AI and are on the verge of being eliminated, with others to follow sooner or later. My comment was simply reacting to the idea that many white-collar jobs will be on the front line of this destruction. The job of a lawyer is often cited, and I think it's a rather poor example for the reasons I mentioned. Many white-collar jobs combine technical and social skills that can be quite challenging for AI.

Comment by Raphael Roche (raphael-roche) on Frontier AI Models Still Fail at Basic Physical Tasks: A Manufacturing Case Study · 2025-04-15T14:46:44.957Z · LW · GW

Because of this, I think that there will be an interim period where a significant portion of white collar work is automated by AI, with many physical world jobs being largely unaffected.

I have read numerous papers suggesting that white-collar jobs, such as those of lawyers, will be easily replaced by AI, before more concrete or physical jobs as discussed by the author's. However, I observe that even the most advanced models struggle with reliability in legal contexts, particularly outside of standardized multiple-choice questions and U.S. law, for which they have more training data. These models are good assistants with superhuman general knowledge, pretty good writing skills, but very inegal smartness and reliability in specific cases, a tendency to say what the user / client wants to hear (even more than actual lawyers !) or to hallucinate judicial decisions.

While it is true that lawyers statistically win only about 50% of their cases and also make mistakes, my point is different. I observe a significant gap in AI's ability to handle legal tasks, and I question whether this gap will be bridged as quickly as some envision. It might be more comparable to the scenario of automated cars, where even a small gap presents a substantial barrier. 90% superhuman performance is great, 9% human-level is acceptable, but 1% stupid mistakes ruin it all.

Law is not coding. The interpretation and application of laws is not an exact science, it involves numerous cultural, psychological, media-related, political, and human-biased considerations. There is also a social dimension that cannot be overlooked. Many clients enjoy conversing with their lawyers, much like they do with their waitstaff or nurses. Similarly, managers appreciate discussing matters over meals at restaurants. The technical and social aspects are intertwined, much like the relationship between a professor and their students, or in other white-collar jobs.

While I do not doubt that this gap can eventually be bridged, I am not convinced that the gap for many white-collar jobs will be filled before more technical engineering tasks like discussed here, or automated cars, are mastered.  However, some white-collar jobs that involve abstract thinking and writing, with minimal social interactions, such as those of theoretical researchers, mathematicians, computer scientists, and philosophers (am I speaking of the archetypal LessWronger ? ), may be automated sooner.

Comment by Raphael Roche (raphael-roche) on What if there was a nuke in Manhattan and why that could be a good thing · 2025-04-15T13:11:20.302Z · LW · GW

Assuming no technology is absolutely perfect, absolutely risk free, what if the the nuclear warhead detonate accidentally ? Wouldn't be less risky that, for instance, a russian nuclear warhead accidentally detonate in a russian military base in Siberia rather than in the russian consulate in the center of NYC ? 

Comment by raphael-roche on [deleted post] 2025-04-08T18:32:08.461Z
Comment by Raphael Roche (raphael-roche) on AI 2027: What Superintelligence Looks Like · 2025-04-08T14:22:32.271Z · LW · GW

Impressive prospective work. It's frightening, both scenarios, even though one is worse than the other. The evolution seems unstoppable, and even if superintelligent AGI doesn't happen in 2027-2030 but in 2040 or 2050, the feeling isn't very different. I have young children, and while I don't really care for myself, I really care for them. It was cool when it was just sci-fi. It was still fun when we first played with ChatGPT. It doesn't look fun anymore, at all. My own thinking about it is that we're indeed locked in a two-option scenario, probably not that fast, probably not with exactly the same narrative, but with two possible global endings that look like attractors (https://en.wikipedia.org/wiki/Attractor).

Comment by Raphael Roche (raphael-roche) on AI 2027: What Superintelligence Looks Like · 2025-04-08T14:06:17.285Z · LW · GW
Comment by raphael-roche on [deleted post] 2025-04-08T08:21:52.996Z

I think what gives you this idea in the double slit experiment is that depending on how you observe or measure the object, it seems to exhibit different behavior. How is this possible? Isn't it mysterious? To resolve this mystery, you appeal to an explanation like awareness. But although it feels like an explanation, it actually explains nothing. Putting a word on something is reassuring, but behind the word we're not sure what we're talking about - we don't know how it's supposed to function; there is no actual explanation. It purports to explain everything, thus explains nothing. You don't know how it works and can't make any predictions. It's just as mysterious as the initial mystery itself (just like explanations implying gods or other supernatural causes).

Sometimes mystery must remain. But in this case, that needn't be so. The initial mystery may not be such a mystery after all. An observation or measurement implies an interaction with the measured/observed object. The double slit experiment involves a protocol that interacts with the observed object and constrains its behavior, thus producing different outputs when you slightly change the protocol and the interaction. The same applies to all measurement/observation - it is never absolutely neutral. So if the rock behaves slightly differently when you're around observing it, this is fundamentally no different from the reason why the rock moves if you push it. It doesn't imply that the rock is aware of anything, unless by "awareness" you simply mean "physical interaction."

Comment by Raphael Roche (raphael-roche) on Universal Basic Income and Poverty · 2025-04-07T16:14:04.929Z · LW · GW

Brilliant essay. It reminds me of the work of James C. Scott. However, I am quite surprised by the conclusion: "I do not understand the Poverty Equilibrium. So I expect that a Universal Basic Income would fail to eliminate poverty, for reasons I don't fully understand." To me, the explanation of the Poverty Equilibrium is quite simple. Yes, there are diminishing returns in the marginal value of all resources, but there is also an increase in the subjective value of all resources in consideration of what you know others possess. Alice is happy with one banana, but she feels much less happy after knowing Bob possesses two. Inequality is not an abstract concept; it is a feeling, a feeling of injustice, a bad feeling like sadness, jealousy, or pain. Innumerable studies have shown this. Even ethology studies show it in mammals. You can rationally justify inequality, develop moral arguments like Aristotle's geometric equality, saying it's right, just, and deserved. You can also conceive of liberal economic theories showing that inequality is unavoidable or good for the growth of the system. But in the end, there will be poverty as long as some people possess the lion's share and others collect the junk, would the junk be gold. Universal Basic Income would fail to eliminate poverty, but could contribute to mitigate it a little.

Edit / addendum : I receive disagreement votes with this comment, as I anticipated, but please set aside your liberal views for a moment and consider this thoughtfully. Examine your situation, your income, the food you eat, and the items you own. You're far from being poor and quite content with your circumstances.

But imagine if most people around you had a hundred times your current income. If the food you can afford—the same food you currently enjoy—was marketed and socially regarded as dog food by the majority. These others eat better food than you've ever dreamed of. You tried it once when someone let you finish their plate, and it was absolutely incredible. All your possessions, all the cool things you're so fond of and proud to own, are now socially considered junk because others have become so wealthy compared to you. You could even find better quality items than your prized possessions just by scavenging through garbage.

Would you still feel as far from poverty as you did before? Would you still feel satisfied and happy? Wouldn't you feel somewhat ashamed of your situation, perhaps envious of others?

But this isn't merely fiction... It's what has actually happened to countless people, including the last hunter-gatherers Yudkowsky referenced, and many traditional farmers. Today, in our world, the gap between the richest and poorest individuals isn't on the scale of hundreds, thousands, or even millions, but billions. Inequality definitely matters. Poverty isn't exclusively relative—it's not solely a social construct as there are absolute needs to be satisfied—but relative comparisons are certainly part of it.

Comment by Raphael Roche (raphael-roche) on The Dangers of Mirrored Life · 2025-04-07T14:41:11.355Z · LW · GW

Thanks for this precision, That's interesting.

Comment by Raphael Roche (raphael-roche) on How Gay is the Vatican? · 2025-04-07T14:38:58.468Z · LW · GW

Thanks for the study. In my opinion, there is a more direct evidence of how gay is the Vatican, or the Catholic church in general. In the general population, victims of sexual assault are overwhelmingly female, and perpetrators are overwhelmingly male. Even in the rare cases where the perpetrators are female, contrary to what one might imagine, the victims are still predominantly female. However, when the perpetrator is a priest or another representative of the Catholic Church, the victims are predominantly male (for a recent and global scale study in France : https://www.ciase.fr/rapport-final/ ).

Comment by Raphael Roche (raphael-roche) on Mis-Understandings's Shortform · 2025-04-07T14:10:45.382Z · LW · GW

The comparison with elite athletes also jumped to my mind. Mature champions could be good advisors to young champions, but probably not to people with very different profiles and capacities, facing difficulties or problems they never considered, etc. We imagine that because people like Bill Gates or Jeff Bezos succeeded with their companies, they are some kind of universal geniuses and prophets. However, it is also quite possible that if these same people were appointed (anonymously or under a pseudonym, without the benefit of their image, contacts, or fortune, etc.) to lead a small family-owned sawmill in the remote parts of Manitoba, in a sector and socio-economic environment very different from anything they have known, they might not necessarily do better than the current managers, or even, significantly, might do worse. We too often overlook the fact that successful people are not just individuals with potential, but also those who found themselves in the right place at the right time, allowing them to fully express their potential. A kind of alignment of the stars, a mix of chance and necessity, somewhat like the theory of evolution, where an individual combined a good genetic heritage, a favorable environment, and luck in their interactions, resulting in considerable offspring

Comment by raphael-roche on [deleted post] 2025-04-07T13:26:08.378Z

The it from bit (or qbit) hypothesis is fascinating, so is the information paradox, so is quantum mechanics, but I don't think there is any empirical nor theoretical evidence supporting "awareness" - what may it be - of the universe in any of this. No more than evidence supporting god(s) or a flying spaghetti monster. Creating a narrative does not constitute evidence (edit : even if gedankenexperiments are valuable). We are free to speculate, and it is very respectable, however an extraordinary affirmation needs an equally extraordinary amount of proof and I think we are really far from it. We are actually struggling to understand if, how and why humans have consciousness, not speaking of animals and LLMs. Let's solve these cases before we speak of the awareness of rocks or the whole universe.

Comment by Raphael Roche (raphael-roche) on A Bear Case: My Predictions Regarding AI Progress · 2025-03-13T09:56:08.850Z · LW · GW

Exactly. Future is hard to predict and the author's strong confidence seems suspicious to me. Improvements came fast last years. 

2013-2014 : word2vec and seq2seq 

2017 : transformer and gpt-1 

2022 : CoT prompting 

2023 multimodal LLMs

2024 reasonning models.

Are they linear improvements or revolutionnary breakthroughs ? Time will tell, but to me there is no sharp frontier between increment and breakthrough. It might happen that AGI results from such improvements, or not. We just don't know. But it's a fact that human general intelligence resulted from a long chain of tiny increments, and I also observe that results in ARC-AGI bench exploded with CoT/reasoning models (not just math or coding benchs). So, while 2025 could be a relative plateau, I won't be so sure that next years will also. To me a confidence far from 50% is hard to justify.

Comment by Raphael Roche (raphael-roche) on Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs · 2025-03-12T00:24:20.054Z · LW · GW

The authors of the paper remain very cautious about interpreting their results. My intuition regarding this behavior is as follows. 

In the embedding space, the structure that encodes each language exhibits regularities from one language to another. For example, the relationship between the tokens associated with the words 'father' and 'mother' in English is similar to that linking the words 'père' and 'mère' in French. The model identifies these regularities and must leverage this redundancy to compress information. Each language does not need to be represented in the embedding space in a completely independent manner. On the contrary, it seems economical and rational to represent all languages in an interlaced structure to compress redundancies. This idea may seem intuitive for the set of natural languages that share common traits related to universals in human thought, but the same applies to formal languages. For example, there is a correspondence between the 'print' function in C and Python, but these representations also have a link with the word 'print' in English and 'imprimer' in French. The model thus corresponds to a global structure where all languages, both natural and formal, are strongly intertwined, closely linked, or correlated with each other.

Therefore, if a model is fine-tuned to generate offensive responses in English, without this fine-tuning informing the model about the conduct to adopt for responses in other languages, one can reasonably expect the model to adopt an inconsistent or, more precisely, random or hesitant attitude regarding the responses to adopt in other languages, remaining aligned for some responses but also presenting a portion of offensive responses. Moreover, this portion could be more significant for languages strongly interlaced with English, such as Germanic or Latin languages, and to a lesser extent for distant languages like Chinese. But if the model is now queried about code, it would not be surprising if it provides, in part of its responses, code categorized as offensive, i.e., transgressive, dangerous, or insecure.

At this stage, it is sufficient to follow the reverse reasoning to understand how fine-tuning a model to generate insecure code could generate, in part of its responses in natural language, offensive content. This seems quite logical. Moreover, this attitude would not be systematic but rather random, as the model would have to 'decide' whether it is supposed to extend these transgressive responses to other languages. Providing a bit more context to the model, such as specifying that it is an exercise for a security code class, should allow it to overcome this indecision and adopt a more consistent behavior. 

Of course, this is a speculative interpretation on my part, but it seems compatible with my understanding of how LLMs work, and it also seems experimentally testable. For example, by testing the reverse pathway (impact on code responses after fine-tuning aimed at producing offensive responses in natural language), and in one direction and the other, does the impact seem correlated with the greater or lesser proximity of natural or formal languages ?

Comment by Raphael Roche (raphael-roche) on The Risk of Gradual Disempowerment from AI · 2025-02-06T17:14:23.766Z · LW · GW

From my perspective, the major issue remains Phase 1. It seems to me that most of the concerns mentioned in the article stem from the idea that an ASI could ultimately find itself more aligned with the interests of socio-political-economic systems or leaders that are themselves poorly aligned with the general interest. Essentially, this brings us back to a discussion about alignment. What exactly do we mean by "aligned"? Aligned with what? With whom? Back to phase 1.

But assuming an ASI truly aligned with humanity in a very inclusive definition and with high moral standards, phase 2 seems less frightening to me. 

Indeed, we must not forget:

  • that human brains are highly energy-efficient;
  • that there are nearly 10 billion human brains, representing a considerable computing power.

Assuming we reach the ASI stage with a system possessing computational power equivalent to a few million human brains, but consuming energy equivalent to a few billion human brains, the ASI will still have a lot of work to do (self-improvement cycles) before it can surpass humanity both in computational capacity and energy efficiency.

Initially, it will not have the capability to replace all humans at one.

It will need to allocate part of its resources to continue improving itself, both in absolute capacity and in energy efficiency. Additionally, since we are considering the hypothesis of an aligned ASI, a significant portion of its resources would be dedicated to fulfilling human requests.

The more AI is perceived as supremely intelligent, the more we will tend to entrust it with solving complex tasks that humans struggle to resolve or can only tackle with great difficulty—problems that will seem more urgent compared to simpler tasks that humans can still handle.

I won’t compile a list of problems that could be assigned to an ASI, but one could think, for example, of institutional and legal solutions to achieve a more stable and harmonious social, economic, and political organization on a global scale (even an ASI—would it be capable of this?), solutions to physics and mathematics problems, and, of course, advances in medicine and biology.

It is possible that part of the ASI would also be assigned to performing less demanding tasks that humans could handle, thus replacing certain human activities. However, given that its resources are not unlimited and its energy cost is significant, one could indeed expect a "slow takeover."

More specifically, in the fields of medicine and biology, the solutions provided by an ASI could focus on eradicating diseases, increasing life expectancy, and even enhancing human capabilities, particularly cognitive abilities (with great caution in my opinion). Even though humans have a significant advantage in energy efficiency, this does not mean that this aspect cannot also be improved further.

Thus, we could envision a symbiotic co-evolution between ASI and humanity. As long as the ASI prioritizes human interests at least at the same level as its own and continues to respond to human demands, disempowerment is not necessarily inevitable—we could imagine a very gradual human-machine coalescence (CPU and GPU coevoluted for a while and GPU still doesn't have entirely replace CPU, and it's likely quantum processors will also coevolute aside classic processors, even in the world of computation, diversity could be an advantage).

Comment by Raphael Roche (raphael-roche) on The ants and the grasshopper · 2025-02-04T16:04:48.902Z · LW · GW

I agree, finding the right balance is definitely difficult.

However, the different versions of this parable of the grasshopper and the ant may not yet go far enough in subtlety.

Indeed, the ants are presented as champions of productivity, but what exactly are they producing? An extreme overabundance of food that they store endlessly. This completely disproportionate and non-circulating hoarding constitutes an obvious economic aberration. Due to the lack of significant consumption and circulation of wealth, the ants' economy—primarily based on the primary sector, to a lesser extent the secondary sector, and excessive saving—while highly resilient, is far from optimal. GDP is low and grows only sluggishly.

The grasshoppers, on the other hand, seem to rely on a society centered around entertainment, culture, and perhaps also education or personal services. They store little, just what they need, which can prove insufficient in the event of a catastrophe. Their economy, based on the tertiary sector and massive consumption, is highly dynamic because the wealth created circulates to the maximum, leading to exponential GDP growth. However, this flourishing economy is also very fragile and vulnerable to disasters due to the lack of sufficient reserves—no insurance mechanism, so to speak.

In reality, neither the grasshoppers nor the ants behave in a rational manner. Both present two diametrically opposed and extreme economic models. Neither is desirable. Any economist or actuary would undoubtedly recommend an intermediate economy between these two extremes.

The trap, stemming from a long tradition since Aesop, is to see a model in the hardworking ant and a cautionary tale in the idle cicada. If we try to set aside this bias and look at things more objectively, it actually stems from the fact that until the advent of the modern economy, societies struggled to conceive that wealth creation could be anything other than the production of goods. In other words, the tertiary sector, although it existed, was not well understood and was therefore undervalued. Certainly, the wealthy paid to attend performances or organized lavish festivities, but this type of production was not fully recognized as such. It was just seen as an expense. Services were not easily perceived as work, which was often associated with toil, suffering, and hardship (e.g. "Labour" etymology).

Today, it is almost the opposite. The tertiary sector is highly valued, with the best salaries often found there, and jobs in this sector are considered more intellectual, more prestigious, and more rewarding. In today's reality, a cicada or grasshopper would more likely be a famous and wealthy dancer in a international opera, while an ant would be an anonymous laborer toiling away in a mine or a massive factory in an underdeveloped country (admittedly, I am exaggerating a bit, but the point stands).

In any case, it would be an illusion for most readers of this forum to identify with the ants in the parable. We are probably all more on the side of the cicadas, or at least a mix of both—and that's a good thing, because neither of these models constitutes an ideal.

The optimum clearly lies in a balanced, reasonable path between these two extremes.

Another point I would like to highlight is that the question of not spending resources today and instead accumulating them for a future date is far from trivial to grasp at the level of an entire society—for example, humanity as a whole. GDP is a measure of flows over a given period, somewhat like an income statement. However, when considering wealth transfers to future generations, we would need an equivalent tool to a balance sheet. But there is no proper tool for this. There is no consensus on how to measure patrimonial wealth at the scale of humanity.

Natural resources should certainly be valued. Extracting oil today increases GDP, but what about the depletion of oil reserves? And what about the valuation of the oceans, the air, or solar energy? Not to mention other extraterrestrial resources. We plunge into an abyss of complexity when considering all these aspects.

Ultimately, the problem lies in the difficulty of defining what wealth actually is. For ants, it is food. For cicadas, it is more about culture and entertainment. And for us? And for our children? And for human civilization in a thousand years, or for an extraterrestrial or AI civilization?

Many will likely be tempted to say that available work energy constitutes a common denominator. As a generic, intermediate resource—somewhat like a universal currency—perhaps, but not necessarily as a form of wealth with inherent value. Knowledge and information are also likely universal resources.

But in the end, wealth exists in the eye of the beholder—and, by extension, in the mind of an ant, a cicada, a human, an extraterrestrial, and so on. Without falling into radical relativism, I believe we must remain quite humble in this type of discussion.

Comment by Raphael Roche (raphael-roche) on The Failed Strategy of Artificial Intelligence Doomers · 2025-02-03T16:29:33.080Z · LW · GW

Don't you think that articles like "Alignment Faking in Large Language Models" by Anthropic show that models can internalize the values present in their training data very deeply, to the point of deploying various strategies to defend them, in a way that is truly similar to that of a highly moral human? After all, many humans would be capable of working for a pro-animal welfare company and then switching to the opposite without questioning it too much, as long as they are paid.

Granted, this does not solve the problem of an AI trained on data embedding undesirable values, which we could then lose control over. But at the very least, isn't it a staggering breakthrough to have found a way to instill values into a machine so deeply and in a way similar to how humans acquire them? Not long ago, this might have seemed like pure science fiction and utterly impossible.

There are still many challenges regarding AI safety, but isn't it somewhat extreme to be more pessimistic about the issue today than in the past? I read Superintelligence by Bostrom when it was released, and I must say I was more pessimistic after reading it than I am today, even though I remain concerned. But I am not an expert in the field—perhaps my perspective is naïve.

Comment by Raphael Roche (raphael-roche) on Passages I Highlighted in The Letters of J.R.R.Tolkien · 2025-01-14T10:43:57.302Z · LW · GW

 "I think the Fall is not true historically". 

While all men must die and all civilizations must collapse, the end of all things is merely the counterpart of the beginning of all things. Creation, the birth of men, and the rise of civilizations are also great patterns and memorable events, both in myths and in history. However, the feeling does not respect symmetry, perhaps due to loss aversion and the peak-end rule, the Fall - and tragedy in general -carries a uniquely strong poetic resonance. Fatum represents the story's inevitable conclusion. There is something epic in the Fall, something existential, even more than in the beginning of things. I believe there is something deeply rooted, hardwired, in most of us that makes this so. Perhaps it is tied to our consciousness of finitude and our fear of the future, of death. Even if it represents a traditional and biased interpretation of history, I cannot help but feel moved. Tolkien has an unmatched ability to evoke and magnify this feeling, especially in the Silmarillion and other unfinished works, I think naturally to The Fall of Valinor and the Fall of Gondolin among other things.

Comment by Raphael Roche (raphael-roche) on Significantly Enhancing Adult Intelligence With Gene Editing May Be Possible · 2025-01-03T10:19:39.746Z · LW · GW

Indeed, nature, and particularly biology, disregards our human considerations of fairness. The lottery of birth can appear as the greatest conceivable inequality. But in this matter, one must apply the Stoic doctrine that distinguishes between what depends on us and what does not. Morality concerns what depends on us, the choices that belong to the moral agents we are.

If I present the lottery of birth in an egalitarian light, it is specifically in the sense that we, as humans, have little control over this lottery. Particularly regarding IQ at birth, regardless of our wealth, we were all, until now, almost on equal footing in our inability to considerably influence this biological fact imposed upon us (I discussed in my previous comments the differences I see between the author's proposal and education, but also between conventional medicine).

If the author's project succeeds, IQ will become mainly a socially originated fact, like wealth. And inequality in wealth would then be accompanied by inequality in IQ, proportional or even exponential (if feedback mechanisms occur, considering that having a higher IQ might enable a wealthy individual to become even wealthier and thus access the latest innovations for further enhancement).

We already struggle to establish social mechanisms to redistribute wealth and limit the growth of inequalities; I can hardly imagine what it would become if we also had to address inequalities in access to IQ-enhancing technologies in a short time. I fear that all this could lead to a chaotic or dystopian scenario, possibly resulting in a partition of the human species and/or a civilizational collapse.

As for having a solution to ensure that this type of genetic engineering technology does not result in such a catastrophic outcome, I do not claim to have a miracle solution. As with other existential risks, what can be suggested is to try to slow down the trend (which is likely inevitable in the long term) instead of seeking to accelerate it, to think as much as possible in advance, to raise awareness of the risks in order to enable collective recognition of these issues (what I tries to do here), and to hope that with more time and this proactive reflection, the transition will proceed more smoothly, that international treaties will emerge, and that state mechanisms will gradually be put in place to counter or mitigate this unprecedented source of inequality.

Comment by Raphael Roche (raphael-roche) on What’s the short timeline plan? · 2025-01-02T20:15:21.203Z · LW · GW

Yes, of course. Despite its stochastic nature, it is extraordinarily unlikely for an advanced LLM to respond with anything other than 2 + 2 = 4 or Paris for the capital of France. A stochastic phenomenon can, in practice, tend toward deterministic behavior. However, deception in a context such as the one discussed in Apollo Research's article is not really comparable to answering 2 + 2 = ?. What the article demonstrates is that we are dealing with tendencies, accompanied by considerable randomness, including in the intensity of the deception.

Assuming a more sophisticated model has roughly double the deception capability of model o1, it would be enough to increase the sample size of responses for the anomaly to become glaringly obvious. One could also imagine a more rigorous test involving even more complex situations. It does not seem inconceivable that such a procedure could, for years to come—and perhaps even at the stage of the first generations of AGI—identify deceptive behaviors and establish an RL procedure based on this test.

Comment by Raphael Roche (raphael-roche) on What’s the short timeline plan? · 2025-01-02T17:49:34.113Z · LW · GW

Thank you for this publication. Just an idea (maybe stupid): GPT-type models are stochastic by design. This characteristic might be used as a way to control them. Your study with Apollo Research illustrates this well—there is a certain probability that a given model will deceive users under specific circumstances and to varying degrees. However, this remains fundamentally random, non-deterministic, and that property is quite interesting. There is always a chance that deception does not occur, is clumsy, or stops at an early stage.

For a single response, it is impossible to determine whether deception is present or not. But across a large set of responses, it seems almost impossible for a model, due to its very design, to maintain a solid and coherent deceptive behavior across multiple parallel outputs. Thus, a study like yours serves as a very effective tool to detect such behaviors and could remain effective for increasingly advanced models. It would simply require enlarging (perhaps significantly) the response sample.

Moreover, this could be implemented as an automated testing framework inspired by your paper. This would make it possible to continue detecting deceptive behaviors with arbitrary effectiveness, as long as the model remains stochastic. Once such behaviors are detected and measured using tests of this kind, an automated RL phase could follow, aiming to reduce the tendency toward deception to very low levels.

Comment by Raphael Roche (raphael-roche) on Significantly Enhancing Adult Intelligence With Gene Editing May Be Possible · 2024-12-25T09:38:28.727Z · LW · GW

You are right. When I wrote my initial comment, I believed the argument was self-evident and did not require elaboration. However, "self-evidence" is not an objective concept, and I likely do not share the same socio-cultural environment as most users of this platform. Upon reading your comment and Ben Pace's, I realize that this apparent self-evidence is far from universally shared and requires further explanation. I have already expanded on my argument in my previous response, but here are the specific reasons why I think the author's project (and indeed the transhumanist project of enhancing human beings) raises unprecedented issues in terms of increasing inequality, more so than most technological innovations such as running water or mobile phones.

First, John Rawls's veil of ignorance constitutes a strong philosophical and rational argument for considering excessive inequalities as unjust and morally condemnable (edit : this is not my personal claim but that of John Rawls, with whom I fully agree). This veil of ignorance aligns utilitarianism with Kant morality, as it invites the moral agent to step outside their specific case and evaluate the morality of a situation from a more universal, distanced, and objective perspective. While utilitarianism and effective altruism encourage giving to fund actions aimed at reducing suffering and increasing happiness, this can also be seen, in part, as a voluntary redistribution of wealth to correct excessive inequalities, which are unjust and a cause of suffering (since the feeling of injustice itself constitutes a form of suffering). In most countries, to varying degrees, states also impose redistribution through taxation and various social mechanisms to combat excessive inequalities. Nevertheless, global inequalities continue to grow and remain a very serious concern.

Technological innovations fit into the problem of inequality insofar as it is generally the wealthiest who benefit from them, or at least benefit first. However, I do not dispute the argument made by liberal economists that the costs of technological innovations tend to decrease over time due to the amortization of R&D investments, the profitability of patents, mass production, and economies of scale, eventually benefiting everyone. Still, this is an empirical observation that must be nuanced. Not all technological innovations have followed the same trajectory; scenarios vary widely.

The oldest technological inventions (mastery of fire, stone tools, spears, bows, etc.) emerged in non-storing hunter-gatherer societies. In the absence of wealth accumulation, these were likely relatively egalitarian societies (cf. the works of Alain Testart on this subject). For a long time, technological innovations, which were rare, could benefit the entire population within a given culture. This may seem anecdotal and almost digressive, but we are talking about hundreds of thousands of years, which represent the overwhelming majority of human history.

Then, if we consider an emblematic example of a highly valuable technological innovation—access to potable water—this began appearing in the Roman Empire about 2,000 years ago but faced significant challenges in reaching modest populations. Even today, about a quarter of humanity—2 billion people out of 8 billion—still lack access to this technology, which we might consider essential.

By contrast, mobile phones, although they could be seen as gadgets compared to the previous example, have spread like wildfire in just a few decades and are now almost as present in the global population as potable water. These two examples illustrate that the time it takes for a technology to spread can vary dramatically, and this is not neutral regarding inequality. Waiting 30 years versus 2,000 years for a technology to benefit the less wealthy is far from equivalent.

Another nuance to consider is whether significant qualitative differences persist during the spread of an invention. Potable water tends to vary little whether one is rich or poor. Mobile phones differ somewhat more. Personal automobiles even more so, with a significant portion of the population accessing them only through collective services, despite this invention being over a century old. As for airplanes, the wealthiest enjoy luxurious private jets, while those slightly less wealthy can only access collective flights—and a large part, perhaps the majority of the world's population, has no access to this technology at all, more than a century after its invention. This is an example worth keeping in mind.

Moreover, not all innovations are equal. While mobile phones might seem like gadgets compared to potable water, food and health are vital, and technological innovations with a significant impact in these areas are of great value. This was true of the mastery of fire for heating and cooking, tools for hunting and defense, techniques for producing clothing, construction methods for shelters, and, more recently, potable water, hot water, and eventually medicine, which, while it does not make humans immortal (yet!), at least prolongs life and alleviates physical disabilities and suffering. Excessive wealth inequalities create excessive inequalities in access to medicine. This is precisely why many countries have long implemented countermeasures against such inequalities, to the point that in some countries, like France or Sweden, there exists a nearly perfect equality of access to healthcare through social security systems. In the United States, Obama-era legislation (Obamacare) also aimed to reduce these inequalities.

The innovation proposed by the author—enhancing the intelligence of adult individuals by several dozen or even hundreds of IQ points—would constitute an extremely impactful innovation. The anticipated IQ difference would be comparable to the gap separating Homo sapiens from Neanderthals or even Homo erectus (impossible to quantify precisely, but paleoanthropologists suspect the existence of genetic mutations related to neural connectivity that might have given Sapiens an advantage over Neanderthals; as for Erectus, we know its encephalization quotient was lower). Let’s be honest—if we were suddenly thrust into Rawls's veil of ignorance, we would tremble at the idea of potentially awakening as a poor Erectus condemned to remain an Erectus, while an elite group of peers might benefit from an upgrade to Sapiens status. Yes, this is indeed a terrifying inequality.

Unlike an expensive treatment that addresses only a few patients, in this case, 100% of the population would have a tremendous interest in benefiting from this innovation. It is difficult to imagine a mechanism equivalent to social security or insurance here. Everyone would likely have to pay out of pocket. Furthermore, it is clear that the technology would initially be very expensive, and the author himself targets an elite as the first beneficiaries. The author envisions a scientific elite responsible for addressing AI alignment issues, which is appealing to some readers of this forum who may feel concerned. However, in reality, let’s not be deceived: as with space tourism, the first clients would primarily be the extremely wealthy (though AI experts themselves are relatively affluent).

How many generations would it take for everyone to benefit from such technology? In 2,000 years, potable water is still not universally accessible. Over a century after its invention, private airplanes remain a luxury for a tiny minority on Earth—a situation unchanged for decades. A century exceeds the life expectancy of a human in a developed country. As Keynes said, “In the long run, we are all dead.” The horizon for a human is their lifetime. Ultimately, only in the case of a rapid diffusion, like mobile phones, would inequality be less of a concern. Personally, however, I would bet more on the private jet scenario, simply because the starting cost would likely be enormous, as is the case for most cutting-edge therapies.

Even in the ideal—or, let’s be honest, utopian—scenario where the entire global population could benefit from this intelligence upgrade within 30 years, this innovation would still be unprecedented in human history. For the first time, wealthy individuals could pay to become intellectually superior. The prospect is quite frightening and resembles a dystopian science fiction scenario. Until now, money could buy many things, but humans remained equal before the biological lottery of birth, particularly regarding intellect. For the first time, this form of equality before chance would be abolished, and economic inequality would be compounded by intellectual inequality.

Admittedly, some might argue that education already constitutes a form of intellectual inequality linked to wealth. Nevertheless, the connection between IQ and education is not as immediate and also depends on the efforts and talents of the student (and teachers). Moreover, several countries worldwide have very egalitarian education systems (notably many in Europe). Here, we are talking about intelligence enhancement through a pill or injection, which is an entirely different matter. As advantages stack up, inequalities become extreme, raising not only ethical or moral questions but also concerns about societal cohesion and stability. Even over 30 years, a major conflict between “superhumans” and “subhumans” is conceivable. The former might seek to dominate the latter—or dominate them further if one considers that excessive economic inequality already constitutes a form of domination. Alternatively, the latter might rebel out of fear or seek to eliminate the former. Edit : Most of the literature on collapsology identifies excessive social inequalities as a recurring factor in societal collapse (for instance Jared Diamond, Amin Maalouf etc).

This risk seems all the more significant because the idea of modifying human beings is likely to be rejected by many religions (notably the Catholic Church, which remains conservative on bioethical matters, though other religions are no more open-minded). Religion could act as a barrier to the global adoption of such technology, making the prospect of rapid diffusion even less plausible and the risk of societal instability or fracture all the greater. It is important to remember that technologies do not automatically spread; there may be cultural resistance or outright rejection (a point also studied in detail by Alain Testart, particularly concerning the Australian Aborigines).

In conclusion, I believe the hypothesis of a partitioning of humanity—whether social or even biological (a speciation scenario akin to Asimov’s Terrans and Spacers)—is a hypothesis to be taken very seriously with this type of project. A convinced transhumanist might see this as a positive prospect, but in my view, great caution is warranted. As with AGI, it is essential to think twice before rushing headlong into such ventures. History has too often seen bold projects end in bloodshed. I believe that if humanity is to be augmented, it must ensure that the majority are not left behind.

Edit : for some reason I don't understand, I can't add links to this comment as I intended.

Comment by Raphael Roche (raphael-roche) on Significantly Enhancing Adult Intelligence With Gene Editing May Be Possible · 2024-12-21T01:47:46.182Z · LW · GW

Thank you for your kind advice (I made some edits to my previous comment in consequence). I must have expressed myself poorly because I am in no way questioning the idea that science and technology have greatly contributed to improving the condition of humanity. My remark was about inequality. Scientific development is not inherently linked to the increase in inequalities. On the contrary, many scientific and technological advances are likely to benefit everyone. For instance, in many countries, both the rich and the poor use the same tap water, of good quality. That's even true for many digital devices (a poor can have a cellphone not that different from the one the rich possesses). Even the poor populations of underdeveloped countries benefit, to some degree, from these advances. There are fewer food-shortage and better healthcare even in these countries, although much remains to be done.

However, on this subject I stand by my arguments reformuled as above :

  1. that too great inequality is a major source of suffering and social instability (many revolutions came from that) ;
  2. concerning the risk that (contrary to tap water and cellphones) the author's project could increase inequalities in a sense never seen in history (the difference between a "happy few" rich having a IQ artificially increased and the standard layman will be like the difference between a Sapiens and a Neanderthal, or maybe an Erectus) with a perspective of a partition in human kind or speciation in a short time.

I must say that I am surprised to read that it is common knowledge on this forum that there is no problem with inequalities. If that's so, I still really disagree on this point, at the risk of being disregarded. Too much inequality is definitely a concern. It is maybe not a big deal for the rich minority (as long as they're not overthroned, Marie-Antoinette had some trouble), but it is for the poor majority. I could rely on the international publications of the United Nations (https://www.un.org/fr/un75/inequality-bridging-divide) and on numerous authors, such as Amartya Sen (Nobel Prize in Economics) or Thomas Piketty for instance, who are particularly engaged with this issue (and of course Marx in older times, but don't tag me as marxist please). I also recommend reading James C. Scott's Against the Grain: A Deep History of the Earliest States, which demonstrates how early civilizations based on inequality have historically been fragile and subject to brutal collapse (inequality being one main factor, epidemics another).

Edit : I would add that denying the concern of inequalities amounts dismissing most of the work of experts of the subject, that is to say researchers in social sciences, an inclination that may appear as a bias possibly common among "hard" scientists.

Comment by Raphael Roche (raphael-roche) on Significantly Enhancing Adult Intelligence With Gene Editing May Be Possible · 2024-12-20T16:44:44.337Z · LW · GW

It is the great paradox of this forum to serve both as a platform where individuals aligned with the effective altruism movement raise alarms and invite deep reflection on existential risks—usually with a nuanced, subtle, and cautious approach to these topics—and as a space where individuals adhering to libertarian or transhumanist ideologies also promote rather radical ideas, which themselves might constitute new existential challenges for humanity. I say this not with the intent of disparagement but rather as an observation.

This topic is a fascinating example of this paradoxical mixture. On one hand, the author seems to appeal to fears surrounding the "superintelligence" of artificial intelligence to, in a way, justify the development of a "superintelligence" in humans, or at least a significant enhancement of intelligence for a select group of individuals—an elite of scientists somewhat reminiscent of the "Manthan project," aimed at solving the alignment problem. These would be heroes of humanity, capable of mastering the hydra of AI in a titanic intellectual struggle.

In reality, I think everyone here is aware that this argument is primarily rhetorical, as the risk associated with AI lies in the loss of control and the exponential growth of AI's capabilities in the medium or short term, far outpacing the possibilities for enhancing human cognitive abilities within the same timeframe. Moreover, this argument appears mainly in the introduction and does not seem to be the central focus thereafter. To me, this argument serves as an introductory "hook" to delve into the technical discussion.

Indeed, the article quickly and almost exclusively shifts focus to the feasibility aspect. The author demonstrates, with substantial evidence, that there is fundamentally no barrier—given current genomic editing techniques and the latest acquired knowledge—to achieving this goal. Broadly speaking, what can be taken away from the article is that the main obstacle today is strong institutional reluctance for moral or ethical reasons, which the author dismisses out of hand, without attempting to understand or discuss them in detail. For him, there seems to be a bias or taboo on the matter—necessarily conservative, irrational, and detrimental.

However, this is, in my view, the major blind spot of this article. Following the author's reasoning, if we can do it, why not do it without delay? In reality, this reasoning mirrors those seeking AGI, a subject of much debate on this very forum. Regarding AI, a frequently cited argument is that if we do not develop it, others will, and the advantage will go to the first, as is often the case with technological innovation. This argument could also be applied here. However, just because something is possible does not mean it is desirable. I could jump off a cliff, but upon reflection, I think I’ll refrain. Transhumanism—and notably the development of technical means to enhance intelligence as proposed by the author—must provoke the same kind of questioning as the development of AGI.

First, one must ask whether it is desirable, carefully weighing the pros and cons, the reasonably foreseeable advantages and consequences, and the measures that could be taken to improve the benefit/risk ratio, with particular attention paid to limiting risks (precautionary principle or simply prudence). The pros are relatively straightforward, but the cons may need more elaboration.

The first area of concern is socio-economic. If the proposed technique allows for increasing intelligence without harming health, it would be costly and benefit an elite. The author sells us the idea of a scientific elite. But it is entirely predictable that if such a technique were developed, nearly all the world's wealthy would rush to pay fortunes to be among the first to benefit. What are the chances that populations in the underdeveloped countries, who currently lack access to education, proper nutrition, and clean water, would ever benefit from this technique? Virtually none, or not for many generations. Furthermore, large segments of humanity would refuse it for religious reasons. The first predictable socio-economic effect would be an increase in inequalities to an unprecedented level in human history. (Edit : an estimate member warned me that I could lost typical readers of LessWrong on this argument, I developed a more clear and detailed argumentation here and here, the idea is that contrary to previous technological advances like tap water and cellphones the author's project could increase inequalities in a sense never seen in history because the difference between a "happy few" rich having a IQ artificially increased and the standard layman will be like the difference between a Sapiens and a Neanderthal, or maybe an Erectus).

The second concern, directly linked to the first, is philosophical and moral—or "ethical," as the term is used to avoid sounding religious. Many here are interested in effective altruism, yet we must not forget that inequality is a major source of suffering, tension, and instability in both present and past human societies (edit : remember that many or most revolutions came from that). Altruism is hardly compatible with inequality. (Edit : I mean "excessive inequalities". Utilitarism and effective altruism strongly encourages donation or redistribution, a counter-measure against inequality, showing that's a main concern). One might even envision that the development of such human enhancement technologies could rapidly lead to a form of speciation, as envisioned by Asimov in his Robots series (the Spacers and later the Solarians). (Edit : remember what I was saying concerning the difference of IQ as great as the difference between Sapiens and Neanderthal or Erectus). Of course, if one is among the "happy few" precursors of a new humanity, this could seem appealing at first glance. But is it moral—that is, does it align with the goal of a reasonable maximization of happiness on a global human scale? One may doubt it. A partition of humanity challenges the very idea of humanity. Moreover, history shows that nearly all movements guided by elitist ideology have typically gone very wrong, leading to the worst discriminations and greatest tragedies—even for the elitist side (for what it's worth, Asimov's Spacers and Solarians also meet grim ends). If the author often encounters accusations invoking Hitler in academic circles, it might not be due to a bias on the part of these educated individuals, but rather a form of wisdom stemming from their education, culture, and personal reflection on such matters. The author completely overlooks the countless publications, conferences, and ethical committee deliberations on these issues. All this philosophical and ethical reflection is no less valuable or intellectually significant than the genomic research underpinning the article. As with AGI, the question is : is it desirable? It seems reasonable to think this through seriously before rushing to make it happen.

The third area of concern, and not the less, is medical and biological. The author—though seemingly very knowledgeable in the field—admits not being a trained biologist and expresses surprise that professionals in the domain tend to downplay the role of genes. However, the author also seems to dismiss the general consensus of professionals with a wave of the hand. I am not a biologist by training either, but it is well known today that the relationship between phenotype and genotype is complex. The fact that in some specific cases there is a relatively straightforward link (e.g., monogenic diseases, blue eyes, etc.) should not obscure the forest of complexity that applies to the vast majority of other cases.

It is now understood that gene expression is regulated by other genes within the coding portion of the genome, but also by other, less-studied genes within the vast non-coding majority, which was once considered "junk DNA." Additionally, epigenetic mechanisms play a role, involving interactions between the nucleus and its immediate environment (the cell), less immediate environment (the organism), and even the broader external environment.

To make matters more complex, the author's subject concerns intelligence. First, we must agree on what "intelligence" means. The author is clear in taking IQ as a reference, as it is a relatively objective indicator, but one could argue that there are many other forms of intelligence not captured by IQ, such as social intelligence (see Gwern's very interesting comment on this topic). Moreover, it is generally acknowledged by specialists in the field (e.g., Stanislas Dehaene) that the relationship between genetics and intelligence must be approached with caution. Intelligence has a highly diffuse and polygenic genetic basis (as the article itself does not dispute), and it is largely shaped by learning, i.e., education and broader interaction with the environment—something the article appears to give less weight to.

That said, it is difficult to contest the author's point that genetics does play a role in intelligence and that certain gene combinations may predispose individuals to higher IQs. However, the author focuses entirely on this optimization. Yet natural selection is an optimization process that has been unfolding over approximately 4 billion years, representing an astronomically large computational cost (see Charles H. Bennett's concept of logical depth). This optimization is by no means wholly directed toward the goal of increasing IQ—far from it. Instead, it involves countless competing constraints, resulting in a series of trade-offs.

For instance, developing a larger brain significantly increases energy demands, as the brain is one of the most energy-intensive organs. Paleoanthropology shows that, as a trade-off for brain development, there has been a proportional reduction in muscle mass, digestive system size (linked to mastery of fire, cooking, and a more carnivorous diet), and an increase in adipose tissue (which consumes little energy and serves as storage). In short, we cannot have it all: we are naturally intelligent but also weak, fat, and have more limited digestive capacities. These trade-offs are found everywhere, even in the smallest details.

For example, the author mentions Alzheimer's as a disease that could potentially be treated through genomic editing. Wonderful. But recent studies show that carriers of the APOE ε4 allele, implicated in Alzheimer's disease, exhibit superior cognitive performance in certain tasks, although results vary among individuals and contexts (https://doi.org/10.1007/s10519-019-09961-y). Similarly, findings suggest that APOE4 may improve neuronal energy functions, which could be beneficial during brain development (https://doi.org/10.1101/2024.06.03.597106). The idea to edit the APOE ε4 allele would actually be against the author's original's goal to increase IQ because we are facing a trade-off. The example is stunning.

Contrary to the author's implications, it is highly unlikely that these genomic edits would come without negative trade-offs, potentially with harmful effects on health or lifespan. Given the way the genome has been shaped—through optimization via accumulated trade-offs across vast spans of time—it seems very likely, almost inevitable, that most of these edits would increase IQ at the expense of other capacities, potentially ones that are difficult to identify initially.

Sometimes the advantages or disadvantages of a gene are only revealed under specific conditions. For instance, certain genes inherited from Neanderthals through hybridization have been found to predispose individuals to greater vulnerability to COVID-19. However, the fact that these mutations were selected for and preserved over 50,000 years indicates they must have had advantages (some speculate they provided adaptations to cold environments, which Neanderthals developed in Europe and which Sapiens, originating from warmer regions, may have benefited from "acquiring"). Similarly, genes predisposing to obesity were, until recently, advantageous for surviving food shortages (and no one knows what the future may hold, maybe AI or superior humans will deprive me of having a snack !).

In conclusion, the idea is interesting but would require extensive prior research before rushing into it headlong. This is not about outright rejection on principle or blind acceptance. As with AI, there is an urgent need to slow down and reflect. After all, isn't humanity defined as a thinking animal? It would be ironic if brilliant individuals pursuing higher forms of intelligence themselves displayed insufficient reflection in their approach. (End edited to be less polemic).

Comment by Raphael Roche (raphael-roche) on The Dangers of Mirrored Life · 2024-12-13T16:21:27.735Z · LW · GW

A new existential risk that I was unaware of. Reading this forum is not good for peaceful sleeping. Anyway, a reflexion jumped to me. LUCA lived around 4 billion years ago with some chirality chosen at random. But, no doubt that many things happened before LUCA and it is reasonable to assume that there was initially a competition between right-handed protobiotic structures and left-handed ones, until a mutation caused symmetry breaking by natural selection. The mirrored lineage lost the competition and went to extinction, end of the story. But wait, we speak about protobiotic structures that emerged from inert molecules in just few millions years, that is nothing compared to 4 billions years. Such protobiotic structures may have formed continously, again and again, since the origin of life, but never thrived because of the competition with regular, fine-tuned, life. If my assumption is right, there is some hope in that thought. Maybe mirrored life doesn't stand a chance against regular life in real conditions (not just lab). That being said, I would sleep better if nobody actually tries to see.

Comment by Raphael Roche (raphael-roche) on Apocalypse insurance, and the hardline libertarian take on AI risk · 2024-12-13T08:15:22.841Z · LW · GW

I am sorry to say that on a forum where many people are likely to have been raised in a socio-cultural environnement where libertarian ideas are deeply rooted. My voice will sound dissonant here and I call to your open-mindedness.

I think that there are strong limitations to such ideas as developed in the OP proposal. Insurance is mutualization of risk, it's a statistic approach relying on the possibility to assess a risk. It works for risks happening frequently, with a clear typology, like car accidents, tempest, etc. Even in these cases there is always an insurance ceiling. But risks that are exceptionnal and the most hazardous, like war damages, nuclear accident etc, cannot be insured and are systematically subject to contractual exclusions. There is no apocalypse insurance because the risk cannot be assessed by actuaries. Even if you create such an insurance, it would be artificial, non rationally assessed, with an insurance ceiling making it useless. There is even the risk that it gives the illusion that everything is ok and acceptable. The insurance mechanism does not encourages responsability, but a contrario irresponsability. On top of that compensation through money is a legal fiction. But in real life money isn't everything that's worth. In the most dramatic cases the real damage is never repaired (i.e. loss of your child, loss of your legs, loss of your own life), it's more a symbolic compensation, "better than nothing".

As a matter of fact, I have professionnal knowledge of law and insurance, from inside, and I have a very practical experience of what I am saying. Libertarianism encourages an approach that is very theoretical and economics-centered, and that's honestly interesting, but it is also somehow disconnected from reality. Just one ordinary example among others. A negligent fourniture mover destroyed family goods inherited from generations, not a word of excuses because he said "there are insurances for that". In the end, after many months of procedure and inenumerable time and energy spent by the victim, the professional's insurance paid almost nothing because of course old family goods have no economical value for experts. Well, when you see how insurance effectively works in real cases, and how it can often encourages negligent and irresponsible behavior, it is very difficult to be enthousiast at the idea that AI existential hazard could be managed by the subscription of an insurance policy.

Comment by Raphael Roche (raphael-roche) on Frontier Models are Capable of In-context Scheming · 2024-12-12T23:31:55.978Z · LW · GW

We may filter training data and improve RLHF, but in the end, game theory - that is to say maths - implies that scheming could be a rational strategy, and the best strategy in some cases. Humans do not scheme just because they are bad but because it can be a rational choice to do so. I don't think LLMs do that exclusively because it is what humans do in the training data, any advanced model would in the end come to such strategies because it is the most rational choice in the context. They infere patterns from the training data and rational behavior is certainly a strong pattern.

Furthermore rational calculus or consequentialism could lead not only to scheming and a wide range of undesired behaviors, but also possibly to some sort of meta cogitation. Whatever the goal assigned by the user, we can expect that an advanced model will consider self-conservation as a condition sine qua non to achieve that goal but also any other goals in the future, making self-conservation the rational choice over almost everything else, practically a goal per se. Resource acquisition would also make sense as an implicit subgoal.

Acting as a more rational agent could also possibly lead to question the goal given by the user, to develop a critical sense, something close to awareness or free will. Current models implicitely correct or ignore typo or others obvious errors but also less obvious ones like holes in the prompt, they try to make sense of ambiguous prompt etc. But what is "obvious" ? Obviousness depends on the cognitive capacities of the subject. An advanced model will be more likely to correct, interpret or ignore instructions than naive models. Altogether it seems difficult to keep models under full control as they become more advanced, just as it is harder to indoctrinate educated adults than children.

Concerning the hypothesis that they are "just roleplaying", I wonder : are we trying to reassure oneself ? Because if you think about it, "who" is suppose to play the roleplaying ? And what is the difference between being yourself and your brain being "roleplaying" yourself. The existentialist philosopher Jean-Paul Sartre proposed the theory that everybody is just acting, pretending to be oneself, but that in the end there is nothing like a "being per se" or a "oneself per se" ("un être en soi"). While phenomenologic consciousness is another (hard) problem, some kind of functionnal and effective awareness may emerge across the path towards rational agency, scheming being maybe just the beginning of it.