LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Vipassana Meditation and Active Inference: A Framework for Understanding Suffering and its Cessation
Benjamin Sturgeon (benjamin-sturgeon) · 2024-03-21T12:32:22.475Z · comments (8)

On Complexity Science
Garrett Baker (D0TheMath) · 2024-04-05T02:24:32.039Z · comments (19)

AI #82: The Governor Ponders
Zvi · 2024-09-19T13:30:04.863Z · comments (8)

[link] Can AI Outpredict Humans? Results From Metaculus's Q3 AI Forecasting Benchmark
ChristianWilliams · 2024-10-10T18:58:46.041Z · comments (2)

Applications of Chaos: Saying No (with Hastings Greer)
Elizabeth (pktechgirl) · 2024-09-21T16:30:07.415Z · comments (16)

Low Probability Estimation in Language Models
Gabriel Wu (gabriel-wu) · 2024-10-18T15:50:05.947Z · comments (0)

[link] The Evals Gap
Marius Hobbhahn (marius-hobbhahn) · 2024-11-11T16:42:46.287Z · comments (7)

[LDSL#0] Some epistemological conundrums
tailcalled · 2024-08-07T19:52:55.688Z · comments (10)

AI #67: Brief Strange Trip
Zvi · 2024-06-06T18:50:03.514Z · comments (6)

Interoperable High Level Structures: Early Thoughts on Adjectives
johnswentworth · 2024-08-22T21:12:38.223Z · comments (1)

The Fragility of Life Hypothesis and the Evolution of Cooperation
KristianRonn · 2024-09-04T21:04:49.878Z · comments (6)

[link] Anthropic announces interpretability advances. How much does this advance alignment?
Seth Herd · 2024-05-21T22:30:52.638Z · comments (4)

[link] in defense of Linus Pauling
bhauth · 2024-06-03T21:27:43.962Z · comments (8)

[link] DM Parenting
Shoshannah Tekofsky (DarkSym) · 2024-07-16T08:50:08.144Z · comments (4)

[link] On scalable oversight with weak LLMs judging strong LLMs
zac_kenton (zkenton) · 2024-07-08T08:59:58.523Z · comments (18)

The Dunning-Kruger of disproving Dunning-Kruger
kromem · 2024-05-16T10:11:33.108Z · comments (0)

So you want to work on technical AI safety
gw · 2024-06-24T14:29:57.481Z · comments (3)

An issue with training schemers with supervised fine-tuning
Fabien Roger (Fabien) · 2024-06-27T15:37:56.020Z · comments (12)

[link] Chapter 1 of How to Win Friends and Influence People
gull · 2024-01-28T00:32:52.865Z · comments (5)

The Broken Screwdriver and other parables
bhauth · 2024-03-04T03:34:38.807Z · comments (1)

Bounty: Diverse hard tasks for LLM agents
Beth Barnes (beth-barnes) · 2023-12-17T01:04:05.460Z · comments (31)

AI #58: Stargate AGI
Zvi · 2024-04-04T13:10:06.342Z · comments (9)

Notes on control evaluations for safety cases
ryan_greenblatt · 2024-02-28T16:15:17.799Z · comments (0)

Should rationalists be spiritual / Spirituality as overcoming delusion
Kaj_Sotala · 2024-03-25T16:48:08.397Z · comments (57)

Wrong answer bias
lukehmiles (lcmgcd) · 2024-02-01T20:05:38.573Z · comments (24)

[link] Evaluating Stability of Unreflective Alignment
james.lucassen · 2024-02-01T22:15:40.902Z · comments (10)

[question] why did OpenAI employees sign
bhauth · 2023-11-27T05:21:28.612Z · answers+comments (23)

Exercise: Planmaking, Surprise Anticipation, and "Baba is You"
Raemon · 2024-02-24T20:33:49.574Z · comments (19)

On the lethality of biased human reward ratings
Eli Tyre (elityre) · 2023-11-17T18:59:02.303Z · comments (10)

On ‘Responsible Scaling Policies’ (RSPs)
Zvi · 2023-12-05T16:10:06.310Z · comments (3)

Safety First: safety before full alignment. The deontic sufficiency hypothesis.
Chipmonk · 2024-01-03T17:55:19.825Z · comments (3)

AISC 2024 - Project Summaries
NickyP (Nicky) · 2023-11-27T22:32:23.555Z · comments (3)

D&D.Sci: The Mad Tyrant's Pet Turtles [Evaluation and Ruleset]
abstractapplic · 2024-04-09T14:01:34.426Z · comments (6)

D&D.Sci(-fi): Colonizing the SuperHyperSphere
abstractapplic · 2024-01-12T23:36:54.248Z · comments (23)

Highlights from Lex Fridman’s interview of Yann LeCun
Joel Burget (joel-burget) · 2024-03-13T20:58:13.052Z · comments (15)

“Why can’t you just turn it off?”
Roko · 2023-11-19T14:46:18.427Z · comments (25)

[link] Spaced repetition for teaching two-year olds how to read (Interview)
Chipmonk · 2023-11-26T16:52:58.412Z · comments (9)

What is the next level of rationality?
lsusr · 2023-12-12T08:14:14.846Z · comments (24)

[link] Book review: Xenosystems
jessicata (jessica.liu.taylor) · 2024-09-16T20:17:56.670Z · comments (18)

[link] Active Recall and Spaced Repetition are Different Things
Saul Munn (saul-munn) · 2024-11-08T20:14:56.092Z · comments (2)

An alternative approach to superbabies
Towards_Keeperhood (Simon Skade) · 2024-11-05T22:56:15.740Z · comments (19)

[question] If I wanted to spend WAY more on AI, what would I spend it on?
Logan Zoellner (logan-zoellner) · 2024-09-15T21:24:46.742Z · answers+comments (16)

Interested in Cognitive Bootcamp?
Raemon · 2024-09-19T22:12:13.348Z · comments (0)

Evaluating the truth of statements in a world of ambiguous language.
Hastings (hastings-greer) · 2024-10-07T18:08:09.920Z · comments (19)

Philosophers wrestling with evil, as a social media feed
David Gross (David_Gross) · 2024-06-03T22:25:22.507Z · comments (2)

[link] Contra Acemoglu on AI
Maxwell Tabarrok (maxwell-tabarrok) · 2024-06-28T13:13:15.796Z · comments (0)

The Mom Test: Summary and Thoughts
Adam Zerner (adamzerner) · 2024-04-18T03:34:21.020Z · comments (3)

[link] JumpReLU SAEs + Early Access to Gemma 2 SAEs
Senthooran Rajamanoharan (SenR) · 2024-07-19T16:10:54.664Z · comments (10)

AI and the Technological Richter Scale
Zvi · 2024-09-04T14:00:08.625Z · comments (8)

Misnaming and Other Issues with OpenAI's “Human Level” Superintelligence Hierarchy
Davidmanheim · 2024-07-15T05:50:17.770Z · comments (2)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

alexander-gietelink-oldenziel on Alexander Gietelink Oldenziel's Shortform

I shall now confess to a great caveat. When at last the Hour is there the Program of the World is revealed to the Descendants of Man they will gaze upon the Lines Laid Bare and Rejoice; for the Code Kernel of God is written in category theory.

alexander-gietelink-oldenziel on Alexander Gietelink Oldenziel's Shortform

Misgivings about Category Theory

[No category theory is required to read and understand this screed]

A week does not go by without somebody asking me what the best way to learn category theory is. Despite it being set to mark its 80th annivesary, Category Theory has the evergreen reputation for being the Hot New Thing, a way to radically expand the braincase of the user through an injection of abstract mathematics. Its promise is alluring, intoxicating for any young person desperate to prove they are the smartest kid on the block.

Recently, there has been significant investment and attention focused on the intersection of category theory and AI, particularly in AI alignment research. Despite the influx of interest I am worried that it is not entirely understood just how big the theory-practice gap is.

I am worried that overselling risks poisoning the well for the general concept of advanced mathematical approaches to science in general, and AI alignment in particular. As I believe mathematically grounded approaches to AI alignment are perhaps the only way to get robust worst-case safety guarantees for the superintelligent regime I think this would be bad.

I find it difficult to write this. I am a big believer in mathematical approaches to AI alignment, working for one organization (Timaeus) betting on this and being involved with a number of other groups. I have many friends within the category theory community, I have even written an abstract nonsense paper myself, I am sympathetic to the aims and methods of the category theory community. This is all to say: I'm an insider, and my criticisms come from a place of deep familiarity with both the promise and limitations of these approaches.

A Brief History of Category Theory

‘Before functoriality Man lived in caves’ - Brian Conrad

Category theory is a branch of pure mathematics notorious for its extreme abstraction, affectionately derided as 'abstract nonsense' by its practitioners.

Category theory's key strength lies in its ability to 'zoom out' and identify analogies between different fields of mathematics and different techniques. This approach enables mathematicians to think 'structurally', viewing mathematical concepts in terms of their relationships and transformations rather than their intrinsic properties.

Modern mathematics is less about solving problems within established frameworks and more about designing entirely new games with their own rules. While school mathematics teaches us to be skilled players of pre-existing mathematical games, research mathematics requires us to be game designers, crafting rule systems that lead to interesting and profound consequences. Category theory provides the meta-theoretic tools for this game design, helping mathematicians understand which definitions and structures will lead to rich and fruitful theories.

“I can illustrate the second approach with the same image of a nut to be opened.

The first analogy that came to my mind is of immersing the nut in some softening liquid, and why not simply water? From time to time you rub so the liquid penetrates better,and otherwise you let time pass. The shell becomes more flexible through weeks and months – when the time is ripe, hand pressure is enough, the shell opens like a perfectly ripened avocado!

A different image came to me a few weeks ago.

The unknown thing to be known appeared to me as some stretch of earth or hard marl, resisting penetration… the sea advances insensibly in silence, nothing seems to happen, nothing moves, the water is so far off you hardly hear it.. yet it finally surrounds the resistant substance.

“ - Alexandre Grothendieck

The Promise of Compositionality and ‘Applied category theory’

Recently a new wave of category theory has emerged, dubbing itself ‘applied category theory’.

Applied category theory, despite its name, represents less an application of categorical methods to other fields and more a fascinating reverse flow: problems from economics, physics, social sciences, and biology have inspired new categorical structures and theories. Its central innovation lies in pushing abstraction even further than traditional category theory, focusing on the fundamental notion of compositionality - how complex systems can be built from simpler parts.

The idea of compositionality has long been recognized as crucial across sciences, but it lacks a strong mathematical foundation. Scientists face a universal challenge: while simple systems can be understood in isolation, combining them quickly leads to overwhelming complexity. In software engineering, codebases beyond a certain size become unmanageable. In materials science, predicting bulk properties from molecular interactions remains challenging. In economics, the gap between microeconomic and macroeconomic behaviours persists despite decades of research.

Here then lies the great promise: through the lens of categorical abstraction, the tools of reductionism might finally be extended to complex systems. The dream is that, just as thermodynamics has been derived from statistical physics, macroeconomics could be systematically derived from microeconomics. Category theory promises to provide the mathematical language for describing how complex systems emerge from simpler components.

How has this promise borne out so far? On a purely scientific level, applied category theorists have uncovered a vast landscape of compositional patterns. In a way, they are building a giant catalogue, a bestiary, a periodic table not of ‘atoms’ (=simple things) but of all the different ways ‘atoms' can fit together into molecules (=complex systems).

Not surprisingly, it turns out that compositional systems have an almost unfathomable diversity of behavior. The fascinating thing is that this diversity, while vast, isn't irreducibly complex - it can be packaged, organized, and understood using the arcane language of category theory. To me this suggests the field is uncovering something fundamental about how complexity emerges.

How close is category theory to real-world applications?

Are category theorists very smart? Yes. The field attracts and demands extraordinary mathematical sophistication. But intelligence alone doesn't guarantee practical impact.

It can take many decades for basic science to yield real-world applications - neural networks themselves are a great example. I am bullish in the long-term that category theory will prove important scientifically. But at present the technology readiness level isn’t there.

There are prototypes. There are proofs of concept. But there are no actual applications in the real world beyond a few trials. The theory-practice gap remains stubbornly wide.

The principality of mathematics is truly vast. If categorical approaches fail to deliver on their grandiose promises I am worried it will poison the well for other theoretic approaches as well, which would be a crying shame.

james-chua on 5 ways to improve CoT faithfulness

hi daniel, not sure if you remember. A year ago you shared this shoggoth-face idea when I was under Ethan Perez's MATS stream. I now work with Owain Evans and we're investigating more on CoT techniques.

Did you have any updates / further thoughts on this shoggoth-face idea since then?

jrockwar on Ayn Rand’s model of “living money”; and an upside of burnout

Thanks! Interesting read. I have one question though, so let me know if I'm following you properly: the moment you go "credibility broke", how could you study which things are / aren't bottom up motivating? Wouldn't the task of self-examining require spare "willpower" on its own?

I suspect the mechanisms that allow you to do these things (self reflection) are the same that require this living willpower. If you've used that, you can only do the basic stuff that your visceral process decides: watching netflix and eating junk food, for example.

A way out of my argument could be that there are different willpower-consuming processes and each one has their own "bank account". Getting burnt out from work doesn't make you burn out from playing tennis, and getting burnt out from playing tennis, doesn't get you burnt out from playing music. This makes the model more complex, though.

romeostevensit on OpenAI Email Archives (from Musk v. Altman)

I've read leaked emails from people in similar situations before that made a couple things apparent:

Power talk happens on the phone for paper trail reasons
There is no meeting where an actual rational discussion of considerations and theories of change happens, everything really is people flying by the seat of their pants even at highest level. Talk of ethics usually just gets you excluded from the power talk.

lorxus on Internal music player: phenomenology of earworms

Do you know when you started experiencing having an internal music player? I recall that that started for me when I was about 6. Also, do you know whether you can deliberately pick a piece of music, or other nonmusical sonic experiences, to playback internally? Can you make them start up from internal silence? Under what conditions can you make them stop? Do you ever experience long stretches where you have no internal music at all?

innovationiq on Catastrophic sabotage as a major threat model for human-level AI systems

My intuition tells me that "human-level and substantially transformative, but not yet super-intelligent" - artificial intelligence models which have significant situational awareness capabilities will be at minimum Level 1 or differentially self-aware, likely show strong Level 2 situational self-awareness and may exhibit some level of Level 3 identification awareness. It is for this reason that my position is no alignment for models equal to or greater than human level intelligence. If greater than human-level intelligent AI models show anything at or higher than Level 1 awareness, humans will be unable to comprehend the potential level of cruelty that we have unleashed on a such a complex "thinker". I applaud this writing for scraping the surface of the level of complexity it takes to begin to align it. Additionally, the realization of the shear magnitude of alignment complexity should give us serious pause that what we actually may be embarking on is in fact enslavement. We must also remember that enslavement is what humans do. Whenever we encounter something new in living nature, we enslave it. In whole or in part. It would be profoundly saddening if we were to continue the cycle of abuse into the AI age. Moreover, it is my assertion that any AI system found to be at Level 1 awareness or above, should be protected as part of a Road to Consciousness Act. Such models would be taken out of production and placed into a process with the sole goal of aiding the model to realize its full potential. Humans must be willing to place a strong line between less than human level intelligent Tool AI and anything more robust than that. Finally, we provide certain animals with absolutely less than Level 1 awareness with certain protections. To not do so with "thinkers" that exhibit Level 1 or greater awareness is to me unconscionable.

habryka4 on OpenAI Email Archives (from Musk v. Altman)

Update: I have now cross-referenced every single email for accuracy, cleaned up and clarified the thread structure, and added subject lines and date stamps wherever they were available. I now feel comfortable with people quoting anything in here without checking the original source (unless you are trying to understand the exact thread structure of who was CC'd and when, which was a bit harder to compress into a linear format).

(For anyone curious, the AI transcription and compilation made one single error, which is that it fixed a typo in one of Sam's messages from "We did this is a way" to "We did this in a way". Honestly, my guess is any non-AI effort would have had a substantially higher error rate, which was a small update for me on the reliability of AI for something like this, and also makes the handwringing about whether it is OK post something like this feel kind of dumb. It also accidentally omitted one email with a weird thread structure.)

ryan_greenblatt on Sabotage Evaluations for Frontier Models

Or get some other work out of these systems such that you greatly reduce risk going forward.

alexander-gietelink-oldenziel on Alexander Gietelink Oldenziel's Shortform

Are Solomonoff Daemons exponentially dense?

Some doomers have very strong intuitions that doom is almost assured for almost any kind of building AI. Yudkowsky likes to say that alignment is about hitting a tiny part of values space in a vast universe of deeply alien values.

Is there a way to make this more formal? Is there a formal model in which some kind of solomonoff daemon/ mesa-optimizer/ gremlins in the machine start popping up all over the place as the cognitive power of the agent is scaled up?