LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

AI #88: Thanks for the Memos
Zvi · 2024-10-31T15:00:07.412Z · comments (5)

[link] What Ketamine Therapy Is Like
Sable · 2024-11-11T11:09:08.602Z · comments (8)

[link] Epistemic status: poetry (and other poems)
Richard_Ngo (ricraz) · 2024-11-21T18:13:17.194Z · comments (5)

The Shallow Bench
Karl Faulks (karl-faulks) · 2024-11-05T05:07:27.357Z · comments (5)

I finally got ChatGPT to sound like me
lsusr · 2024-09-17T09:39:59.415Z · comments (18)

[link] MIRI's September 2024 newsletter
Harlan · 2024-09-16T18:15:40.785Z · comments (0)

Looking back on the Future of Humanity Institute - Asterisk
jakeeaton · 2024-11-19T00:44:40.928Z · comments (0)

AI #91: Deep Thinking
Zvi · 2024-11-21T14:30:06.930Z · comments (9)

~80 Interesting Questions about Foundation Model Agent Safety
RohanS · 2024-10-28T16:37:04.713Z · comments (4)

Work with me on agent foundations: independent fellowship
Alex_Altair · 2024-09-21T13:59:16.706Z · comments (5)

We Don't Know Our Own Values, but Reward Bridges The Is-Ought Gap
johnswentworth · 2024-09-19T22:22:05.307Z · comments (47)

[link] Analyzing how SAE features evolve across a forward pass
bensenberner · 2024-11-07T22:07:02.827Z · comments (0)

AI #80: Never Have I Ever
Zvi · 2024-09-10T17:50:08.074Z · comments (20)

[link] Dangerous capability tests should be harder
LucaRighetti (Error404Dinosaur) · 2024-11-21T17:20:50.610Z · comments (3)

[link] Literacy Rates Haven't Fallen By 20% Since the Department of Education Was Created
Maxwell Tabarrok (maxwell-tabarrok) · 2024-11-22T20:53:59.007Z · comments (0)

[link] The Choice Transition
owencb · 2024-11-18T12:30:56.198Z · comments (4)

Start an Upper-Room UV Installation Company?
jefftk (jkaufman) · 2024-10-19T02:00:10.691Z · comments (9)

Economics Roundup #3
Zvi · 2024-09-10T13:50:06.955Z · comments (9)

Minimal Motivation of Natural Latents
johnswentworth · 2024-10-14T22:51:58.125Z · comments (14)

Motivation control
Joe Carlsmith (joekc) · 2024-10-30T17:15:50.881Z · comments (7)

Monthly Roundup #24: November 2024
Zvi · 2024-11-18T13:20:06.086Z · comments (14)

Reading RFK Jr so that you don’t have to
braces · 2024-11-22T00:59:19.583Z · comments (0)

Which LessWrong/Alignment topics would you like to be tutored in? [Poll]
Ruby · 2024-09-19T01:35:02.999Z · comments (12)

How difficult is AI Alignment?
Sammy Martin (SDM) · 2024-09-13T15:47:10.799Z · comments (6)

MATS AI Safety Strategy Curriculum v2
DanielFilan · 2024-10-07T22:44:06.396Z · comments (6)

Startup Success Rates Are So Low Because the Rewards Are So Large
AppliedDivinityStudies (kohaku-none) · 2024-10-10T20:22:01.557Z · comments (6)

[link] Things I learned talking to the new breed of scientific institution
Abhishaike Mahajan (abhishaike-mahajan) · 2024-08-29T14:00:14.844Z · comments (6)

AI #89: Trump Card
Zvi · 2024-11-07T16:30:05.684Z · comments (12)

Time Efficient Resistance Training
romeostevensit · 2024-10-07T15:15:44.950Z · comments (10)

Australian AI Safety Forum 2024
Liam Carroll (liam-carroll) · 2024-09-27T00:40:11.451Z · comments (0)

Formalizing the Informal (event invite)
abramdemski · 2024-09-10T19:22:53.564Z · comments (0)

[link] Point of Failure: Semiconductor-Grade Quartz
Annapurna (jorge-velez) · 2024-09-30T15:57:40.495Z · comments (8)

[link] IAPS: Mapping Technical Safety Research at AI Companies
Zach Stein-Perlman · 2024-10-24T20:30:41.159Z · comments (12)

D&D Sci Coliseum: Arena of Data
aphyer · 2024-10-18T22:02:54.305Z · comments (23)

[link] An Interactive Shapley Value Explainer
James Stephen Brown (james-brown) · 2024-09-28T05:01:21.169Z · comments (9)

Open Thread Fall 2024
habryka (habryka4) · 2024-10-05T22:28:50.398Z · comments (119)

Reflections on the Metastrategies Workshop
gw · 2024-10-24T18:30:46.255Z · comments (5)

[link] Programming Refusal with Conditional Activation Steering
Bruce W. Lee (bruce-lee) · 2024-09-11T20:57:08.714Z · comments (0)

Live Machinery: An Interface Design Philosophy for Wholesome AI Futures
Sahil · 2024-11-01T17:24:09.957Z · comments (2)

[Linkpost] Play with SAEs on Llama 3
Tom McGrath · 2024-09-25T22:35:44.824Z · comments (2)

Doing Research Part-Time is Great
casualphysicsenjoyer (hatta_afiq) · 2024-11-22T19:01:15.542Z · comments (7)

instruction tuning and autoregressive distribution shift
nostalgebraist · 2024-09-05T16:53:41.497Z · comments (5)

2025 Color Trends
sarahconstantin · 2024-10-07T21:20:03.962Z · comments (7)

[link] Intrinsic Power-Seeking: AI Might Seek Power for Power’s Sake
TurnTrout · 2024-11-19T18:36:20.721Z · comments (5)

Winners of the Essay competition on the Automation of Wisdom and Philosophy
AI Impacts (AI Imacts) · 2024-10-28T17:10:04.272Z · comments (3)

[question] Implications of China's recession on AGI development?
Eric Neyman (UnexpectedValues) · 2024-09-28T01:12:36.443Z · answers+comments (3)

Are we dropping the ball on Recommendation AIs?
Charbel-Raphaël (charbel-raphael-segerie) · 2024-10-23T17:48:00.000Z · comments (17)

How to use bright light to improve your life.
Nat Martin (nat-martin) · 2024-11-18T19:32:10.667Z · comments (8)

[link] FrontierMath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AI
Tamay · 2024-11-14T06:13:22.042Z · comments (0)

Anthropic rewrote its RSP
Zach Stein-Perlman · 2024-10-15T14:25:12.518Z · comments (19)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

lucid_levi_ackerman on Which things were you surprised to learn are metaphors?

Spoiler warning.

https://attackontitan.fandom.com/wiki/Rumbling

shankar-sivarajan on John Fisher's Shortform

See also: Great Moments in Unintended Consequences playlist.

gunnar_zarncke on Hierarchical Agency: A Missing Piece in AI Alignment

In the section about existing theories that could be related I was missing Luhmann's Social Systems theory. It is not a mathematical theory but otherwise fits the desiderata. This is what o1-preview says about it (after being fed the above dialog up to the theories suggested by Claude:

Introducing Luhmann's Social Systems Theory into our discussion is an intriguing idea. Let's explore how it might help us address the challenge of modeling hierarchical agency with a suitable mathematical formalism.
Overview of Luhmann's Social Systems Theory:
Niklas Luhmann's theory conceptualizes society as a complex set of self-referential social systems composed not of individuals or actions, but of communications. In his framework:
Autopoietic Systems: Social systems are autopoietic, meaning they are self-producing and maintain their boundaries through their own operations. They continuously reproduce the elements (communications) that constitute them.
Functional Differentiation: Society is differentiated into various subsystems (e.g., economy, law, politics), each operating based on its own binary code (profit/non-profit, legal/illegal, power/no power).
Communication as the Fundamental Unit: Communications are the basic elements, and meaning is generated through the network of communications within a system.
Operative Closure and Cognitive Openness: Systems are operationally closed—they can only refer to themselves—but cognitively open, as they observe and are influenced by their environment.
Relating Luhmann's Theory to Hierarchical Agency:
Type Consistency Across Levels:
Agents as Systems: If we consider both subagents and superagents as autopoietic systems of communications, we maintain type consistency. Both levels are constituted by the same fundamental processes.
Scale-Free Structure: Since Luhmann's theory doesn't privilege any particular scale, it aligns with your desideratum for a scale-free formalism.
Expressiveness for Real-World Situations:
Conflict and Integration: Luhmann's concept of structural coupling explains how different systems interact and influence each other without losing their autonomy. This could model conflicts between layers or how a superagent gains agency at the expense of subagents.
Emergent Properties: The emergent nature of social systems in Luhmann's theory parallels how superagents might exhibit properties not reducible to their subagents.
Representation of Intentionality:
System Intentionality: While individual intentions are abstracted away, systems exhibit purposive behavior through their operations. This could be adapted to represent the goals and beliefs of agents at different levels.
Communication of Intentions: Intentionality can be embedded in the communications that constitute the system, allowing us to model beliefs and desires.
Towards a Mathematical Formalism:
While Luhmann's theory is primarily sociological and lacks a mathematical framework, we can attempt to formalize its concepts:
Agent-Based Modeling:
Agents as Communication Processes: Model agents (both subagents and superagents) as processes that generate and respond to communications.
Network Dynamics: Use graph theory to represent communications as edges and agents as nodes, allowing us to analyze the structure and dynamics of interactions.
Category Theory:
Objects and Morphisms: Utilize category theory to model agents as objects and communications as morphisms, preserving type consistency and composability.
Functorial Relationships: Define functors that map between categories of agents at different hierarchical levels.
Dynamic Systems Theory:
Differential Equations: Represent the evolution of communications and agent states over time using differential equations.
Attractors and Stability: Analyze how certain communication patterns lead to stable superagent formations.
Information Theory:
Entropy and Redundancy: Measure the information content of communications, which could correlate with the intentionality and decision-making processes of agents.
Mutual Information: Assess how much information is shared between subagents and superagents, indicating the degree of coupling.
Potential Benefits:
Unified Framework: By treating communications as the foundational elements, we can model both agents and superagents within the same formal structure.
Emergence and Self-Organization: The formalism could capture how complex behaviors emerge from simple interactions, which is crucial for understanding hierarchical agency.
Applicability to AI Systems: For AI alignment, this approach might help in designing AI agents that can form or integrate into higher-level systems without losing alignment with human values.
Challenges:
Abstract Nature: Luhmann's theory is highly abstract, which might make direct mathematical translation difficult.
Intentionality Representation: Modeling intentionality purely through communications may not capture all aspects of beliefs and desires inherent in agents.

joseph-miller on leogao's Shortform

Unconferences are a thing for this reason

alexander-gietelink-oldenziel on John Fisher's Shortform

Sounds like a great idea! With the best of intentions ! What could possibly go wrong ?

john-fisher on John Fisher's Shortform

Law proposal: After a company makes over a 500 million USD in revenue 2 years in a row, the citizens of the country it operates in automatically become the company's shareholders and board members. There could be a more nuanced version of this that gives the public only some percentage - say 50% - of voting power and company's profit so that there is still strong incentive for the existing shareholders to stick around

ben-lang on A very strange probability paradox

Yes, its a bit weird. I was replying because I thought (perhaps getting the wrong end of the stick) that you were confused about what the question was, not (as it seems now) pointing out that the question (in your view) is open to being confused.

In probability theory the phrase "given that" is a very important, and it is (as far as I know) always used in the way used here. ["given that X happens" means "X may or may not happen, but we are thinking about the cases where it does", which is very different from meaning "X always happens"]

A more common use would be "What is the probability that a person is sick, given that they are visiting a doctor right now?". This doesn't mean "everyone in the world is visiting a doctor right now", it means that the people who are not visiting a doctor right now exist, but we are not talking about them. Similarly, the original post's imagined world involves cases where odd numbers are rolled, but we are talking about the set without odds. It is weird to think about how proposing a whole set of imaginary situations (odd and even rolls) then talking only about a subset of them (only evens) is NOT the same as initially proposing the smaller set of imaginary events in the first place (your D3 labelled 2,4,6).

But yes, I can definitely see how the phrase "given that", could be interpreted the other way.

joachim-bartosik on You are not too "irrational" to know your preferences.

If they can’t do that, why on earth should you give up on your preferences? In what bizarro world would that sort of acquiescence to someone else’s self-claimed authority be “rational?”

Well if they consistently make recommendations that in retrospect end up looking good then maybe you're bad at understanding. Or maybe they're bad at explaining. But trusting them when you don't understand their recommendation is exploitable so maybe they're running a strategy where they deliberately make good recommendations with poor explanations so when you start trusting them they can start mixing in exploitative recommendations (which you can't tell apart because all recommendations have poor explanations).

So I'd really rather not do that in community context. There are ways to work with that. Eg. boss can skip some details of employees recommendations and if results are bad enough fire the employee. On the other hand I think it's pretty common for employee to act in their own interest. But yeah, we're talking principal-agent problem at that point and tradeoffs what's more efficient...

mfar on The Queen’s Dilemma: A Paradox of Control

W. Ross Ashby's Law of Requisite Variety (1956) suggests fundamental limits to human control over more capable systems.

This law sounds super enticing and I want to understand it more. Could you spell out how the law suggests this?

I did a quick search of LessWrong and Wikipedia regarding this law.

"... Ashby's "Law of requisite variety", which roughly speaking states that a system can only remain in homeostasis if it has more internal states than the external states it encounters." from Yuxi_Liu, "Cybernetic dreams" [LW · GW].
"Either the AI is too simple to be an independent robust agent in human society, or it needs to be approximately as complex as humans themselves. Cf. the law of requisite variety." from Roman Leventov, "For alignment, we should simultaneously use multiple theories of cognition and value" [LW · GW].
"This law (of which Shannon's theorem 10 relating to the suppression of noise is a special case) says that if a certain quantity of disturbance is prevented by a regulator from reaching some essential variables, then that regulator must be capable of exerting at least that quantity of selection." from W. R. Ashby (1960), "Design for a Brain", p. 229, quoted via Wikipedia page.

Enough testimonials, the Wikipedia page itself describes the law as based on the observation that in a two-player game between the environment (disturber) and a system trying to maintain stasis (regulator), if the environment has D moves that all lead to different outcomes (given any move from the system), and the system has R possible responses, then the best the system can do is restrict the number of outcomes to D/R.

I can see the link between this and the descriptions from Yuxi_Liu, Roman Leventov, and Ashby. Your reading is a couple of steps removed. How did you get from D/R outcomes in this game to "fundamental limits to human control over more capable systems"? My guess it that you simply mean that if the more capable system is more complex / has more moves available moves / more "variety" than humans then the law will apply with the human as the regulator and the AI as the disturber. Is that right? Could you comment on how you see capability in terms of variety?

daystareld on You are not too "irrational" to know your preferences.

I'm a little confused. Do the examples in the post all seem purely hypothetical to you?

Whether or not it's rational to have ice cream.
Whether or not wanting your partner to do housework is reasonable.
Whether or not you want to receive unfiltered criticism or judgements.
Whether being mono vs open vs poly is a sign of rationality.
Whether your career preference is a sign of rationality.

Or are they not sufficiently detailed, or...? They're all real things I have encountered, and obviously not all are as equally detailed, and I could always add more, but if it doesn't seem concrete enough yet, I'm not sure what else to add or in how much detail.