LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Meetup In a Box: Year In Review
Czynski (JacobKopczynski) · 2024-02-14T01:18:28.259Z · comments (0)

Exploring OpenAI's Latent Directions: Tests, Observations, and Poking Around
Johnny Lin (hijohnnylin) · 2024-01-31T06:01:27.969Z · comments (4)

Distillation of 'Do language models plan for future tokens'
TheManxLoiner · 2024-06-27T20:57:34.351Z · comments (2)

$250K in Prizes: SafeBench Competition Announcement
ozhang (oliver-zhang) · 2024-04-03T22:07:41.171Z · comments (0)

Vote in the LessWrong review! (LW 2022 Review voting phase)
habryka (habryka4) · 2024-01-17T07:22:17.921Z · comments (9)

Just because an LLM said it doesn't mean it's true: an illustrative example
dirk (abandon) · 2024-08-21T21:05:59.691Z · comments (12)

How do LLMs give truthful answers? A discussion of LLM vs. human reasoning, ensembles & parrots
Owain_Evans · 2024-03-28T02:34:21.799Z · comments (0)

[link] Positive visions for AI
L Rudolf L (LRudL) · 2024-07-23T20:15:26.064Z · comments (4)

[link] my favourite Scott Sumner blog posts
DMMF · 2024-06-11T14:40:43.093Z · comments (0)

[link] Let's Design A School, Part 2.1 School as Education - Structure
Sable · 2024-05-02T22:04:30.435Z · comments (2)

Optimizing Repeated Correlations
SatvikBeri · 2024-08-01T17:33:23.823Z · comments (1)

[link] what becoming more secure did for me
Chipmonk · 2024-08-22T17:44:48.525Z · comments (5)

Evaluating Solar
jefftk (jkaufman) · 2024-02-17T21:50:04.783Z · comments (5)

[link] Care Doesn't Scale
stavros · 2024-10-28T11:57:38.742Z · comments (1)

[link] Death notes - 7 thoughts on death
Nathan Young · 2024-10-28T15:01:13.532Z · comments (1)

Option control
Joe Carlsmith (joekc) · 2024-11-04T17:54:03.073Z · comments (0)

Do Sparse Autoencoders (SAEs) transfer across base and finetuned language models?
Taras Kutsyk · 2024-09-29T19:37:30.465Z · comments (8)

[question] Which things were you surprised to learn are metaphors?
Gordon Seidoh Worley (gworley) · 2024-11-22T03:46:02.845Z · answers+comments (16)

[link] A brief history of the automated corporation
owencb · 2024-11-04T14:35:04.906Z · comments (1)

Trying Bluesky
jefftk (jkaufman) · 2024-11-17T02:50:04.093Z · comments (16)

Sleeping on Stage
jefftk (jkaufman) · 2024-10-22T00:50:07.994Z · comments (3)

[question] Seeking AI Alignment Tutor/Advisor: $100–150/hr
MrThink (ViktorThink) · 2024-10-05T21:28:16.491Z · answers+comments (3)

[link] overengineered air filter shelving
bhauth · 2024-11-08T22:04:39.987Z · comments (2)

SAE features for refusal and sycophancy steering vectors
neverix · 2024-10-12T14:54:48.022Z · comments (4)

Dave Kasten's AGI-by-2027 vignette
davekasten · 2024-11-26T23:20:47.212Z · comments (5)

Chat Bankman-Fried: an Exploration of LLM Alignment in Finance
claudia.biancotti · 2024-11-18T09:38:35.723Z · comments (4)

SAEs you can See: Applying Sparse Autoencoders to Clustering
Robert_AIZI · 2024-10-28T14:48:16.744Z · comments (0)

A Triple Decker for Elfland
jefftk (jkaufman) · 2024-10-11T01:50:02.332Z · comments (0)

The new ruling philosophy regarding AI
Mitchell_Porter · 2024-11-11T13:28:24.476Z · comments (0)

Improving Model-Written Evals for AI Safety Benchmarking
Sunishchal Dev (sunishchal-dev) · 2024-10-15T18:25:08.179Z · comments (0)

How to put California and Texas on the campaign trail!
Yair Halberstadt (yair-halberstadt) · 2024-11-06T06:08:25.673Z · comments (4)

No Electricity in Manchuria
winstonBosan · 2024-11-19T01:11:58.661Z · comments (0)

You're Playing a Rough Game
jefftk (jkaufman) · 2024-10-17T19:20:06.251Z · comments (2)

Abstractions are not Natural
Alfred Harwood · 2024-11-04T11:10:09.023Z · comments (21)

AI Safety University Organizing: Early Takeaways from Thirteen Groups
agucova · 2024-10-02T15:14:00.137Z · comments (0)

[question] When engaging with a large amount of resources during a literature review, how do you prevent yourself from becoming overwhelmed?
corruptedCatapillar · 2024-11-01T07:29:49.262Z · answers+comments (2)

On The Rationalist Megameetup
Screwtape · 2024-11-23T09:08:26.897Z · comments (3)

[link] Conventional footnotes considered harmful
dkl9 · 2024-10-01T14:54:01.732Z · comments (16)

[link] SB 1047 gets vetoed
ryan_b · 2024-09-30T15:49:38.609Z · comments (1)

[link] UK AISI: Early lessons from evaluating frontier AI systems
Zach Stein-Perlman · 2024-10-25T19:00:21.689Z · comments (0)

A suite of Vision Sparse Autoencoders
Louka Ewington-Pitsos (louka-ewington-pitsos) · 2024-10-27T04:05:20.377Z · comments (0)

[link] Transformer Debugger
Henk Tillman (henk-tillman) · 2024-03-12T19:08:56.280Z · comments (0)

Proving the Geometric Utilitarian Theorem
StrivingForLegibility · 2024-08-07T01:39:10.920Z · comments (0)

[link] Structured Transparency: a framework for addressing use/mis-use trade-offs when sharing information
habryka (habryka4) · 2024-04-11T18:35:44.824Z · comments (0)

[link] Was a Subway in New York City Inevitable?
Jeffrey Heninger (jeffrey-heninger) · 2024-03-30T00:53:21.314Z · comments (4)

[link] An Intuitive Explanation of Sparse Autoencoders for Mechanistic Interpretability of LLMs
Adam Karvonen (karvonenadam) · 2024-06-25T15:57:16.872Z · comments (0)

Decent plan prize announcement (1 paragraph, $1k)
lemonhope (lcmgcd) · 2024-01-12T06:27:44.495Z · comments (19)

[link] **In defence of Helen Toner, Adam D'Angelo, and Tasha McCauley**
mrtreasure · 2023-12-06T02:02:32.004Z · comments (3)

[link] Announcing Open Philanthropy's AI governance and policy RFP
Julian Hazell (julian-hazell) · 2024-07-17T02:02:39.933Z · comments (0)

D&D.Sci Hypersphere Analysis Part 4: Fine-tuning and Wrapup
aphyer · 2024-01-18T03:06:39.344Z · comments (5)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

gunnar_zarncke on Hierarchical Agency: A Missing Piece in AI Alignment

In the section about existing theories that could be related I was missing Luhmann's Social Systems theory. It is not a mathematical theory but otherwise fits the desiderata. This is what o1-preview says about it (after being fed the above dialog up to the theories suggested by Claude:

Introducing Luhmann's Social Systems Theory into our discussion is an intriguing idea. Let's explore how it might help us address the challenge of modeling hierarchical agency with a suitable mathematical formalism.
Overview of Luhmann's Social Systems Theory:
Niklas Luhmann's theory conceptualizes society as a complex set of self-referential social systems composed not of individuals or actions, but of communications. In his framework:
Autopoietic Systems: Social systems are autopoietic, meaning they are self-producing and maintain their boundaries through their own operations. They continuously reproduce the elements (communications) that constitute them.
Functional Differentiation: Society is differentiated into various subsystems (e.g., economy, law, politics), each operating based on its own binary code (profit/non-profit, legal/illegal, power/no power).
Communication as the Fundamental Unit: Communications are the basic elements, and meaning is generated through the network of communications within a system.
Operative Closure and Cognitive Openness: Systems are operationally closed—they can only refer to themselves—but cognitively open, as they observe and are influenced by their environment.
Relating Luhmann's Theory to Hierarchical Agency:
Type Consistency Across Levels:
Agents as Systems: If we consider both subagents and superagents as autopoietic systems of communications, we maintain type consistency. Both levels are constituted by the same fundamental processes.
Scale-Free Structure: Since Luhmann's theory doesn't privilege any particular scale, it aligns with your desideratum for a scale-free formalism.
Expressiveness for Real-World Situations:
Conflict and Integration: Luhmann's concept of structural coupling explains how different systems interact and influence each other without losing their autonomy. This could model conflicts between layers or how a superagent gains agency at the expense of subagents.
Emergent Properties: The emergent nature of social systems in Luhmann's theory parallels how superagents might exhibit properties not reducible to their subagents.
Representation of Intentionality:
System Intentionality: While individual intentions are abstracted away, systems exhibit purposive behavior through their operations. This could be adapted to represent the goals and beliefs of agents at different levels.
Communication of Intentions: Intentionality can be embedded in the communications that constitute the system, allowing us to model beliefs and desires.
Towards a Mathematical Formalism:
While Luhmann's theory is primarily sociological and lacks a mathematical framework, we can attempt to formalize its concepts:
Agent-Based Modeling:
Agents as Communication Processes: Model agents (both subagents and superagents) as processes that generate and respond to communications.
Network Dynamics: Use graph theory to represent communications as edges and agents as nodes, allowing us to analyze the structure and dynamics of interactions.
Category Theory:
Objects and Morphisms: Utilize category theory to model agents as objects and communications as morphisms, preserving type consistency and composability.
Functorial Relationships: Define functors that map between categories of agents at different hierarchical levels.
Dynamic Systems Theory:
Differential Equations: Represent the evolution of communications and agent states over time using differential equations.
Attractors and Stability: Analyze how certain communication patterns lead to stable superagent formations.
Information Theory:
Entropy and Redundancy: Measure the information content of communications, which could correlate with the intentionality and decision-making processes of agents.
Mutual Information: Assess how much information is shared between subagents and superagents, indicating the degree of coupling.
Potential Benefits:
Unified Framework: By treating communications as the foundational elements, we can model both agents and superagents within the same formal structure.
Emergence and Self-Organization: The formalism could capture how complex behaviors emerge from simple interactions, which is crucial for understanding hierarchical agency.
Applicability to AI Systems: For AI alignment, this approach might help in designing AI agents that can form or integrate into higher-level systems without losing alignment with human values.
Challenges:
Abstract Nature: Luhmann's theory is highly abstract, which might make direct mathematical translation difficult.
Intentionality Representation: Modeling intentionality purely through communications may not capture all aspects of beliefs and desires inherent in agents.

joseph-miller on leogao's Shortform

Unconferences are a thing for this reason

red75prime on Dave Kasten's AGI-by-2027 vignette

I have "AGI 2029 ASI 2030 TAI 2037" flair in a certain subreddit. The scenario I envisage is as follows.

2029 . AI systems learn to solve many problems thrown at them, but slower and less reliable than human professionals. Lower sample efficiency, worse long term planning, etc.

2030 . Those inefficiencies are reduced to the point where AI systems learn to solve problems faster than any human (in some cases thanks to their processing speed).

2030 - 2037 . AI systems learn to solve (that is get the knack of) millions of real-world problem types that are required to make contributions to all sides of the economy.

2037 . Transformative AI

alexander-gietelink-oldenziel on John Fisher's Shortform

Sounds like a great idea! With the best of intentions ! What could possibly go wrong ?

john-fisher on John Fisher's Shortform

Law proposal: After a company makes over a 500 million USD in revenue 2 years in a row, the citizens of the country it operates in automatically become the company's shareholders and board members. There could be a more nuanced version of this that gives the public only some percentage - say 50% - of voting power and company's profit so that there is still strong incentive for the existing shareholders to stick around

ben-lang on A very strange probability paradox

Yes, its a bit weird. I was replying because I thought (perhaps getting the wrong end of the stick) that you were confused about what the question was, not (as it seems now) pointing out that the question (in your view) is open to being confused.

In probability theory the phrase "given that" is a very important, and it is (as far as I know) always used in the way used here. ["given that X happens" means "X may or may not happen, but we are thinking about the cases where it does", which is very different from meaning "X always happens"]

A more common use would be "What is the probability that a person is sick, given that they are visiting a doctor right now?". This doesn't mean "everyone in the world is visiting a doctor right now", it means that the people who are not visiting a doctor right now exist, but we are not talking about them. Similarly, the original post's imagined world involves cases where odd numbers are rolled, but we are talking about the set without odds. It is weird to think about how proposing a whole set of imaginary situations (odd and even rolls) then talking only about a subset of them (only evens) is NOT the same as initially proposing the smaller set of imaginary events in the first place (your D3 labelled 2,4,6).

But yes, I can definitely see how the phrase "given that", could be interpreted the other way.

joachim-bartosik on You are not too "irrational" to know your preferences.

If they can’t do that, why on earth should you give up on your preferences? In what bizarro world would that sort of acquiescence to someone else’s self-claimed authority be “rational?”

Well if they consistently make recommendations that in retrospect end up looking good then maybe you're bad at understanding. Or maybe they're bad at explaining. But trusting them when you don't understand their recommendation is exploitable so maybe they're running a strategy where they deliberately make good recommendations with poor explanations so when you start trusting them they can start mixing in exploitative recommendations (which you can't tell apart because all recommendations have poor explanations).

So I'd really rather not do that in community context. There are ways to work with that. Eg. boss can skip some details of employees recommendations and if results are bad enough fire the employee. On the other hand I think it's pretty common for employee to act in their own interest. But yeah, we're talking principal-agent problem at that point and tradeoffs what's more efficient...

mfar on The Queen’s Dilemma: A Paradox of Control

W. Ross Ashby's Law of Requisite Variety (1956) suggests fundamental limits to human control over more capable systems.

This law sounds super enticing and I want to understand it more. Could you spell out how the law suggests this?

I did a quick search of LessWrong and Wikipedia regarding this law.

"... Ashby's "Law of requisite variety", which roughly speaking states that a system can only remain in homeostasis if it has more internal states than the external states it encounters." from Yuxi_Liu, "Cybernetic dreams" [LW · GW].
"Either the AI is too simple to be an independent robust agent in human society, or it needs to be approximately as complex as humans themselves. Cf. the law of requisite variety." from Roman Leventov, "For alignment, we should simultaneously use multiple theories of cognition and value" [LW · GW].
"This law (of which Shannon's theorem 10 relating to the suppression of noise is a special case) says that if a certain quantity of disturbance is prevented by a regulator from reaching some essential variables, then that regulator must be capable of exerting at least that quantity of selection." from W. R. Ashby (1960), "Design for a Brain", p. 229, quoted via Wikipedia page.

Enough testimonials, the Wikipedia page itself describes the law as based on the observation that in a two-player game between the environment (disturber) and a system trying to maintain stasis (regulator), if the environment has D moves that all lead to different outcomes (given any move from the system), and the system has R possible responses, then the best the system can do is restrict the number of outcomes to D/R.

I can see the link between this and the descriptions from Yuxi_Liu, Roman Leventov, and Ashby. Your reading is a couple of steps removed. How did you get from D/R outcomes in this game to "fundamental limits to human control over more capable systems"? My guess it that you simply mean that if the more capable system is more complex / has more moves available moves / more "variety" than humans then the law will apply with the human as the regulator and the AI as the disturber. Is that right? Could you comment on how you see capability in terms of variety?

daystareld on You are not too "irrational" to know your preferences.

I'm a little confused. Do the examples in the post all seem purely hypothetical to you?

Whether or not it's rational to have ice cream.
Whether or not wanting your partner to do housework is reasonable.
Whether or not you want to receive unfiltered criticism or judgements.
Whether being mono vs open vs poly is a sign of rationality.
Whether your career preference is a sign of rationality.

Or are they not sufficiently detailed, or...? They're all real things I have encountered, and obviously not all are as equally detailed, and I could always add more, but if it doesn't seem concrete enough yet, I'm not sure what else to add or in how much detail.

viliam on Making a conservative case for alignment

I think I still don't understand the main conflict which bothers you.

Two major points.

1) It annoys me if someone insists that I accept their theory about what being trans really is.

Zack insists that Blanchard is right, and that I fail at rationality if I disagree with him. People on Twitter and Reddit insist that Blanchard is wrong, and that I fail at being a decent human if I disagree with them. My opinion is that I have no comparative advantage at figuring out who is right and who is wrong on this topic, or maybe everyone is wrong, anyway it is an empirical question and I don't have the data. I hope that people who have more data and better education will one day sort it out, but until that happens, my position firmly remains "I don't know (and most likely neither do you), stop bothering me".

Also, from larger perspective, this is moving the goalposts. Long ago, tolerance was defined as basically not hurting other people, and letting them do whatever they want as long as it does not hurt others. Recently it also includes agreeing with the beliefs of their woke representatives. (Note that this is about the representatives, not the people being represented. Two trans people can have different opinions, but you are required to believe the woke one and oppose the non-woke one.) Otherwise, you are transphobic. I completely reject that. Furthermore, I claim that even trans people themselves are not necessarily experts on themselves. Science exists for a reason, otherwise we could just make opinion polls.

Shortly: disagreement is not hate. But it often gets conflated, especially in environments that overwhelmingly contain people of one political tribe.

2) Every cause gets abused. It is bad if it becomes a taboo to point this out.

A few months (or is it already years?) ago, there was an epidemic of teenagers on TikTok who appeared to have developed Tourette syndrome overnight. A few weeks or months later, apparently the epidemic was gone. I have no way to check those teenagers, but I think it is reasonable to assume that many of them were faking it. Why would anyone do that? Most likely, attention seeking. (There is also a things called Munchausen syndrome.) This is what I referred to as "cosplayers".

Note that this is completely different from saying that Tourette syndrome does not exist.

If you adopt a rule that e.g. everyone must use everyone else's preferred pronouns all the time, no exception, and you get banned for hate speech otherwise, this becomes a perfect opportunity for... anyone who enjoys using it as a leverage. You get an explosion of pronouns: it starts with "he" and "she", proceeds with "they", then you get "xe", "ve", "foo", "bar", "baz", and ultimately anyone is free to make up their own pronouns, and everyone else is required to play along, or else. (That's when you get the "attack helicopters" as an attempt to point out the absurdity of the system.)

Again, moving the goalposts. We started with trans people who report feeling gender dysphoria, so we use their preferred pronouns to alleviate their suffering. So far, okay. But if there is a person who actually feels dysphoria from not being addressed as "ve" (someone who would be triggered by calling them any of: "he", "she", or "they"), then I believe that this is between them and their psychiatrist, and I want to be left out of this game.

Another annoying thing is how often this is used to derail the debate (on places like Twitter and Reddit). Suppose that someone is called "John" and has a male-passing photo. So you try to say something about John, and your automatically use the pronoun "he". Big mistake! You haven't noticed it, but recently John identifies as agender. And whatever you wanted to talk about originally is unimportant now, and the thread becomes about what a horrible person you are. Okay, you have learned your lesson; but the point is that the next time someone else is going to make the same mistake. So it basically becomes impossible to discuss John, ever. And sometimes, it is important to be able to discuss John, without getting the debate predictably derailed.

Shortly: misgendering should be considered bad manners, but not something you ban people for.

...and that's basically all.