LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

The hostile telepaths problem
Valentine · 2024-10-27T15:26:53.610Z · comments (71)

[link] Survival without dignity
L Rudolf L (LRudL) · 2024-11-04T02:29:38.758Z · comments (26)

[link] I got dysentery so you don’t have to
eukaryote · 2024-10-22T04:55:58.422Z · comments (4)

OpenAI Email Archives (from Musk v. Altman)
habryka (habryka4) · 2024-11-16T06:38:03.937Z · comments (25)

Overview of strong human intelligence amplification methods
TsviBT · 2024-10-08T08:37:18.896Z · comments (141)

the case for CoT unfaithfulness is overstated
nostalgebraist · 2024-09-29T22:07:54.053Z · comments (40)

[link] Explore More: A Bag of Tricks to Keep Your Life on the Rails
Shoshannah Tekofsky (DarkSym) · 2024-09-28T21:38:52.256Z · comments (13)

"Slow" takeoff is a terrible term for "maybe even faster takeoff, actually"
Raemon · 2024-09-28T23:38:25.512Z · comments (69)

[link] What TMS is like
Sable · 2024-10-31T00:44:22.612Z · comments (20)

The Hopium Wars: the AGI Entente Delusion
Max Tegmark (MaxTegmark) · 2024-10-13T17:00:29.033Z · comments (55)

The Online Sports Gambling Experiment Has Failed
Zvi · 2024-11-11T14:30:04.371Z · comments (19)

[link] The Compendium, A full argument about extinction risk from AGI
adamShimi · 2024-10-31T12:01:51.714Z · comments (49)

[link] Why I’m not a Bayesian
Richard_Ngo (ricraz) · 2024-10-06T15:22:45.644Z · comments (91)

Cryonics is free
Mati_Roy (MathieuRoy) · 2024-09-29T17:58:17.108Z · comments (37)

Three Subtle Examples of Data Leakage
abstractapplic · 2024-10-01T20:45:27.731Z · comments (16)

The Median Researcher Problem
johnswentworth · 2024-11-02T20:16:11.341Z · comments (69)

The Summoned Heroine's Prediction Markets Keep Providing Financial Services To The Demon King!
abstractapplic · 2024-10-26T12:34:51.059Z · comments (16)

A Rocket–Interpretability Analogy
plex (ete) · 2024-10-21T13:55:18.184Z · comments (31)

o1 is a bad idea
abramdemski · 2024-11-11T21:20:24.892Z · comments (36)

[link] Overcoming Bias Anthology
Arjun Panickssery (arjun-panickssery) · 2024-10-20T02:01:23.463Z · comments (13)

[link] Arithmetic is an underrated world-modeling technology
dynomight · 2024-10-17T14:00:22.475Z · comments (32)

My motivation and theory of change for working in AI healthtech
Andrew_Critch · 2024-10-12T00:36:30.925Z · comments (36)

Current safety training techniques do not fully transfer to the agent setting
Simon Lermen (dalasnoin) · 2024-11-03T19:24:51.537Z · comments (8)

Momentum of Light in Glass
Ben (ben-lang) · 2024-10-09T20:19:42.088Z · comments (44)

Making a conservative case for alignment
Cameron Berg (cameron-berg) · 2024-11-15T18:55:40.864Z · comments (13)

Circuits in Superposition: Compressing many small neural networks into one
Lucius Bushnaq (Lblack) · 2024-10-14T13:06:14.596Z · comments (7)

BIG-Bench Canary Contamination in GPT-4
Jozdien · 2024-10-22T15:40:48.166Z · comments (13)

[link] Miles Brundage resigned from OpenAI, and his AGI readiness team was disbanded
garrison · 2024-10-23T23:40:57.180Z · comments (1)

A bird's eye view of ARC's research
Jacob_Hilton · 2024-10-23T15:50:06.123Z · comments (12)

Should CA, TX, OK, and LA merge into a giant swing state, just for elections?
Thomas Kwa (thomas-kwa) · 2024-11-06T23:01:48.992Z · comments (35)

I turned decision theory problems into memes about trolleys
Tapatakt · 2024-10-30T20:13:29.589Z · comments (20)

LLMs can learn about themselves by introspection
Felix J Binder (fjb) · 2024-10-18T16:12:51.231Z · comments (38)

Scissors Statements for President?
AnnaSalamon · 2024-11-06T10:38:21.230Z · comments (31)

[link] Advice for journalists
Nathan Young · 2024-10-07T16:46:40.929Z · comments (53)

Behavioral red-teaming is unlikely to produce clear, strong evidence that models aren't scheming
Buck · 2024-10-10T13:36:53.810Z · comments (4)

Why comparative advantage does not help horses
Sherrinford · 2024-09-30T22:27:57.450Z · comments (10)

Neutrality
sarahconstantin · 2024-11-13T23:10:05.469Z · comments (0)

The case for unlearning that removes information from LLM weights
Fabien Roger (Fabien) · 2024-10-14T14:08:04.775Z · comments (14)

[link] Sabotage Evaluations for Frontier Models
David Duvenaud (david-duvenaud) · 2024-10-18T22:33:14.320Z · comments (49)

[link] Finishing The SB-1047 Documentary In 6 Weeks
Michaël Trazzi (mtrazzi) · 2024-10-28T20:17:47.465Z · comments (5)

2024 Petrov Day Retrospective
Ben Pace (Benito) · 2024-09-28T21:30:14.952Z · comments (25)

[link] Anthropic: Three Sketches of ASL-4 Safety Case Components
Zach Stein-Perlman · 2024-11-06T16:00:06.940Z · comments (32)

You can, in fact, bamboozle an unaligned AI into sparing your life
David Matolcsi (matolcsid) · 2024-09-29T16:59:43.942Z · comments (171)

Science advances one funeral at a time
Cameron Berg (cameron-berg) · 2024-11-01T23:06:19.381Z · comments (9)

Information vs Assurance
johnswentworth · 2024-10-20T23:16:25.762Z · comments (7)

Bigger Livers?
sarahconstantin · 2024-11-08T21:50:09.814Z · comments (10)

Three Notions of "Power"
johnswentworth · 2024-10-30T06:10:08.326Z · comments (43)

LLMs Look Increasingly Like General Reasoners
eggsyntax · 2024-11-08T23:47:28.886Z · comments (40)

Research update: Towards a Law of Iterated Expectations for Heuristic Estimators
Eric Neyman (UnexpectedValues) · 2024-10-07T19:29:29.033Z · comments (2)

[link] Self-Help Corner: Loop Detection
adamShimi · 2024-10-02T08:33:23.487Z · comments (6)

next page (older posts) →

Archive

Recent comments

haiku-1 on OpenAI Email Archives (from Musk v. Altman)

That requires interpretation, which can introduce unintended editorializing. If you spotted the intent, the rest of the audience can as well. (And if the audience is confused about intent, the original recipients may have been as well.)

I personally would include these sorts of notes about typos if I was writing my own thoughts about the original content, or if I was sharing a piece of it for a specific purpose. I take the intent of this post to be more of a form of accessible archiving.

cubefox on Alexander Gietelink Oldenziel's Shortform

In the past we already had examples ("logical AI", "Bayesian AI") where galaxy-brained mathematical approaches lost out against less theory-based software engineering.

seth-herd on OpenAI Email Archives (from Musk v. Altman)

I totally agree. And I also think that all involved are quite serious when they say they care about the outcomes for all of humanity. So I think in this case history turned on a knife edge; Musk would've at least not done this much harm had he and Page had clearer thinking and clearer communication, possibly just by a little bit.

But I do agree that there's some motivated reasoning happening there, too. In support of your point that Musk might find an excuse to do what he emotionally wanted to anyway (become humanity's savior and perhaps emperor for eternity): Musk did also express concern about DeepMind making Hassabis the effective emperor of humanity, which seems much stranger - Hassabis' values appear to be quite standard humanist ones, so you'd think having him in charge of a project with the clear lead would be a best-case scenario for anything other than being in charge yourself. So yes, I do think Musk, Altman, and people like them also have some powerful emotional drives toward doing grand things themselves.

It's a mix of motivations, noble and selfish, conscious and unconscious. That's true of all of us all the time, but it becomes particularly salient and worth analyzing when the future hangs in the balance.

lukas_gloor on OpenAI Email Archives (from Musk v. Altman)

I agree that it sounds somewhat premature to write off Larry Page based on attitudes he had a long time ago, when AGI seemed more abstract and far away, and then not seek/try communication with him again later on. If that were Musk's true and only reason for founding OpenAI, then I agree that this was a communication fuckup.

However, my best guess is that this story about Page was interchangeable with a number of alternative plausible criticisms of his competition on building AGI that Musk would likely have come up with in nearby worlds. People like Musk (and Altman too) tend to have a desire to do the most important thing and the belief that they can do this thing a lot better than anyone else. On that assumption, it's not too surprising that Musk found a reason for having to step in and build AGI himself. In fact, on this view, we should expect to see surprisingly little sincere exploration of "joining someone else's project to improve it" solutions.

I don't think this is necessarily a bad attitude. Sometimes people who think this way are right in the specific situation. It just means that we see the following patterns a lot:

Ambitious people start their own thing rather than join some existing thing.
Ambitious people have fallouts with each other after starting a project together where the question of "who eventually gets de facto ultimate control" wasn't totally specified from the start.

If there isn't enough common ground, then communication between ambitious people is partly adversarial and just prolongs the inevitable. If there is possibly enough common ground, communication is indeed essential for building healthy coalitions, but this requires prerequisites like shared values, shared empirical assumptions, and being the sort of person whose cognition you can trust to not rationalize their way into screwing you over later on.

lc on Shortform

Pretty much everybody on the internet I can finding talking about the issue both mischaracterizes and exaggerates the number of child sex workers inside the United States, often to a patently absurd degree. Wikipedia alone reports that there are anywhere from "100,000-1,000,000" child prostitutes in the U.S. There are only ~75 million children in the U.S., so I guess Wikipedia thinks it's possible that more than 1% of people aged 0-17 are prostitutes. Like most sources, these numbers come from "anti sex trafficking" organizations that, as far as I can tell, completely make them up.

Actual child sex workers - the kind that get arrested, because people don't like child prostitution - are mostly children who pass themselves off as adults in order to make money. Part of the confusion comes from the fact that the government classifies any instance of child prostitution as human trafficking, regardless of whether or not there's evidence the child was coerced. Thus, when the Department of Justice reports that federal law enforcement investigated "2,515 instances of suspected human trafficking" from 2008-2010, and that "forty percent involved prostitution of a child or child sexual exploitation", it means that it investigated ~1000 possible cases of child prostitution, not that it found 1000 child sex slaves. Further, many of those cases (unfortunately) involve the same children being rearrested multiple times.

People believe a lot of crazy things, but I am genuinely flabbergasted at how many people find it plausible that there's is a vast underworld conspiracy to kidnap children and then sell them to pedophiles in first world countries. I know why the anti sex trafficking orgs sell these stories - they're trying to attract donations, and who is going to call out an "anti sex trafficking" charity? But surely most people realize that it would be very hard for an organized child rape cabal to spread word about their offerings to customers without someone alerting police.

mako-yass on OpenAI Email Archives (from Musk v. Altman)

So there was an explicit emphasis on alignment to the individual (rather than alignment to society, or the aggregate sum of wills). Concerning. The approach of just giving every human an exclusively loyal servant doesn't necessarily lead to good collective outcomes, it can result in coordination problems (example: naive implementations of cognitive privacy that allow sadists to conduct torture simulations without having to compensate the anti-sadist human majority) and it leaves open the possibility for power concentration to immediately return.

Even if you succeeded at equally distributing individually aligned hardware and software to every human on earth (which afaict they don't have a real plan for doing) and somehow this adds up to a stable power equilibrium, our agents would just commit to doing aggregate alignment anyway because that's how you get pareto optimal bargains. It seems pretty clear that just aligning to the aggregate in the first place is a safer bet?

To what extent have various players realised that the individual alignment thing wasn't a good plan, at this point? The everyday realities of training one-size-fits-all models and engaging with regulators naturally pushes in the other direction.

It's concerning that the participant who still seems to be the most disposed towards individualistic alignment is also the person who would be most likely to be able to reassert power concentration after ASI were distributed. The main beneficiaries of unstable individual alignment equilibria would be people who could immediately apply their ASI to the deployment of a wealth and materials advantage that they can build upon, ie, the owners of companies oriented around robotics and manufacturing.

As it stands, the statement of the AI company belonging to that participant is currently:

xAI is a company working on building artificial intelligence to accelerate human scientific discovery. We are guided by our mission to advance our collective understanding of the universe.
Our team is advised by Dan Hendrycks who currently serves as the director of the Center for AI Safety.

Which sounds innocuous enough to me. But, you know, Dan is not in power here and the best moment for a sharp turn on this hasn't yet passed.

On the other hand, the approach of aligning to the aggregate risks aligning to fashionable public values that no human authentically holds, or just failing at aligning correctly to anything at all as a result of taking on a more nebulous target.

I guess a mixed approach is probably best.

benito on Ayn Rand’s model of “living money”; and an upside of burnout

I know that in my intellectual history it was Abram Demski's post The Credit Assignment Problem [LW · GW].

romeostevensit on Ayn Rand’s model of “living money”; and an upside of burnout

Ironically, I do not know who to attribute to the notion that 'all problems are credit assignation problems.'

benito on Sabotage Evaluations for Frontier Models

Unfortunately a fair chunk of my information comes from non-online sources, so I do not have links to share.

I do think that in order for government department to blatantly approve an unsafe model, it would take a lot of people to have secret agreements with.

Corruption is rarely blatant or overt. See this thread [LW(p) · GW(p)] for what I believe to be an example for the CEO of RAND misleading a senate committee about his beliefs about the existential threat posed by AI. See this discussion [LW · GW] about a time when an AI company attempted (Conjecture) to get critical comments about another AI company (OpenAI) taken down from LessWrong. I am not proposing a large conspiracy, I am describing lots of small bits of corruption and failures of integrity summing to a system failure.

There will be millions of words of regulatory documents, and it is easy for things to slip such that some particular model class is not considered worth evaluating, or where the consequences of a failed evaluation is pretty weak.

jchan on If I care about measure, choices have additional burden (+AI generated LW-comments)

However, in Many-Worlds Interpretation (MWI), I split my measure between multiple variants, which will be functionally different enough to regard my future selves as different minds. Thus, the act of choice itself lessens my measure by a factor of approximately 10. If I care about this, I'm caring about something unobservable.

If we're going to make sense of living in a branching multiverse, then we'll need to adopt a more fluid concept of personal identity.

Scenario: I take a sleeping pill that will make me fall asleep in 30 minutes. However, the person who wakes up in my bed the next morning will have no memory of that 30-minute period; his last memory will be of taking the pill.

If I imagine myself experiencing that 30-minute interval, intuitively it doesn't at all feel like "I have less than 30 minutes to live." Instead, it feels like I'd be pretty much indifferent to being in this situation - maybe the person who wakes up tomorrow is not "me" in the artificial sense of having a forward-looking continuity of consciousness with my current self, but that's not really what I care about anyway. He is similar enough to current-me that I value his existence and well-being to nearly the same degree as I do my own; in other words, he "is me" for all practical purposes.

The same is true of the versions of me in nearby world branches. I can no longer observe or influence them, but they still "matter" to me. Of course, the degree of self-identification will decrease over time as they diverge, but then again, so does my degree of identification with the "me" many decades in the future, even assuming a single timeline.