LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] [Paper] AI Sandbagging: Language Models can Strategically Underperform on Evaluations
Teun van der Weij (teun-van-der-weij) · 2024-06-13T10:04:49.556Z · comments (10)

Newsom Vetoes SB 1047
Zvi · 2024-10-01T12:20:06.127Z · comments (6)

[link] Hardshipification
Jonathan Moregård (JonathanMoregard) · 2024-05-28T20:02:29.709Z · comments (17)

[link] [Paper] Stress-testing capability elicitation with password-locked models
Fabien Roger (Fabien) · 2024-06-04T14:52:50.204Z · comments (10)

[link] A Universal Emergent Decomposition of Retrieval Tasks in Language Models
Alexandre Variengien (alexandre-variengien) · 2023-12-19T11:52:27.354Z · comments (3)

Some for-profit AI alignment org ideas
Eric Ho (eh42) · 2023-12-14T14:23:20.654Z · comments (19)

MATS Winter 2023-24 Retrospective
utilistrutil · 2024-05-11T00:09:17.059Z · comments (28)

Self-prediction acts as an emergent regularizer
Cameron Berg (cameron-berg) · 2024-10-23T22:27:03.664Z · comments (4)

[link] Nietzsche's Morality in Plain English
Arjun Panickssery (arjun-panickssery) · 2023-12-04T00:57:42.839Z · comments (13)

[link] What are you getting paid in?
Austin Chen (austin-chen) · 2024-07-17T19:23:04.219Z · comments (14)

A very strange probability paradox
notfnofn · 2024-11-22T14:01:36.587Z · comments (20)

Why you should be using a retinoid
GeneSmith · 2024-08-19T03:07:41.722Z · comments (59)

[link] Is Deep Learning Actually Hitting a Wall? Evaluating Ilya Sutskever's Recent Claims
garrison · 2024-11-13T17:00:01.005Z · comments (13)

AI #51: Altman’s Ambition
Zvi · 2024-02-20T19:50:07.439Z · comments (5)

Actually, Power Plants May Be an AI Training Bottleneck.
Lao Mein (derpherpize) · 2024-06-20T04:41:33.567Z · comments (13)

Retirement Accounts and Short Timelines
jefftk (jkaufman) · 2024-02-19T18:50:05.231Z · comments (35)

OpenAI o1, Llama 4, and AlphaZero of LLMs
Vladimir_Nesov · 2024-09-14T21:27:41.241Z · comments (25)

[link] What Depression Is Like
Sable · 2024-08-27T17:43:22.549Z · comments (23)

Sparse Autoencoders Work on Attention Layer Outputs
Connor Kissane (ckkissane) · 2024-01-16T00:26:14.767Z · comments (9)

AI #83: The Mask Comes Off
Zvi · 2024-09-26T12:00:08.689Z · comments (19)

An Introduction To The Mandelbrot Set That Doesn't Mention Complex Numbers
Yitz (yitz) · 2024-01-17T09:48:07.930Z · comments (11)

Release: Optimal Weave (P1): A Prototype Cohabitive Game
mako yass (MakoYass) · 2024-08-17T14:08:18.947Z · comments (21)

Secular interpretations of core perennialist claims
zhukeepa · 2024-08-25T23:41:02.683Z · comments (32)

Constructability: Plainly-coded AGIs may be feasible in the near future
Épiphanie Gédéon (joy_void_joy) · 2024-04-27T16:04:45.894Z · comments (13)

Some Vacation Photos
johnswentworth · 2024-01-04T17:15:01.187Z · comments (0)

[link] Essay competition on the Automation of Wisdom and Philosophy — $25k in prizes
owencb · 2024-04-16T10:10:13.338Z · comments (12)

AISafety.com – Resources for AI Safety
Søren Elverlin (soren-elverlin-1) · 2024-05-17T15:57:11.712Z · comments (3)

Decomposing the QK circuit with Bilinear Sparse Dictionary Learning
keith_wynroe · 2024-07-02T13:17:16.352Z · comments (7)

[question] What are the good rationality films?
Ben Pace (Benito) · 2024-11-20T06:04:56.757Z · answers+comments (50)

[link] New voluntary commitments (AI Seoul Summit)
Zach Stein-Perlman · 2024-05-21T11:00:41.794Z · comments (17)

How to prevent collusion when using untrusted models to monitor each other
Buck · 2024-09-25T18:58:20.693Z · comments (6)

[link] Palworld development blog post
bhauth · 2024-01-28T05:56:19.984Z · comments (12)

Refusal mechanisms: initial experiments with Llama-2-7b-chat
Andy Arditi (andy-arditi) · 2023-12-08T17:08:01.250Z · comments (7)

Values Are Real Like Harry Potter
johnswentworth · 2024-10-09T23:42:24.724Z · comments (17)

3C's: A Recipe For Mathing Concepts
johnswentworth · 2024-07-03T01:06:11.944Z · comments (5)

The Gemini Incident
Zvi · 2024-02-22T21:00:04.594Z · comments (19)

Survey of 2,778 AI authors: six parts in pictures
KatjaGrace · 2024-01-06T04:43:34.590Z · comments (1)

Self-Referential Probabilistic Logic Admits the Payor's Lemma
Yudhister Kumar (randomwalks) · 2023-11-28T10:27:29.029Z · comments (14)

Studying The Alien Mind
Quentin FEUILLADE--MONTIXI (quentin-feuillade-montixi) · 2023-12-05T17:27:28.049Z · comments (10)

[link] Not every accommodation is a Curb Cut Effect: The Handicapped Parking Effect, the Clapper Effect, and more
Michael Cohn (michael-cohn) · 2024-09-15T05:27:36.691Z · comments (39)

[link] Gwern Branwen interview on Dwarkesh Patel’s podcast: “How an Anonymous Researcher Predicted AI's Trajectory”
Said Achmiz (SaidAchmiz) · 2024-11-14T23:53:34.922Z · comments (0)

Quick look: applications of chaos theory
Elizabeth (pktechgirl) · 2024-08-18T15:00:07.853Z · comments (51)

[link] My thesis (Algorithmic Bayesian Epistemology) explained in more depth
Eric Neyman (UnexpectedValues) · 2024-05-09T19:43:16.543Z · comments (4)

Graceful Degradation
Screwtape · 2024-11-05T23:57:53.362Z · comments (8)

LessWrong Community Weekend 2024, open for applications
UnplannedCauliflower · 2024-05-01T10:18:21.992Z · comments (2)

[link] MIRI's May 2024 Newsletter
Harlan · 2024-05-15T00:13:30.153Z · comments (1)

[Intuitive self-models] 2. Conscious Awareness
Steven Byrnes (steve2152) · 2024-09-25T13:29:02.820Z · comments (48)

[link] Is "superhuman" AI forecasting BS? Some experiments on the "539" bot from the Centre for AI Safety
titotal (lombertini) · 2024-09-18T13:07:40.754Z · comments (3)

[link] The Cognitive-Theoretic Model of the Universe: A Partial Summary and Review
jessicata (jessica.liu.taylor) · 2024-03-27T19:59:27.893Z · comments (36)

A couple productivity tips for overthinkers
Steven Byrnes (steve2152) · 2024-04-20T16:05:50.332Z · comments (13)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

simon on Magic by forgetting

Here are some things one might care about:

what happens to your physical body
the access to working physical bodies of cognitive algorithms, across all possible universes, that are within some reference class containing the cognitive algorithm implemented by your physical body
... etc, etc...
what happens to the physical body selected by the following process:
1. start with your physical body
2. go forward to some later time selected by the cognitive algorithm implemented by your physical body, allowing (or causing) the knowledge possessed by the cognitive algorithm implemented by your physical body to change in the interim
3. at that later time, randomly sample from all the physical bodies, among all universes, that implement cognitive algorithms having the same knowledge as the cognitive algorithm implemented by your physical body at that later time
4. (optionally) return to step b but with the physical body whose changes of cognitive algorithm are tracked and whose decisions are used being the the new physical body selected from step c
5. stop whenever the cognitive algorithm implemented by the physical body selected in some step decides to stop.

For 1, 2, and I expect for the vast majority of possibilities for 3, your procedure will not work. It will work for 4, which is apparently what you care about.

Terminal values are arbitrary, so that's entirely valid. However, 4 is not something that seems, to me, like a particularly privileged or "rational" thing to care about.

jmh on Hell is wasted on the evil

If I'm reading this correctly, then generally we're seeing a rather flat payoff curve over most "do good opportunities" and the rare max should stand out like a sore thumb when taking a good look. So those really should be things do-gooders will jump on quickly. (Note, that doesn't mean they are done quickly or that additional assistance is not important.)

While not as obvious, it probably also means that a lot of more mundane opportunities are getting ignored. That comes from an insight offered in one of my classes from years back asking why so much clumping (think fad type stuff here) exists when the marginal utility of the consumed good is pretty much equal to all the other goods that could have been consumer. In other words, when the opportunity cost is zero why is everyone doing the same thing?

I suspect we could see something like that in the "do good" space. Therefore, taking the path not followed could be a very good thing.

yams on yams's Shortform

Folks using compute overhang to 4D chess their way into supporting actions that differentially benefit capabilities.

I'm often tempted to comment this in various threads, but it feels like a rabbit hole, it's not an easy one to convince someone of (because it's an argument they've accepted for years), and I've had relatively little success talking about this with people in person (there's some change I should make in how I'm talking about it, I think).

More broadly, I've started using quick takes to catalog random thoughts, because sometimes when I'm meeting someone for the first time, they have heard of me, and are mistaken about my beliefs, but would like to argue against their straw version. Having a public record I can point to of things I've thought feels useful for combatting this.

benito on Making a conservative case for alignment

I could believe it, but my (weak) guess is that in most settings people care about which pronoun they use far less than they care about people not being confused about who is being referred to.

christiankl on Shortform

Wikipedia currently writes "Epstein installed concealed cameras in numerous places on his properties to allegedly record sexual activity with underage girls of prominent people for criminal purposes such as blackmail."

This suggests that more than one underage girl was passed around. The fact that the others don't have the same courage as Virginia Giuffre to speak publically about it, does not mean that there wasn't a problem.

It does show that the system to suppress information about it is working well, which is also shown by the fact that the FBI did not lose that video stack as material for prosecuting all those people.

I submit that this industry in particular does not exist, or at least would be a terrible way to make money on a risk-adjusted basis compared to drug dealing.

The blackmail potential existing matters in addition to being directly paid. J. Edgar Hoover could be effectively blackmailed by the mafia into saying that there's no mafia by having photos of his homosexual activities, but these days that wouldn't be enough to blackmail anybody.

Understanding how organizations like Epsteins operate is hard because they do everything they can to avoid being well understood.

As far as industry goes, it's quite old but https://wikileaks.org/wiki/an_insight_into_child_porn is a good read about how child pornography worked two decades ago. It's not the same as child prostitution but it's a good article about ground realities.

sting on Making a conservative case for alignment

Is there literally any scene that has openly transgender people in it and does 3, 4, or 5?

If you can use "they" without problems, that sounds a lot like 4.

As for 3 and 5, not to my knowledge. Compromises like this would be more likely in settings with a mix of Liberals and Conservatives, but such places are becoming less common. Perhaps some family reunions would have similar rules or customs?

elityre on yams's Shortform

To whom are are you talking?

benito on Making a conservative case for alignment

My rough take: the rationalist scene in Berkeley used to be very bad at maintaining boundaries. Basically the boundaries were "who gets invited to parties by friends". The one Berkeley community space ("REACH") was basically open-access. In recent years the Lightcone team (of which I am a part) has hosted spaces and events and put in the work to maintain actual boundaries (including getting references on people and checking out suspicion of bad behavior, but mostly just making it normal for people to have events with standards for entry) and this has substantially improved the ability for rationalist spaces to have culture that is distinct from the local Berkeley culture.

abandon on Which things were you surprised to learn are not metaphors?

I enjoy being embodied, and I'd describe what I enjoy as the sensation rather than the fact. Proprioception feels pleasant, touch (for most things one is typically likely to touch) feels pleasant, it is a joy to have limbs and to move them through space. So many joints to flex, so many muscles to tense and untense. (Of course, sometimes one feels pain, but this is thankfully the exception rather than the rule).

gwern on Lao Mein's Shortform

I don't believe there are any details about the restructuring, so a detailed analysis is impossible. There have been a few posts by lawyers and quotes from lawyers, and it is about what you would expect: this is extremely unusual, the OA nonprofit has a clear legal responsibility to sell the for-profit for the maximum $$$ it can get or else some even more valuable other thing which assists its founding mission, it's hard to see how the economics is going to work here, and aspects of this like Altman getting equity (potentially worth billions) render any conversion extremely suspect as it's hard to see how Altman's handpicked board could ever meaningfully authorize or conduct an arms-length transaction, and so it's hard to see how this could go through without leaving a bad odor (even if it does ultimately go through because the CA AG doesn't want to try to challenge it).