LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] Seven lessons I didn't learn from election day
Eric Neyman (UnexpectedValues) · 2024-11-14T18:39:07.053Z · comments (33)

The case for unlearning that removes information from LLM weights
Fabien Roger (Fabien) · 2024-10-14T14:08:04.775Z · comments (15)

[link] Anthropic: Three Sketches of ASL-4 Safety Case Components
Zach Stein-Perlman · 2024-11-06T16:00:06.940Z · comments (33)

[question] What are the best arguments for/against AIs being "slightly 'nice'"?
Raemon · 2024-09-24T02:00:19.605Z · answers+comments (58)

[link] Finishing The SB-1047 Documentary In 6 Weeks
Michaël Trazzi (mtrazzi) · 2024-10-28T20:17:47.465Z · comments (5)

The nihilism of NeurIPS
charlieoneill (kingchucky211) · 2024-12-20T23:58:11.858Z · comments (7)

2024 Petrov Day Retrospective
Ben Pace (Benito) · 2024-09-28T21:30:14.952Z · comments (25)

[link] Sabotage Evaluations for Frontier Models
David Duvenaud (david-duvenaud) · 2024-10-18T22:33:14.320Z · comments (55)

Science advances one funeral at a time
Cameron Berg (cameron-berg) · 2024-11-01T23:06:19.381Z · comments (9)

2024 Unofficial LessWrong Census/Survey
Screwtape · 2024-12-02T05:30:53.019Z · comments (42)

Catastrophic sabotage as a major threat model for human-level AI systems
evhub · 2024-10-22T20:57:11.395Z · comments (11)

LLMs Look Increasingly Like General Reasoners
eggsyntax · 2024-11-08T23:47:28.886Z · comments (45)

Zvi’s Thoughts on His 2nd Round of SFF
Zvi · 2024-11-20T13:40:08.092Z · comments (2)

A very strange probability paradox
notfnofn · 2024-11-22T14:01:36.587Z · comments (26)

[question] What are the strongest arguments for very short timelines?
Kaj_Sotala · 2024-12-23T09:38:56.905Z · answers+comments (64)

Anvil Problems
Screwtape · 2024-11-13T22:57:41.974Z · comments (13)

Three Notions of "Power"
johnswentworth · 2024-10-30T06:10:08.326Z · comments (44)

[Intuitive self-models] 1. Preliminaries
Steven Byrnes (steve2152) · 2024-09-19T13:45:27.976Z · comments (20)

(Salt) Water Gargling as an Antiviral
Elizabeth (pktechgirl) · 2024-11-22T18:00:02.765Z · comments (6)

[link] Self-Help Corner: Loop Detection
adamShimi · 2024-10-02T08:33:23.487Z · comments (6)

Parable of the vanilla ice cream curse (and how it would prevent a car from starting!)
Mati_Roy (MathieuRoy) · 2024-12-08T06:57:45.783Z · comments (21)

A breakdown of AI capability levels focused on AI R&D labor acceleration
ryan_greenblatt · 2024-12-22T20:56:00.298Z · comments (2)

Research update: Towards a Law of Iterated Expectations for Heuristic Estimators
Eric Neyman (UnexpectedValues) · 2024-10-07T19:29:29.033Z · comments (2)

[link] Should you be worried about H5N1?
gw · 2024-12-05T21:11:06.996Z · comments (2)

There is a globe in your LLM
jacob_drori (jacobcd52) · 2024-10-08T00:43:40.300Z · comments (4)

GPT-o1
Zvi · 2024-09-16T13:40:06.236Z · comments (34)

Circling as practice for “just be yourself”
Kaj_Sotala · 2024-12-16T07:40:04.482Z · comments (5)

5 homegrown EA projects, seeking small donors
Austin Chen (austin-chen) · 2024-10-28T23:24:25.745Z · comments (4)

Self-prediction acts as an emergent regularizer
Cameron Berg (cameron-berg) · 2024-10-23T22:27:03.664Z · comments (4)

Newsom Vetoes SB 1047
Zvi · 2024-10-01T12:20:06.127Z · comments (6)

[link] Is Deep Learning Actually Hitting a Wall? Evaluating Ilya Sutskever's Recent Claims
garrison · 2024-11-13T17:00:01.005Z · comments (14)

JargonBot Beta Test
Raemon · 2024-11-01T01:05:26.552Z · comments (55)

AIs Will Increasingly Fake Alignment
Zvi · 2024-12-24T13:00:07.770Z · comments (0)

OpenAI o1, Llama 4, and AlphaZero of LLMs
Vladimir_Nesov · 2024-09-14T21:27:41.241Z · comments (25)

Deep Causal Transcoding: A Framework for Mechanistically Eliciting Latent Behaviors in Language Models
Andrew Mack (andrew-mack) · 2024-12-03T21:19:42.333Z · comments (7)

[question] What are the good rationality films?
Ben Pace (Benito) · 2024-11-20T06:04:56.757Z · answers+comments (53)

AI #83: The Mask Comes Off
Zvi · 2024-09-26T12:00:08.689Z · comments (20)

AI #92: Behind the Curve
Zvi · 2024-11-28T14:40:05.448Z · comments (7)

Values Are Real Like Harry Potter
johnswentworth · 2024-10-09T23:42:24.724Z · comments (17)

Remap your caps lock key
bilalchughtai (beelal) · 2024-12-15T14:03:33.623Z · comments (17)

[Intuitive self-models] 2. Conscious Awareness
Steven Byrnes (steve2152) · 2024-09-25T13:29:02.820Z · comments (48)

How to prevent collusion when using untrusted models to monitor each other
Buck · 2024-09-25T18:58:20.693Z · comments (11)

Scaffolding for "Noticing Metacognition"
Raemon · 2024-10-09T17:54:13.657Z · comments (4)

[link] Gwern Branwen interview on Dwarkesh Patel’s podcast: “How an Anonymous Researcher Predicted AI's Trajectory”
Said Achmiz (SaidAchmiz) · 2024-11-14T23:53:34.922Z · comments (0)

[link] Not every accommodation is a Curb Cut Effect: The Handicapped Parking Effect, the Clapper Effect, and more
Michael Cohn (michael-cohn) · 2024-09-15T05:27:36.691Z · comments (39)

Testing which LLM architectures can do hidden serial reasoning
Filip Sondej · 2024-12-16T13:48:34.204Z · comments (9)

Graceful Degradation
Screwtape · 2024-11-05T23:57:53.362Z · comments (8)

Should there be just one western AGI project?
rosehadshar · 2024-12-03T10:11:17.914Z · comments (72)

[link] Gwern: Why So Few Matt Levines?
kave · 2024-10-29T01:07:27.564Z · comments (10)

[link] Is "superhuman" AI forecasting BS? Some experiments on the "539" bot from the Centre for AI Safety
titotal (lombertini) · 2024-09-18T13:07:40.754Z · comments (3)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

sharmake-farah on The Field of AI Alignment: A Postmortem, and What To Do About It

Re the OpenAI o-series and search, my initial prediction is that Q*/MCTS search will work well on problems that are easy to verify and and easy to get training data for, and not work if either of these 2 conditions are violated, and secondarily will be reliant on the model having good error correction capabilities to use the search effectively, which is why I expect we can make RL capable of superhuman performance on mathematics/programming with some rather moderate schlep/drudge work, and I also expect cost reductions such that it can actually be practical, but I'm only giving a 50/50 chance by 2028 for superhuman performance as measured by benchmarks in these domains.

I think my main difference from you, Thane Ruthenis is I expect costs to reduce surprisingly rapidly, though this is admittedly untested.

This will accelerate AI progress, but not immediately cause an AI explosion, though in the more extreme paces this could create something like a scenario where programming companies are founded by a few people smartly managing a lot of programming AIs, and programming/mathematics experiencing something like what happened to the news industry from the rise of the internet, where there was a lot of bankruptcy of the middle end, the top end won big, and most people are in the bottom end.

Also, correct point on how a lot of people's conceptions of search are babble-and-prune, not top down search like MCTS/Q*/BFS/DFS/A* (not specifically targeted at sunwillrisee

By contrast, my understanding is that the sort of search John is talking about retargeting isn't the brute-force babble-and-prune algorithms, but a top-down heuristical-constraint-based search [LW · GW].

cbiddulph on The Field of AI Alignment: A Postmortem, and What To Do About It

Maybe someone else could moderate it?

dagon on What's the best metric for measuring quality of life?

There's no good candidate for a simple, legible, easily-obtained, and agreeable-to-most metric. Before-and-after polling of patients is probably closest we can get.

That said, the dimensions of quality that the FDA concerns itself with (including physical functioning, self-reported pain, and other easily- and not-easily-measured things) is likely close enough to "improves quality of life" that it's not necessary to have a new direction.

Perhaps you could identify some drugs that you think would improve quality of life, and work backwards to the metrics that prove to you that they do so.

archimedes on Letter from an Alien Mind

this concern sounds like someone walking down a straight road and then closing their eyes cause they know where they want to go anyway

This doesn't sound like a good analogy at all. A better analogy might be a stylized subway map compared to a geographically accurate one. Sometimes removing detail can make it easier to process.

logan-zoellner on The Field of AI Alignment: A Postmortem, and What To Do About It

A policeman sees a drunk man searching for something under a streetlight and asks what the drunk has lost. He says he lost his keys and they both look under the streetlight together. After a few minutes the policeman asks if he is sure he lost them here, and the drunk replies, no, and that he lost them in the park. The policeman asks why he is searching here, and the drunk replies, "this is where the light is".

I've always been sympathetic to the drunk in this story. If the key is in the light, there is a chance of finding it. If it is in the dark, he's not going to find it anyway so there isn't much point in looking there.

Given the current state of alignment research, I think it's fair to say that we don't know where the answer will come from. I support The Plan [LW · GW] and I hope research continues on it. But if I had to guess, alignment will not be solved via getting a bunch of physicists thinking about agent foundations. It will be solved by someone who doesn't know better making a discovery they "wasn't supposed to work".

On an interesting side here a fun story about experts repeatedly failing to make an obvious-in-hindsight discovery because they "knew better".

viliam on If all trade is voluntary, then what is "exploitation?"

Seems to me that there is always some friction, some lack of information, etc., so "this requires market failure by definition" basically means "this happens in the real world".

chris_leong on The Field of AI Alignment: A Postmortem, and What To Do About It

Agreed. Simply focusing on physics post-docs feels too narrow to me.

Then again, just as John has a particular idea of what good alignment research looks like, I have my own idea: I would lean towards recruiting folk with both a technical and a philosophical background. It's possible that my own idea is just as narrow.

johnburidan on JohnBuridan's Shortform

You can buy bulk tamiflu powder here: https://www.selleckchem.com/products/oseltamivir-phosphate-Tamiflu.html And instructions for preparation are here: https://dph.illinois.gov/content/dam/soi/en/web/idph/files/publications/tami-flu-flyer-050316.pdf $300 for 13 doses, which would last you about 6 days, dosing the recommended twice daily during a pandemic. Shelf life of the powder is 2 years. Would any medicine nerd sanity check that I am not missing something essential?

viliam on If all trade is voluntary, then what is "exploitation?"

Let's say a company demands that a worker buy a suit worth $10, and gains $1 from it. Then the worker could offer to work for $1.05 less, but without the suit, and that would be more profitable for both the company and the worker.

And the company could simply say no, knowing that the worker has more to lose, relatively, and therefore will be likely to give up and accept the original deal.

Seems to me that at least a part of the intuition behind "exploitation" is that the person with greater negotiation power can precommit to reject even the win/win deals if they are not unbalanced enough in their favor.

To use the metaphor of a growing pie, imagine that there is a button that will magically summon a pie for both of us to share, but only if we both press the button simultaneously. Problem is, you are starving but I am not. So I say that unless you give me 90% of the pie, I refuse to press the button. I will lose some good pie, but I can live with that, and you can't.

Furthermore, this is an iterated game. If you accept to take 10% of the pie and let me take 90%, what happens when we find a similar button tomorrow? Yeah, you will be starving again, and I will be not.

(And this can get even more unfair, when the stronger party can use their advantage to lobby for making the environment even worse for the weaker party. Not sure what would be the proper metaphor here. Making it illegal to eat things other than pies? Making it illegal for two people to press the magical button unless one of them is me?)

seth-herd on The Field of AI Alignment: A Postmortem, and What To Do About It

Definitely.

Lack of publicly reporting null results was a subtle but huge problem in cognitive neuroscience. It took a while to figure out just how much effort was being wasted running studies that others had already tried and not reported because results were null.

Alignment doesn't have the same journal gatekeeping system that filters out null results, but there's probably a pretty strong tendency to report less on lack of progress than actual progress.

So post about it if you worked hard at something and got nowhere. This is valuable information when others choose their problems and approaches.

I do see people doing this; it would probably be valuable if we did it more.