LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

A case for donating to AI risk reduction (including if you work in AI)
tlevin (trevor) · 2024-12-02T19:05:06.658Z · comments (2)

[link] Zen and The Art of Semiconductor Manufacturing
Recurrented (rachel-farley) · 2024-12-09T17:19:35.236Z · comments (2)

AI #95: o1 Joins the API
Zvi · 2024-12-19T15:10:05.196Z · comments (1)

Checking in on Scott's composition image bet with imagen 3
Dave Orr (dave-orr) · 2024-12-22T19:04:17.495Z · comments (0)

Intricacies of Feature Geometry in Large Language Models
7vik (satvik-golechha) · 2024-12-07T18:10:51.375Z · comments (0)

An Illustrated Summary of "Robust Agents Learn Causal World Model"
Dalcy (Darcy) · 2024-12-14T15:02:44.828Z · comments (2)

Reading RFK Jr so that you don’t have to
braces · 2024-11-22T00:59:19.583Z · comments (1)

U.S.-China Economic and Security Review Commission pushes Manhattan Project-style AI initiative
Phib · 2024-11-19T18:42:43.296Z · comments (7)

Measuring whether AIs can statelessly strategize to subvert security measures
Alex Mallen (alex-mallen) · 2024-12-19T21:25:28.555Z · comments (0)

Neuroscience of human social instincts: a sketch
Steven Byrnes (steve2152) · 2024-11-22T16:16:52.552Z · comments (0)

[link] The Evals Gap
Marius Hobbhahn (marius-hobbhahn) · 2024-11-11T16:42:46.287Z · comments (7)

Win/continue/lose scenarios and execute/replace/audit protocols
Buck · 2024-11-15T15:47:24.868Z · comments (2)

Vegans need to eat just enough Meat - emperically evaluate the minimum ammount of meat that maximizes utility
Johannes C. Mayer (johannes-c-mayer) · 2024-12-22T22:08:31.971Z · comments (28)

[question] What Have Been Your Most Valuable Casual Conversations At Conferences?
johnswentworth · 2024-12-25T05:49:36.711Z · answers+comments (19)

o1 Turns Pro
Zvi · 2024-12-10T17:00:08.036Z · comments (3)

[link] a space habitat design
bhauth · 2024-11-25T17:28:48.481Z · comments (13)

Luck Based Medicine: No Good Very Bad Winter Cured My Hypothyroidism
Elizabeth (pktechgirl) · 2024-12-08T20:10:02.651Z · comments (3)

Estimates of GPU or equivalent resources of large AI players for 2024/5
CharlesD · 2024-11-28T23:01:58.522Z · comments (7)

A Conflicted Linkspost
Screwtape · 2024-11-21T00:37:54.035Z · comments (0)

[link] Just one more exposure bro
Chipmonk · 2024-12-12T21:37:07.069Z · comments (6)

I Finally Worked Through Bayes' Theorem (Personal Achievement)
keltan · 2024-12-05T02:04:16.547Z · comments (6)

Correct my H5N1 research ($reward)
Elizabeth (pktechgirl) · 2024-12-09T19:07:03.277Z · comments (23)

[link] Ideas for benchmarking LLM creativity
gwern · 2024-12-16T05:18:55.631Z · comments (10)

A Solution for AGI/ASI Safety
Weibing Wang (weibing-wang) · 2024-12-18T19:44:29.739Z · comments (25)

[link] A toy evaluation of inference code tampering
Fabien Roger (Fabien) · 2024-12-09T17:43:40.910Z · comments (0)

AI #94: Not Now, Google
Zvi · 2024-12-12T15:40:06.336Z · comments (3)

[link] Review: Breaking Free with Dr. Stone
TurnTrout · 2024-12-18T01:26:37.730Z · comments (4)

AI #96: o3 But Not Yet For Thee
Zvi · 2024-12-26T20:30:06.722Z · comments (5)

Dave Kasten's AGI-by-2027 vignette
davekasten · 2024-11-26T23:20:47.212Z · comments (8)

ReSolsticed vol I: "We're Not Going Quietly"
Raemon · 2024-12-26T17:52:33.727Z · comments (3)

Looking back on the Future of Humanity Institute - Asterisk
jakeeaton · 2024-11-19T00:44:40.928Z · comments (0)

[link] Active Recall and Spaced Repetition are Different Things
Saul Munn (saul-munn) · 2024-11-08T20:14:56.092Z · comments (2)

[link] Epistemic status: poetry (and other poems)
Richard_Ngo (ricraz) · 2024-11-21T18:13:17.194Z · comments (5)

On Targeted Manipulation and Deception when Optimizing LLMs for User Feedback
Marcus Williams · 2024-11-07T15:39:06.854Z · comments (6)

Which evals resources would be good?
Marius Hobbhahn (marius-hobbhahn) · 2024-11-16T14:24:48.012Z · comments (4)

[link] Analyzing how SAE features evolve across a forward pass
bensenberner · 2024-11-07T22:07:02.827Z · comments (0)

AI #91: Deep Thinking
Zvi · 2024-11-21T14:30:06.930Z · comments (10)

Book a Time to Chat about Interp Research
Logan Riggs (elriggs) · 2024-12-03T17:27:46.808Z · comments (3)

[link] What Ketamine Therapy Is Like
Sable · 2024-11-11T11:09:08.602Z · comments (8)

Cognitive Biases Contributing to AI X-risk — a deleted excerpt from my 2018 ARCHES draft
Andrew_Critch · 2024-12-03T09:29:49.745Z · comments (2)

Detection of Asymptomatically Spreading Pathogens
jefftk (jkaufman) · 2024-12-05T18:20:02.473Z · comments (7)

[link] The Deep Lore of LightHaven, with Oliver Habryka (TBC episode 228)
Eneasz · 2024-12-24T22:45:50.065Z · comments (4)

[link] A dataset of questions on decision-theoretic reasoning in Newcomb-like problems
Caspar Oesterheld (Caspar42) · 2024-12-16T22:42:03.763Z · comments (1)

[link] Literacy Rates Haven't Fallen By 20% Since the Department of Education Was Created
Maxwell Tabarrok (maxwell-tabarrok) · 2024-11-22T20:53:59.007Z · comments (0)

Preppers Are Too Negative on Objects
jefftk (jkaufman) · 2024-12-18T02:30:01.854Z · comments (2)

[link] Careless thinking: A theory of bad thinking
Nathan Young · 2024-12-17T18:23:16.140Z · comments (17)

[link] Dangerous capability tests should be harder
LucaRighetti (Error404Dinosaur) · 2024-11-21T17:20:50.610Z · comments (3)

[link] Conjecture: A Roadmap for Cognitive Software and A Humanist Future of AI
Connor Leahy (NPCollapse) · 2024-12-02T13:28:57.977Z · comments (10)

Monthly Roundup #24: November 2024
Zvi · 2024-11-18T13:20:06.086Z · comments (14)

[link] The Choice Transition
owencb · 2024-11-18T12:30:56.198Z · comments (4)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

lunatic_at_large on Whistleblowing Twitter Bot

I agree, though I think it would be a very ridiculous own-goal if e.g. GPT-4o decided to block a whistleblowing report about OpenAI because it was trained to serve OpenAI's interests. I think any model used by this kind of whistleblowing tool should be open-source (nothing fancy / more dangerous than what's already out there), run locally by the operators of the tool, and tested to make sure it doesn't block legitimate posts.

lunatic_at_large on Whistleblowing Twitter Bot

My gut instinct is that this would have been a fantastic thing to create 2-4 years ago. My biggest hesitation is that the probability a tool like this decreases existential risk is proportional to the fraction of lab researchers who know about it and adoption can be a slow / hard thing to make happen. I still think that this kind of program could be incredibly valuable under the right circumstances so someone should probably be working on this.

Also, I have a very amateurish security question: if someone provides their work email to verify their authenticity with this tool, can their employer find out? For example, I wouldn't put it past OpenAI to check if an employee's email account got pinged by this tool and then to pressure / fire that employee.

tailcalled on What's the best metric for measuring quality of life?

Since it's the FDA that's doing the regulating, they could pick the investigator. Completely ungameable.

viliam on ReSolsticed vol I: "We're Not Going Quietly"

Fantastic! Are lyrics available somewhere?

christiankl on What's the best metric for measuring quality of life?

That sounds like it's relatively easy to game by the company who chooses the investigators.

alexey on The Online Sports Gambling Experiment Has Failed

I mostly agree, but it's a double-digit percent increase in bankruptcies which ends up being (from the post)

about 4bps (0.04%)/year of additional bankruptcies

buck on The Field of AI Alignment: A Postmortem, and What To Do About It

Do you mean during the program? Sure, maybe the only MATS offers you can get are for projects you think aren't useful--I think some MATS projects are pretty useless (e.g. our dear OP's). But it's still an opportunity to argue with other people about the problems in the field and see whether anyone has good justifications for their prioritization. And you can stop doing the streetlight stuff afterwards if you want to.

Remember that the top-level commenter here is currently a physicist, so it's not like the usefulness of their work would be going down by doing a useless MATS project :P

alexey on The Online Sports Gambling Experiment Has Failed

But, crucially, if one product is not available, then these people will very likely form an addiction to something else. That is what 'addictive personality disorder' means.

Except whatever they got addicted to before the legalization of online sports betting, it apparently led to much lower bankruptcy rates etc.

I feel that the discourse has quietly assumed a fabricated option: if these people can't gamble then they will be happy unharmed non-addicts.

This post isn't quietly assuming something: it's loudly giving evidence that they will be much less harmed.

richard_kennaway on Terminal goal vs Intelligence

A terminal goal is (this is the definition of the term) a goal which is not instrumental to any other goal.

If an agent knows its terminal goal, and has a goal of preventing it from changing, then which of those goals is its actual terminal goal?

If it knows its current terminal goal, and knows that that goal might be changed in the future, is there any reason it must try to prevent that? Whatever is written in the slot marked “terminal goal” is what it will try to achieve at the time.

If its actual terminal goal is of the form “X, and in addition prevent this from ever being changed”, then it will resist its terminal goal being changed.

If its actual terminal goal is simply X, it will not.

This is regardless of how intelligent it is, and how uncertain or not it is about the future.

tailcalled on What's the best metric for measuring quality of life?

The best way is probably to have an excellent investigator rank the research subjects by their quality of life. If you've got a good idea about what a high-quality life is, you could probably do the ranking of them yourself.