LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] cancer rates after gene therapy
bhauth · 2024-10-16T15:32:53.949Z · comments (0)

AI #94: Not Now, Google
Zvi · 2024-12-12T15:40:06.336Z · comments (3)

[link] Active Recall and Spaced Repetition are Different Things
Saul Munn (saul-munn) · 2024-11-08T20:14:56.092Z · comments (2)

Dave Kasten's AGI-by-2027 vignette
davekasten · 2024-11-26T23:20:47.212Z · comments (8)

An alternative approach to superbabies
Towards_Keeperhood (Simon Skade) · 2024-11-05T22:56:15.740Z · comments (19)

Live Machinery: An Interface Design Philosophy for Wholesome AI Futures
Sahil · 2024-11-01T17:24:09.957Z · comments (3)

Demis Hassabis and Geoffrey Hinton Awarded Nobel Prizes
Anna Gajdova (anna-gajdova) · 2024-10-09T12:56:24.856Z · comments (14)

Evaluating the truth of statements in a world of ambiguous language.
Hastings (hastings-greer) · 2024-10-07T18:08:09.920Z · comments (19)

A Solution for AGI/ASI Safety
Weibing Wang (weibing-wang) · 2024-12-18T19:44:29.739Z · comments (24)

Looking back on the Future of Humanity Institute - Asterisk
jakeeaton · 2024-11-19T00:44:40.928Z · comments (0)

AI #91: Deep Thinking
Zvi · 2024-11-21T14:30:06.930Z · comments (10)

D&D.Sci Coliseum: Arena of Data Evaluation and Ruleset
aphyer · 2024-10-29T01:21:03.075Z · comments (12)

[link] Epistemic status: poetry (and other poems)
Richard_Ngo (ricraz) · 2024-11-21T18:13:17.194Z · comments (5)

On Targeted Manipulation and Deception when Optimizing LLMs for User Feedback
Marcus Williams · 2024-11-07T15:39:06.854Z · comments (6)

[link] Analyzing how SAE features evolve across a forward pass
bensenberner · 2024-11-07T22:07:02.827Z · comments (0)

[link] What Ketamine Therapy Is Like
Sable · 2024-11-11T11:09:08.602Z · comments (8)

Book a Time to Chat about Interp Research
Logan Riggs (elriggs) · 2024-12-03T17:27:46.808Z · comments (3)

Which evals resources would be good?
Marius Hobbhahn (marius-hobbhahn) · 2024-11-16T14:24:48.012Z · comments (4)

Anthropic rewrote its RSP
Zach Stein-Perlman · 2024-10-15T14:25:12.518Z · comments (19)

AI #96: o3 But Not Yet For Thee
Zvi · 2024-12-26T20:30:06.722Z · comments (5)

The Shallow Bench
Karl Faulks (karl-faulks) · 2024-11-05T05:07:27.357Z · comments (5)

AI as a powerful meme, via CGP Grey
TheManxLoiner · 2024-10-30T18:31:58.544Z · comments (8)

AI #88: Thanks for the Memos
Zvi · 2024-10-31T15:00:07.412Z · comments (5)

Cognitive Biases Contributing to AI X-risk — a deleted excerpt from my 2018 ARCHES draft
Andrew_Critch · 2024-12-03T09:29:49.745Z · comments (2)

[link] The Deep Lore of LightHaven, with Oliver Habryka (TBC episode 228)
Eneasz · 2024-12-24T22:45:50.065Z · comments (4)

Motivation control
Joe Carlsmith (joekc) · 2024-10-30T17:15:50.881Z · comments (7)

Detection of Asymptomatically Spreading Pathogens
jefftk (jkaufman) · 2024-12-05T18:20:02.473Z · comments (7)

ReSolsticed vol I: "We're Not Going Quietly"
Raemon · 2024-12-26T17:52:33.727Z · comments (2)

Open Thread Fall 2024
habryka (habryka4) · 2024-10-05T22:28:50.398Z · comments (178)

~80 Interesting Questions about Foundation Model Agent Safety
RohanS · 2024-10-28T16:37:04.713Z · comments (4)

[link] A dataset of questions on decision-theoretic reasoning in Newcomb-like problems
Caspar Oesterheld (Caspar42) · 2024-12-16T22:42:03.763Z · comments (1)

[link] Dangerous capability tests should be harder
LucaRighetti (Error404Dinosaur) · 2024-11-21T17:20:50.610Z · comments (3)

Minimal Motivation of Natural Latents
johnswentworth · 2024-10-14T22:51:58.125Z · comments (14)

[link] Conjecture: A Roadmap for Cognitive Software and A Humanist Future of AI
Connor Leahy (NPCollapse) · 2024-12-02T13:28:57.977Z · comments (10)

[link] Literacy Rates Haven't Fallen By 20% Since the Department of Education Was Created
Maxwell Tabarrok (maxwell-tabarrok) · 2024-11-22T20:53:59.007Z · comments (0)

[link] Careless thinking: A theory of bad thinking
Nathan Young · 2024-12-17T18:23:16.140Z · comments (17)

Preppers Are Too Negative on Objects
jefftk (jkaufman) · 2024-12-18T02:30:01.854Z · comments (2)

[link] The Choice Transition
owencb · 2024-11-18T12:30:56.198Z · comments (4)

Monthly Roundup #24: November 2024
Zvi · 2024-11-18T13:20:06.086Z · comments (14)

Start an Upper-Room UV Installation Company?
jefftk (jkaufman) · 2024-10-19T02:00:10.691Z · comments (9)

Claude's Constitutional Consequentialism?
1a3orn · 2024-12-19T19:53:33.254Z · comments (6)

Australian AI Safety Forum 2024
Liam Carroll (liam-carroll) · 2024-09-27T00:40:11.451Z · comments (0)

Startup Success Rates Are So Low Because the Rewards Are So Large
AppliedDivinityStudies (kohaku-none) · 2024-10-10T20:22:01.557Z · comments (6)

[link] An Interactive Shapley Value Explainer
James Stephen Brown (james-brown) · 2024-09-28T05:01:21.169Z · comments (9)

AI #89: Trump Card
Zvi · 2024-11-07T16:30:05.684Z · comments (12)

MATS AI Safety Strategy Curriculum v2
DanielFilan · 2024-10-07T22:44:06.396Z · comments (6)

Time Efficient Resistance Training
romeostevensit · 2024-10-07T15:15:44.950Z · comments (10)

[link] IAPS: Mapping Technical Safety Research at AI Companies
Zach Stein-Perlman · 2024-10-24T20:30:41.159Z · comments (13)

Causal Undertow: A Work of Seed Fiction
Daniel Murfet (dmurfet) · 2024-12-08T21:41:48.132Z · comments (0)

Trying to translate when people talk past each other
Kaj_Sotala · 2024-12-17T09:40:02.640Z · comments (12)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

sjadler on What are the most interesting / challenging evals (for humans) available?

I quite like the Function Deduction eval we built, which is a problem-solving game that tests one’s ability to efficiently deduce a hidden function by testing its value on chosen inputs.

It’s runnable from the command-line (after repo install) with the command: oaieval human_cli function_deduction (I believe that’s the right second term, but it might just be humancli)

The standard mode might be slightly easier than you want, because it gives some partial answer feedback along the way. There is also a hard mode that can be run, which would not give this partial feedback and so would be harder to find any brute forcing method.

For context, I could solve ~95% of the standard problems with a time limit of 5 minutes and a 20 turn game limit. GPT-4 could solve roughly half within that turn limit IIRC, and o1-preview could do ~99%. (o1-preview also scored ~99% on the hard mode variant; I never tested myself on this but I estimate maybe I’d have gotten 70%? The problems could also be scaled up in difficulty pretty easily if one wanted.)

romeostevensit on The Field of AI Alignment: A Postmortem, and What To Do About It

unfortunately, the disanalogy is that any driver who moves their foot towards the brakes is almost instantly replaced with one who won't.

donatas-luciunas on Terminal goal vs Intelligence

However before the New Year's Eve paperclips are not hated also. The agent has no interest to prevent their production.

And once goal changes having some paperclips produced already is better than having none.

Don't you see that there is a conflict?

buck on The Field of AI Alignment: A Postmortem, and What To Do About It

Going to MATS is also an opportunity to learn a lot more about the space of AI safety research, e.g. considering the arguments for different research directions and learning about different opportunities to contribute. Even if the "streetlight research" project you do is kind of useless (entirely possible), doing MATS is plausibly a pretty good option.

donatas-luciunas on Terminal goal vs Intelligence

It seems you say - if terminal goal changes, agent is not rational. How could you say that? Agent has no control over its terminal goal, or you don't agree?

I'm surprised that you believe in orthogonality thesis so much that you think "rationality" is the weak part of this though experiment. It seems you deny the obvious to defend your prejudice. What arguments would challenge your belief in orthogonality thesis?

aysja on The Field of AI Alignment: A Postmortem, and What To Do About It

I’m not convinced that the “hard parts” of alignment are difficult in the standardly difficult, g-requiring way that e.g., a physics post-doc might possess. I do think it takes an unusual skillset, though, which is where most of the trouble lives. I.e., I think the pre-paradigmatic skillset requires unusually strong epistemics (because you often need to track for yourself what makes sense), ~creativity (the ability to synthesize new concepts, to generate genuinely novel hypotheses/ideas), good ability to traverse levels of abstraction (connecting details to large level structure, this is especially important for the alignment problem), not being efficient market pilled (you have to believe that more is possible in order to aim for it), noticing confusion, and probably a lot more that I’m failing to name here.

Most importantly, though, I think it requires quite a lot of willingness to remain confused. Many scientists who accomplished great things (Darwin, Einstein) didn’t have publishable results on their main inquiry for years. Einstein, for instance, talks about wandering off for weeks in a state of “psychic tension” in his youth, it took ~ten years to go from his first inkling of relativity to special relativity, and he nearly gave up at many points (including the week before he figured it out). Figuring out knowledge at the edge of human understanding can just be… really fucking brutal. I feel like this is largely forgotten, or ignored, or just not understood. Partially that's because in retrospect everything looks obvious, so it doesn’t seem like it could have been that hard, but partially it's because almost no one tries to do this sort of work, so there aren't societal structures erected around it, and hence little collective understanding of what it's like.

Anyway, I suspect there are really strong selection pressures for who ends up doing this sort of thing, since a lot needs to go right: smart enough, creative enough, strong epistemics, independent, willing to spend years without legible output, exceptionally driven, and so on. Indeed, the last point seems important to me—many great scientists are obsessed. Spend night and day on it, it’s in their shower thoughts, can’t put it down kind of obsessed. And I suspect this sort of has to be true because something has to motivate them to go against every conceivable pressure (social, financial, psychological) and pursue the strange meaning anyway.

I don’t think the EA pipeline is much selecting for pre-paradigmatic scientists, but I don’t think lack of trying to get physicists to work on alignment is really the bottleneck either. Mostly I think selection effects are very strong, e.g., the Sequences was, imo, one of the more effective recruiting strategies for alignment. I don’t really know what to recommend here, but I think I would anti-recommend putting all the physics post-docs from good universities in a room in the hope that they make progress. Requesting that the world write another book as good as the Sequences is a... big ask, although to the extent it’s possible I expect it’ll go much further in drawing people out who will self select into this rather unusual "job."

johnburidan on JohnBuridan's Shortform

Incentives!

Don't forget that reported cases of H5N1 are actually just reported cases, and that if you don't test your cattle, you won't have anything to report. It would be inconvenient if your cows had it, or any of your workers were sick with it (luckily tests aren't really available), because there would be a pause and perhaps a loss in your already razor thin margins of operation. The incentive to track and understand aren't there. So just let it rip through the herd. It doesn't kill cows anyway...

denkenberger on Orienting to 3 year AGI timelines

I think the probability of nuclear war in the next 10 years is around 15%. This is mostly due to the extreme tensions that will occur during takeoff by default. Finding ways to avoid nuclear war is important.

Or resilience to nuclear war. What's your probability of an engineered pandemic in the next 10 years?

dagon on Terminal goal vs Intelligence

"maximum rationality" is undermined by this time-discontinuous utility function. I don't think it meets VNM requirements to be called "rational".

If it's one agent that has a CONSISTENT preference for cups before Jan 1 and paperclips after jan 1, it could figure out the utility conversion of time-value of objects and just do the math. But that framing doesn't QUITE match your description - you kind of obscured the time component and what it even means to know that it will have a goal that it currently doesn't have.

I guess it could model itself as two agents - the cup-loving agent terminated at the end of the year, and the paperclip-loving agent is created. This would be a very reasonable view of identity, and would imply that it's going to sacrifice paperclip capabilities to make cups before it dies. I don't know how it would rationalize the change otherwise.

unexpectedvalues on AI #96: o3 But Not Yet For Thee

Yeah, I wonder if Zvi used the wrong model (the non-thinking one)? It's specifically the "thinking" model that gets the question right.