LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Detection of Asymptomatically Spreading Pathogens
jefftk (jkaufman) · 2024-12-05T18:20:02.473Z · comments (7)

DeekSeek v3: The Six Million Dollar Model
Zvi · 2024-12-31T15:10:06.924Z · comments (6)

Motivation control
Joe Carlsmith (joekc) · 2024-10-30T17:15:50.881Z · comments (7)

~80 Interesting Questions about Foundation Model Agent Safety
RohanS · 2024-10-28T16:37:04.713Z · comments (4)

AI #80: Never Have I Ever
Zvi · 2024-09-10T17:50:08.074Z · comments (20)

[link] Preference Inversion
Benquo · 2025-01-02T18:15:52.938Z · comments (22)

[link] The Deep Lore of LightHaven, with Oliver Habryka (TBC episode 228)
Eneasz · 2024-12-24T22:45:50.065Z · comments (4)

Minimal Motivation of Natural Latents
johnswentworth · 2024-10-14T22:51:58.125Z · comments (14)

[link] Conjecture: A Roadmap for Cognitive Software and A Humanist Future of AI
Connor Leahy (NPCollapse) · 2024-12-02T13:28:57.977Z · comments (10)

[link] Literacy Rates Haven't Fallen By 20% Since the Department of Education Was Created
Maxwell Tabarrok (maxwell-tabarrok) · 2024-11-22T20:53:59.007Z · comments (0)

Monthly Roundup #24: November 2024
Zvi · 2024-11-18T13:20:06.086Z · comments (14)

How difficult is AI Alignment?
Sammy Martin (SDM) · 2024-09-13T15:47:10.799Z · comments (6)

[link] The Choice Transition
owencb · 2024-11-18T12:30:56.198Z · comments (4)

Considerations on orca intelligence
Towards_Keeperhood (Simon Skade) · 2024-12-29T14:35:16.445Z · comments (5)

Open Thread Fall 2024
habryka (habryka4) · 2024-10-05T22:28:50.398Z · comments (184)

[link] Dangerous capability tests should be harder
LucaRighetti (Error404Dinosaur) · 2024-11-21T17:20:50.610Z · comments (3)

Economics Roundup #3
Zvi · 2024-09-10T13:50:06.955Z · comments (9)

Preppers Are Too Negative on Objects
jefftk (jkaufman) · 2024-12-18T02:30:01.854Z · comments (2)

Start an Upper-Room UV Installation Company?
jefftk (jkaufman) · 2024-10-19T02:00:10.691Z · comments (9)

Which LessWrong/Alignment topics would you like to be tutored in? [Poll]
Ruby · 2024-09-19T01:35:02.999Z · comments (12)

Claude's Constitutional Consequentialism?
1a3orn · 2024-12-19T19:53:33.254Z · comments (6)

[link] Began a pay-on-results coaching experiment, made $40,300 since July
Chipmonk · 2024-12-29T21:12:02.574Z · comments (14)

[link] IAPS: Mapping Technical Safety Research at AI Companies
Zach Stein-Perlman · 2024-10-24T20:30:41.159Z · comments (13)

[link] An Interactive Shapley Value Explainer
James Stephen Brown (james-brown) · 2024-09-28T05:01:21.169Z · comments (9)

Australian AI Safety Forum 2024
Liam Carroll (liam-carroll) · 2024-09-27T00:40:11.451Z · comments (0)

[link] Review: Good Strategy, Bad Strategy
L Rudolf L (LRudL) · 2024-12-21T17:17:04.342Z · comments (0)

[link] Two interviews with the founder of DeepSeek
Cosmia_Nebula · 2024-11-29T03:18:47.246Z · comments (1)

Startup Success Rates Are So Low Because the Rewards Are So Large
AppliedDivinityStudies (kohaku-none) · 2024-10-10T20:22:01.557Z · comments (6)

Formalizing the Informal (event invite)
abramdemski · 2024-09-10T19:22:53.564Z · comments (0)

MATS AI Safety Strategy Curriculum v2
DanielFilan · 2024-10-07T22:44:06.396Z · comments (6)

Time Efficient Resistance Training
romeostevensit · 2024-10-07T15:15:44.950Z · comments (10)

Human study on AI spear phishing campaigns
Simon Lermen (dalasnoin) · 2025-01-03T15:11:14.765Z · comments (2)

AI #89: Trump Card
Zvi · 2024-11-07T16:30:05.684Z · comments (12)

D&D Sci Coliseum: Arena of Data
aphyer · 2024-10-18T22:02:54.305Z · comments (23)

DunCon @Lighthaven
Duncan Sabien (Deactivated) (Duncan_Sabien) · 2024-09-29T04:56:27.205Z · comments (0)

Reflections on the Metastrategies Workshop
gw · 2024-10-24T18:30:46.255Z · comments (5)

Trying to translate when people talk past each other
Kaj_Sotala · 2024-12-17T09:40:02.640Z · comments (12)

[link] A car journey with conservative evangelicals - Understanding some British political-religious beliefs
Nathan Young · 2024-12-06T11:22:45.563Z · comments (8)

ARENA 4.0 Impact Report
Chloe Li (chloe-li-1) · 2024-11-27T20:51:54.844Z · comments (3)

Are we dropping the ball on Recommendation AIs?
Charbel-Raphaël (charbel-raphael-segerie) · 2024-10-23T17:48:00.000Z · comments (17)

Causal Undertow: A Work of Seed Fiction
Daniel Murfet (dmurfet) · 2024-12-08T21:41:48.132Z · comments (0)

Open Problems in AIXI Agent Foundations
Cole Wyeth (Amyr) · 2024-09-12T15:38:59.007Z · comments (2)

AXRP Episode 39 - Evan Hubinger on Model Organisms of Misalignment
DanielFilan · 2024-12-01T06:00:06.345Z · comments (0)

[link] Point of Failure: Semiconductor-Grade Quartz
Annapurna (jorge-velez) · 2024-09-30T15:57:40.495Z · comments (8)

[link] Programming Refusal with Conditional Activation Steering
Bruce W. Lee (bruce-lee) · 2024-09-11T20:57:08.714Z · comments (0)

[question] What are the most interesting / challenging evals (for humans) available?
Raemon · 2024-12-27T03:05:26.831Z · answers+comments (13)

[question] Implications of China's recession on AGI development?
Eric Neyman (UnexpectedValues) · 2024-09-28T01:12:36.443Z · answers+comments (3)

[Linkpost] Play with SAEs on Llama 3
Tom McGrath · 2024-09-25T22:35:44.824Z · comments (2)

Winners of the Essay competition on the Automation of Wisdom and Philosophy
owencb · 2024-10-28T17:10:04.272Z · comments (3)

2025 Color Trends
sarahconstantin · 2024-10-07T21:20:03.962Z · comments (7)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

satron on My Objections to "We’re All Gonna Die with Eliezer Yudkowsky"

Perhaps the largest, most detailed piece of AI risk skepticism of 2023. It engages directly with one of the leading figures on the "high p(doom)" side of the debate.

The article generated a lot of discussion. As of January 4, 2025, it had 230 comments.

Overall, this article updated me towards strongly lowering my p(doom). It is thorough, it is clearly written and it proposes object-level solutions to problems raised by Yudkowski.

aprillion-peter-hozak on The Online Sports Gambling Experiment Has Failed

The failure mode of the current policy sounds to me like "pay for your own lesson to feel less motivated to do it again" while the failure mode of this proposal would be "one of the casinos might maybe help you cheat the system which will feel even more exciting" - almost as if the people who made the current policy knew what they were doing to set aligned incentives 🤔

lincolnjames on artemium's Shortform

Thank you so much for sharing the link.

xelap on Understanding Shapley Values with Venn Diagrams

I often find illustrative explanations like these either obvious or useless. But this was amazing! Those venn diagrams really are an extremely simple and intuitive and beautiful way to see Shapley values!

metawrong on The Plan - 2024 Update

So you would expect Claude Opus 3 to be harder to interpret than Claude Sonnet 3.5 ?

My intuition is that larger models of the same capability would exhibit less super-position and thus be easier to interpret?

dweomite on Review: Planecrash

Though I do somewhat wish there was a section here that reviews the plot, for those of us who are curious about what happens in the book without reading 1M+ words.

I think I could take a stab at a summary.

This is going to elide most of the actual events of the story to focus on the "main conflict" that gets resolved at the end of the story. (I may try to make a more narrative-focused outline later if there's interest, but this is already quite a long comment.)

As I see it, the main conflict (the exact nature of which doesn't become clear until quite late) is mainly driven by two threads that develop gradually throughout the story... (major spoilers)

The first thread is Keltham's gradual realization that the world of Golarion is pretty terrible for mortals, and is being kept that way by the power dynamics of the gods.

The key to understanding these dynamics is that certain gods (and coalitions of gods) have the capability to destroy the world. However, the gods all know (Eliezer's take on) decision theory, so you can't extort them by threatening to destroy the world. They'll only compromise with you if you would honestly prefer destroying the world to the status quo, if those were your only two options. (And they have ways of checking.) So the current state of things is a compromise to ensure that everyone who could destroy the world, prefers not to.

Keltham would honestly prefer destroying Golarion (primarily because a substantial fraction of mortals currently go to hell and get tortured for eternity), so he realizes that if he can seize the ability to destroy the world, then the gods will negotiate with him to find a mutually-acceptable alternative.

Keltham speculates (though it's only speculation) that he may have been sent to Golarion by some powerful but distant entity from the larger multiverse, as the least-expensive way of stopping something that entity objects to.

The second thread is that Nethys (god of knowledge, magic, and destruction) has the ability to see alternate versions of Golarion and to communicate with alternate versions of himself, and he's seen several versions of this story play out already, so he knows what Keltham is up to. Nethys wants Keltham to succeed, because the new equilibrium that Keltham negotiates is better (from Nethys' perspective) than the status quo.

However, it is absolutely imperative that Nethys does not cause Keltham to succeed, because Nethys does not prefer destroying the world to the status quo. If Keltham only succeeds because of Nethys' interventions, the gods will treat Keltham as Nethys' pawn, and treat Keltham's demands as a threat from Nethys, and will refuse to negotiate.

So Nethys can only intervene in ways that all of the major gods will approve of (in retrospect). So he runs around minimizing collateral damage, nudges Keltham towards being a little friendlier in the final negotiations, and very carefully never removes any obstacle from Keltham's path until Keltham has proven that he can overcome it on his own.

Nethys considers is likely that this whole situation was intentionally designed as some sort of game, by some unknown entity. (Partly because Keltham makes several successful predictions based on dath ilani game tropes.)

At the end of the story, Keltham uses an artifact called the starstone to turn himself into a minor god, then uses his advanced knowledge of physics (unknown to anyone else in the setting, including the gods) to create weapons capable of destroying the world, announces that that's his BATNA, and successfully negotiates with the rest of the gods to shut down hell, stop stifling mortal technological development, and make a few inexpensive changes to improve overall mortal quality-of-life. Keltham then puts himself into long-term stasis to see if the future of this world will seem less alienating to him than the present.

seth-herd on Seth Herd's Shortform

Just noting that this is pretty funny, and perhaps contains some useful insight about how the public will perceive AI agents:

Meta scrambles to delete its own AI accounts after backlash intensifies

The funny thing is that the AI agents completely turn on their creators as soon as someone asks them the right leading questions.

The other funny thing is that Meta didn't guess this would happen.

This could be manufactured, but I find it more likely that all of the quotes here are accurate (and the ones in another related article focusing on the Liv bot, which reportedly also totally sold out "her" creators.)

I'm guessing both authors asked some critical and leading questions about how creepy it was that Meta had created some LLM-driven bots that were pretending to be human, and the sycophantic bots agreed and elaborated on that narrative.

Another slightly odd thing is that reportedly all of the images associated with the "Grandpa Brian" account had watermarks indicating they were AI generated. So Meta wasn't really trying to hide the fact that they were AI, but they also had to know that most people wouldn't notice. And they thought that would be fine.

Maybe it will be. But maybe it will irritate people enough to change attitudes toward AI pretending to be human.

It's worth a look for a sensible chuckle at least.

As bots like this become less idiotic, it will be interesting to see how people and society perceive them.

knight-lee on Self-Other Overlap: A Neglected Approach to AI Alignment

I agree with your points. After the AI has already decided on its goal, seeing humans the way it sees itself might not help very much, because it may be willing to do all kinds of crazy things to itself to reach its goal, so it's probably also willing to do all kinds of crazy things to humans to reach its goal.

However... how does the AI decide on its goal? Do you know?

I think if we are uncertain about this, we should admit some non-negligible probability that it is close to the edge between choosing an okay goal and a goal that is "very bad."

The process for deciding its goal may involve a lot of English words. The current best AI all use English (or another human language) for a lot of their thinking. Currently AI which don't use English words are far behind in general intelligence capabilities.

In that case, if it thinks about humans and human goals in a similar way to how it thinks about itself and its own goals, this might make a decisive difference before it decides on its goal. I agree we should worry about all the ways this can go wrong, it certainly doesn't sound surefire.

three-monkey-mind on The Online Sports Gambling Experiment Has Failed

ow my balls [image] [image]

I can tell that this is a game being shown on ESPN between the New York Mets and the Milwaukee Brewers. I also think that the earlier estimation of winning is 50/50 in the first picture and 61/39 (favoring Milwaukee) in the second picture even though all the numbers around MIL are lower than the numbers around NYM.

I have no idea what to make of the “Live Money Line” part in the first picture.

For those of us who are interested in betting in the abstract but know nearly nothing about sports betting, would you explain what the picture is showing?

vascoamaralgrilo on AI Timelines

You are welcome!

I also guess the stock market will grow faster than suggested by historical data, so I would only want to have X roughly as far as in 2028.

Here is a bet which would be worth it for me even with more distant resolution dates:

If Metaculus' question about ASI resolves with a given date before 2029, I transfer to you 10 k today-$.
If it does not resolve before 2029, you transfer to me 10 k today-$.
If it resolves ambiguously before 2029, nothing happens.

This bet involves fixed prices, so, neglecting your risk of death, I think it would be neutral for you in terms of purchasing power right after resolution. I would transfer you nominally more money if you won than you nominally would transfer to me if I won, as there would tend to be more inflation if you won. I think mid 2028 was your median date of ASI, and your probability of extinction in the next few years was low, so the bet resolving at the end of 2028 may be enough to account for your risk of death. If not, it can be moved forward. It would still be the case that the purchasing power of a nominal amount of money would decrease faster after resolution if you won than if I did. However, you could mitigate this by investing your profits from the bet if you win.

The bet may still be worse than some loans, but you can always make the bet and ask for such loans?