LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

How To Believe False Things
Eneasz · 2025-04-02T16:28:29.055Z · comments (10)

How I force LLMs to generate correct code
claudio · 2025-03-21T14:40:19.211Z · comments (7)

Vacuum Decay: Expert Survey Results
JessRiedel · 2025-03-13T18:31:17.434Z · comments (25)

A Slow Guide to Confronting Doom
Ruby · 2025-04-06T02:10:56.483Z · comments (20)

One-shot steering vectors cause emergent misalignment, too
Jacob Dunefsky (jacob-dunefsky) · 2025-04-14T06:40:41.503Z · comments (6)

OpenAI #11: America Action Plan
Zvi · 2025-03-18T12:50:03.880Z · comments (3)

Keltham's Lectures in Project Lawful
Morpheus · 2025-04-01T10:39:47.973Z · comments (4)

You will crash your car in front of my house within the next week
Richard Korzekwa (Grothor) · 2025-04-01T21:43:21.472Z · comments (6)

Mistral Large 2 (123B) exhibits alignment faking
Marc Carauleanu (Marc-Everin Carauleanu) · 2025-03-27T15:39:02.176Z · comments (4)

Go home GPT-4o, you’re drunk: emergent misalignment as lowered inhibitions
Stuart_Armstrong · 2025-03-18T14:48:54.762Z · comments (12)

AI-enabled coups: a small group could use AI to seize power
Tom Davidson (tom-davidson-1) · 2025-04-16T16:51:29.561Z · comments (1)

Elon Musk May Be Transitioning to Bipolar Type I
Cyborg25 · 2025-03-11T17:45:06.599Z · comments (22)

Announcing ILIAD2: ODYSSEY
Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2025-04-03T17:01:06.004Z · comments (1)

[link] Eukaryote Skips Town - Why I'm leaving DC
eukaryote · 2025-03-26T17:16:29.663Z · comments (1)

[link] AI for AI safety
Joe Carlsmith (joekc) · 2025-03-14T15:00:23.491Z · comments (12)

[link] Preparing for the Intelligence Explosion
fin · 2025-03-11T15:38:29.524Z · comments (17)

[link] AI for Epistemics Hackathon
Austin Chen (austin-chen) · 2025-03-14T20:46:34.250Z · comments (10)

[link] New Paper: Infra-Bayesian Decision-Estimation Theory
Vanessa Kosoy (vanessa-kosoy) · 2025-04-10T09:17:38.966Z · comments (4)

[link] ASI existential risk: Reconsidering Alignment as a Goal
habryka (habryka4) · 2025-04-15T19:57:42.547Z · comments (7)

PauseAI and E/Acc Should Switch Sides
WillPetillo · 2025-04-01T23:25:51.265Z · comments (6)

Why does LW not put much more focus on AI governance and outreach?
Severin T. Seehrich (sts) · 2025-04-12T14:24:54.197Z · comments (28)

The principle of genomic liberty
TsviBT · 2025-03-19T14:27:57.175Z · comments (51)

Fun With GPT-4o Image Generation
Zvi · 2025-03-26T19:50:03.270Z · comments (3)

100+ concrete projects and open problems in evals
Marius Hobbhahn (marius-hobbhahn) · 2025-03-22T15:21:40.970Z · comments (1)

[link] birds and mammals independently evolved intelligence
bhauth · 2025-04-08T20:00:05.100Z · comments (23)

Introducing 11 New AI Safety Organizations - Catalyze's Winter 24/25 London Incubation Program Cohort
Alexandra Bos (AlexandraB) · 2025-03-10T19:26:11.017Z · comments (0)

I'm resigning as Meetup Czar. What's next?
Screwtape · 2025-04-02T00:30:42.110Z · comments (2)

Disempowerment spirals as a likely mechanism for existential catastrophe
Raymond D · 2025-04-10T14:37:58.301Z · comments (6)

AI 2027: Dwarkesh’s Podcast with Daniel Kokotajlo and Scott Alexander
Zvi · 2025-04-07T13:40:05.944Z · comments (2)

Will compute bottlenecks prevent a software intelligence explosion?
Tom Davidson (tom-davidson-1) · 2025-04-04T17:41:37.088Z · comments (2)

[link] Phoenix Rising
Metacelsus · 2025-03-09T11:53:52.618Z · comments (7)

AI CoT Reasoning Is Often Unfaithful
Zvi · 2025-04-04T14:50:05.538Z · comments (4)

Selective modularity: a research agenda
cloud · 2025-03-24T04:12:44.822Z · comments (2)

Going Nova
Zvi · 2025-03-19T13:30:01.293Z · comments (14)

LLM AGI will have memory, and memory changes alignment
Seth Herd · 2025-04-04T14:59:13.070Z · comments (9)

[link] Google DeepMind: An Approach to Technical AGI Safety and Security
Rohin Shah (rohinmshah) · 2025-04-05T22:00:14.803Z · comments (12)

Feedback loops for exercise (VO2Max)
Elizabeth (pktechgirl) · 2025-03-18T00:10:06.827Z · comments (9)

Book Review: Affective Neuroscience
sarahconstantin · 2025-03-10T06:50:04.602Z · comments (8)

Renormalization Roadmap
Lauren Greenspan (LaurenGreenspan) · 2025-03-31T20:34:16.352Z · comments (7)

Apply to MATS 8.0!
Ryan Kidd (ryankidd44) · 2025-03-20T02:17:58.018Z · comments (4)

FrontierMath Score of o3-mini Much Lower Than Claimed
YafahEdelman (yafah-edelman-1) · 2025-03-17T22:41:06.527Z · comments (7)

Steelmanning heuristic arguments
Dmitry Vaintrob (dmitry-vaintrob) · 2025-04-13T01:09:33.392Z · comments (0)

[link] How Gay is the Vatican?
rba · 2025-04-06T21:27:50.530Z · comments (32)

[link] Softmax, Emmett Shear's new AI startup focused on "Organic Alignment"
Chipmonk · 2025-03-28T21:23:46.220Z · comments (1)

[link] Sentinel's Global Risks Weekly Roundup #11/2025. Trump invokes Alien Enemies Act, Chinese invasion barges deployed in exercise.
NunoSempere (Radamantis) · 2025-03-17T19:34:01.850Z · comments (3)

Solving willpower seems easier than solving aging
Yair Halberstadt (yair-halberstadt) · 2025-03-23T15:25:40.861Z · comments (28)

Socially Graceful Degradation
Screwtape · 2025-03-20T04:03:41.213Z · comments (9)

Housing Roundup #11
Zvi · 2025-04-01T16:30:03.694Z · comments (1)

How I switched careers from software engineer to AI policy operations
Lucie Philippon (lucie-philippon) · 2025-04-13T06:37:33.507Z · comments (1)

Consider showering
bohaska (Bohaska) · 2025-04-01T23:54:26.714Z · comments (15)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

vladimir_nesov on Shortform

My first impression of o3 (as available via Chatbot Arena) is that when I'm showing it my AI scaling analysis comments (such as this [LW(p) · GW(p)] and this [LW(p) · GW(p)]), it responds with confident unhinged speculation teeming with hallucinations, compared to the other recent models that usually respond with bland rephrasings that get almost everything correctly with a few minor hallucinations or reasonable misconceptions carrying over from their outdated knowledge.

Don't know yet if it's specific to speculative/forecasting discussions, but it doesn't look good (for faithfulness of arguments) when combined with good performance on benchmarks. Possibly a stream of consciousness style data is useful within long reasoning traces and can add up to normality for questions with a localized answer, but results in spurious details that aren't measured by hallucination benchmarks and so get worse. Though in the o3 System Card hallucination rate also significantly increased compared to o1 (Section 3.3).

yonatank on Power Lies Trembling: a three-book review

A relevant, very recent opinion piece that has been syndicated around the country, explaining the universal value of faith:
https://www.latimes.com/opinion/story/2025-03-29/it-is-not-faith-that-divides-us

knight-lee on AI-enabled coups: a small group could use AI to seize power

In a just world, mitigations against AI-enabled coups will be similar to mitigations against AI takeover risk.

In a cynical world, mitigations against AI-enabled coups involve installing your own allies to supervise (or lead) AI labs, and taking actions against humans you dislike. Leaders mitigating the risk may simply make sure that if it does happen, it's someone on their side. Leaders who believe in the risk may even accelerate the US-China AI race faster.

Note: I don't really endorse the "cynical world," I'm just writing it as food for thought :)

oliver-clive-griffin on Lucius Bushnaq's Shortform

Is the central point here that a given input will activate it's representation in both the size 1000 and size 50 sub-dictionaries, meaning the reconstruction will be 2x too big?

jenn on jenn's Shortform

i agree that there doesn't seem to be any sort of rigorous way to get off the crazy train in some principled manner, and that fundamentally it does come down to vibes. but that only makes it worse if people are uncritical/uncurious/uncaring/unrigorous about how said vibes are generated. like, i see angst about it in the ea sphere about the inconsistency/intransitivity, and various attempts to discuss or tackle it, and this seems useful to me even though it's still mostly groping around in the dark. in academia there seems to be a missing mood.

roman-malov on Roman Malov's Shortform

Rule and Example

Rules can generate examples. For instance: DALLE-3 is a rule according to which different examples (images) are generated.

From examples, rules can be inferred. For example: with a sufficient dataset of images and their names, a DALLE-3 model can be trained on it.

In computer science, there is a concept called Kolmogorov complexity of data. It is (roughly) defined as the length of the shortest program capable of producing that data.

Some data are simple and can be compressed easily; some are complex and harder to compress. In a sense, the task of machine learning is to find a program of a given size that serves as a "compression" of the dataset.

In the real world, although knowing the underlying rule is often very useful, sometimes it is more practical to use a giant look-up table (GLUT) [LW · GW] of examples. Sometimes you need to memorize the material instead of trying to "understand" it.

Sometimes there are examples that are more complex than the rule that generated them. For example, in the interval [0;1] (which is quite easy to describe, the rule being: all numbers are not greater than 1 and not less than 0), there exists a number containing all the works of Shakespeare (which definitely cannot be compressed to a description comparable to that of the interval [0;1]).

Or, сonsider the program that outputs every natural number from 1 to (which is very short, because the Kolmogorov complexity of $10^{10^{20}}$ is low) will at some point produce a binary encoding of LOTR. In that case, the complexity lies in the starting index, the map for finding the needle in the haystack is as valuable (and as complex) as the needle itself.

Properties follow from rules. It is not necessary to know about every example of a rule in order to have some information about all of them. Moreover, all examples together can have less information (or Kolmogorov complexity) than sum of individual Kolmogorov complexities (as in example above).

knight-lee on Commitment Races are a technical problem ASI can easily solve

After thinking about it more, it's possible your model of why Commitment Races resolve fairly, is more correct than my model of why Commitment Races resolve fairly, although I'm less certain they do resolve fairly.

My model's flaw

My model is that acausal influence does not happen until one side deliberately simulates the other and sees their commitment. Therefore, it is advantageous for both sides to commit up to but not exceeding some Schelling point of fairness, before simulating the other, so that the first acasual message will maximize their payoff without triggering a mutual disaster.

I think one possibly fatal flaw of my model is that it doesn't explain why one side shouldn't add the exception "but if the other side became a rock with an ultimatum, I'll still yield to them, conditional on the fact they became a rock with an ultimatum before realizing I will add this exception (by simulating me or receiving acausal influence from me)."

According to my model, adding this exception improves ones encounters with rocks with ultimatums by yielding to them, and does not increase the rate of encountering rocks with ultimatums (at least in the first round of acausal negotation, which may be the only round), since the exception explicitly rules out yielding to agents affected by whether you make exception.

This means that in my model, becoming a rock with an ultimatum may still be the winning strategy, conditional on the fact the agent becoming a rock with an ultimatum doesn't know it is the winning strategy, and the Commitment Race problem may reemerge.

Your model

My guess of your model, is that acausal influence is happening a lot, such that refusing in the ultimatum game can successfully punish the prior decision to be unfair (i.e. reduce the frequency of prior decisions to be unfair).

In order for your refusal to influence their frequency of being unfair, your refusal has to have some kind of acausal influence on them, even if they are relatively simpler minds than you (and can't simulate you).

At first, this seemed impossible to me, but after thinking about it more, maybe even if you are a more complex mind than the other player, your decision-making may be made out of simpler algorithms, some of which they can imagine and be influenced by.

thane-ruthenis on Surprising LLM reasoning failures make me think we still need qualitative breakthroughs for AGI

OpenAI's o3 and o4-mini models are likely to become accessible for $20000 per month

Already available for $20/month.

The $20,000/month claims seems to originate from that atrocious The Information article, which threw together a bunch of unrelated sentences at the end to create the (false) impression that o3 and o4-mini are innovator-agents which will become available for $20,000/month this week. In actuality, the sentences "OpenAI believes it can charge $20,000 per month for doctorate-level AI", "new AI aims to resemble inventors", and "OpenAI is preparing to launch [o3 and o4-mini] this week" are separately true, but have nothing to do with each other.

benwr on Not all capabilities will be created equal: focus on strategically superhuman agents

I think it seems like a fine possibility in principle, actually; sorry to have given the wrong impression! It's not my central hope, since strategy-stealing seems like it should make many human-augmentations "available" to AI systems as well. This is notably not true for things involving, e.g., BCIs or superbabies.

nick_tarleton on Surprising LLM reasoning failures make me think we still need qualitative breakthroughs for AGI

That article is sloppily written enough to say "Early testers report that the AI [i.e. o3 and/or o4-mini] can generate original research ideas in fields like nuclear fusion, drug discovery, and materials science; tasks usually reserved for PhD-level experts" linking, as a citation, to OpenAI's January release announcement of o3-mini.

TechCrunch attributes the rumor to a paywalled article in The Information (and attributes the price to specialized agents, not o3 or o4-mini themselves).