LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Among Us: A Sandbox for Agentic Deception
7vik (satvik-golechha) · 2025-04-05T06:24:49.000Z · comments (4)

AGI Safety & Alignment @ Google DeepMind is hiring
Rohin Shah (rohinmshah) · 2025-02-17T21:11:18.970Z · comments (19)

[link] Detecting Strategic Deception Using Linear Probes
Nicholas Goldowsky-Dill (nicholas-goldowsky-dill) · 2025-02-06T15:46:53.024Z · comments (9)

Fake thinking and real thinking
Joe Carlsmith (joekc) · 2025-01-28T20:05:06.735Z · comments (11)

How do you deal w/ Super Stimuli?
Logan Riggs (elriggs) · 2025-01-14T15:14:51.552Z · comments (25)

My model of what is going on with LLMs
Cole Wyeth (Amyr) · 2025-02-13T03:43:29.447Z · comments (49)

[link] A short course on AGI safety from the GDM Alignment team
Vika · 2025-02-14T15:43:50.903Z · comments (1)

C'mon guys, Deliberate Practice is Real
Raemon · 2025-02-05T22:33:59.069Z · comments (25)

AI Control May Increase Existential Risk
Jan_Kulveit · 2025-03-11T14:30:05.972Z · comments (13)

[link] What the Headlines Miss About the Latest Decision in the Musk vs. OpenAI Lawsuit
garrison · 2025-03-06T19:49:02.145Z · comments (0)

Third-wave AI safety needs sociopolitical thinking
Richard_Ngo (ricraz) · 2025-03-27T00:55:30.548Z · comments (23)

Timaeus in 2024
Jesse Hoogland (jhoogland) · 2025-02-20T23:54:56.939Z · comments (1)

The Lizardman and the Black Hat Bobcat
Screwtape · 2025-04-06T19:02:01.238Z · comments (13)

How I talk to those above me
Maxwell Peterson (maxwell-peterson) · 2025-03-30T06:54:59.869Z · comments (13)

Show, not tell: GPT-4o is more opinionated in images than in text
Daniel Tan (dtch1997) · 2025-04-02T08:51:02.571Z · comments (41)

We probably won't just play status games with each other after AGI
Matthew Barnett (matthew-barnett) · 2025-01-15T04:56:38.330Z · comments (21)

The Rising Sea
Jesse Hoogland (jhoogland) · 2025-01-25T20:48:52.971Z · comments (2)

How training-gamers might function (and win)
Vivek Hebbar (Vivek) · 2025-04-11T21:26:18.669Z · comments (4)

Six Thoughts on AI Safety
boazbarak · 2025-01-24T22:20:50.768Z · comments (55)

[link] Towards a scale-free theory of intelligent agency
Richard_Ngo (ricraz) · 2025-03-21T01:39:42.251Z · comments (22)

[link] Elite Coordination via the Consensus of Power
Richard_Ngo (ricraz) · 2025-03-19T06:56:44.825Z · comments (15)

Dear AGI,
Nathan Young · 2025-02-18T10:48:15.030Z · comments (11)

Three Months In, Evaluating Three Rationalist Cases for Trump
Arjun Panickssery (arjun-panickssery) · 2025-04-18T08:27:27.257Z · comments (11)

Agent Foundations 2025 at CMU
Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2025-01-19T23:48:22.569Z · comments (10)

Thoughts on the conservative assumptions in AI control
Buck · 2025-01-17T19:23:38.575Z · comments (5)

Tips and Code for Empirical Research Workflows
John Hughes (john-hughes) · 2025-01-20T22:31:51.498Z · comments (14)

Implications of the inference scaling paradigm for AI safety
Ryan Kidd (ryankidd44) · 2025-01-14T02:14:53.562Z · comments (70)

We should start looking for scheming "in the wild"
Marius Hobbhahn (marius-hobbhahn) · 2025-03-06T13:49:39.739Z · comments (4)

[link] Five Recent AI Tutoring Studies
Arjun Panickssery (arjun-panickssery) · 2025-01-19T03:53:47.714Z · comments (0)

[link] Anthropic releases Claude 3.7 Sonnet with extended thinking mode
LawrenceC (LawChan) · 2025-02-24T19:32:43.947Z · comments (8)

On Emergent Misalignment
Zvi · 2025-02-28T13:10:05.973Z · comments (5)

[link] Wired on: "DOGE personnel with admin access to Federal Payment System"
Raemon · 2025-02-05T21:32:11.205Z · comments (45)

How To Believe False Things
Eneasz · 2025-04-02T16:28:29.055Z · comments (10)

Training AGI in Secret would be Unsafe and Unethical
Daniel Kokotajlo (daniel-kokotajlo) · 2025-04-18T12:27:35.795Z · comments (2)

How I force LLMs to generate correct code
claudio · 2025-03-21T14:40:19.211Z · comments (7)

Voting Results for the 2023 Review
Raemon · 2025-02-06T08:00:37.461Z · comments (3)

The Risk of Gradual Disempowerment from AI
Zvi · 2025-02-05T22:10:06.979Z · comments (15)

Vacuum Decay: Expert Survey Results
JessRiedel · 2025-03-13T18:31:17.434Z · comments (26)

[link] The Manhattan Trap: Why a Race to Artificial Superintelligence is Self-Defeating
Corin Katzke (corin-katzke) · 2025-01-21T16:57:00.998Z · comments (11)

Stargate AI-1
Zvi · 2025-01-24T15:20:18.752Z · comments (1)

What goals will AIs have? A list of hypotheses
Daniel Kokotajlo (daniel-kokotajlo) · 2025-03-03T20:08:31.539Z · comments (19)

One-shot steering vectors cause emergent misalignment, too
Jacob Dunefsky (jacob-dunefsky) · 2025-04-14T06:40:41.503Z · comments (6)

A Slow Guide to Confronting Doom
Ruby · 2025-04-06T02:10:56.483Z · comments (20)

OpenAI #11: America Action Plan
Zvi · 2025-03-18T12:50:03.880Z · comments (3)

Ambiguous out-of-distribution generalization on an algorithmic task
Wilson Wu (wilson-wu) · 2025-02-13T18:24:36.160Z · comments (6)

How might we safely pass the buck to AI?
joshc (joshua-clymer) · 2025-02-19T17:48:32.249Z · comments (58)

The Mask Comes Off: A Trio of Tales
Zvi · 2025-02-14T15:30:15.372Z · comments (1)

[link] ASI existential risk: Reconsidering Alignment as a Goal
habryka (habryka4) · 2025-04-15T19:57:42.547Z · comments (14)

On the OpenAI Economic Blueprint
Zvi · 2025-01-15T14:30:06.773Z · comments (2)

Keltham's Lectures in Project Lawful
Morpheus · 2025-04-01T10:39:47.973Z · comments (4)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

gjm on o3 Will Use Its Tools For You

Pedantic note: there are many instances of "syncopathy" that I am fairly sure should be "sycophancy".

(It's an understandable mistake -- "syncopathy" is composed of familiar components, which could plausibly be put together to mean something like "the disease of agreeing too much" which is, at least in the context of AI, not far off what sycophancy in fact means. Whereas if you can parse "sycophancy" at all you might work out that it means "fig-showing" which obviously has nothing to do with anything. So far as I can tell, no one actually knows how "fig-showing" came to be the term for servile flattery.)

michaeldickens on Planning for Extreme AI Risks

I think the right way to self-destruct isn't to shut down entirely. It's to spend all your remaining assets on safety (whether that be lobbying for regulations, or research, or whatever). This would greatly increase the total amount of money spent on safety efforts so it might help quite a lot.

I do believe shutting down does have a decent chance, although not a comfortingly large one, of scaring government and/or other AI companies into taking the risks seriously.

anthonyc on What Makes an AI Startup "Net Positive" for Safety?

I won't comment on your specific startup, but I wonder in general how an AI Safety startup becomes a successful business. What's the business model? Who is the target customer? Why do they buy? Unless the goal is to get acquired by one of the big labs, in which case, sure, but again, why or when do they buy, and at what price? Especially since they already don't seem to be putting much effort into solving the problem themselves despite having better tools and more money to do so than any new entrant startup.

anthonyc on Three Months In, Evaluating Three Rationalist Cases for Trump

I really, really hope at some point the Democrats will acknowledge the reason they lost is that they failed to persuade the median voter of their ideas, and/or adopt ideas that appeal to said voters. At least among those I interact with, there seems to be a denial of the idea that this is how you win elections, which is a prerequisite for governing.

saidachmiz on A Dissent on Honesty

The hard cases are much more interesting. What about lying to my landlord about renting a room on airbnb? What about saying your class will make people millionaires for the low low price of $1,000 (hey, it could happen)? What about hiding the rats from the health inspector?

None of these seem like hard cases to me. Lying is wrong (and pretty obviously so) in all three of these cases.

anthonyc on Why Does It Feel Like Something? An Evolutionary Path to Subjectivity

That seems very possible to me, and if and when we can show whether something like that is the case, I do think it would represent significant progress. If nothing else, it would help tell us what the thing we need to be examining actually is, in a way we don't currently have an easy way to specify.

elizabeth-1 on A Dissent on Honesty

I liked this post a lot more than I expected to, but I'm disappointed the only examples of lying are a combination of people who have no right to the information and people who are better off for you lying (in a way that gives them truer beliefs than if you'd told the literal truth).

The hard cases are much more interesting. What about lying to my landlord about renting a room on airbnb? What about saying your class will make people millionaires for the low low price of $1,000 (hey, it could happen)? What about hiding the rats from the health inspector?

hpcfung on hpcfung's Shortform

Is there any attempt at compiling a list of all publicly available university courses materials (lecture notes, videos, reference books, syllabi), across all institutions? I seem to remember cosmolearning.org but the site is no longer running.

I imagine this kind of infrastructure is really helpful, or even necessary to self learners.

The equivalent for researchers would be conferences, summer schools/workshops, powerpoints for talks, etc.

caerulea-lawrence on What If Galaxies Are Alive and Atoms Have Minds? A Thought Experiment on Life Across Scales

To answer the question to pose as a precision in your comment [LW · GW], if there are structures that could be analogous to intelligence, without being literal biological? - The simple answer to that is 'yes'.

What we call 'consciousness' is not a 'neutral' lens - and there is no issue with imagining and understanding that there could be types of 'consciousness' that are shaped by very different processes than our own.

Personally I want to be part of a conscious universe, where there is communication going in all directions, and there is a shared goal and purpose. Though, since the structures might be so different, even reaching the step where they are able to differentiate themselves, and even communicate anywhere close to effectively, won't be easy. Considering how hard it is to understand ourselves, aka the signals from cells, bacteria and viruses, it might not be much easier for, say, the Earth to communicate with us.

Ideas/theories that are similar:
Panpsychism, but an idea/theory that might also fit would be Analytical Idealism.
A theory that explores this in a much more general way, looking at it from the perspective of values and paradigms, would be Spiral Dynamics.

I also don't see anything wrong with going in this direction, as an exploration. Complexity theory and emergence duly point out that there is much more to our reality, even to biology, than meets the eye.

nmca on Recent AI model progress feels mostly like bullshit

Is there an o3 update yet?