LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Thoughts on SB-1047
ryan_greenblatt · 2024-05-29T23:26:14.392Z · comments (1)

[link] An Opinionated Evals Reading List
Marius Hobbhahn (marius-hobbhahn) · 2024-10-15T14:38:58.778Z · comments (0)

[link] Are There Examples of Overhang for Other Technologies?
Jeffrey Heninger (jeffrey-heninger) · 2023-12-13T21:48:08.954Z · comments (50)

LessOnline Festival Updates Thread
Ben Pace (Benito) · 2024-04-18T21:55:08.003Z · comments (26)

Does AI risk “other” the AIs?
Joe Carlsmith (joekc) · 2024-01-09T17:51:47.020Z · comments (3)

Rationalists are missing a core piece for agent-like structure (energy vs information overload)
tailcalled · 2024-08-17T09:57:19.370Z · comments (9)

Measuring Coherence of Policies in Toy Environments
dx26 (dylan-xu) · 2024-03-18T17:59:08.118Z · comments (9)

New paper shows truthfulness & instruction-following don't generalize by default
joshc (joshua-clymer) · 2023-11-19T19:27:30.735Z · comments (0)

Understanding SAE Features with the Logit Lens
Joseph Bloom (Jbloom) · 2024-03-11T00:16:57.429Z · comments (0)

How you can help pass important AI legislation with 10 minutes of effort
ThomasW · 2024-09-14T22:10:50.386Z · comments (2)

[link] Against Nonlinear (Thing Of Things)
tailcalled · 2024-01-18T21:40:00.369Z · comments (18)

[link] "Why I Write" by George Orwell (1946)
Arjun Panickssery (arjun-panickssery) · 2024-04-25T16:02:28.668Z · comments (2)

We Inspected Every Head In GPT-2 Small using SAEs So You Don’t Have To
robertzk (Technoguyrob) · 2024-03-06T05:03:09.639Z · comments (0)

Aligned AI is dual use technology
lc · 2024-01-27T06:50:10.435Z · comments (31)

A hermeneutic net for agency
TsviBT · 2024-01-01T08:06:30.289Z · comments (4)

Woods’ new preprint on object permanence
Steven Byrnes (steve2152) · 2024-03-07T21:29:57.738Z · comments (1)

The Problem With the Word ‘Alignment’
peligrietzer · 2024-05-21T03:48:26.983Z · comments (8)

Memorizing weak examples can elicit strong behavior out of password-locked models
Fabien Roger (Fabien) · 2024-06-06T23:54:25.167Z · comments (5)

Paper out now on creatine and cognitive performance
Fabienne · 2023-11-26T10:58:29.745Z · comments (2)

The LessWrong 2022 Review: Review Phase
RobertM (T3t) · 2023-12-22T03:23:49.635Z · comments (7)

Apply to ESPR & PAIR, Rationality and AI Camps for Ages 16-21
Anna Gajdova (anna-gajdova) · 2024-05-03T12:36:37.610Z · comments (5)

Against empathy-by-default
Steven Byrnes (steve2152) · 2024-10-16T16:38:49.926Z · comments (22)

[link] Talk: "AI Would Be A Lot Less Alarming If We Understood Agents"
johnswentworth · 2023-12-17T23:46:32.814Z · comments (3)

[link] microwave drilling is impractical
bhauth · 2024-06-12T22:16:00.199Z · comments (14)

Mira Murati leaves OpenAI/ OpenAI to remove non-profit control
Sodium · 2024-09-25T21:15:17.315Z · comments (4)

On the Latest TikTok Bill
Zvi · 2024-03-13T18:50:05.398Z · comments (7)

Managing catastrophic misuse without robust AIs
ryan_greenblatt · 2024-01-16T17:27:31.112Z · comments (17)

[link] Sam Altman, Greg Brockman and others from OpenAI join Microsoft
Ozyrus · 2023-11-20T08:23:00.791Z · comments (15)

Consider the humble rock (or: why the dumb thing kills you)
pleiotroth · 2024-07-04T13:54:15.593Z · comments (11)

[question] Shane Legg's necessary properties for every AGI Safety plan
jacquesthibs (jacques-thibodeau) · 2024-05-01T17:15:41.233Z · answers+comments (12)

[link] Announcing the $200k EA Community Choice
Austin Chen (austin-chen) · 2024-08-14T00:39:37.350Z · comments (8)

[link] This is Water by David Foster Wallace
Nathan Young · 2024-04-24T21:21:09.445Z · comments (16)

Referendum Mechanics in a Marketplace of Ideas
Martin Sustrik (sustrik) · 2024-08-25T08:30:01.901Z · comments (2)

Now THIS is forecasting: understanding Epoch’s Direct Approach
Elliot Mckernon (elliot) · 2024-05-04T12:06:48.144Z · comments (4)

Medical Roundup #1
Zvi · 2024-01-16T20:30:35.802Z · comments (9)

Dual Wielding Kindle Scribes
mesaoptimizer · 2024-02-21T17:17:58.743Z · comments (18)

The Bitter Lesson for AI Safety Research
adamk · 2024-08-02T18:39:36.884Z · comments (5)

Some Unorthodox Ways To Achieve High GDP Growth
johnswentworth · 2024-08-08T18:58:56.046Z · comments (6)

On the UBI Paper
Zvi · 2024-09-03T14:50:08.647Z · comments (6)

Measurement tampering detection as a special case of weak-to-strong generalization
ryan_greenblatt · 2023-12-23T00:05:55.357Z · comments (10)

[link] Defending against hypothetical moon life during Apollo 11
eukaryote · 2024-01-07T04:49:42.628Z · comments (9)

AI #86: Just Think of the Potential
Zvi · 2024-10-17T15:10:06.552Z · comments (8)

AI Alignment Research Engineer Accelerator (ARENA): Call for applicants v4.0
James Fox · 2024-07-06T11:34:57.227Z · comments (7)

So What's Up With PUFAs Chemically?
J Bostock (Jemist) · 2024-04-27T13:32:52.159Z · comments (23)

John Schulman leaves OpenAI for Anthropic
Sodium · 2024-08-06T01:23:15.427Z · comments (0)

[question] What's the theory of impact for activation vectors?
Chris_Leong · 2024-02-11T07:34:48.536Z · answers+comments (12)

[link] [EAForum xpost] A breakdown of OpenAI's revenue
dschwarz · 2024-07-10T18:09:20.017Z · comments (5)

[link] Identifying Functionally Important Features with End-to-End Sparse Dictionary Learning
Dan Braun (Daniel Braun) · 2024-05-17T16:25:02.267Z · comments (10)

[link] Congressional Insider Trading
Maxwell Tabarrok (maxwell-tabarrok) · 2024-08-30T13:32:57.264Z · comments (6)

Transfer Learning in Humans
niplav · 2024-04-21T20:49:42.595Z · comments (1)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

christiankl on avturchin's Shortform

My point is that the behavior is not well modeled as "hunting humans". They don't attack humans with the intent to kill and eat as prey.

richard_kennaway on JargonBot Beta Test

I would like to be able to set my defaults so that I never see any of the proposed AI content. Will this be possible?

euanmclean on Searching for phenomenal consciousness in LLMs: Perceptual reality monitoring and introspective confidence

Thanks for the feedback Garrett.

This was intended to be more of a technical report than a blog post, meaning I wanted to keep the discussion reasonably rigorous/thorough. Which always comes with the downside of it being a slog to read, so apologies for that!

I'll write a shortened version if I find the time!

euanmclean on Searching for phenomenal consciousness in LLMs: Perceptual reality monitoring and introspective confidence

Thanks James!

One failure mode is that the modification makes the model very dumb in all instances.

Yea, good point. Perhaps an extra condition we'd need to include is that the "difficulty of meta-level questions" should be the same before and after the modification - e.g. - the distribution over stuff it's good at and stuff its bad at should be just as complex (not just good at everything or bad at everything) before and after

euanmclean on Searching for phenomenal consciousness in LLMs: Perceptual reality monitoring and introspective confidence

Thanks Felix!

This is indeed a cool and surprising result. I think it strengthens the introspection interpretation, but without a requirement to make a judgement of the reliability of some internal signal (right?), it doesn't directly address the question of whether there is a discriminator in there.

avturchin on avturchin's Shortform

The problem is that their understanding of their territory is not the same as our legal understanding, so they can attack on the roads outside their homes.

christiankl on avturchin's Shortform

The dogs are not hunting humans but want to defend territory or something similar.

remizidae on Cryonics is free

Definitely. Let’s not imitate the deceptive headlines in mainstream media

anthonyc on Open Thread Fall 2024

Thanks. I'd somehow made it to 2024 without realizing Markdown was a standardized syntax.

christiankl on Three Notions of "Power"

If we take the issue of forced prostitution and the official numbers are estimates and by their nature estimates are not exact.

https://www.spiegel.de/international/germany/human-trafficking-persists-despite-legality-of-prostitution-in-germany-a-902533.html would be a journalistic story about prostitution in Germany that describes what happens here with legalized prostitution.

I was once talking with someone who in the past was thinking about opening a brothel and who had some insight about how brothels are run in Germany and who said that a lot of coercion is used.

Recently, I read something from a policeman who was complaining about how the standard of proving coercion for prostitutes is too high. Proving that a prostitute who's over 21 who left was beaten was not enough in court to convince the court that she falls under the criteria of outlawed exploitation of prostitutions.