LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

A List of 45+ Mech Interp Project Ideas from Apollo Research’s Interpretability Team
Lee Sharkey (Lee_Sharkey) · 2024-07-18T14:15:50.248Z · comments (18)

AI Alignment Metastrategy
Vanessa Kosoy (vanessa-kosoy) · 2023-12-31T12:06:11.433Z · comments (13)

[link] My Number 1 Epistemology Book Recommendation: Inventing Temperature
adamShimi · 2024-09-08T14:30:40.456Z · comments (18)

[question] Which skincare products are evidence-based?
Vanessa Kosoy (vanessa-kosoy) · 2024-05-02T15:22:12.597Z · answers+comments (47)

[link] Anthropic release Claude 3, claims >GPT-4 Performance
LawrenceC (LawChan) · 2024-03-04T18:23:54.065Z · comments (41)

The LessWrong 2022 Review
habryka (habryka4) · 2023-12-05T04:00:00.000Z · comments (43)

Talent Needs of Technical AI Safety Teams
yams (william-brewer) · 2024-05-24T00:36:40.486Z · comments (64)

Why I funded PIBBSS
Ryan Kidd (ryankidd44) · 2024-09-15T19:56:33.018Z · comments (20)

8 examples informing my pessimism on uploading without reverse engineering
Steven Byrnes (steve2152) · 2023-11-03T20:03:50.450Z · comments (12)

Anthropic's Certificate of Incorporation
Zach Stein-Perlman · 2024-06-12T13:00:30.806Z · comments (4)

BIG-Bench Canary Contamination in GPT-4
Jozdien · 2024-10-22T15:40:48.166Z · comments (12)

Current AIs Provide Nearly No Data Relevant to AGI Alignment
Thane Ruthenis · 2023-12-15T20:16:09.723Z · comments (155)

Mapping the semantic void: Strange goings-on in GPT embedding spaces
mwatkins · 2023-12-14T13:10:22.691Z · comments (31)

[link] Gender Exploration
sapphire (deluks917) · 2024-01-14T18:57:32.893Z · comments (25)

[link] introduction to cancer vaccines
bhauth · 2024-05-05T01:06:16.972Z · comments (19)

The Pareto Best and the Curse of Doom
Screwtape · 2024-02-21T23:10:01.359Z · comments (21)

Rationality Research Report: Towards 10x OODA Looping?
Raemon · 2024-02-24T21:06:38.703Z · comments (21)

How much to update on recent AI governance moves?
habryka (habryka4) · 2023-11-16T23:46:01.601Z · comments (5)

Four visions of Transformative AI success
Steven Byrnes (steve2152) · 2024-01-17T20:45:46.976Z · comments (22)

Social status part 1/2: negotiations over object-level preferences
Steven Byrnes (steve2152) · 2024-03-05T16:29:07.143Z · comments (15)

[link] Practically A Book Review: Appendix to "Nonlinear's Evidence: Debunking False and Misleading Claims" (ThingOfThings)
tailcalled · 2024-01-03T17:07:13.990Z · comments (25)

The Pearly Gates
lsusr · 2024-05-30T04:01:14.198Z · comments (6)

The Parable Of The Fallen Pendulum - Part 1
johnswentworth · 2024-03-01T00:25:00.111Z · comments (32)

Simple versus Short: Higher-order degeneracy and error-correction
Daniel Murfet (dmurfet) · 2024-03-11T07:52:46.307Z · comments (6)

The case for more ambitious language model evals
Jozdien · 2024-01-30T00:01:13.876Z · comments (30)

Introduction to French AI Policy
Lucie Philippon (lucie-philippon) · 2024-07-04T03:39:45.273Z · comments (12)

Ten arguments that AI is an existential risk
KatjaGrace · 2024-08-13T17:00:03.397Z · comments (41)

[link] A primer on the current state of longevity research
Abhishaike Mahajan (abhishaike-mahajan) · 2024-08-22T17:14:57.990Z · comments (6)

What I Would Do If I Were Working On AI Governance
johnswentworth · 2023-12-08T06:43:42.565Z · comments (32)

You should go to ML conferences
Jan_Kulveit · 2024-07-24T11:47:52.214Z · comments (13)

A Selection of Randomly Selected SAE Features
CallumMcDougall (TheMcDouglas) · 2024-04-01T09:09:49.235Z · comments (2)

Experiences and learnings from both sides of the AI safety job market
Marius Hobbhahn (marius-hobbhahn) · 2023-11-15T15:40:32.196Z · comments (4)

' petertodd'’s last stand: The final days of open GPT-3 research
mwatkins · 2024-01-22T18:47:00.710Z · comments (16)

Being nicer than Clippy
Joe Carlsmith (joekc) · 2024-01-16T19:44:23.893Z · comments (32)

Please stop using mediocre AI art in your posts
Raemon · 2024-08-25T00:13:52.890Z · comments (24)

Fact Finding: Attempting to Reverse-Engineer Factual Recall on the Neuron Level (Post 1)
Neel Nanda (neel-nanda-1) · 2023-12-23T02:44:24.270Z · comments (8)

Attitudes about Applied Rationality
Camille Berger (Camille Berger) · 2024-02-03T14:42:22.770Z · comments (18)

[link] Please support this blog (with money)
Elizabeth (pktechgirl) · 2024-08-17T15:30:05.641Z · comments (2)

Clarifying METR's Auditing Role
Beth Barnes (beth-barnes) · 2024-05-30T18:41:56.029Z · comments (1)

The Leopold Model: Analysis and Reactions
Zvi · 2024-06-14T15:10:03.480Z · comments (19)

"AI Alignment" is a Dangerously Overloaded Term
Roko · 2023-12-15T14:34:29.850Z · comments (100)

OthelloGPT learned a bag of heuristics
jylin04 · 2024-07-02T09:12:56.377Z · comments (10)

[link] Most smart and skilled people are outside of the EA/rationalist community: an analysis
titotal (lombertini) · 2024-07-12T12:13:56.215Z · comments (36)

2023 in AI predictions
jessicata (jessica.liu.taylor) · 2024-01-01T05:23:42.514Z · comments (35)

[link] Perplexity wins my AI race
Elizabeth (pktechgirl) · 2024-08-24T19:20:10.859Z · comments (12)

Discriminating Behaviorally Identical Classifiers: a model problem for applying interpretability to scalable oversight
Sam Marks (samuel-marks) · 2024-04-18T16:17:39.136Z · comments (10)

Stuxnet, not Skynet: Humanity's disempowerment by AI
Roko · 2023-11-04T22:23:55.428Z · comments (24)

[question] How do you feel about LessWrong these days? [Open feedback thread]
jacobjacob · 2023-12-05T20:54:42.317Z · answers+comments (281)

A bird's eye view of ARC's research
Jacob_Hilton · 2024-10-23T15:50:06.123Z · comments (12)

Picking Mentors For Research Programmes
Raymond D · 2023-11-10T13:01:14.197Z · comments (8)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

christiankl on avturchin's Shortform

My point is that the behavior is not well modeled as "hunting humans". They don't attack humans with the intent to kill and eat as prey.

richard_kennaway on JargonBot Beta Test

I would like to be able to set my defaults so that I never see any of the proposed AI content. Will this be possible?

euanmclean on Searching for phenomenal consciousness in LLMs: Perceptual reality monitoring and introspective confidence

Thanks for the feedback Garrett.

This was intended to be more of a technical report than a blog post, meaning I wanted to keep the discussion reasonably rigorous/thorough. Which always comes with the downside of it being a slog to read, so apologies for that!

I'll write a shortened version if I find the time!

euanmclean on Searching for phenomenal consciousness in LLMs: Perceptual reality monitoring and introspective confidence

Thanks James!

One failure mode is that the modification makes the model very dumb in all instances.

Yea, good point. Perhaps an extra condition we'd need to include is that the "difficulty of meta-level questions" should be the same before and after the modification - e.g. - the distribution over stuff it's good at and stuff its bad at should be just as complex (not just good at everything or bad at everything) before and after

euanmclean on Searching for phenomenal consciousness in LLMs: Perceptual reality monitoring and introspective confidence

Thanks Felix!

This is indeed a cool and surprising result. I think it strengthens the introspection interpretation, but without a requirement to make a judgement of the reliability of some internal signal (right?), it doesn't directly address the question of whether there is a discriminator in there.

avturchin on avturchin's Shortform

The problem is that their understanding of their territory is not the same as our legal understanding, so they can attack on the roads outside their homes.

christiankl on avturchin's Shortform

The dogs are not hunting humans but want to defend territory or something similar.

remizidae on Cryonics is free

Definitely. Let’s not imitate the deceptive headlines in mainstream media

anthonyc on Open Thread Fall 2024

Thanks. I'd somehow made it to 2024 without realizing Markdown was a standardized syntax.

christiankl on Three Notions of "Power"

If we take the issue of forced prostitution and the official numbers are estimates and by their nature estimates are not exact.

https://www.spiegel.de/international/germany/human-trafficking-persists-despite-legality-of-prostitution-in-germany-a-902533.html would be a journalistic story about prostitution in Germany that describes what happens here with legalized prostitution.

I was once talking with someone who in the past was thinking about opening a brothel and who had some insight about how brothels are run in Germany and who said that a lot of coercion is used.

Recently, I read something from a policeman who was complaining about how the standard of proving coercion for prostitutes is too high. Proving that a prostitute who's over 21 who left was beaten was not enough in court to convince the court that she falls under the criteria of outlawed exploitation of prostitutions.