LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

What mistakes has the AI safety movement made?
EuanMcLean (euanmclean) · 2024-05-23T11:19:02.717Z · comments (29)

Against most, but not all, AI risk analogies
Matthew Barnett (matthew-barnett) · 2024-01-14T03:36:16.267Z · comments (41)

A gentle introduction to mechanistic anomaly detection
Erik Jenner (ejenner) · 2024-04-03T23:06:16.778Z · comments (0)

Self-Awareness: Taxonomy and eval suite proposal
Daniel Kokotajlo (daniel-kokotajlo) · 2024-02-17T01:47:01.802Z · comments (2)

[link] A primer on why computational predictive toxicology is hard
Abhishaike Mahajan (abhishaike-mahajan) · 2024-08-19T17:16:37.735Z · comments (2)

[question] Is cybercrime really costing trillions per year?
Fabien Roger (Fabien) · 2024-09-27T08:44:07.621Z · answers+comments (28)

AiPhone
Zvi · 2024-06-12T22:20:02.141Z · comments (4)

[link] Moving on from community living
Vika · 2024-04-17T17:02:11.357Z · comments (7)

Eugenics Performed By A Blind, Idiot God
omnizoid · 2023-09-17T20:37:13.650Z · comments (10)

Generalization, from thermodynamics to statistical physics
Jesse Hoogland (jhoogland) · 2023-11-30T21:28:50.089Z · comments (9)

[link] Investigating an insurance-for-AI startup
L Rudolf L (LRudL) · 2024-09-21T15:29:10.083Z · comments (0)

Another argument against utility-centric alignment paradigms
Fiora from Rosebloom · 2024-09-22T07:28:27.856Z · comments (39)

[link] Improving Dictionary Learning with Gated Sparse Autoencoders
Senthooran Rajamanoharan (SenR) · 2024-04-25T18:43:47.003Z · comments (38)

Bayesian updating in real life is mostly about understanding your hypotheses
Max H (Maxc) · 2024-01-01T00:10:30.978Z · comments (4)

Never Drop A Ball
Screwtape · 2023-11-23T04:15:35.834Z · comments (1)

[link] Superforecasting the Origins of the Covid-19 Pandemic
DanielFilan · 2024-03-12T19:01:15.914Z · comments (0)

Taxonomy of AI-risk counterarguments
Odd anon · 2023-10-16T00:12:51.021Z · comments (13)

A short calculation about a Twitter poll
Ege Erdil (ege-erdil) · 2023-08-14T19:48:53.018Z · comments (64)

Book Review: On the Edge: The Future
Zvi · 2024-09-27T14:00:05.279Z · comments (1)

Catastrophic Goodhart in RL with KL penalty
Thomas Kwa (thomas-kwa) · 2024-05-15T00:58:20.763Z · comments (10)

On Llama-3 and Dwarkesh Patel’s Podcast with Zuckerberg
Zvi · 2024-04-22T13:10:02.645Z · comments (4)

Black Box Biology
GeneSmith · 2023-11-29T02:27:29.794Z · comments (30)

Thoughts on open source AI
Sam Marks (samuel-marks) · 2023-11-03T15:35:42.067Z · comments (17)

A framework for thinking about AI power-seeking
Joe Carlsmith (joekc) · 2024-07-24T22:41:01.685Z · comments (15)

[link] Twitter thread on AI safety evals
Richard_Ngo (ricraz) · 2024-07-31T00:18:14.076Z · comments (3)

Do not delete your misaligned AGI.
mako yass (MakoYass) · 2024-03-24T21:37:07.724Z · comments (13)

Instrumental Convergence Bounty
Logan Zoellner (logan-zoellner) · 2023-09-14T14:02:32.989Z · comments (24)

On coincidences and Bayesian reasoning, as applied to the origins of COVID-19
viking_math · 2024-02-19T01:14:06.772Z · comments (28)

"Building a House" Review
jefftk (jkaufman) · 2023-07-31T19:20:01.248Z · comments (6)

All About Concave and Convex Agents
mako yass (MakoYass) · 2024-03-24T21:37:17.922Z · comments (23)

What is a Tool?
johnswentworth · 2024-06-25T23:40:07.483Z · comments (4)

AI research assistants competition 2024Q3: Tie between Elicit and You.com
Elizabeth (pktechgirl) · 2024-10-12T15:10:05.417Z · comments (2)

RTFB: California’s AB 3211
Zvi · 2024-07-30T13:10:03.853Z · comments (2)

[link] DIY Deliberate Practice
lynettebye · 2023-08-21T12:22:10.284Z · comments (4)

AI #55: Keep Clauding Along
Zvi · 2024-03-14T15:40:09.335Z · comments (16)

[link] Outrage Bonding
Jonathan Moregård (JonathanMoregard) · 2024-08-09T13:46:59.818Z · comments (12)

Don't sleep on Coordination Takeoffs
trevor (TrevorWiesinger) · 2024-01-27T19:55:26.831Z · comments (24)

[link] Ice: The Penultimate Frontier
Roko · 2024-07-13T23:44:56.827Z · comments (56)

[link] DeepMind: Evaluating Frontier Models for Dangerous Capabilities
Zach Stein-Perlman · 2024-03-21T03:00:31.599Z · comments (8)

Natural Latents Are Not Robust To Tiny Mixtures
johnswentworth · 2024-06-07T18:53:36.643Z · comments (8)

What is SB 1047 *for*?
Raemon · 2024-09-05T17:39:39.871Z · comments (8)

The proper response to mistakes that have harmed others?
Ruby · 2023-12-31T04:06:31.505Z · comments (12)

Vote on worthwhile OpenAI topics to discuss
Ben Pace (Benito) · 2023-11-21T00:03:03.898Z · comments (55)

Inspired by: Failures in Kindness
X4vier · 2024-07-27T01:21:42.848Z · comments (2)

Raemon's Deliberate (“Purposeful?”) Practice Club
Raemon · 2023-11-14T18:24:19.335Z · comments (11)

[question] We might be dropping the ball on Autonomous Replication and Adaptation.
Charbel-Raphaël (charbel-raphael-segerie) · 2024-05-31T13:49:11.327Z · answers+comments (30)

Offering AI safety support calls for ML professionals
Vael Gates · 2024-02-15T23:48:12.797Z · comments (1)

How should TurnTrout handle his DeepMind equity situation?
habryka (habryka4) · 2023-10-16T18:25:38.895Z · comments (30)

Social status part 2/2: everything else
Steven Byrnes (steve2152) · 2024-03-05T16:29:19.072Z · comments (2)

[link] Research Report: Sparse Autoencoders find only 9/180 board state features in OthelloGPT
Robert_AIZI · 2024-03-05T13:55:33.483Z · comments (24)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

deepthoughtlife on Demis Hassabis and Geoffrey Hinton Awarded Nobel Prizes

It seems like you are failing to get my points at all. First, I am defending the point that blue LEDs are unworthy because the blue LED is not worthy of the award, but I corrected your claiming it was my example. Second, you are the only one making this about snubbing at all. I explicitly told you that I don't care about snubbing arguments. Comparisons are used for other reasons than snubbing. Third, since this isn't about snubbing, it doesn't matter at all whether or not the LED could have been given the award.

raemon on is there a big dictionary somewhere with all your jargon and acronyms and whatnot?

Makes sense. I agree that concern is important though not quite sure how to approach it.

sharmake-farah on What actual bad outcome has "ethics-based" RLHF AI Alignment already prevented?

Mostly IMO the bad outcomes in a concrete sense were PR and monetary concerns.

roko on What actual bad outcome has "ethics-based" RLHF AI Alignment already prevented?

his AI girlfriend told him to

Which AI told him this? What exactly did it say? Had it undergone RLHF for ethics/harmlessness?

roko on What actual bad outcome has "ethics-based" RLHF AI Alignment already prevented?

This is not to do with ethics though?

Air Canada Has to Honor a Refund Policy Its Chatbot Made Up

This is just the model hallucinating?

roko on What actual bad outcome has "ethics-based" RLHF AI Alignment already prevented?

prevention of another Sydney.

But concretely, what bad outcomes eventuated because of Sydney?

roko on What actual bad outcome has "ethics-based" RLHF AI Alignment already prevented?

Why would less RL on Ethics reduce productivity? Most work-use of AI has nothing to do with ethics.

In fact since RLHF decreases model capability AFAIK, would skipping this actually increase productivity because the models would be better?

cbiddulph on Arithmetic is an underrated world-modeling technology

918,367 kg

An average chimp is 45 kg

918,367 kg / 45 (kg / chimp)

= 20,408 chimps

owain_evans on LLMs can learn about themselves by introspection

You do mention the biggest issue with this showing introspection, "Models only exhibit introspection on simpler tasks", and yet the idea you are going for is clearly for its application to very complex tasks where we can't actually check its work. This flaw seems likely fatal, but who knows at this point? (The fact that GPT-4o and Llama 70B do better than GPT-3.5 does is evidence, but see my later problems with this...)

I addressed this point here [LW(p) · GW(p)]. Also see section 7.1.1 in the paper.

nathan-helm-burger on HDBSCAN is Surprisingly Effective at Finding Interpretable Clusters of the SAE Decoder Matrix

I have used HDBSCAN in a variety of instances in my data science career. The noise-aware aspect is definitely a mixed blessing. Often I find the best results come from using a variety of clustering algorithms, and figuring out how to do an ensemble of the results (e.g. treating the output of each clustering algorithm as a dimension in a similarity vector). Did you experiment with other clustering algorithms also?

Additionally, UMAP is outdated, please use PaCMAP instead: https://www.lesswrong.com/posts/C8LZ3DW697xcpPaqC/the-geometry-of-feelings-and-nonsense-in-large-language?commentId=Deddnyr7zJMwmNLBS [LW(p) · GW(p)]