LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

A Case for Superhuman Governance, using AI
ozziegooen · 2024-06-07T00:10:10.902Z · comments (0)

AI Constitutions are a tool to reduce societal scale risk
Sammy Martin (SDM) · 2024-07-25T11:18:17.826Z · comments (2)

AI #85: AI Wins the Nobel Prize
Zvi · 2024-10-10T13:40:07.286Z · comments (6)

Two Tales of AI Takeover: My Doubts
Violet Hour · 2024-03-05T15:51:05.558Z · comments (8)

Understanding Subjective Probabilities
Isaac King (KingSupernova) · 2023-12-10T06:03:27.958Z · comments (16)

Adversarial Robustness Could Help Prevent Catastrophic Misuse
aogara (Aidan O'Gara) · 2023-12-11T19:12:26.956Z · comments (18)

AI #62: Too Soon to Tell
Zvi · 2024-05-02T15:40:04.364Z · comments (8)

Running the Numbers on a Heat Pump
jefftk (jkaufman) · 2024-02-09T03:00:04.920Z · comments (12)

Putting multimodal LLMs to the Tetris test
Lovre · 2024-02-01T16:02:12.367Z · comments (5)

[link] When scientists consider whether their research will end the world
Harlan · 2023-12-19T03:47:06.645Z · comments (4)

Some comments on intelligence
Viliam · 2024-08-01T15:17:07.215Z · comments (5)

[link] The origins of the steam engine: An essay with interactive animated diagrams
jasoncrawford · 2023-11-29T18:30:36.315Z · comments (1)

AI #74: GPT-4o Mini Me and Llama 3
Zvi · 2024-07-25T13:50:06.528Z · comments (6)

[link] AISN #28: Center for AI Safety 2023 Year in Review
aogara (Aidan O'Gara) · 2023-12-23T21:31:40.767Z · comments (1)

AI #59: Model Updates
Zvi · 2024-04-11T14:20:06.339Z · comments (2)

Taking features out of superposition with sparse autoencoders more quickly with informed initialization
Pierre Peigné (pierre-peigne) · 2023-09-23T16:21:42.799Z · comments (8)

Announcing SPAR Summer 2024!
laurenmarie12 · 2024-04-16T08:30:31.339Z · comments (2)

[question] What's your standard for good work performance?
Chi Nguyen · 2023-09-27T16:58:16.114Z · answers+comments (3)

The Math of Suspicious Coincidences
Roko · 2024-02-07T13:32:35.513Z · comments (3)

Some additional SAE thoughts
Hoagy · 2024-01-13T19:31:40.089Z · comments (4)

Interpreting the Learning of Deceit
RogerDearnaley (roger-d-1) · 2023-12-18T08:12:39.682Z · comments (14)

Verifiable private execution of machine learning models with Risc0?
mako yass (MakoYass) · 2023-10-25T00:44:48.643Z · comments (2)

[link] Baking vs Patissing vs Cooking, the HPS explanation
adamShimi · 2024-07-17T20:29:09.645Z · comments (16)

Against "argument from overhang risk"
RobertM (T3t) · 2024-05-16T04:44:00.318Z · comments (11)

[question] What are things you're allowed to do as a startup?
Elizabeth (pktechgirl) · 2024-06-20T00:01:59.257Z · answers+comments (9)

[link] One: a story
Richard_Ngo (ricraz) · 2023-10-10T00:18:31.604Z · comments (0)

AI Alignment Breakthroughs this week (10/08/23)
Logan Zoellner (logan-zoellner) · 2023-10-08T23:30:54.924Z · comments (14)

The Intentional Stance, LLMs Edition
Eleni Angelou (ea-1) · 2024-04-30T17:12:29.005Z · comments (3)

RA Bounty: Looking for feedback on screenplay about AI Risk
Writer · 2023-10-26T13:23:02.806Z · comments (6)

Sparse MLP Distillation
slavachalnev · 2024-01-15T19:39:02.926Z · comments (3)

[link] Anthropic: Reflections on our Responsible Scaling Policy
Zac Hatfield-Dodds (zac-hatfield-dodds) · 2024-05-20T04:14:44.435Z · comments (21)

[link] Managing AI Risks in an Era of Rapid Progress
Algon · 2023-10-28T15:48:25.029Z · comments (3)

[link] How "Pause AI" advocacy could be net harmful
Tamsin Leake (carado-1) · 2023-12-26T16:19:20.724Z · comments (10)

Glomarization FAQ
Zane · 2023-11-15T20:20:49.488Z · comments (5)

[question] Current AI safety techniques?
Zach Stein-Perlman · 2023-10-03T19:30:54.481Z · answers+comments (2)

[link] 2024 State of the AI Regulatory Landscape
Deric Cheng (deric-cheng) · 2024-05-28T11:59:06.582Z · comments (0)

"Full Automation" is a Slippery Metric
ozziegooen · 2024-06-11T19:56:49.855Z · comments (1)

Differential Optimization Reframes and Generalizes Utility-Maximization
J Bostock (Jemist) · 2023-12-27T01:54:22.731Z · comments (2)

AI Safety 101 : Reward Misspecification
markov (markovial) · 2023-10-18T20:39:34.538Z · comments (4)

Let's talk about Impostor syndrome in AI safety
Igor Ivanov (igor-ivanov) · 2023-09-22T13:51:18.482Z · comments (4)

Reviewing the Structure of Current AI Regulations
Deric Cheng (deric-cheng) · 2024-05-07T12:34:17.820Z · comments (0)

[question] Potential alignment targets for a sovereign superintelligent AI
Paul Colognese (paul-colognese) · 2023-10-03T15:09:59.529Z · answers+comments (4)

Investigating Bias Representations in LLMs via Activation Steering
DawnLu · 2024-01-15T19:39:14.077Z · comments (4)

End-to-end hacking with language models
tchauvin (timot.cool) · 2024-04-05T15:06:53.689Z · comments (0)

Big-endian is better than little-endian
Menotim · 2024-04-29T02:30:48.053Z · comments (17)

[question] How does it feel to switch from earn-to-give?
Neil (neil-warren) · 2024-03-31T16:27:22.860Z · answers+comments (4)

[question] [link] Is Bjorn Lomborg roughly right about climate change policy?
yhoiseth · 2023-09-27T20:06:30.722Z · answers+comments (14)

Throughput vs. Latency
alkjash · 2024-01-12T21:37:07.632Z · comments (2)

Dishonorable Gossip and Going Crazy
Ben Pace (Benito) · 2023-10-14T04:00:35.591Z · comments (31)

Results from the Turing Seminar hackathon
Charbel-Raphaël (charbel-raphael-segerie) · 2023-12-07T14:50:38.377Z · comments (1)

← previous page (newer posts) · next page (older posts) →

^{^}

"Curated", a term which here means "This just got emailed to 30,000 people, of whom typically half open the email, and it gets shown at the top of the frontpage to anyone who hasn't read it for ~1 week."

LessWrong 2.0 Reader

Archive

Recent comments