LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Australian AI Safety Forum 2024
Liam Carroll (liam-carroll) · 2024-09-27T00:40:11.451Z · comments (0)
A New Class of Glitch Tokens - BPE Subtoken Artifacts (BSA)
Lao Mein (derpherpize) · 2024-09-20T13:13:26.181Z · comments (7)
0.202 Bits of Evidence In Favor of Futarchy
niplav · 2024-09-29T21:57:59.896Z · comments (0)
Book Review: On the Edge: The Business
Zvi · 2024-09-25T12:20:06.230Z · comments (0)
[link] Characterizing stable regions in the residual stream of LLMs
Jett Janiak (jett) · 2024-09-26T13:44:58.792Z · comments (4)
COT Scaling implies slower takeoff speeds
Logan Zoellner (logan-zoellner) · 2024-09-28T16:20:00.320Z · comments (56)
I'm creating a deep dive podcast episode about the original Leverage Research - would you like to take part?
spencerg · 2024-09-22T14:03:22.164Z · comments (1)
[link] On Fables and Nuanced Charts
Niko_McCarty (niko-2) · 2024-09-08T17:09:07.503Z · comments (2)
Open Problems in AIXI Agent Foundations
Cole Wyeth (Amyr) · 2024-09-12T15:38:59.007Z · comments (2)
Book Review: On the Edge: The Gamblers
Zvi · 2024-09-24T11:50:06.065Z · comments (1)
Glitch Token Catalog - (Almost) a Full Clear
Lao Mein (derpherpize) · 2024-09-21T12:22:16.403Z · comments (3)
Eye contact is effortless when you’re no longer emotionally blocked on it
Chipmonk · 2024-09-27T21:47:01.970Z · comments (19)
Augmenting Statistical Models with Natural Language Parameters
jsteinhardt · 2024-09-20T18:30:10.816Z · comments (0)
Proveably Safe Self Driving Cars [Modulo Assumptions]
Davidmanheim · 2024-09-15T13:58:19.472Z · comments (24)
[link] An Interactive Shapley Value Explainer
James Stephen Brown (james-brown) · 2024-09-28T05:01:21.169Z · comments (6)
My disagreements with "AGI ruin: A List of Lethalities"
Noosphere89 (sharmake-farah) · 2024-09-15T17:22:18.367Z · comments (38)
RLHF is the worst possible thing done when facing the alignment problem
tailcalled · 2024-09-19T18:56:27.676Z · comments (10)
Apply to MATS 7.0!
Ryan Kidd (ryankidd44) · 2024-09-21T00:23:49.778Z · comments (0)
DunCon @Lighthaven
Duncan Sabien (Deactivated) (Duncan_Sabien) · 2024-09-29T04:56:27.205Z · comments (0)
Fun With CellxGene
sarahconstantin · 2024-09-06T22:00:03.461Z · comments (2)
[link] Peak Human Capital
PeterMcCluskey · 2024-09-30T21:13:30.421Z · comments (2)
Distinguish worst-case analysis from instrumental training-gaming
Olli Järviniemi (jarviniemi) · 2024-09-05T19:13:34.443Z · comments (0)
[question] Where to find reliable reviews of AI products?
Elizabeth (pktechgirl) · 2024-09-17T23:48:25.899Z · answers+comments (4)
[link] AI forecasting bots incoming
Dan H (dan-hendrycks) · 2024-09-09T19:14:31.050Z · comments (44)
[link] Point of Failure: Semiconductor-Grade Quartz
Annapurna (jorge-velez) · 2024-09-30T15:57:40.495Z · comments (7)
[link] AI Safety at the Frontier: Paper Highlights, August '24
gasteigerjo · 2024-09-03T19:17:24.850Z · comments (0)
The case for more Alignment Target Analysis (ATA)
Chi Nguyen · 2024-09-20T01:14:41.411Z · comments (13)
[question] Any real toeholds for making practical decisions regarding AI safety?
lukehmiles (lcmgcd) · 2024-09-29T12:03:08.084Z · answers+comments (6)
[link] If-Then Commitments for AI Risk Reduction [by Holden Karnofsky]
habryka (habryka4) · 2024-09-13T19:38:53.194Z · comments (0)
Investigating Sensitive Directions in GPT-2: An Improved Baseline and Comparative Analysis of SAEs
Daniel Lee (daniel-lee) · 2024-09-06T02:28:41.954Z · comments (0)
[question] What prevents SB-1047 from triggering on deep fake porn/voice cloning fraud?
ChristianKl · 2024-09-26T09:17:39.088Z · answers+comments (21)
[link] Predicting Influenza Abundance in Wastewater Metagenomic Sequencing Data
jefftk (jkaufman) · 2024-09-23T17:25:58.380Z · comments (0)
[Linkpost] Play with SAEs on Llama 3
Tom McGrath · 2024-09-25T22:35:44.824Z · comments (1)
[link] Evaluating Synthetic Activations composed of SAE Latents in GPT-2
Giorgi Giglemiani (Rakh) · 2024-09-25T20:37:48.227Z · comments (0)
[link] My Methodological Turn
adamShimi · 2024-09-29T15:01:45.986Z · comments (0)
[link] Fictional parasites very different from our own
Abhishaike Mahajan (abhishaike-mahajan) · 2024-09-08T14:59:39.080Z · comments (0)
[link] Introduction to Super Powers (for kids!)
Shoshannah Tekofsky (DarkSym) · 2024-09-20T17:17:27.070Z · comments (0)
[question] What's the Deal with Logical Uncertainty?
Ape in the coat · 2024-09-16T08:11:43.588Z · answers+comments (21)
[link] SB 1047 gets vetoed
ryan_b · 2024-09-30T15:49:38.609Z · comments (1)
Fun With The Tabula Muris (Senis)
sarahconstantin · 2024-09-20T18:20:01.901Z · comments (0)
Seeking Mechanism Designer for Research into Internalizing Catastrophic Externalities
c.trout (ctrout) · 2024-09-11T15:09:48.019Z · comments (2)
Superintelligence Can't Solve the Problem of Deciding What You'll Do
Vladimir_Nesov · 2024-09-15T21:03:28.077Z · comments (11)
[link] The Offense-Defense Balance of Gene Drives
Maxwell Tabarrok (maxwell-tabarrok) · 2024-09-27T16:47:25.976Z · comments (1)
[question] When can I be numerate?
FinalFormal2 · 2024-09-12T04:05:27.710Z · answers+comments (2)
[link] Foundations - Why Britain has stagnated [crosspost]
Nathan Young · 2024-09-23T10:43:20.411Z · comments (1)
Do Sparse Autoencoders (SAEs) transfer across base and finetuned language models?
Taras Kutsyk · 2024-09-29T19:37:30.465Z · comments (2)
[link] Hyperpolation
Gunnar_Zarncke · 2024-09-15T21:37:00.002Z · comments (4)
[link] on Science Beakers and DDT
bhauth · 2024-09-05T03:21:21.382Z · comments (13)
[link] To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning
Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-09-19T16:13:55.835Z · comments (1)
[link] [Paper] Hidden in Plain Text: Emergence and Mitigation of Steganographic Collusion in LLMs
Yohan Mathew (ymath) · 2024-09-25T14:52:48.263Z · comments (0)
← previous page (newer posts) · next page (older posts) →