LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Exploring SAE features in LLMs with definition trees and token lists
mwatkins · 2024-10-04T22:15:28.108Z · comments (5)

COT Scaling implies slower takeoff speeds
Logan Zoellner (logan-zoellner) · 2024-09-28T16:20:00.320Z · comments (56)

LASR Labs Spring 2025 applications are open!
Erin Robertson · 2024-10-04T13:44:20.524Z · comments (0)

[link] A Percentage Model of a Person
Sable · 2024-10-12T17:55:07.560Z · comments (3)

Distinguish worst-case analysis from instrumental training-gaming
Olli Järviniemi (jarviniemi) · 2024-09-05T19:13:34.443Z · comments (0)

Gated Attention Blocks: Preliminary Progress toward Removing Attention Head Superposition
cmathw · 2024-04-08T11:14:43.268Z · comments (4)

[question] Is there software to practice reading expressions?
lsusr · 2024-04-23T21:53:00.679Z · answers+comments (10)

UDT1.01: The Story So Far (1/10)
Diffractor · 2024-03-27T23:22:35.170Z · comments (6)

The Defence production act and AI policy
[deleted] · 2024-03-01T14:26:09.064Z · comments (0)

[link] WSJ: Inside Amazon’s Secret Operation to Gather Intel on Rivals
trevor (TrevorWiesinger) · 2024-04-23T21:33:08.049Z · comments (5)

AI #66: Oh to Be Less Online
Zvi · 2024-05-30T14:20:03.334Z · comments (6)

Your LLM Judge may be biased
Henry Papadatos (henry) · 2024-03-29T16:39:22.534Z · comments (9)

Deconfusing In-Context Learning
Arjun Panickssery (arjun-panickssery) · 2024-02-25T09:48:17.690Z · comments (1)

Thousands of malicious actors on the future of AI misuse
Zershaaneh Qureshi (zershaaneh-qureshi) · 2024-04-01T10:08:42.357Z · comments (0)

[link] [Fiction] A Confession
Arjun Panickssery (arjun-panickssery) · 2024-04-18T16:28:48.194Z · comments (2)

AI #49: Bioweapon Testing Begins
Zvi · 2024-02-01T15:30:04.690Z · comments (11)

Medical Roundup #2
Zvi · 2024-04-09T13:40:05.908Z · comments (18)

[link] A High Decoupling Failure
Maxwell Tabarrok (maxwell-tabarrok) · 2024-04-14T19:46:09.552Z · comments (5)

On DeepMind’s Frontier Safety Framework
Zvi · 2024-06-18T13:30:21.154Z · comments (4)

AI #47: Meet the New Year
Zvi · 2024-01-13T16:20:10.519Z · comments (7)

AI Safety Camp final presentations
Linda Linsefors · 2024-03-29T14:27:43.503Z · comments (3)

Childhood and Education Roundup #5
Zvi · 2024-04-17T13:00:03.015Z · comments (4)

[link] Claude 3 Opus can operate as a Turing machine
Gunnar_Zarncke · 2024-04-17T08:41:57.209Z · comments (2)

The Evolution of Humans Was Net-Negative for Human Values
Zack_M_Davis · 2024-04-01T16:01:10.037Z · comments (1)

Finding the Wisdom to Build Safe AI
Gordon Seidoh Worley (gworley) · 2024-07-04T19:04:16.089Z · comments (10)

(Appetitive, Consummatory) ≈ (RL, reflex)
Steven Byrnes (steve2152) · 2024-06-15T15:57:39.533Z · comments (1)

Closeness To the Issue (Part 5 of "The Sense Of Physical Necessity")
LoganStrohl (BrienneYudkowsky) · 2024-03-09T00:36:47.388Z · comments (0)

[link] Toki pona FAQ
dkl9 · 2024-03-17T21:44:21.782Z · comments (8)

Drone Wars Endgame
RussellThor · 2024-02-01T02:30:46.161Z · comments (71)

[link] Searching for the Root of the Tree of Evil
Ivan Vendrov (ivan-vendrov) · 2024-06-08T17:05:53.950Z · comments (14)

Eye contact is effortless when you’re no longer emotionally blocked on it
Chipmonk · 2024-09-27T21:47:01.970Z · comments (24)

Deeply Cover Car Crashes?
jefftk (jkaufman) · 2023-12-10T22:20:01.133Z · comments (31)

AI companies' commitments
Zach Stein-Perlman · 2024-05-29T11:00:31.339Z · comments (0)

Mental Masturbation and the Intellectual Comfort Zone
Declan Molony (declan-molony) · 2024-05-07T05:47:05.257Z · comments (2)

But Where do the Variables of my Causal Model come from?
Dalcy (Darcy) · 2024-08-09T22:07:57.395Z · comments (1)

On Dwarkesh’s 3rd Podcast With Tyler Cowen
Zvi · 2024-02-02T19:30:05.974Z · comments (9)

Good job opportunities for helping with the most important century
HoldenKarnofsky · 2024-01-18T17:30:03.332Z · comments (0)

Introduce a Speed Maximum
jefftk (jkaufman) · 2024-01-11T02:50:04.284Z · comments (28)

Debate: Is it ethical to work at AI capabilities companies?
Ben Pace (Benito) · 2024-08-14T00:18:38.846Z · comments (21)

A Socratic dialogue with my student
lsusr · 2023-12-05T09:31:05.266Z · comments (14)

[link] UC Berkeley course on LLMs and ML Safety
Dan H (dan-hendrycks) · 2024-07-09T15:40:00.920Z · comments (1)

[question] What are your cruxes for imprecise probabilities / decision rules?
Anthony DiGiovanni (antimonyanthony) · 2024-07-31T15:42:27.057Z · answers+comments (29)

We’re not as 3-Dimensional as We Think
silentbob · 2024-08-04T14:39:16.799Z · comments (16)

[link] "Model UN Solutions"
Arjun Panickssery (arjun-panickssery) · 2023-12-08T23:06:33.490Z · comments (5)

[link] Scaling laws for dominant assurance contracts
jessicata (jessica.liu.taylor) · 2023-11-28T23:11:07.631Z · comments (5)

An anti-inductive sequence
Viliam · 2024-08-14T12:28:54.226Z · comments (10)

[link] Shifting Headspaces - Transitional Beast-Mode
Jonathan Moregård (JonathanMoregard) · 2024-08-12T13:02:06.120Z · comments (9)

Please Bet On My Quantified Self Decision Markets
niplav · 2023-12-01T20:07:38.284Z · comments (6)

[link] Learning coefficient estimation: the details
Zach Furman (zfurman) · 2023-11-16T03:19:09.013Z · comments (0)

Is the Power Grid Sustainable?
jefftk (jkaufman) · 2024-10-26T02:30:06.612Z · comments (38)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

niplav on Announcing turntrout.com, my new digital home

I remember writing a note a few years ago on who I wished would to create a long site, and your pseudonym was on the list. Happy to see that this has happened, even if for unfortunate reasons.

mako-yass on Trying Bluesky

In my understanding of english, when people say algorithm about social media systems, it doesn't encompass very simple, transparent ones. It would be like calling a rock a spirit.

Maybe we should call those recommenders?

o-o on O O's Shortform

I guess in the real world the rules aren’t harder per se but just less clear and not written down. I think both the rules and tools needed to solve contest math questions at least feel harder than the vast majority of rules and tools human minds deal with. Someone like Terrence Tao, who is a master of these, excelled in every subject when he was a kid (iirc).

I think LLMs have a pretty good model of human behavior, so for anything related to human judgement, in theory this isn’t why it’s not doing well.

And where rules are unwritten/unknown (say biology), are the rules not at least captured by current methods? The next steps are probably like baking the intuitions of something like alphafold into something like o1. Whatever that means. R&D is what’s important and there is generally vast sums of data there.

bogdan-ionut-cirstea on Circuits in Superposition: Compressing many small neural networks into one

Any thoughts on potential connections with task arithmetic?

alexander-gietelink-oldenziel on Announcing turntrout.com, my new digital home

I think I speak for all of the LessWrong commentariat when I say I am sad to see you go.

That said, congratulations for building such a wonderfully eigen website!

quinn-dougherty on Alexander Gietelink Oldenziel's Shortform

my dude, top level post- this does not read like a shortform

seth-herd on OpenAI Email Archives (from Musk v. Altman)

I agree that there was a lot more to that exchange than that quick summary.

My point was that there wasn't enough or it wasn't careful enough.

buck on Win/continue/lose scenarios and execute/replace/audit protocols

Thanks, this is closely related!

nathan-helm-burger on Alexander Gietelink Oldenziel's Shortform

The two suggestions that come to mind after brief thought are:

Search the internet for prompts others have found to work for this. I expect a fairly lengthy and complicated prompt would do better than a short straightforward one.
Use a base model as a source of creativity, then run that output through a chat model to clean it up (grammar, logical consistency, etc)

gwern on Happiness and Goodness as Universal Terminal Virtues

Coffee culture in America doesn't have much to do with the Revolutionary War. The rise of coffee is much later than the American Revolution. The brief boycott didn't last (after all, Americans - infamous smugglers in general - were smuggling plenty of tea because of the taxes, so sourcing tea was not a problem) and there was enormous consumption of tea consistently throughout: https://en.wikipedia.org/wiki/American_tea_culture#Colonial_and_Revolutionary_eras In fact, I was surprised to learn recently that American tea was overwhelmingly green tea in the 1800s, and one of the biggest export markets for green tea worldwide.

(This was really surprising to me, because if you look around the 1900s, even as late as the 1990s, black tea is the standard American tea; all iced tea is of course black tea, and your local grocery store would be full of mostly just black teas with a few token green teas, and exactly one oolong tea if you were lucky - as I found out the hard way when I became interested in non-black teas.)