LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] Win Friends and Influence People Ch. 2: The Bombshell
gull · 2024-01-28T21:40:47.986Z · comments (13)

[link] Jailbreak steering generalization
Sarah Ball · 2024-06-20T17:25:24.110Z · comments (2)

[link] Tinker
Richard_Ngo (ricraz) · 2024-04-16T18:26:38.679Z · comments (0)

Inducing Unprompted Misalignment in LLMs
Sam Svenningsen (sven) · 2024-04-19T20:00:58.067Z · comments (6)

How To Do Patching Fast
Joseph Miller (Josephm) · 2024-05-11T20:13:52.424Z · comments (6)

Is This Lie Detector Really Just a Lie Detector? An Investigation of LLM Probe Specificity.
Josh Levy (josh-levy) · 2024-06-04T15:45:54.399Z · comments (0)

Mud and Despair (Part 4 of "The Sense Of Physical Necessity")
LoganStrohl (BrienneYudkowsky) · 2024-03-07T00:14:23.975Z · comments (0)

Are we so good to simulate?
KatjaGrace · 2024-03-04T05:20:03.535Z · comments (24)

[link] The consistent guessing problem is easier than the halting problem
jessicata (jessica.liu.taylor) · 2024-05-20T04:02:03.865Z · comments (5)

[link] On what research policymakers actually need
MondSemmel · 2024-04-23T19:50:12.833Z · comments (0)

International Scientific Report on the Safety of Advanced AI: Key Information
Aryeh Englander (alenglander) · 2024-05-18T01:45:10.194Z · comments (0)

[question] How would you navigate a severe financial emergency with no help or resources?
Tigerlily · 2024-05-02T18:27:51.329Z · answers+comments (22)

Losing Faith In Contrarianism
omnizoid · 2024-04-25T20:53:34.842Z · comments (44)

When Are Results from Computational Complexity Not Too Coarse?
Dalcy (Darcy) · 2024-07-03T19:06:44.953Z · comments (7)

Requirements for a Basin of Attraction to Alignment
RogerDearnaley (roger-d-1) · 2024-02-14T07:10:20.389Z · comments (9)

[question] What progress have we made on automated auditing?
LawrenceC (LawChan) · 2024-07-06T01:49:43.714Z · answers+comments (1)

Whiteboard Pen Magazines are Useful
Johannes C. Mayer (johannes-c-mayer) · 2024-07-12T17:15:33.200Z · comments (6)

Interview with Vanessa Kosoy on the Value of Theoretical Research for AI
WillPetillo · 2023-12-04T22:58:40.005Z · comments (0)

[question] Is there software to practice reading expressions?
lsusr · 2024-04-23T21:53:00.679Z · answers+comments (10)

Free Will and Dodging Anvils: AIXI Off-Policy
Cole Wyeth (Amyr) · 2024-08-29T22:42:24.485Z · comments (12)

[link] A High Decoupling Failure
Maxwell Tabarrok (maxwell-tabarrok) · 2024-04-14T19:46:09.552Z · comments (5)

[link] Turning 22 in the Pre-Apocalypse
testingthewaters · 2024-08-22T20:28:25.794Z · comments (14)

[link] The Hippie Rabbit Hole -Nuggets of Gold in Rivers of Bullshit
Jonathan Moregård (JonathanMoregard) · 2024-01-05T18:27:01.769Z · comments (20)

AI #49: Bioweapon Testing Begins
Zvi · 2024-02-01T15:30:04.690Z · comments (11)

Turning Your Back On Traffic
jefftk (jkaufman) · 2024-07-17T01:00:08.627Z · comments (7)

Thousands of malicious actors on the future of AI misuse
Zershaaneh Qureshi (zershaaneh-qureshi) · 2024-04-01T10:08:42.357Z · comments (0)

Striking Implications for Learning Theory, Interpretability — and Safety?
RogerDearnaley (roger-d-1) · 2024-01-05T08:46:58.915Z · comments (4)

[link] An AI Manhattan Project is Not Inevitable
Maxwell Tabarrok (maxwell-tabarrok) · 2024-07-06T16:42:35.920Z · comments (25)

Medical Roundup #2
Zvi · 2024-04-09T13:40:05.908Z · comments (18)

[Interim research report] Evaluating the Goal-Directedness of Language Models
Rauno Arike (rauno-arike) · 2024-07-18T18:19:04.260Z · comments (4)

The Defence production act and AI policy
[deleted] · 2024-03-01T14:26:09.064Z · comments (0)

AI #66: Oh to Be Less Online
Zvi · 2024-05-30T14:20:03.334Z · comments (6)

Gated Attention Blocks: Preliminary Progress toward Removing Attention Head Superposition
cmathw · 2024-04-08T11:14:43.268Z · comments (4)

[link] [Fiction] A Confession
Arjun Panickssery (arjun-panickssery) · 2024-04-18T16:28:48.194Z · comments (2)

Your LLM Judge may be biased
Henry Papadatos (henry) · 2024-03-29T16:39:22.534Z · comments (9)

[link] WSJ: Inside Amazon’s Secret Operation to Gather Intel on Rivals
trevor (TrevorWiesinger) · 2024-04-23T21:33:08.049Z · comments (5)

Debate: Get a college degree?
Ben Pace (Benito) · 2024-08-12T22:23:34.744Z · comments (14)

[link] Neel Nanda on the Mechanistic Interpretability Researcher Mindset
Michaël Trazzi (mtrazzi) · 2023-09-21T19:47:02.745Z · comments (1)

[link] Dark Skies Book Review
PeterMcCluskey · 2023-12-29T18:28:59.352Z · comments (3)

Review Report of Davidson on Takeoff Speeds (2023)
Trent Kannegieter · 2023-12-22T18:48:55.983Z · comments (11)

[question] Is a random box of gas predictable after 20 seconds?
Thomas Kwa (thomas-kwa) · 2024-01-24T23:00:53.184Z · answers+comments (35)

[link] Alignment Workshop talks
Richard_Ngo (ricraz) · 2023-09-28T18:26:30.250Z · comments (1)

I designed an AI safety course (for a philosophy department)
Eleni Angelou (ea-1) · 2023-09-23T22:03:00.036Z · comments (15)

Possible OpenAI's Q* breakthrough and DeepMind's AlphaGo-type systems plus LLMs
Burny · 2023-11-23T03:16:09.358Z · comments (25)

Super-Exponential versus Exponential Growth in Compute Price-Performance
moridinamael · 2023-10-06T16:23:56.714Z · comments (25)

Principles For Product Liability (With Application To AI)
johnswentworth · 2023-12-10T21:27:41.403Z · comments (55)

Enhancing intelligence by banging your head on the wall
Bezzi · 2023-12-12T21:00:48.584Z · comments (26)

Deconfusing In-Context Learning
Arjun Panickssery (arjun-panickssery) · 2024-02-25T09:48:17.690Z · comments (1)

[link] Dall-E 3
p.b. · 2023-10-02T20:33:18.294Z · comments (9)

UDT1.01: The Story So Far (1/10)
Diffractor · 2024-03-27T23:22:35.170Z · comments (6)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

hastings-greer on Applications of Chaos: Saying No (with Hastings Greer)

This is a good point! As a result of this effect and Jensen’s inequality, chaos is a much more significant limit on testing CUDA programs than for example cpp programs

$^{1}$ Huang

hastings-greer on Applications of Chaos: Saying No (with Hastings Greer)

I enjoyed doing this interview. I haven’t done too much extemporaneous public speaking, and it was a weird but wonderful experience being on the other side of the youtube camera. Thanks Elizabeth!

owencb on Essay competition on the Automation of Wisdom and Philosophy — $25k in prizes

I've now sent emails contacting all of the prize-winners.

atlasofcharts on Singular learning theory: exercises

I think there's a mistake in 17: \sin(x) is not a diffeomorphism between (-\pi,\pi) and (-1,1) (since it is e.g. not bijective between these domains). Either you mean sin(x/2) or the interval bounds should be (-\pi/2, \pi/2)

bohaska on How to teach things well

We call on our knowledge when something related triggers, so in order for a lesson to be useful, you need to build those connections and triggers in the student’s mind.

Seems related to trigger-action plans...

stephen-fowler on Laziness death spirals

I previously think I overvalued the model in which laziness/motivation/mood are primarily internal states that required internal solutions. For me, this model also generated a lot of guilt because failing to be productive was a personal failure.

But is the problem a lack of "willpower" or is your brain just operating sub-optimally because you're making a series of easily fixable health blunders?

Are you eating healthy?
Are you consuming large quantities of sugar?
Are you sleeping with your phone on your bedside table?
Are you deficient in any vitamins?
Is you sleep trash because you have been consuming alcohol?
Are you waking up at a consistent time?
Are you doing at least some exercise?

I find time spent addressing this and other similar deficits is usually more productive than trying to think your way out of a laziness spiral.

None of this is medical advice. My experience may not be applicable to you. Do your own research. I ate half a tub of ice cream 30 minutes ago.

neel-nanda-1 on Showing SAE Latents Are Not Atomic Using Meta-SAEs

Interesting thought! I expect there's systematic differences, though it's not quite obvious how. Your example seems pretty plausible to me. Meta SAEs are also more incentived to learn features which tend to split a lot, I think, as then they're useful for more predicting many latents. Though ones that don't split may be useful as they entirely explain a latent that's otherwise hard to explain.

Anyway, we haven't checked yet, but I expect many of the results in this post would look similar for eg sparse linear regression over a smaller SAEs decoder. Re why meta SAEs are interesting at all, they're much cheaper to train than a smaller SAE, and BatchTopK gives you more control over the L0 than you could easily get with sparse linear regression, which are some mild advantages, but you may have a small SAE lying around anyway. I see the interesting point of this post more as "SAE latents are not atomic, as shown by one method, but probably other methods would work well too"

zac-hatfield-dodds on A Rational Company - Seeking Advisors

This sounds to me like the classic rationalist failure mode of doing stuff which is unusually popular among rationalists, rather than studying what experts or top performers are doing and then adopting the techniques, conceptual models, and ways of working that actually lead to good results.

Or in other words, the primary thing when thinking about how to optimize a business is not being rationalist; it is to succeed in business (according to your chosen definition).

Happily there's considerable scholarship on business, and CommonCog has done a fantastic job organizing and explaining the good parts. I highly recommend reading and discussing and reflecting on the whole site - it's a better education in business than any MBA program I know of.

hector-perez-arenas on My hopes for YouCongress.com

Thank you for sharing your thoughtful ideas and vision for YouCongress, Nathan!

Regarding auto-updates of current legislation, I've been considering building something akin to subreddits (let's call them "halls" here) on top of our current topics. Some would be automated by us (e.g. AI polls, climate polls, bills in US Congress), while others would be managed by users. The latter would allow the owner(s) of the hall to add new or existing polls manually or via the API. For example, someone could create a hall about their local region or topic of interest. This user-driven approach could offer a wider coverage.

The downside here is whether this may lead to duplicated polls.

What are your thoughts on this?

bohaska on Making intentions concrete - Trigger-Action Planning

Such as this one!