LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] Recommendations for Technical AI Safety Research Directions
Sam Marks (samuel-marks) · 2025-01-10T19:34:04.920Z · comments (1)

[link] Gaming TruthfulQA: Simple Heuristics Exposed Dataset Weaknesses
TurnTrout · 2025-01-16T02:14:35.098Z · comments (3)

AI Craftsmanship
abramdemski · 2024-11-11T22:17:01.112Z · comments (7)

Retrospective: PIBBSS Fellowship 2024
DusanDNesic · 2024-12-20T15:55:24.194Z · comments (1)

An Illustrated Summary of "Robust Agents Learn Causal World Model"
Dalcy (Darcy) · 2024-12-14T15:02:44.828Z · comments (2)

Neuroscience of human social instincts: a sketch
Steven Byrnes (steve2152) · 2024-11-22T16:16:52.552Z · comments (0)

[link] RL, but don't do anything I wouldn't do
Gunnar_Zarncke · 2024-12-07T22:54:50.714Z · comments (5)

Perils of Generalizing from One's Social Group
localdeity · 2024-11-24T15:31:18.332Z · comments (1)

Why our politicians aren't Median
Yair Halberstadt (yair-halberstadt) · 2024-11-03T14:03:33.779Z · comments (15)

[link] Zen and The Art of Semiconductor Manufacturing
Recurrented (rachel-farley) · 2024-12-09T17:19:35.236Z · comments (2)

[link] Slightly More Than You Wanted To Know: Pregnancy Length Effects
JustisMills · 2024-10-21T01:26:02.030Z · comments (4)

[link] "We know how to build AGI" - Sam Altman
Nikola Jurkovic (nikolaisalreadytaken) · 2025-01-06T02:05:05.134Z · comments (5)

[Intuitive self-models] 6. Awakening / Enlightenment / PNSE
Steven Byrnes (steve2152) · 2024-10-22T13:23:08.836Z · comments (8)

[link] Electrostatic Airships?
DaemonicSigil · 2024-10-27T04:32:34.852Z · comments (13)

ReSolsticed vol I: "We're Not Going Quietly"
Raemon · 2024-12-26T17:52:33.727Z · comments (4)

[link] electric turbofans
bhauth · 2024-11-02T22:50:59.807Z · comments (2)

Cognitive Work and AI Safety: A Thermodynamic Perspective
Daniel Murfet (dmurfet) · 2024-12-08T21:42:17.023Z · comments (9)

Training AI agents to solve hard problems could lead to Scheming
Marius Hobbhahn (marius-hobbhahn) · 2024-11-19T00:10:55.522Z · comments (12)

Why imperfect adversarial robustness doesn't doom AI control
Buck · 2024-11-18T16:05:06.763Z · comments (25)

Checking in on Scott's composition image bet with imagen 3
Dave Orr (dave-orr) · 2024-12-22T19:04:17.495Z · comments (0)

A case for donating to AI risk reduction (including if you work in AI)
tlevin (trevor) · 2024-12-02T19:05:06.658Z · comments (2)

o3, Oh My
Zvi · 2024-12-30T14:10:05.144Z · comments (17)

[link] Linkpost: Memorandum on Advancing the United States’ Leadership in Artificial Intelligence
Nisan · 2024-10-25T04:37:00.828Z · comments (2)

[link] Funding Case: AI Safety Camp 11
Remmelt (remmelt-ellen) · 2024-12-23T08:51:55.255Z · comments (4)

Tips and Code for Empirical Research Workflows
John Hughes (john-hughes) · 2025-01-20T22:31:51.498Z · comments (6)

Retrospective: 12 [sic] Months Since MIRI
james.lucassen · 2025-01-21T02:52:06.271Z · comments (0)

Toward Safety Cases For AI Scheming
Mikita Balesni (mykyta-baliesnyi) · 2024-10-31T17:20:06.019Z · comments (1)

Measuring whether AIs can statelessly strategize to subvert security measures
Alex Mallen (alex-mallen) · 2024-12-19T21:25:28.555Z · comments (0)

Read The Sequences As If They Were Written Today
Peter Berggren (peter-berggren) · 2025-01-02T02:51:36.537Z · comments (7)

o1 Turns Pro
Zvi · 2024-12-10T17:00:08.036Z · comments (3)

[link] Testing for Scheming with Model Deletion
Guive (GAA) · 2025-01-07T01:54:13.550Z · comments (20)

[link] new chinese stealth aircraft
bhauth · 2025-01-01T00:19:10.644Z · comments (3)

[question] Could orcas be (trained to be) smarter than humans? 
Towards_Keeperhood (Simon Skade) · 2024-11-04T23:29:26.677Z · answers+comments (22)

Reading RFK Jr so that you don’t have to
braces · 2024-11-22T00:59:19.583Z · comments (1)

AI #95: o1 Joins the API
Zvi · 2024-12-19T15:10:05.196Z · comments (1)

AI #96: o3 But Not Yet For Thee
Zvi · 2024-12-26T20:30:06.722Z · comments (8)

AI #86: Just Think of the Potential
Zvi · 2024-10-17T15:10:06.552Z · comments (8)

Timaeus is hiring researchers & engineers
Jesse Hoogland (jhoogland) · 2025-01-17T19:13:14.739Z · comments (2)

Seeking Collaborators
abramdemski · 2024-11-01T17:13:36.162Z · comments (15)

[link] The Alignment Trap: AI Safety as Path to Power
crispweed · 2024-10-29T15:21:26.545Z · comments (17)

AI #87: Staying in Character
Zvi · 2024-10-29T07:10:08.212Z · comments (3)

AI #99: Farewell to Biden
Zvi · 2025-01-16T14:20:05.768Z · comments (5)

U.S.-China Economic and Security Review Commission pushes Manhattan Project-style AI initiative
Phib · 2024-11-19T18:42:43.296Z · comments (7)

[link] Ideas for benchmarking LLM creativity
gwern · 2024-12-16T05:18:55.631Z · comments (11)

Win/continue/lose scenarios and execute/replace/audit protocols
Buck · 2024-11-15T15:47:24.868Z · comments (2)

[link] a space habitat design
bhauth · 2024-11-25T17:28:48.481Z · comments (13)

Some lessons from the OpenAI-FrontierMath debacle
7vik (satvik-golechha) · 2025-01-19T21:09:17.990Z · comments (9)

Toward Safety Case Inspired Basic Research
Lucas Teixeira · 2024-10-31T23:06:32.854Z · comments (3)

Vegans need to eat just enough Meat - emperically evaluate the minimum ammount of meat that maximizes utility
Johannes C. Mayer (johannes-c-mayer) · 2024-12-22T22:08:31.971Z · comments (35)

AI Assistants Should Have a Direct Line to Their Developers
Jan_Kulveit · 2024-12-28T17:01:58.643Z · comments (6)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

kajus on Doing a self-randomized study of the impacts of glycine on sleep (Science is hard)

There are apps that can measure when you go to sleep based on your breath or something. Maybe that could be helpful?

viliam on leogao's Shortform

yep. doing it and then redoing it can still be much faster than procrastinating on it

niplav on Doing a self-randomized study of the impacts of glycine on sleep (Science is hard)

Cool to see people doing self-blinded & randomized QS experiments :-)

Two tips (which you might have considered already): (1) You can buy empty pill capsules and fill them with whatever you want. That makes it a lot easier to blind for taste, and far less annoying to consume with eyes closed. (2) I've found it useful to use a wearable tracker for data collection, especially for sleep, since I can't be arsed to write all that down. ~All trackers allow for data export (thanks GDPR!), I use a cheap fitbit.

mikbp on mikbp's Shortform

There is an Avaaz signature campaign to "Establish National Licensing systems for AGI before it is fully achieved" (started by the director of the Millennium Project, Jerome Glenn) you may want to sign.

I'm not sure how fitting such a petition is for LW, that's why I put it here. If somebody more involved (forum admins, maybe?) thinks it is worth for it to have a real post, please do it.

fdrocha on Mechanisms too simple for humans to design

Taking your Tetris example, sure 6KB seems small -- as long as you restrict yourself to a space of all possible programs for Gameboy or whichever platform you took this example from. But if your goal is to encode Tetris for a computer engineer who has no knowledge about Gameboy, you will have to include, at the very least, the documentation on the CPU ISA, the hardware architecture of the device and the details on the quirks of its I/O hardware. That would already bring the "size of Tetris" to 10s of megabytes. Describing it for a person from 1950s, I suspect, would require a decent chunk of Internet in addition.

I don't think this is making it a fairer comparison. For bacteria, doesn't that mean you'd have to include descriptions of DNA, amino acids, proteins in general and everything known about the specific proteins used by the bacteria, etc? You quickly end up with a decent chunk of the Internet as well.

Kolgomorov complexity is not about how much background knowledge or computational effort was required to produce some from first principles output. It is about how much, given infinite knowledge and time, you can compress a complete description of the output. Which maybe means it's not the right metric to use here...

knight-lee on MONA: Managed Myopia with Approval Feedback

This is a very beautiful idea! It feels like the kind of clever discoveries that we need.

I think one possible generalization of MONA, is that a relatively trusted but weaker model makes the decisions, but a more stronger but untrusted model gets trained to give ideas/advice to the weaker model. Its RL goal is not how well the weaker model performs, just whether the weaker model likes its ideas/advice.

This generalization preserves MONA's advantage over scalable oversight: if the stronger model's reasons are hidden or incomprehensible to the weaker model, the stronger model can't get away with it. It won't be rewarded for learning such reasons in the first place.

Just like scalable oversight, the weaker model might have an architecture which improves alignment at a capability cost.

It's more general than MONA in the sense the approval feedback can be swapped for any trusted but weaker model, which doesn't just judge ideas but uses ideas. It is allowed to learn over time which ideas work better, but its learning process is relatively safer (due to its architecture or whatever reason we trust it more).

Do you think this is a next step worth exploring?

mikbp on [Link] A community alert about Ziz

Agree.

I consider myself a rationalist, but if I were going to recommend this post to anyone, it would be to show how dangerous fanaticism/sectarianism is --and this is not only about Ziz's and Co. actions, also many comments here are kind of scary. It shows how single-thinking and going to the extreme about one's convictions, even if one tries very hard to be correct, often goes extremely badly. We humans are dumb, so just don't take anything too literally or too much to its extreme, take yourself and your thoughts with a bit of salt, be nuanced (at least nuancedly nuanced).

No wonder some people hate or are afraid of rationalist when they see stuff like this... and when they see that this is resurfaced after several years with no apparent reason.

fdrocha on Mechanisms too simple for humans to design

I don't think it affects the essence of your argument, but I would say that you cannot get a good estimate of the Kolgomorov complexity of Word or other modern software from binary size. The Kolgomorov complexity of Word should properly be the size of the smallest binary that would execute in an indistinguishable way to Word. There are very good reasons to think that the existing Word binary is significantly larger than that.

Modern software development practices optimize for a combination of factors where binary size has very little weight. Development and maintenance time and cost are paramount are usually the biggest factors, absence of bugs and performance are relatively smaller concerns in most cases and size not a factor except in some special cases.

Sometimes human programmers do optimize primarily for size and if you look at the tricks they came up with have similar vibes to some of the biology tricks described in the post. For a charming example in early constrained computer systems look at http://www.catb.org/jargon/html/story-of-mel.html. For a community of people doing this type of thing for toy problems see https://codegolf.stackexchange.com. If you just want to be dazzled by how much people can pack into 4kb binaries when they are really trying look at https://www.youtube.com/playlist?list=PLjxyPjW-DeNWennaPMEPBDoj5GFTj3rbz.

vanessa-kosoy on Announcement: Learning Theory Online Course

Not sure these are the best textbooks, but you can try:

"Naive Set Theory" by Halmos
"Probability Theory" by Jaynes
"Introduction to the Theory of Computation" by Sipser

neel-nanda-1 on Tips and Code for Empirical Research Workflows

I've been really enjoying voice to text + LLMs recently, via a great Mac App called Super Whisper (which can work with local speech to text models, so could also possibly be used for confidential stuff) - combining Super Whisper and Claude and Cursor means I can just vaguely ramble at my laptop about what experiments should happen and they happen, it's magical!