LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

(The) Lightcone is nothing without its people: LW + Lighthaven's big fundraiser
habryka (habryka4) · 2024-11-30T02:55:16.077Z · comments (257)

OpenAI Email Archives (from Musk v. Altman and OpenAI blog)
habryka (habryka4) · 2024-11-16T06:38:03.937Z · comments (80)

Alignment Faking in Large Language Models
ryan_greenblatt · 2024-12-18T17:19:06.665Z · comments (53)

The hostile telepaths problem
Valentine · 2024-10-27T15:26:53.610Z · comments (85)

[link] Survival without dignity
L Rudolf L (LRudL) · 2024-11-04T02:29:38.758Z · comments (29)

[link] Biological risk from the mirror world
jasoncrawford · 2024-12-12T19:07:06.305Z · comments (36)

What’s the short timeline plan?
Marius Hobbhahn (marius-hobbhahn) · 2025-01-02T14:59:20.026Z · comments (47)

[link] Review: Planecrash
L Rudolf L (LRudL) · 2024-12-27T14:18:33.611Z · comments (40)

[link] I got dysentery so you don’t have to
eukaryote · 2024-10-22T04:55:58.422Z · comments (4)

What Goes Without Saying
sarahconstantin · 2024-12-20T18:00:06.363Z · comments (19)

The Field of AI Alignment: A Postmortem, and What To Do About It
johnswentworth · 2024-12-26T18:48:07.614Z · comments (154)

The Online Sports Gambling Experiment Has Failed
Zvi · 2024-11-11T14:30:04.371Z · comments (59)

[link] By default, capital will matter more than ever after AGI
L Rudolf L (LRudL) · 2024-12-28T17:52:58.358Z · comments (99)

Orienting to 3 year AGI timelines
Nikola Jurkovic (nikolaisalreadytaken) · 2024-12-22T01:15:11.401Z · comments (47)

You are not too "irrational" to know your preferences.
DaystarEld · 2024-11-26T15:01:42.996Z · comments (50)

Ayn Rand’s model of “living money”; and an upside of burnout
AnnaSalamon · 2024-11-16T02:59:07.368Z · comments (58)

[link] Understanding Shapley Values with Venn Diagrams
Carson L · 2024-12-06T21:56:43.960Z · comments (34)

[link] What TMS is like
Sable · 2024-10-31T00:44:22.612Z · comments (23)

Frontier Models are Capable of In-context Scheming
Marius Hobbhahn (marius-hobbhahn) · 2024-12-05T22:11:17.320Z · comments (23)

Communications in Hard Mode (My new job at MIRI)
tanagrabeast · 2024-12-13T20:13:44.825Z · comments (25)

Making a conservative case for alignment
Cameron Berg (cameron-berg) · 2024-11-15T18:55:40.864Z · comments (68)

[link] The Compendium, A full argument about extinction risk from AGI
adamShimi · 2024-10-31T12:01:51.714Z · comments (52)

Information vs Assurance
johnswentworth · 2024-10-20T23:16:25.762Z · comments (17)

Shallow review of technical AI safety, 2024
technicalities · 2024-12-29T12:01:14.724Z · comments (33)

[link] When Is Insurance Worth It?
kqr · 2024-12-19T19:07:32.573Z · comments (65)

[link] Overcoming Bias Anthology
Arjun Panickssery (arjun-panickssery) · 2024-10-20T02:01:23.463Z · comments (14)

The Summoned Heroine's Prediction Markets Keep Providing Financial Services To The Demon King!
abstractapplic · 2024-10-26T12:34:51.059Z · comments (16)

The Median Researcher Problem
johnswentworth · 2024-11-02T20:16:11.341Z · comments (71)

o1 is a bad idea
abramdemski · 2024-11-11T21:20:24.892Z · comments (38)

[link] o1: A Technical Primer
Jesse Hoogland (jhoogland) · 2024-12-09T19:09:12.413Z · comments (17)

Neutrality
sarahconstantin · 2024-11-13T23:10:05.469Z · comments (27)

[link] Gradient Routing: Masking Gradients to Localize Computation in Neural Networks
cloud · 2024-12-06T22:19:26.717Z · comments (12)

Current safety training techniques do not fully transfer to the agent setting
Simon Lermen (dalasnoin) · 2024-11-03T19:24:51.537Z · comments (8)

o3
Zach Stein-Perlman · 2024-12-20T18:30:29.448Z · comments (155)

Subskills of "Listening to Wisdom"
Raemon · 2024-12-09T03:01:18.706Z · comments (22)

"It's a 10% chance which I did 10 times, so it should be 100%"
egor.timatkov · 2024-11-18T01:14:27.738Z · comments (57)

A Rocket–Interpretability Analogy
plex (ete) · 2024-10-21T13:55:18.184Z · comments (31)

OpenAI #10: Reflections
Zvi · 2025-01-07T17:00:07.348Z · comments (6)

Repeal the Jones Act of 1920
Zvi · 2024-11-27T15:00:06.801Z · comments (23)

Activation space interpretability may be doomed
bilalchughtai (beelal) · 2025-01-08T12:49:38.421Z · comments (28)

“Alignment Faking” frame is somewhat fake
Jan_Kulveit · 2024-12-20T09:51:04.664Z · comments (13)

Maximizing Communication, not Traffic
jefftk (jkaufman) · 2025-01-05T13:00:02.280Z · comments (7)

How will we update about scheming?
ryan_greenblatt · 2025-01-06T20:21:52.281Z · comments (19)

[link] China Hawks are Manufacturing an AI Arms Race
garrison · 2024-11-20T18:17:51.958Z · comments (42)

What Indicators Should We Watch to Disambiguate AGI Timelines?
snewman · 2025-01-06T19:57:43.398Z · comments (48)

[question] Which things were you surprised to learn are not metaphors?
Eric Neyman (UnexpectedValues) · 2024-11-21T18:56:18.025Z · answers+comments (79)

What o3 Becomes by 2028
Vladimir_Nesov · 2024-12-22T12:37:20.929Z · comments (15)

Why Don't We Just... Shoggoth+Face+Paraphraser?
Daniel Kokotajlo (daniel-kokotajlo) · 2024-11-19T20:53:52.084Z · comments (55)

Capital Ownership Will Not Prevent Human Disempowerment
beren · 2025-01-05T06:00:23.095Z · comments (11)

[link] Parkinson's Law and the Ideology of Statistics
Benquo · 2025-01-04T15:49:21.247Z · comments (6)

next page (older posts) →

Archive

Recent comments

wassname on New, improved multiple-choice TruthfulQA

Owen, have you looked at the GitHub issues in your repo? There are other issues too. I submitted one here about wrong labels.

I really think it's worth making TruthfulQA 2.0, give the amount of usage it sees and the room for improvement.

wassname on Gaming TruthfulQA: Simple Heuristics Exposed Dataset Weaknesses

TruthfulQA is actually quite bad. I don't blame the authors, as no one has made anything better, but we really should. It's only ~800 samples. And many of them are badly labelled.

wassname on Nathan Helm-Burger's Shortform

I agree, it shows the ease of shoffy copying. But it doesn't show the ease of reverse engineering or parallel engineering.

It's just distillation, though. It doesn't reveal how o1 could be constructed, it just reveals how to efficiently copy from o1-like outputs (not from scratch). This recipe won't be able to make o1, unless o1 already exists. That means this method of copying lets someone catch up to the leader, but not surpass them.

There are some papers that attempt to replicate o1 though, and so far they don't quite get there, using distillation from a larger model (math-star, huggingface TTC) or not matching the results (see my post [LW(p) · GW(p)]). Maybe we will see open source replication in a couple of months? Which means only a short lag.

It's worth noting that Silicon Valley leaks like a sieve. And this is a feature, not a bug. Part of the reason it became the techno-VC centre of the world is because they banned non-competes. So you can take your competitor's trade secrets if you are willing to pay millions to poach some of their engineers. This is why some ML engineers get paid millions, it's not the skill, it's the trade secrets that competitors are paying for (and sometimes the brand-name). This has been great for tech and civilisation, but it's not so great for maintaining a technology lead.

christiankl on Unregulated Peptides: Does BPC-157 hold its promises?

That's not a good data point. If you want to provide anecdotal data, it would be good to provide more of the observations. How long did he have a should issue before taking BPC-157? How fast did it get away afterward?

benquo on Rough Sketch for Product to Enhance Citizen Participation in Politics

Your proposal is well-structured and interesting but has a fundamental flaw that needs to be addressed. Interest keyword-based filtering will primarily encourage politics-as-identity, which is actively harmful - it directs attention towards zero-sum thinking and performative identities, rather than creative problem solving. As Bryan Caplan demonstrates in The Myth of the Rational Voter, people already tend to vote to express identities and affiliations rather than to achieve better outcomes. We shouldn't build tools that further entrench this destructive pattern.

Instead, imagine a tool that:

Has users journal daily about their life - activities, hopes, problems, and worries
Uses AI to identify where their constraints are plausibly caused by or could be alleviated by government action, especially local government
Maps them to specific opportunities for formal recourse, with guidance on process, likely outcomes, and practical assistance (like drafting letters or legal documents)
For issues requiring collective action, connects users facing similar constraints and helps coordinate through mechanisms like dominant assurance contracts [LW · GW] where appropriate

This approach would ground political participation in the solving of one's own problems rather than identity expression. While technically more challenging to implement than interest-based filtering, it would generate higher-quality engagement that expands our collective problem-solving capacity rather than just reallocating political power between existing interest groups.

The patterns emerging from aggregated user experiences would naturally reveal systemic issues and preventive opportunities, especially in how regulations and policies interact to shape people's choices and planning horizons. While building reliable AI judgment about political causation is challenging, it's better to attempt something hard that would be beneficial if feasible, than to facilitate the destructive forces of identity-based politics simply because they're easier to implement.

waterlubber on Unregulated Peptides: Does BPC-157 hold its promises?

Anecdotal data point: an (online) friend of mine with EDS successfully used BPC-157 to treat shoulder ligament injury, although apparently it promoted scar tissue formation as well. He claims that it produced a significant improvement in his symptoms.

yonatan-cale-1 on Yonatan Cale's Shortform

More on starting early:

Imagine a lab starts working in an air gapped network, and one of the 1000 problems that comes up is working-from-home.

If that problem comes up now (early), then we can say "okay, working from home is allowed", and we'll add that problem to the queue of things that we'll prioritize and solve. We can also experiment with it: Maybe we can open another secure office closer to the employee's house, would they like that? If so, we could discuss fancy ways to secure the communication between the offices. If not, we can try something else.

If that problem comes up when security is critical (if we wait), then the solution will be "no more working from home, period". The security staff will be too overloaded with other problems to solve, not available to experiment with having another office nor to sign a deal with Cursor.

anthonyc on Passages I Highlighted in The Letters of J.R.R.Tolkien

Edit to add: Just thinking about the converse, you could also make it sound more ridiculous by rewriting it with more obscure parts of the legendarium, too.

Conquer Morgoth with Ungoliant. Turn Maiar into balrogs. Glamdring among the morgul-blades.

sharmake-farah on What Is The Alignment Problem?

Third reason “patterns not holding” is less central an issue than it might seem: the Generalized Correspondence Principle. When quantum mechanics or general relativity came along, they still had to agree with classical mechanics in all the (many) places where classical mechanics worked. More generally: if some pattern in fact holds, then it will still be true that the pattern held under the original context even if later data departs from the pattern, and typically the pattern will generalize in some way to the new data. Prototypical example: maybe in the blegg/rube example, some totally new type of item is introduced, a gold donut (“gonut”). And then we’d have a whole new cluster, but the two old clusters are still there; the old pattern is still present in the environment.

While a trivial version of something like this holds true, the Correspondence principle doesn't apply everywhere, and while there are 2 positive results on a correspondence theorem holding, there is a negative result stating that the correspondence principle is false in the general case of physical laws/rules whose only requirement is that they be Turing-computable, which means that there's no way to make theories all add up to normality in all cases.

More here:

https://www.lesswrong.com/posts/XMGWdfTC7XjgTz3X7/a-correspondence-theorem-in-the-maximum-entropy-framework [LW · GW]

https://www.lesswrong.com/posts/FWuByzM9T5qq2PF2n/a-correspondence-theorem [LW · GW]

https://www.lesswrong.com/posts/74crqQnH8v9JtJcda/egan-s-theorem#oZNLtNAazf3E5bN6X [LW(p) · GW(p)]

https://www.lesswrong.com/posts/74crqQnH8v9JtJcda/egan-s-theorem#M6MfCwDbtuPuvoe59 [LW(p) · GW(p)]

https://www.lesswrong.com/posts/74crqQnH8v9JtJcda/egan-s-theorem#XQDrXyHSJzQjkRDZc [LW(p) · GW(p)]

anthonyc on Passages I Highlighted in The Letters of J.R.R.Tolkien

I would assume that his children in particular would be quite familiar with their usage, though, and that seems to be who a lot of the legendarium-heavy letters are written to.

I also think that it sounds at least slightly less ridiculous to rewrite that passage in the language of Star Wars rather than Starcraft. Conquer the Emperor with the Dark Side. Turn Jedi into Sith. An X-Wing among the TIE fighters. Probably because it's more culturally established, with a more deeply developed mythos.