LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

What You Can Give Instead of Advice
Karl Faulks (karl-faulks) · 2024-10-24T23:10:48.014Z · comments (2)

[link] AISN #45: Center for AI Safety 2024 Year in Review
Corin Katzke (corin-katzke) · 2024-12-19T18:15:56.416Z · comments (0)

Reflections on ML4Good
james__p · 2024-11-25T02:40:32.586Z · comments (0)

LLM Psychometrics and Prompt-Induced Psychopathy
Korbinian K. (korbinian-koch) · 2024-10-18T18:11:24.256Z · comments (2)

Approaches to Group Singing
jefftk (jkaufman) · 2025-01-01T12:50:01.877Z · comments (1)

Motte-and-Bailey: a Short Explanation
Lorec · 2024-10-23T22:29:55.074Z · comments (0)

[link] Markov's Inequality Explained
criticalpoints · 2025-01-08T00:31:55.125Z · comments (2)

A Poem Is All You Need: Jailbreaking ChatGPT, Meta & More
Sharat Jacob Jacob (sharat-jacob-jacob) · 2024-10-29T12:41:30.337Z · comments (0)

Sideloading: creating a model of a person via LLM with very large prompt
avturchin · 2024-11-22T16:41:28.293Z · comments (4)

GPT-4o Can In Some Cases Solve Moderately Complicated Captchas
dirk (abandon) · 2024-11-09T04:04:37.782Z · comments (2)

Basics of Bayesian learning
Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-14T10:00:46.000Z · comments (0)

Simple Steganographic Computation Eval - gpt-4o and gemini-exp-1206 can't solve it yet
Filip Sondej · 2024-12-19T15:47:05.512Z · comments (2)

AXRP Episode 38.1 - Alan Chan on Agent Infrastructure
DanielFilan · 2024-11-16T23:30:09.098Z · comments (0)

[link] Linkpost: Rat Traps by Sheon Han in Asterisk Mag
Chris_Leong · 2024-12-03T03:22:45.424Z · comments (5)

[question] Who are the worthwhile non-European pre-Industrial thinkers?
Lorec · 2024-12-03T01:45:31.445Z · answers+comments (4)

No Internally-Crispy Mac and Cheese
jefftk (jkaufman) · 2024-12-20T03:20:01.798Z · comments (5)

A good way to build many air filters on the cheap
winstonBosan · 2024-12-08T01:47:58.236Z · comments (5)

curate
technicalities · 2025-01-14T14:40:30.510Z · comments (0)

(My) self-referential reason to believe in free will
jacek (jacek-karwowski) · 2025-01-06T23:35:02.809Z · comments (5)

ML4Good (AI Safety Bootcamp) - Experience report
JanEbbing · 2024-11-05T01:18:43.554Z · comments (0)

Preliminary Thoughts on Flirting Theory
Alice Blair (Diatom) · 2024-12-24T07:37:47.045Z · comments (6)

Commenting Patterns by Platform
jefftk (jkaufman) · 2024-12-01T11:50:06.932Z · comments (0)

Playing with Otamatones
jefftk (jkaufman) · 2025-01-02T19:50:01.781Z · comments (0)

Exploring the petertodd / Leilan duality in GPT-2 and GPT-J
mwatkins · 2024-12-23T13:17:53.755Z · comments (1)

How Much to Give is a Pragmatic Question
jefftk (jkaufman) · 2024-12-24T04:20:01.480Z · comments (1)

[link] AI Prejudices: Practical Implications
PeterMcCluskey · 2024-10-19T02:19:58.695Z · comments (0)

[link] My AI timelines
samuelshadrach (xpostah) · 2024-12-22T21:06:41.722Z · comments (2)

[question] AI for medical care for hard-to-treat diseases?
CronoDAS · 2025-01-10T23:55:39.902Z · answers+comments (0)

[question] How counterfactual are logical counterfactuals?
Donald Hobson (donald-hobson) · 2024-12-15T21:16:40.515Z · answers+comments (10)

[link] The Computational Complexity of Circuit Discovery for Inner Interpretability
Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-10-17T13:18:46.378Z · comments (2)

How I'd like alignment to get done (as of 2024-10-18)
TristanTrim · 2024-10-18T23:39:03.107Z · comments (4)

Conversational Signposts—An Antidote to Dull Social Interactions
Declan Molony (declan-molony) · 2024-10-22T05:37:56.175Z · comments (6)

[link] OpenAI’s cybersecurity is probably regulated by NIS Regulations
Adam Jones (domdomegg) · 2024-10-25T11:06:38.392Z · comments (2)

Substituting Talkbox for Breath Controller
jefftk (jkaufman) · 2024-10-27T19:10:03.768Z · comments (0)

Updating the NAO Simulator
jefftk (jkaufman) · 2024-10-30T13:50:06.908Z · comments (0)

Spooky Recommendation System Scaling
phdead · 2024-10-31T22:00:51.728Z · comments (0)

[link] Anthropic - The case for targeted regulation
anaguma · 2024-11-05T07:07:48.174Z · comments (0)

LDT (and everything else) can be irrational
Christopher King (christopher-king) · 2024-11-06T04:05:36.932Z · comments (7)

Fundamental Uncertainty: Chapter 9 - How do we live with uncertainty?
Gordon Seidoh Worley (gworley) · 2024-11-07T18:15:45.049Z · comments (2)

The Three Warnings of the Zentradi
Trevor Hill-Hand (Jadael) · 2024-11-21T20:28:45.567Z · comments (1)

Rethinking Laplace's Rule of Succession
Cleo Nardo (strawberry calm) · 2024-11-22T18:46:25.156Z · comments (5)

Reward Bases: A simple mechanism for adaptive acquisition of multiple reward type
Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-11-23T12:45:01.067Z · comments (0)

[link] LLMs Do Not Think Step-by-step In Implicit Reasoning
Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-11-28T09:16:57.463Z · comments (0)

Launching Applications for the Global AI Safety Fellowship 2025!
Aditya_SK (team-ai-safety) · 2024-11-30T14:02:16.537Z · comments (5)

[link] Picking favourites is hard
dkl9 · 2024-12-04T20:46:47.470Z · comments (3)

Rethink Wellbeing’s Year 2 Update: Foster Sustainable High Performance for Ambitious Altruists
Inga G. (inga-g) · 2024-12-08T14:32:39.902Z · comments (1)

[link] My Mental Model of AI Creativity – Creativity Kiki
Adam Newgas (BorisTheBrave) · 2024-12-09T22:24:23.096Z · comments (0)

[link] Forecast With GiveWell
ChristianWilliams · 2024-12-11T17:52:32.293Z · comments (0)

[question] Do infinite alternatives make AI alignment impossible?
Dakara (chess-ice) · 2024-12-16T18:11:51.233Z · answers+comments (2)

Apply now to SPAR!
agucova · 2024-12-19T22:29:58.963Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

wassname on New, improved multiple-choice TruthfulQA

Owen, have you looked at the GitHub issues in your repo? There are other issues too. I submitted one here about wrong labels.

I really think it's worth making TruthfulQA 2.0, give the amount of usage it sees and the room for improvement.

wassname on Gaming TruthfulQA: Simple Heuristics Exposed Dataset Weaknesses

TruthfulQA is actually quite bad. I don't blame the authors, as no one has made anything better, but we really should. It's only ~800 samples. And many of them are badly labelled.

wassname on Nathan Helm-Burger's Shortform

I agree, it shows the ease of shoffy copying. But it doesn't show the ease of reverse engineering or parallel engineering.

It's just distillation, though. It doesn't reveal how o1 could be constructed, it just reveals how to efficiently copy from o1-like outputs (not from scratch). This recipe won't be able to make o1, unless o1 already exists. That means this method of copying lets someone catch up to the leader, but not surpass them.

There are some papers that attempt to replicate o1 though, and so far they don't quite get there, using distillation from a larger model (math-star, huggingface TTC) or not matching the results (see my post [LW(p) · GW(p)]). Maybe we will see open source replication in a couple of months? Which means only a short lag.

It's worth noting that Silicon Valley leaks like a sieve. And this is a feature, not a bug. Part of the reason it became the techno-VC centre of the world is because they banned non-competes. So you can take your competitor's trade secrets if you are willing to pay millions to poach some of their engineers. This is why some ML engineers get paid millions, it's not the skill, it's the trade secrets that competitors are paying for (and sometimes the brand-name). This has been great for tech and civilisation, but it's not so great for maintaining a technology lead.

christiankl on Unregulated Peptides: Does BPC-157 hold its promises?

That's not a good data point. If you want to provide anecdotal data, it would be good to provide more of the observations. How long did he have a should issue before taking BPC-157? How fast did it get away afterward?

benquo on Rough Sketch for Product to Enhance Citizen Participation in Politics

Your proposal is well-structured and interesting but has a fundamental flaw that needs to be addressed. Interest keyword-based filtering will primarily encourage politics-as-identity, which is actively harmful - it directs attention towards zero-sum thinking and performative identities, rather than creative problem solving. As Bryan Caplan demonstrates in The Myth of the Rational Voter, people already tend to vote to express identities and affiliations rather than to achieve better outcomes. We shouldn't build tools that further entrench this destructive pattern.

Instead, imagine a tool that:

Has users journal daily about their life - activities, hopes, problems, and worries
Uses AI to identify where their constraints are plausibly caused by or could be alleviated by government action, especially local government
Maps them to specific opportunities for formal recourse, with guidance on process, likely outcomes, and practical assistance (like drafting letters or legal documents)
For issues requiring collective action, connects users facing similar constraints and helps coordinate through mechanisms like dominant assurance contracts [LW · GW] where appropriate

This approach would ground political participation in the solving of one's own problems rather than identity expression. While technically more challenging to implement than interest-based filtering, it would generate higher-quality engagement that expands our collective problem-solving capacity rather than just reallocating political power between existing interest groups.

The patterns emerging from aggregated user experiences would naturally reveal systemic issues and preventive opportunities, especially in how regulations and policies interact to shape people's choices and planning horizons. While building reliable AI judgment about political causation is challenging, it's better to attempt something hard that would be beneficial if feasible, than to facilitate the destructive forces of identity-based politics simply because they're easier to implement.

waterlubber on Unregulated Peptides: Does BPC-157 hold its promises?

Anecdotal data point: an (online) friend of mine with EDS successfully used BPC-157 to treat shoulder ligament injury, although apparently it promoted scar tissue formation as well. He claims that it produced a significant improvement in his symptoms.

yonatan-cale-1 on Yonatan Cale's Shortform

More on starting early:

Imagine a lab starts working in an air gapped network, and one of the 1000 problems that comes up is working-from-home.

If that problem comes up now (early), then we can say "okay, working from home is allowed", and we'll add that problem to the queue of things that we'll prioritize and solve. We can also experiment with it: Maybe we can open another secure office closer to the employee's house, would they like that? If so, we could discuss fancy ways to secure the communication between the offices. If not, we can try something else.

If that problem comes up when security is critical (if we wait), then the solution will be "no more working from home, period". The security staff will be too overloaded with other problems to solve, not available to experiment with having another office nor to sign a deal with Cursor.

anthonyc on Passages I Highlighted in The Letters of J.R.R.Tolkien

Edit to add: Just thinking about the converse, you could also make it sound more ridiculous by rewriting it with more obscure parts of the legendarium, too.

Conquer Morgoth with Ungoliant. Turn Maiar into balrogs. Glamdring among the morgul-blades.

sharmake-farah on What Is The Alignment Problem?

Third reason “patterns not holding” is less central an issue than it might seem: the Generalized Correspondence Principle. When quantum mechanics or general relativity came along, they still had to agree with classical mechanics in all the (many) places where classical mechanics worked. More generally: if some pattern in fact holds, then it will still be true that the pattern held under the original context even if later data departs from the pattern, and typically the pattern will generalize in some way to the new data. Prototypical example: maybe in the blegg/rube example, some totally new type of item is introduced, a gold donut (“gonut”). And then we’d have a whole new cluster, but the two old clusters are still there; the old pattern is still present in the environment.

While a trivial version of something like this holds true, the Correspondence principle doesn't apply everywhere, and while there are 2 positive results on a correspondence theorem holding, there is a negative result stating that the correspondence principle is false in the general case of physical laws/rules whose only requirement is that they be Turing-computable, which means that there's no way to make theories all add up to normality in all cases.

More here:

https://www.lesswrong.com/posts/XMGWdfTC7XjgTz3X7/a-correspondence-theorem-in-the-maximum-entropy-framework [LW · GW]

https://www.lesswrong.com/posts/FWuByzM9T5qq2PF2n/a-correspondence-theorem [LW · GW]

https://www.lesswrong.com/posts/74crqQnH8v9JtJcda/egan-s-theorem#oZNLtNAazf3E5bN6X [LW(p) · GW(p)]

https://www.lesswrong.com/posts/74crqQnH8v9JtJcda/egan-s-theorem#M6MfCwDbtuPuvoe59 [LW(p) · GW(p)]

https://www.lesswrong.com/posts/74crqQnH8v9JtJcda/egan-s-theorem#XQDrXyHSJzQjkRDZc [LW(p) · GW(p)]

anthonyc on Passages I Highlighted in The Letters of J.R.R.Tolkien

I would assume that his children in particular would be quite familiar with their usage, though, and that seems to be who a lot of the legendarium-heavy letters are written to.

I also think that it sounds at least slightly less ridiculous to rewrite that passage in the language of Star Wars rather than Starcraft. Conquer the Emperor with the Dark Side. Turn Jedi into Sith. An X-Wing among the TIE fighters. Probably because it's more culturally established, with a more deeply developed mythos.