LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

OpenAI o1, Llama 4, and AlphaZero of LLMs
Vladimir_Nesov · 2024-09-14T21:27:41.241Z · comments (24)

Self-prediction acts as an emergent regularizer
Cameron Berg (cameron-berg) · 2024-10-23T22:27:03.664Z · comments (4)

[link] What Depression Is Like
Sable · 2024-08-27T17:43:22.549Z · comments (23)

Why you should be using a retinoid
GeneSmith · 2024-08-19T03:07:41.722Z · comments (57)

AI #83: The Mask Comes Off
Zvi · 2024-09-26T12:00:08.689Z · comments (19)

Release: Optimal Weave (P1): A Prototype Cohabitive Game
mako yass (MakoYass) · 2024-08-17T14:08:18.947Z · comments (21)

Decomposing the QK circuit with Bilinear Sparse Dictionary Learning
keith_wynroe · 2024-07-02T13:17:16.352Z · comments (7)

Values Are Real Like Harry Potter
johnswentworth · 2024-10-09T23:42:24.724Z · comments (17)

3C's: A Recipe For Mathing Concepts
johnswentworth · 2024-07-03T01:06:11.944Z · comments (5)

[link] Not every accommodation is a Curb Cut Effect: The Handicapped Parking Effect, the Clapper Effect, and more
Michael Cohn (michael-cohn) · 2024-09-15T05:27:36.691Z · comments (39)

The case for a negative alignment tax
Cameron Berg (cameron-berg) · 2024-09-18T18:33:18.491Z · comments (20)

Quick look: applications of chaos theory
Elizabeth (pktechgirl) · 2024-08-18T15:00:07.853Z · comments (46)

[Intuitive self-models] 2. Conscious Awareness
Steven Byrnes (steve2152) · 2024-09-25T13:29:02.820Z · comments (48)

How to prevent collusion when using untrusted models to monitor each other
Buck · 2024-09-25T18:58:20.693Z · comments (5)

[link] Is "superhuman" AI forecasting BS? Some experiments on the "539" bot from the Centre for AI Safety
titotal (lombertini) · 2024-09-18T13:07:40.754Z · comments (3)

Corrigibility = Tool-ness?
johnswentworth · 2024-06-28T01:19:48.883Z · comments (8)

Secular interpretations of core perennialist claims
zhukeepa · 2024-08-25T23:41:02.683Z · comments (32)

Catastrophic sabotage as a major threat model for human-level AI systems
evhub · 2024-10-22T20:57:11.395Z · comments (4)

Secondary forces of debt
KatjaGrace · 2024-06-27T21:10:06.131Z · comments (18)

[link] Gwern: Why So Few Matt Levines?
kave · 2024-10-29T01:07:27.564Z · comments (9)

What is malevolence? On the nature, measurement, and distribution of dark traits
David Althaus (wallowinmaya) · 2024-10-23T08:41:33.197Z · comments (13)

Value fragility and AI takeover
Joe Carlsmith (joekc) · 2024-08-05T21:28:07.306Z · comments (5)

Scaffolding for "Noticing Metacognition"
Raemon · 2024-10-09T17:54:13.657Z · comments (4)

Darwinian Traps and Existential Risks
KristianRonn · 2024-08-25T22:37:14.142Z · comments (14)

Bitter lessons about lucid dreaming
avturchin · 2024-10-16T21:27:04.725Z · comments (61)

On the CrowdStrike Incident
Zvi · 2024-07-22T12:40:05.894Z · comments (14)

Why I quit effective altruism, and why Timothy Telleen-Lawton is staying (for now)
Elizabeth (pktechgirl) · 2024-10-22T18:20:01.194Z · comments (77)

JargonBot Beta Test
Raemon · 2024-11-01T01:05:26.552Z · comments (52)

My 10-year retrospective on trying SSRIs
Kaj_Sotala · 2024-09-22T20:30:02.483Z · comments (10)

The Obliqueness Thesis
jessicata (jessica.liu.taylor) · 2024-09-19T00:26:30.677Z · comments (16)

Rationality Quotes - Fall 2024
Screwtape · 2024-10-10T18:37:55.013Z · comments (22)

Introducing Transluce — A Letter from the Founders
jsteinhardt · 2024-10-23T18:10:02.526Z · comments (2)

Dentistry, Oral Surgeons, and the Inefficiency of Small Markets
GeneSmith · 2024-11-01T17:26:06.466Z · comments (16)

A Simple Toy Coherence Theorem
johnswentworth · 2024-08-02T17:47:50.642Z · comments (17)

Could randomly choosing people to serve as representatives lead to better government?
John Huang · 2024-10-21T17:10:20.920Z · comments (12)

Joshua Achiam Public Statement Analysis
Zvi · 2024-10-10T12:50:06.285Z · comments (14)

Interpreting Preference Models w/ Sparse Autoencoders
Logan Riggs (elriggs) · 2024-07-01T21:35:40.603Z · comments (12)

[question] Interest in Leetcode, but for Rationality?
Gregory (gregory-eales) · 2024-10-16T17:54:25.578Z · answers+comments (20)

[link] Soft Nationalization: how the USG will control AI labs
Deric Cheng (deric-cheng) · 2024-08-27T15:11:14.601Z · comments (7)

[link] A Narrow Path: a plan to deal with AI extinction risk
Andrea_Miotti (AndreaM) · 2024-10-07T13:02:15.229Z · comments (10)

D&D.Sci Scenario Index
aphyer · 2024-07-23T02:00:43.483Z · comments (0)

[link] LK-99 in retrospect
bhauth · 2024-07-07T02:06:27.660Z · comments (21)

In Defense of Open-Minded UDT
abramdemski · 2024-08-12T18:27:36.220Z · comments (27)

[link] Excerpts from "A Reader's Manifesto"
Arjun Panickssery (arjun-panickssery) · 2024-09-06T22:37:40.254Z · comments (1)

AI for Bio: State Of The Field
sarahconstantin · 2024-08-30T18:00:02.187Z · comments (2)

Guide to SB 1047
Zvi · 2024-08-20T13:10:07.408Z · comments (18)

FarmKind's Illusory Offer
jefftk (jkaufman) · 2024-08-09T11:30:07.082Z · comments (5)

The Mask Comes Off: At What Price?
Zvi · 2024-10-21T23:50:05.247Z · comments (16)

[link] If far-UV is so great, why isn't it everywhere?
Austin Chen (austin-chen) · 2024-10-19T18:56:58.910Z · comments (23)

Adam Optimizer Causes Privileged Basis in Transformer LM Residual Stream
Diego Caples (diego-caples) · 2024-09-06T17:55:34.265Z · comments (7)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

keith_wynroe on A Simple Toy Coherence Theorem

The constant bound isn't not that relevant just because of the in principal unbounded size, it also doesn't constrain the induced probabilities in the second coding scheme much at all. It's an upper bound on the maximum length, so you can still have the weightings in codings scheme B differ differ in relative length by a ton, leading to wildly different priors

And of the encoding schemes that remain on the table, virtually all of them will behave identically with respect to the description lengths they assign to "natural" versus "unnatural" optimization criteria.

I have no idea how you're getting to this, not sure if it's claiming a formal result or just like a hunch. But I disagree both that there is a neat correspondence between a system being physically realizable and its having a concise implementation as a TM. Even granting that point, I don't think that nearly all or even most of these physically realisable systems will behave identically or even similarly w.r.t. how they assign codes to "natural" optimization criteria

james-stephen-brown on Both-Sidesism—When Fair & Balanced Goes Wrong

Thanks Hastings,

I think at that time you could reason much better if you could recognize that the separation between left and right was not natural.

I think you're saying it was easier in the past to see unorthodox or contradictory views within parties because the wings were more clearly delineated. I'd agree, it was a divided time, but a less chaotic divided time.

The effective left right split is mono-factor: you are right exactly in proportion to your personal loyalty to one Donald J. Trump

Absolutely, it's also bizarre regarding his tariff policy which is wholly anti-free market, that's a point the left didn't pick up on (because of the chaos I imagine) that was obvious to me. As a left-wing (pro-taxation) person myself who also believes in free markets, his approach is so anti-thetical to my own views, as if he took the last good idea on the right (free markets), and abandoned that in order to create a party based on all the bad ideas. This sort of contrarianism is something I've read Steven Pinker write about as a loyalty test (to despots and cult leaders)—the inducement to followers to knowingly lie or act contrary to their own interests as a statement of loyalty to each other through joint faith in the dear leader.

alexey on The unreasonable effectiveness of plasmid sequencing as a service

He’d walk on over to nearby industry labs with candy and a sales pitch for why they should use his services. He primarily targeted top, Nobel-prize-winning research groups

and

Plasmidsaurus has historically done very little ‘traditional’ marketing — no brochures, few cold reach-outs

seem to be a bit contradictory?

archimedes on Tapatakt's Shortform

It already happens indirectly. Most digital money transfers are things like credit card transactions. For these, the credit card company takes a percentage fee and pays the government tax on its profit.

sharmake-farah on LLMs Look Increasingly Like General Reasoners

I think that the main conclusion is that large amounts of compute are still necessary in reasoning well OOD, and even though o1 is doing a little reasoning, it's a pretty small scale search (usually seconds or minutes of search.), which means that the fact that it's a derivative GPT-4o model matters a lot for it's incapacity, as it's pretty low on the compute scale compared to other models.

shankar-sivarajan on Tapatakt's Shortform

One consideration is the government wouldn't want to encourage (harder-to-tax) cash transactions.

vladimir_nesov on LLMs Look Increasingly Like General Reasoners

Performance after post-training degrades if behavior gets too far from that of the base/SFT model (see Figure 1). Solving this issue would be an entirely different advancement from what o1-like post-training appears to do. So I expect that the model remains approximately as smart as the base model and the corresponding chatbot, it's just better at packaging its intelligence into relevant long reasoning traces.

archimedes on LLMs Look Increasingly Like General Reasoners

Additional data points:

o1-preview and the new Claude Sonnet 3.5 both significantly improved over prior models on SimpleBench.

The math, coding, and science benchmarks in the o1 announcement post:

BMs

zy on Lessons learned from talking to >100 academics about AI safety

I highly agree with almost all of these points, and those are very consistent with my observation. As I am still relatively new to lesswrong, one big observation (based on my experience) I still see today, is disconnected concepts, definitions, and or terminologies with the academic language. Sometimes I see terminology that already exists in academia and introducing new concepts with the same name may be confusing without using channels academics are used to. There are some terms that I try to search on google for example, but the only relevant ones are from lesswrong or blogposts (which I still then read personally). I think this is getting better - in one of the recent conference reviews, I saw significant increase in submissions in AI safety working on X risks.

Another point as you have mentioned is the reverse ingestion of papers from academia; there are rich papers in interpretability as you have mentioned for example, and some concrete confusion I saw from professors or people already in that field is that why there is feels like a lack of connection with these papers or concepts, even though they seems to be pretty related.

About actions - many people that I see are concerned about AI safety risks in my usual professional group are people who are concerned about or working in current intentional risks like misuse. Those are actually also real risks and have already started (CSAM, deep fake porn with real people's faces, privacy, potential bio/chem weapons), and needs to be worked on as well. It is hard to stop working on them, and transition directly to X risks.

However, I do think it is beneficial to keep merging the academic and AI safety groups, which I see are already underway with examples like more papers, and some PhD positions on AI Safety, industry positions etc; This will increase awareness of AI safety, and as you have mentioned the interests in the technical parts are shared, as they could be applied potentially to many kinds of safety, and hopefully not that much on capabilities (though sometimes not separable).

npostavs on Active Recall and Spaced Repetition are Different Things

by saying their name aloud: [...] …but it’s a lot more difficult to use active recall to remember people’s names.

I'm confused, isn't saying their name in a sentence an example of active recall?