LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

New page: Integrity
Zach Stein-Perlman · 2024-07-10T15:00:41.050Z · comments (3)

How to train your own "Sleeper Agents"
evhub · 2024-02-07T00:31:42.653Z · comments (11)

[link] Introducing METR's Autonomy Evaluation Resources
Megan Kinniment (megan-kinniment) · 2024-03-15T23:16:59.696Z · comments (0)

Prediction Markets aren't Magic
SimonM · 2023-12-21T12:54:07.754Z · comments (29)

LLMs Look Increasingly Like General Reasoners
eggsyntax · 2024-11-08T23:47:28.886Z · comments (45)

Zvi’s Thoughts on His 2nd Round of SFF
Zvi · 2024-11-20T13:40:08.092Z · comments (2)

Towards Multimodal Interpretability: Learning Sparse Interpretable Features in Vision Transformers
hugofry · 2024-04-29T20:57:35.127Z · comments (8)

Partial value takeover without world takeover
KatjaGrace · 2024-04-05T06:20:03.961Z · comments (23)

Review: Conor Moreton's "Civilization & Cooperation"
Duncan Sabien (Deactivated) (Duncan_Sabien) · 2024-05-26T19:32:43.131Z · comments (8)

AI #73: Openly Evil AI
Zvi · 2024-07-18T14:40:05.770Z · comments (20)

Stagewise Development in Neural Networks
Jesse Hoogland (jhoogland) · 2024-03-20T19:54:06.181Z · comments (1)

Based Beff Jezos and the Accelerationists
Zvi · 2023-12-06T16:00:08.380Z · comments (29)

story-based decision-making
bhauth · 2024-02-07T02:35:27.286Z · comments (11)

[link] New report: Safety Cases for AI
joshc (joshua-clymer) · 2024-03-20T16:45:27.984Z · comments (14)

Singular learning theory: exercises
Zach Furman (zfurman) · 2024-08-30T20:00:03.785Z · comments (5)

Covert Malicious Finetuning
Tony Wang (tw) · 2024-07-02T02:41:51.698Z · comments (4)

Anvil Problems
Screwtape · 2024-11-13T22:57:41.974Z · comments (12)

[Intuitive self-models] 1. Preliminaries
Steven Byrnes (steve2152) · 2024-09-19T13:45:27.976Z · comments (20)

[link] Debating with More Persuasive LLMs Leads to More Truthful Answers
Akbir Khan (akbir-khan) · 2024-02-07T21:28:10.694Z · comments (14)

Teaching CS During Take-Off
andrew carle (andrew-carle) · 2024-05-14T22:45:39.447Z · comments (13)

Three Notions of "Power"
johnswentworth · 2024-10-30T06:10:08.326Z · comments (43)

On the abolition of man
Joe Carlsmith (joekc) · 2024-01-18T18:17:06.201Z · comments (18)

[link] Re: Anthropic's suggested SB-1047 amendments
RobertM (T3t) · 2024-07-27T22:32:39.447Z · comments (13)

I'm a bit skeptical of AlphaFold 3
Oleg Trott (oleg-trott) · 2024-06-25T00:04:41.274Z · comments (14)

We might be missing some key feature of AI takeoff; it'll probably seem like "we could've seen this coming"
Lukas_Gloor · 2024-05-09T15:43:11.490Z · comments (36)

How well do truth probes generalise?
mishajw · 2024-02-24T14:12:19.729Z · comments (11)

[link] More Hyphenation
Arjun Panickssery (arjun-panickssery) · 2024-02-07T19:43:29.086Z · comments (19)

[link] Detecting Genetically Engineered Viruses With Metagenomic Sequencing
jefftk (jkaufman) · 2024-06-27T14:01:34.868Z · comments (10)

Natural Latents: The Concepts
johnswentworth · 2024-03-20T18:21:19.878Z · comments (18)

[link] Self-Help Corner: Loop Detection
adamShimi · 2024-10-02T08:33:23.487Z · comments (6)

Solving adversarial attacks in computer vision as a baby version of general AI alignment
Stanislav Fort (stanislavfort) · 2024-08-29T17:17:47.136Z · comments (8)

Research update: Towards a Law of Iterated Expectations for Heuristic Estimators
Eric Neyman (UnexpectedValues) · 2024-10-07T19:29:29.033Z · comments (2)

There is a globe in your LLM
jacob_drori (jacobcd52) · 2024-10-08T00:43:40.300Z · comments (4)

GPT-o1
Zvi · 2024-09-16T13:40:06.236Z · comments (34)

A Crisper Explanation of Simulacrum Levels
Thane Ruthenis · 2023-12-23T22:13:52.286Z · comments (13)

The Aspiring Rationalist Congregation
maia · 2024-01-10T22:52:54.298Z · comments (23)

[Valence series] 2. Valence & Normativity
Steven Byrnes (steve2152) · 2023-12-07T16:43:49.919Z · comments (5)

Addressing Feature Suppression in SAEs
Benjamin Wright (Benw8888) · 2024-02-16T18:32:51.927Z · comments (4)

Apply to be a Safety Engineer at Lockheed Martin!
yanni kyriacos (yanni) · 2024-03-31T21:02:08.499Z · comments (3)

"The Solomonoff Prior is Malign" is a special case of a simpler argument
David Matolcsi (matolcsid) · 2024-11-17T21:32:34.711Z · comments (23)

OpenAI: Helen Toner Speaks
Zvi · 2024-05-30T21:10:02.938Z · comments (8)

A simple case for extreme inner misalignment
Richard_Ngo (ricraz) · 2024-07-13T15:40:37.518Z · comments (41)

Rejecting Television
Declan Molony (declan-molony) · 2024-04-23T04:59:50.253Z · comments (10)

[link] Anxiety vs. Depression
Sable · 2024-03-17T00:15:08.255Z · comments (35)

Scalable oversight as a quantitative rather than qualitative problem
Buck · 2024-07-06T17:42:41.325Z · comments (11)

[link] Environmentalism in the United States Is Unusually Partisan
Jeffrey Heninger (jeffrey-heninger) · 2024-05-13T21:23:10.755Z · comments (26)

Fluent, Cruxy Predictions
Raemon · 2024-07-10T18:00:06.424Z · comments (14)

5 homegrown EA projects, seeking small donors
Austin Chen (austin-chen) · 2024-10-28T23:24:25.745Z · comments (4)

Reflections on Less Online
Error · 2024-07-07T03:49:44.534Z · comments (15)

[link] [Paper] Stress-testing capability elicitation with password-locked models
Fabien Roger (Fabien) · 2024-06-04T14:52:50.204Z · comments (10)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

yanling-guo on How Universal Basic Income Could Help Us Build a Brighter Future

Thank you for the explanation.

By actively co-shaping UBI, businesses can make it more effective and efficient, by training the reserve workforce in the way needed by the economy, with more cost control. Of course, if businesses prefer to pay tax and let government do it, it’s also OK, can even be more efficient if businesses trust the expertise of the government. It’s analogous to when consumers buy from businesses, it’s always more efficient to have the specialized companies produce everything, but we also observe DIY projects and it’s good that they are not forbidden. If you DIY something, you can gain knowledge and better discern good products from bad ones, so you can make informed purchases. By doing DIY, you can better understand the effort made by companies and why they deserve to be paid. And if some companies misuse their expertise and charge too much from you, you can have DIY as fall-back option. Analogously, it’s a good idea to let business and other tax payers have the possibility to participate in design of political programs like UBI, although they can certainly opt for paying tax and letting government do everything, although I think it’s a good idea that the government consults businesses and other stake holders to make the UBI more aligned with the need of the society.

As much as I know, UBI isn’t a real policy yet, it’s not yet determined how much UBI everyone should get, whether it’s paid out in dollars or vouchers for training programs or other things, whether the amount everyone gets should depend on their personal effort etc. Thus, I used UBI as an abstract, philosophical term capturing the promise of society to support individuals in need, and I personally think this support should also contain incentives for the recipients to improve themselves, and if UBI is realized, it’s also recommendable to have a good coordination with other existing benefits, training programs, philanthropic supports, etc, lest someone get less than others merely because they are covered by less support.

That UBI can generate a stable consumer base for businesses is well known, but coupled with training programs, it can also support a reserve workforce pool. The market only dictates layoff during economic downturn and re-hiring during recovery, but does nothing for the time in between, where part of the workforce, if lacking proper support, may drift off and be lost to mental problems, alcohol/drug problems, or radicalized. So when business starts to rehire, it can be hard for them to find qualified staff. If you take this into account, it can even save cost by maintaining and supporting a reserve workforce during the downturn, because it makes it easier for businesses to find qualified workers, but also suppliers, once the economy recovers. So with reserve workforce I don’t only mean potential salary takers, but also self-employed like Uber drivers, or startup founders who deliver services and products.

cole-wyeth on mishka's Shortform

I’d like to see the x-axis on this plot scaled by a couple OOMs on a task that doesn’t saturate: https://metr.org/assets/images/nov-2024-evaluating-llm-r-and-d/score_at_time_budget.png My hunch (and a timeline crux for me) is that human performance actually scales in a qualitatively different way with time, doesn’t just asymptote like LLM performance. And even the LLM scaling with time that we do see is an artifact of careful scaffolding. I am a little surprised to see good performance up to the 2 hour mark though. That’s longer than I expected.

sharmake-farah on Benito's Shortform Feed

To talk about the education example, while I do think that the education system can have a lot of problems, I'd say a crux here is that easy classes anti-predict learning, and a lot of kid complaints on schooling would probably making kids learn worse, because hardness is correlated to learning:

https://www.oneusefulthing.org/p/post-apocalyptic-education

https://x.com/emollick/status/1756396139623096695

adele-lopez-1 on Are You More Real If You're Really Forgetful?

Well, I'm very forgetful, and I notice that I do happen to be myself so... :p

But yeah, I've bitten this bullet too, in my case, as a way to avoid the Boltzmann brain problem. (Roughly: "you" includes lots of information generated by a lawful universe. Any specific branch has small measure, but if you aggregate over all the places where "you" exist (say your exact brain state, though the real thing that counts might be more or less broad than this), you get more substantial measure from all the simple lawful universes that only needed 10^X coincidences to make you instead of the 10^Y coincidences required for you to be a Boltzmann brain.)

I think that what anthropically "counts" is most likely somewhere between conscious experience (I've woken up as myself after anesthesia), and exact state of brain in local spacetime (I doubt thermal fluctuations or path dependence matter for being "me").

cata on Yonatan Cale's Shortform

I'm not confident but I am avoiding working on these tools because I think that "scaffolding overhang" in this field may well be most of the gap towards superintelligent autonomous agents.

If you imagine a o1-level entity with "perfect scaffolding", i.e. it can get any info on a computer into its context whenever it wants, and it can choose to invoke any computer functionality that a human could invoke, and it can store and retrieve knowledge for itself at will, and its training includes the use of those functionalities, it's not completely clear to me that it wouldn't already be able to do a slow self-improvement takeoff by itself, although the cost might be currently practically prohibitive.

I don't think building that scaffolding is a trivial task at all, though.

philip_b on A very strange probability paradox

At first I disbelieved. I thought A > B. Then I wrote code myself and checked, and got that B > A. I believed this result. Then I thought about it and realized why my reason for A > B was wrong. But I still didn't understand (and now I don't understand either) why the described random process is not equivalent to randomly choosing 2, 4, or 6 every roll. I thought some more and now I have some doubts. My first doubt is whether there exists some kind of standard way of describing random processes and conditioning on them, and whether the problem as stated by notfnofn. Perhaps the problem is just underspecified? Anyway, this is very interesting.

cata on Making a conservative case for alignment

I don't have a bunch of citations but I spend time in multiple rationalist social spaces and it seems to me that I would in fact be excluded from many of them if I stuck to sex-based pronouns, because as stated above there are many trans people in the community, of whom many hold to the consensus progressive norms on this. The EA Forum policy is not unrepresentative of the typical sentiment.

So I don't agree that the statements are misleading.

(I note that my typical habit is to use singular they for visibly NB/trans people, and I am not excluded for that. So it's not precisely a kind of compelled speech.)

algon on How Universal Basic Income Could Help Us Build a Brighter Future

RE: "something ChatGPT might right", sorry for the error. I wrote the comment quickly, as otherwise I wouldn't have written it at all.
Using ChatGPT to improve your writing is fine. I just want you to be aware that there's an aversion to its style here.
Kennaway was quoting what I said, probably so he could make his reply more precise.
I didn't down-vote your post, for what it's worth.
There's a LW norm, which seems to hold less force in recent years, for people to explain why they downvote something. I thought it would've been dispiriting to get negative feedback with no explanation, so I figured I'd explain in place of the people who downvoted you.
I don't understand why businesses would be co-financing UBI instead of some government tax. Nor do I get why it would be desirable or even feasible, given the co-ordination issues.
If companies get to make UBI conditional on people learning certain things, then it's not a UBI. Instead, it's a peculiar sort of training program.
What does economic recovery have to do with UBI?

rogerdearnaley on DeepSeek beats o1-preview on math, ties on coding; will release weights

There had been a number of papers published over the last year on how to do this kind of training, and for roughly a year now there have been rumors that OpenAI were working on it. If converting that into a working version is possible for a Chinese company like DeepSeek, as it appears, then why haven't Anthropic and Google released versions yet? There doesn't seem to be any realistic possibility that DeepSeek actually have more compute or better researchers than both Anthropic and Google.

One possible interpretation would be that this has significant safety implications, and Anthropic and Google are both still working through these before releasing.

Another possibility would be that Anthropic has in fact released, in the sense that their Claude models' recent advances in agentic behavior (while not using inference-time scaling) are distilled from reasoning traces generated by an internal-only model of this type that is using inference-time scaling.

shankar-sivarajan on Making a conservative case for alignment

incorrect/misleading/unclear statements

I disagree that his statements are misleading: the impression someone who believed them true would have is far more accurate than someone who believed them false. Is that not more relevant, and a better measure of honesty, than whether or not they're "incorrect"?