LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Defining alignment research
Richard_Ngo (ricraz) · 2024-08-19T20:42:29.279Z · comments (23)

Information vs Assurance
johnswentworth · 2024-10-20T23:16:25.762Z · comments (7)

Science advances one funeral at a time
Cameron Berg (cameron-berg) · 2024-11-01T23:06:19.381Z · comments (9)

New page: Integrity
Zach Stein-Perlman · 2024-07-10T15:00:41.050Z · comments (3)

[link] Finishing The SB-1047 Documentary In 6 Weeks
Michaël Trazzi (mtrazzi) · 2024-10-28T20:17:47.465Z · comments (5)

AI #73: Openly Evil AI
Zvi · 2024-07-18T14:40:05.770Z · comments (20)

Dragon Agnosticism
jefftk (jkaufman) · 2024-08-01T17:00:06.434Z · comments (60)

Covert Malicious Finetuning
Tony Wang (tw) · 2024-07-02T02:41:51.698Z · comments (4)

Singular learning theory: exercises
Zach Furman (zfurman) · 2024-08-30T20:00:03.785Z · comments (5)

Bigger Livers?
sarahconstantin · 2024-11-08T21:50:09.814Z · comments (10)

[link] Executable philosophy as a failed totalizing meta-worldview
jessicata (jessica.liu.taylor) · 2024-09-04T22:50:18.294Z · comments (40)

Three Notions of "Power"
johnswentworth · 2024-10-30T06:10:08.326Z · comments (43)

[link] Re: Anthropic's suggested SB-1047 amendments
RobertM (T3t) · 2024-07-27T22:32:39.447Z · comments (13)

I'm a bit skeptical of AlphaFold 3
Oleg Trott (oleg-trott) · 2024-06-25T00:04:41.274Z · comments (14)

Research update: Towards a Law of Iterated Expectations for Heuristic Estimators
Eric Neyman (UnexpectedValues) · 2024-10-07T19:29:29.033Z · comments (2)

Solving adversarial attacks in computer vision as a baby version of general AI alignment
Stanislav Fort (stanislavfort) · 2024-08-29T17:17:47.136Z · comments (8)

[link] Self-Help Corner: Loop Detection
adamShimi · 2024-10-02T08:33:23.487Z · comments (6)

[link] Detecting Genetically Engineered Viruses With Metagenomic Sequencing
jefftk (jkaufman) · 2024-06-27T14:01:34.868Z · comments (10)

GPT-o1
Zvi · 2024-09-16T13:40:06.236Z · comments (34)

[Intuitive self-models] 1. Preliminaries
Steven Byrnes (steve2152) · 2024-09-19T13:45:27.976Z · comments (20)

There is a globe in your LLM
jacob_drori (jacobcd52) · 2024-10-08T00:43:40.300Z · comments (4)

Reflections on Less Online
Error · 2024-07-07T03:49:44.534Z · comments (15)

[question] What are the best arguments for/against AIs being "slightly 'nice'"?
Raemon · 2024-09-24T02:00:19.605Z · answers+comments (51)

LLMs Look Increasingly Like General Reasoners
eggsyntax · 2024-11-08T23:47:28.886Z · comments (39)

5 homegrown EA projects, seeking small donors
Austin Chen (austin-chen) · 2024-10-28T23:24:25.745Z · comments (4)

Fluent, Cruxy Predictions
Raemon · 2024-07-10T18:00:06.424Z · comments (14)

A simple case for extreme inner misalignment
Richard_Ngo (ricraz) · 2024-07-13T15:40:37.518Z · comments (41)

Scalable oversight as a quantitative rather than qualitative problem
Buck · 2024-07-06T17:42:41.325Z · comments (11)

Newsom Vetoes SB 1047
Zvi · 2024-10-01T12:20:06.127Z · comments (6)

[link] [Paper] AI Sandbagging: Language Models can Strategically Underperform on Evaluations
Teun van der Weij (teun-van-der-weij) · 2024-06-13T10:04:49.556Z · comments (10)

Self-prediction acts as an emergent regularizer
Cameron Berg (cameron-berg) · 2024-10-23T22:27:03.664Z · comments (4)

Actually, Power Plants May Be an AI Training Bottleneck.
Lao Mein (derpherpize) · 2024-06-20T04:41:33.567Z · comments (13)

[link] What are you getting paid in?
Austin Chen (austin-chen) · 2024-07-17T19:23:04.219Z · comments (14)

[link] What Depression Is Like
Sable · 2024-08-27T17:43:22.549Z · comments (23)

Why you should be using a retinoid
GeneSmith · 2024-08-19T03:07:41.722Z · comments (57)

OpenAI o1, Llama 4, and AlphaZero of LLMs
Vladimir_Nesov · 2024-09-14T21:27:41.241Z · comments (24)

AI #83: The Mask Comes Off
Zvi · 2024-09-26T12:00:08.689Z · comments (19)

Release: Optimal Weave (P1): A Prototype Cohabitive Game
mako yass (MakoYass) · 2024-08-17T14:08:18.947Z · comments (21)

Decomposing the QK circuit with Bilinear Sparse Dictionary Learning
keith_wynroe · 2024-07-02T13:17:16.352Z · comments (7)

Values Are Real Like Harry Potter
johnswentworth · 2024-10-09T23:42:24.724Z · comments (17)

[link] Not every accommodation is a Curb Cut Effect: The Handicapped Parking Effect, the Clapper Effect, and more
Michael Cohn (michael-cohn) · 2024-09-15T05:27:36.691Z · comments (39)

3C's: A Recipe For Mathing Concepts
johnswentworth · 2024-07-03T01:06:11.944Z · comments (5)

Quick look: applications of chaos theory
Elizabeth (pktechgirl) · 2024-08-18T15:00:07.853Z · comments (46)

[Intuitive self-models] 2. Conscious Awareness
Steven Byrnes (steve2152) · 2024-09-25T13:29:02.820Z · comments (48)

How to prevent collusion when using untrusted models to monitor each other
Buck · 2024-09-25T18:58:20.693Z · comments (5)

The case for a negative alignment tax
Cameron Berg (cameron-berg) · 2024-09-18T18:33:18.491Z · comments (20)

Corrigibility = Tool-ness?
johnswentworth · 2024-06-28T01:19:48.883Z · comments (8)

Secular interpretations of core perennialist claims
zhukeepa · 2024-08-25T23:41:02.683Z · comments (32)

[link] Is "superhuman" AI forecasting BS? Some experiments on the "539" bot from the Centre for AI Safety
titotal (lombertini) · 2024-09-18T13:07:40.754Z · comments (3)

Catastrophic sabotage as a major threat model for human-level AI systems
evhub · 2024-10-22T20:57:11.395Z · comments (4)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

daniel-v on The lying p value

I'm here to say, this is not some property specific to p-values, just about the credibility of the communicator.

If make a bunch of errors all the time, especially those that change their conclusions, indeed you can't trust them. Turns out (BW11) that $s c i e n t i s t s_{p u b l i s h e d i n b e t t e r j o u r n a l s}$ are more credible than $s c i e n t i s t s_{p u b l i s h e d i n w o r s e j o u r n a l s}$ , the errors they make tend not to change the conclusions of the test (i.e., the chance of drawing a wrong conclusion from their data ("gross error" in BW11) was much lower than the headline rate), and (admittedly I'm going out on a limb here) it is very possible the errors that change the conclusion of a particular test do not change the overall conclusion about the general theory (e.g., if theory says X, Y, and Z should happen, and you find support for X and Y and marginal-support-now-not-significant-support-anymore for Z, the theory is still pretty intact unless you really care about using p-values in a binary fashion. If theory says X, Y, and Z should happen, and you find support for X and Y and now-not-significant-support-anymore for Z, that's more of an issue. But given how many tests are in a paper, it's also possible theory says X, Y, and Z should happen, and you find support for X and Y and Z, but turns out your conclusion about W reverses, which may or may not really have something to say about your theory).

I don't think it is wise to throw the baby out with the bathwater.

eggsyntax on eggsyntax's Shortform

But I also find my own understanding to be a bit confused and in need of better sources.

Mine too, for sure.

And agreed, Chollet's points are really interesting. As much as I'm sometimes frustrated with him, I think that ARC-AGI and his willingness to (get someone to) stake substantial money on it has done a lot to clarify the discourse around LLM generality, and also makes it harder for people to move the goalposts and then claim they were never moved).

boris-kashirin on The Online Sports Gambling Experiment Has Failed

Thinking about responsible gambling, something like up-front long-term commitment should solve a lot of problems? You have to decide right away and lock up money you going to spend this month and that will separate decision from impulse to spend.

seth-herd on Current Attitudes Toward AI Provide Little Data Relevant to Attitudes Toward AGI

Those outcomes sound quite plausible.

I'm particularly concerned with polarization. Becoming a political football was the death knell for sensible discussion on climate change, and it could be the same for AGI x-risk. Public belief in climate change actually fell while the evidence mounted. My older post AI scares and changing public beliefs [LW · GW] is actually mostly about polarization.

Having the debate become ideologically/politically motivated seems like it wouldn't be good. I'm still really hoping to avoid polarization on AGI x-risk. It does seem like "AI safety", concerns about bias, deepfakes, and harms from interacting with LLMs are already primarily discussed among liberals in the US.

Neither side has started really worrying about job loss, but that would tend to be the liberal side, too, since conservatives are still somewhat more free-market oriented.

While tying concerns about x-risk with calls to slow AI based on mundane harms might seem expedient, I wouldn't take that bargain if it created worse polarization.

I think this is a common attitude among the x-risk worried, especially since it's hard to predict whether a slowdown in the US AGI push would be a net good or bad thing for x-risk.

nathan-helm-burger on What program structures enable efficient induction?

For what it's worth, the human brain (including the cortex) has a fixed modularity. Long range connections are created during fetal development according to genetic rules, and can only be removed, not rerouted or added to.

I believe this is what causes the high degree of functional localization in the cortex.

shankar-sivarajan on gilch's Shortform

Is this a Lisp-to-Python transpiler?

bogdanb on Cryonics is free

You might want to know that I took a look through the site, and was curious, but I just closed the page the moment the “Calculate your contribution” form refused to show me the pricing options unless I gave it an email address.

nathan-helm-burger on eggsyntax's Shortform

I agree with your frustrations, I think his views are somewhat inconsistent and confusing. But I also find my own understanding to be a bit confused and in need of better sources.

I do think the discussion François has in this interview is interesting. He talks about the ways people have tried to apply LLMs to ARC, and I think he makes some good points about the strengths and shortcomings of LLMs on tasks like this.

jjxw on The Online Sports Gambling Experiment Has Failed

Another working job market economics paper out of Stanford attempts to measure the degree to which sports bettors are overly optimistic. Results largely what you'd expect: people think they're break even when they're actually losing by ~7% and a subset of those people have self control problems.

Funnily enough the way I found out about this paper is from being recruited to participate in it through a targeted ad on social media when I took a trip out to Colorado to farm sports book new account sign up bonuses.

shankar-sivarajan on johnswentworth's Shortform

Is this development unexpected enough to worth remarking upon? This is just Conquest's Second Law.