LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] "Can AI Scaling Continue Through 2030?", Epoch AI (yes)
gwern · 2024-08-24T01:40:32.929Z · comments (4)

How I started believing religion might actually matter for rationality and moral philosophy
zhukeepa · 2024-08-23T17:40:47.341Z · comments (41)

Circuits in Superposition: Compressing many small neural networks into one
Lucius Bushnaq (Lblack) · 2024-10-14T13:06:14.596Z · comments (7)

Parasites (not a metaphor)
lukehmiles (lcmgcd) · 2024-08-08T20:07:13.593Z · comments (17)

[link] Investigating the Chart of the Century: Why is food so expensive?
Maxwell Tabarrok (maxwell-tabarrok) · 2024-08-16T13:21:23.596Z · comments (26)

[link] Miles Brundage resigned from OpenAI, and his AGI readiness team was disbanded
garrison · 2024-10-23T23:40:57.180Z · comments (1)

[link] My Number 1 Epistemology Book Recommendation: Inventing Temperature
adamShimi · 2024-09-08T14:30:40.456Z · comments (18)

BIG-Bench Canary Contamination in GPT-4
Jozdien · 2024-10-22T15:40:48.166Z · comments (12)

A bird's eye view of ARC's research
Jacob_Hilton · 2024-10-23T15:50:06.123Z · comments (12)

Why I funded PIBBSS
Ryan Kidd (ryankidd44) · 2024-09-15T19:56:33.018Z · comments (20)

Ten arguments that AI is an existential risk
KatjaGrace · 2024-08-13T17:00:03.397Z · comments (41)

[link] A primer on the current state of longevity research
Abhishaike Mahajan (abhishaike-mahajan) · 2024-08-22T17:14:57.990Z · comments (6)

Please stop using mediocre AI art in your posts
Raemon · 2024-08-25T00:13:52.890Z · comments (24)

[link] Please support this blog (with money)
Elizabeth (pktechgirl) · 2024-08-17T15:30:05.641Z · comments (2)

[link] Perplexity wins my AI race
Elizabeth (pktechgirl) · 2024-08-24T19:20:10.859Z · comments (12)

Danger, AI Scientist, Danger
Zvi · 2024-08-15T22:40:06.715Z · comments (9)

Backdoors as an analogy for deceptive alignment
Jacob_Hilton · 2024-09-06T15:30:06.172Z · comments (2)

LLM Applications I Want To See
sarahconstantin · 2024-08-19T21:10:03.101Z · comments (5)

Refactoring cryonics as structural brain preservation
Andy_McKenzie · 2024-09-11T18:36:30.285Z · comments (14)

LLMs can learn about themselves by introspection
Felix J Binder (fjb) · 2024-10-18T16:12:51.231Z · comments (38)

What happens if you present 500 people with an argument that AI is risky?
KatjaGrace · 2024-09-04T16:40:03.562Z · comments (7)

Behavioral red-teaming is unlikely to produce clear, strong evidence that models aren't scheming
Buck · 2024-10-10T13:36:53.810Z · comments (4)

[link] Advice for journalists
Nathan Young · 2024-10-07T16:46:40.929Z · comments (53)

Why comparative advantage does not help horses
Sherrinford · 2024-09-30T22:27:57.450Z · comments (10)

I turned decision theory problems into memes about trolleys
Tapatakt · 2024-10-30T20:13:29.589Z · comments (19)

[link] the Giga Press was a mistake
bhauth · 2024-08-21T04:51:24.150Z · comments (26)

[link] Sabotage Evaluations for Frontier Models
David Duvenaud (david-duvenaud) · 2024-10-18T22:33:14.320Z · comments (11)

It's time for a self-reproducing machine
Carl Feynman (carl-feynman) · 2024-08-07T21:52:22.819Z · comments (68)

2024 Petrov Day Retrospective
Ben Pace (Benito) · 2024-09-28T21:30:14.952Z · comments (25)

SB 1047: Final Takes and Also AB 3211
Zvi · 2024-08-27T22:10:07.647Z · comments (11)

You can, in fact, bamboozle an unaligned AI into sparing your life
David Matolcsi (matolcsid) · 2024-09-29T16:59:43.942Z · comments (171)

[question] Am I confused about the "malign universal prior" argument?
nostalgebraist · 2024-08-27T23:17:22.779Z · answers+comments (33)

Information vs Assurance
johnswentworth · 2024-10-20T23:16:25.762Z · comments (7)

[link] Finishing The SB-1047 Documentary In 6 Weeks
Michaël Trazzi (mtrazzi) · 2024-10-28T20:17:47.465Z · comments (5)

The case for unlearning that removes information from LLM weights
Fabien Roger (Fabien) · 2024-10-14T14:08:04.775Z · comments (14)

Defining alignment research
Richard_Ngo (ricraz) · 2024-08-19T20:42:29.279Z · comments (23)

Singular learning theory: exercises
Zach Furman (zfurman) · 2024-08-30T20:00:03.785Z · comments (5)

[link] Executable philosophy as a failed totalizing meta-worldview
jessicata (jessica.liu.taylor) · 2024-09-04T22:50:18.294Z · comments (40)

Research update: Towards a Law of Iterated Expectations for Heuristic Estimators
Eric Neyman (UnexpectedValues) · 2024-10-07T19:29:29.033Z · comments (2)

Solving adversarial attacks in computer vision as a baby version of general AI alignment
Stanislav Fort (stanislavfort) · 2024-08-29T17:17:47.136Z · comments (8)

[link] Self-Help Corner: Loop Detection
adamShimi · 2024-10-02T08:33:23.487Z · comments (6)

There is a globe in your LLM
jacob_drori (jacobcd52) · 2024-10-08T00:43:40.300Z · comments (4)

GPT-o1
Zvi · 2024-09-16T13:40:06.236Z · comments (34)

Three Notions of "Power"
johnswentworth · 2024-10-30T06:10:08.326Z · comments (43)

[question] What are the best arguments for/against AIs being "slightly 'nice'"?
Raemon · 2024-09-24T02:00:19.605Z · answers+comments (51)

5 homegrown EA projects, seeking small donors
Austin Chen (austin-chen) · 2024-10-28T23:24:25.745Z · comments (0)

Newsom Vetoes SB 1047
Zvi · 2024-10-01T12:20:06.127Z · comments (6)

[Intuitive self-models] 1. Preliminaries
Steven Byrnes (steve2152) · 2024-09-19T13:45:27.976Z · comments (20)

Science advances one funeral at a time
Cameron Berg (cameron-berg) · 2024-11-01T23:06:19.381Z · comments (9)

Self-prediction acts as an emergent regularizer
Cameron Berg (cameron-berg) · 2024-10-23T22:27:03.664Z · comments (4)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

remmelt-ellen on If we had known the atmosphere would ignite

Just found your insightful comment. I've been thinking about this for three years. Here are some thoughts expanding on your ideas:

my idea is more about whether alignment could require that the AGI is able to predict its own results and effects on the world (or the results and effects of other AGIs like it, as well as humans)...

In other words, alignment requires sufficient control. Specifically, it requires AGI to have a control system with enough capacity to detect, model, simulate, evaluate, and correct outside effects propagated by the AGI's own components.

... and that proved generally impossible such that even an aligned AGI can only exist in an unstable equilibrium state in which there exist situations in which it will become unrecoverably misaligned, and we just don't know which.

For example, what if AGI is in some kind of convergence basin [LW · GW] where the changing situations/conditions tend to converge outside the ranges humans can survive under?

so we can assume that they will have to be somehow interpreted by the AGI itself who is supposed to hold them

There's a problem you are pointing of somehow mapping the various preferences expressed over time by diverse humans from within their (perceived) contexts – onto reference values. This involves making (irreconcilable) normative assumptions of how to map the dimensionality of the raw expressions of preferences onto internal reference values. Basically, you're dealing with NP-complex combinatorics such as encountered with the knapsack problem.

Further, it raises the question of how to make comparisons across all the possible concrete outside effects the machinery could have against the internal reference values, such to identify misalignments/errors to correct. Ie. just having internalised abstract values is not enough – there would have to be some robust implementation process that translates the values into concrete effects.

dusandnesic on Live Machinery: Interface Design Workshop for AI Safety @ EA Hotel

I'm very sad I cannot attend at that time, but I am hyped about this and believe it to be valuable, so I am writing this endorsement as a signal to others. I've also recommended this to some of my friends, but alas UK visa is hard to get on such short notice. When you run it in Serbia, we'll have more folks from the eastern bloc represented ;)

remmelt-ellen on If we had known the atmosphere would ignite

No actually, assuming the machinery has a hard substrate and is self-maintaining is enough.

remmelt-ellen on If we had known the atmosphere would ignite

we could create aligned ASI by simulating the most intelligent and moral people

This is not an existence proof, because it does not take into account the difference in physical substrates.

Artificial General Intelligence would be artificial, by definition. In fact, what allows for the standardisation of hardware components is the fact that the (silicon) substrate is hard under human living temperatures and pressures. That allows for configurations to stay compartmentalised and stable.

Human “wetware” has a very different substrate. It’s a soup of bouncing organic molecules constantly reacting under living temperatures and pressures

Here’s why the substrate distinction matters [LW · GW].

brambleboy on Matt Goldenberg's Short Form Feed

While the broader message might be good, the study the video is about didn't replicate.

arepo on What TMS is like

Did the ringing go away over time, or was it permanent?

towards_keeperhood on Could orcas be (trained to be) smarter than humans? 

In the wikipedia list, the estimated number of neurons in the neocortex of a blue whale is 5 billion (compared to 43 billion in orcas), even though blue whales are much larger. (Unfortunately the blue whale estimate is just an estimate and not grounded in optical or isotropic fractionation measurements.)

(EDIT: Hm interesting, the linked reddit post mentions 15billion for blue whales. Not sure what is correct.)

gunnar_zarncke on Could orcas be (trained to be) smarter than humans? 

As I commented [LW(p) · GW(p)] on Are big brains for processing sensory input? [LW · GW] I predict that the brain regions of a whale or Orca responsible for spatiotemporal learning and memory are a big part of their encephalization.

tslarm on Update on the Mysterious Trump Buyers on Polymarket

Can't this only be judged in retrospect, and over a decent sample size? If all the markets did was reflect the public expert consensus, they wouldn't be very useful; the possibility that they're doing significantly better is still open.

(I'm assuming that by "every other prediction source" you mean everything other than prediction/betting markets, because it sounds like Polymarket is no longer out of line with the other markets. Betfair is the one I keep an eye on, and that's at 60/40 too.)

aysja on johnswentworth's Shortform

I think I probably agree, although I feel somewhat wary about it. My main hesitations are:

The lack of epistemic modifiers seems off to me, relative to the strength of the arguments they’re making. Such that while I agree with many claims, my imagined reader who is coming into this with zero context is like “why should I believe this?” E.g., “Without intervention, humanity will be summarily outcompeted and relegated to irrelevancy,” which like, yes, but also—on what grounds should I necessarily conclude this? They gave some argument along the lines of “intelligence is powerful,” and that seems probably true, but imo not enough to justify the claim that it will certainly lead to our irrelevancy. All of this would be fixed (according to me) if it were framed more as like “here are some reasons you might be pretty worried,” of which there are plenty, or "here's what I think," rather than “here is what will definitely happen if we continue on this path,” which feels less certain/obvious to me.
Along the same lines, I think it’s pretty hard to tell whether this piece is in good faith or not. E.g., in the intro Connor writes “The default path we are on now is one of ruthless, sociopathic corporations racing toward building the most intelligent, powerful AIs as fast as possible to compete with one another and vie for monopolization and control of both the market and geopolitics.” Which, again, I don’t necessarily disagree with, but my imagined reader with zero context is like “what, really? sociopaths? control over geopolitics?” I.e., I’m expecting readers to question the integrity of the piece, and to be more unsure of how to update on it (e.g. "how do I know this whole thing isn't just a strawman?" etc.).
There are many places where they kind of just state things without justifying them much. I think in the best case this might cause readers to think through whether such claims make sense (either on their own, or by reading the hyperlinked stuff—both of which put quite a lot of cognitive load on them), and in the worst case just causes readers to either bounce or kind of blindly swallow what they’re saying. E.g., “Black-Box Evaluations can only catch all relevant safety issues insofar as we have either an exhaustive list of all possible failure modes, or a mechanistic model of how concrete capabilities lead to safety risks.” They say this without argument and then move on. And although I agree with them (having spent a lot of time thinking this through myself), it’s really not obvious at first blush. Why do you need an exhaustive list? One might imagine, for instance, that a small number of tests would generalize well. And do you need mechanistic models? Sometimes medicines work safely without that, etc., etc. I haven’t read the entire Compendium closely, but my sense is that this is not an isolated incident. And I don't think this is a fatal flaw or anything—they're moving through a ton of material really fast and it's hard to give a thorough account of all claims—but it does make me more hesitant to use it as the default "here's what's happening" document.

All of that said, I do broadly agree with the set of arguments, and I think it’s a really cool activity for people to write up what they believe. I’m glad they did it. But I’m not sure how comfortable I feel about sending it to people who haven’t thought much about AI.