LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

The Worst Form Of Government (Except For Everything Else We've Tried)
johnswentworth · 2024-03-17T18:11:38.374Z · comments (47)

Limitations on Formal Verification for AI Safety
Andrew Dickson · 2024-08-19T23:03:52.706Z · comments (60)

An Extremely Opinionated Annotated List of My Favourite Mechanistic Interpretability Papers v2
Neel Nanda (neel-nanda-1) · 2024-07-07T17:39:35.064Z · comments (16)

[link] Simple probes can catch sleeper agents
Monte M (montemac) · 2024-04-23T21:10:47.784Z · comments (21)

[link] "AI achieves silver-medal standard solving International Mathematical Olympiad problems"
gjm · 2024-07-25T15:58:57.638Z · comments (38)

Processor clock speeds are not how fast AIs think
Ege Erdil (ege-erdil) · 2024-01-29T14:39:38.050Z · comments (55)

On saying "Thank you" instead of "I'm Sorry"
Michael Cohn (michael-cohn) · 2024-07-08T03:13:50.663Z · comments (16)

A Dozen Ways to Get More Dakka
Davidmanheim · 2024-04-08T04:45:19.427Z · comments (11)

Why I don't believe in the placebo effect
transhumanist_atom_understander · 2024-06-10T02:37:07.776Z · comments (22)

The case for training frontier AIs on Sumerian-only corpus
Alexandre Variengien (alexandre-variengien) · 2024-01-15T16:40:22.011Z · comments (15)

Updatelessness doesn't solve most problems
Martín Soto (martinsq) · 2024-02-08T17:30:11.266Z · comments (44)

Notice When People Are Directionally Correct
Chris_Leong · 2024-01-14T14:12:37.090Z · comments (8)

[link] "Can AI Scaling Continue Through 2030?", Epoch AI (yes)
gwern · 2024-08-24T01:40:32.929Z · comments (4)

My simple AGI investment & insurance strategy
lc · 2024-03-31T02:51:53.479Z · comments (27)

[question] Which things were you surprised to learn are not metaphors?
Eric Neyman (UnexpectedValues) · 2024-11-21T18:56:18.025Z · answers+comments (79)

Near-mode thinking on AI
Olli Järviniemi (jarviniemi) · 2024-08-04T20:47:28.085Z · comments (8)

How I started believing religion might actually matter for rationality and moral philosophy
zhukeepa · 2024-08-23T17:40:47.341Z · comments (41)

Circuits in Superposition: Compressing many small neural networks into one
Lucius Bushnaq (Lblack) · 2024-10-14T13:06:14.596Z · comments (8)

Pantheon Interface
NicholasKees (nick_kees) · 2024-07-08T19:03:51.681Z · comments (22)

An even deeper atheism
Joe Carlsmith (joekc) · 2024-01-11T17:28:31.843Z · comments (47)

A Shutdown Problem Proposal
johnswentworth · 2024-01-21T18:12:48.664Z · comments (61)

[link] OpenAI's CBRN tests seem unclear
LucaRighetti (Error404Dinosaur) · 2024-11-21T17:28:30.290Z · comments (6)

Community Notes by X
NicholasKees (nick_kees) · 2024-03-18T17:13:33.195Z · comments (15)

[link] Steering Llama-2 with contrastive activation additions
Nina Panickssery (NinaR) · 2024-01-02T00:47:04.621Z · comments (29)

Things I've Grieved
Raemon · 2024-02-18T19:32:47.169Z · comments (6)

"The Solomonoff Prior is Malign" is a special case of a simpler argument
David Matolcsi (matolcsid) · 2024-11-17T21:32:34.711Z · comments (44)

BIG-Bench Canary Contamination in GPT-4
Jozdien · 2024-10-22T15:40:48.166Z · comments (13)

Parasites (not a metaphor)
lemonhope (lcmgcd) · 2024-08-08T20:07:13.593Z · comments (17)

Why Don't We Just... Shoggoth+Face+Paraphraser?
Daniel Kokotajlo (daniel-kokotajlo) · 2024-11-19T20:53:52.084Z · comments (51)

[question] What do coherence arguments actually prove about agentic behavior?
sunwillrise (andrei-alexandru-parfeni) · 2024-06-01T09:37:28.451Z · answers+comments (35)

Do you believe in hundred dollar bills lying on the ground? Consider humming
Elizabeth (pktechgirl) · 2024-05-16T00:00:05.257Z · comments (22)

[link] Investigating the Chart of the Century: Why is food so expensive?
Maxwell Tabarrok (maxwell-tabarrok) · 2024-08-16T13:21:23.596Z · comments (26)

Why I take short timelines seriously
NicholasKees (nick_kees) · 2024-01-28T22:27:21.098Z · comments (29)

Evidence of Learned Look-Ahead in a Chess-Playing Neural Network
Erik Jenner (ejenner) · 2024-06-04T15:50:47.475Z · comments (14)

Natural Latents: The Math
johnswentworth · 2023-12-27T19:03:01.923Z · comments (37)

What o3 Becomes by 2028
Vladimir_Nesov · 2024-12-22T12:37:20.929Z · comments (12)

A bird's eye view of ARC's research
Jacob_Hilton · 2024-10-23T15:50:06.123Z · comments (12)

Awakening
lsusr · 2024-05-30T07:03:00.821Z · comments (79)

[link] The Dangers of Mirrored Life
Niko_McCarty (niko-2) · 2024-12-12T20:58:32.750Z · comments (7)

RTFB: On the New Proposed CAIP AI Bill
Zvi · 2024-04-10T18:30:08.410Z · comments (14)

AI catastrophes and rogue deployments
Buck · 2024-06-03T17:04:51.206Z · comments (16)

Passages I Highlighted in The Letters of J.R.R.Tolkien
Ivan Vendrov (ivan-vendrov) · 2024-11-25T01:47:59.071Z · comments (10)

The Standard Analogy
Zack_M_Davis · 2024-06-03T17:15:42.327Z · comments (28)

Efficient Dictionary Learning with Switch Sparse Autoencoders
Anish Mudide (anish-mudide) · 2024-07-22T18:45:53.502Z · comments (19)

[link] Miles Brundage resigned from OpenAI, and his AGI readiness team was disbanded
garrison · 2024-10-23T23:40:57.180Z · comments (1)

[question] Which skincare products are evidence-based?
Vanessa Kosoy (vanessa-kosoy) · 2024-05-02T15:22:12.597Z · answers+comments (48)

The Dream Machine
sarahconstantin · 2024-12-05T00:00:05.796Z · comments (6)

A List of 45+ Mech Interp Project Ideas from Apollo Research’s Interpretability Team
Lee Sharkey (Lee_Sharkey) · 2024-07-18T14:15:50.248Z · comments (18)

AI Alignment Metastrategy
Vanessa Kosoy (vanessa-kosoy) · 2023-12-31T12:06:11.433Z · comments (13)

[link] My Number 1 Epistemology Book Recommendation: Inventing Temperature
adamShimi · 2024-09-08T14:30:40.456Z · comments (18)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

jeremy-gillen on Evolution provides no evidence for the sharp left turn

I'm curious whether the recent trend toward bi-level optimization via chain-of-thought was any update for you? I would have thought this would have updated people (partially?) back toward actually-evolution-was-a-decent-analogy.

There's this paragraph, which seems right-ish to me:

In order to experience a sharp left turn that arose due to the same mechanistic reasons as the sharp left turn of human evolution, an AI developer would have to:
Deliberately create a (very obvious^[2] [LW(p) · GW(p)]) inner optimizer, whose inner loss function includes no mention of human values / objectives.^[3] [LW(p) · GW(p)]
Grant that inner optimizer ~billions of times greater optimization power than the outer optimizer.^[4] [LW(p) · GW(p)]
Let the inner optimizer run freely without any supervision, limits or interventions from the outer optimizer.^[5] [LW(p) · GW(p)]

Extremely long chains-of-thought on hard problems is pretty much meeting these conditions, right?

anthonyc on Human, All Too Human - Superintelligence requires learning things we can’t teach

This is all true, but I'm not sure the claimed implications are so certain. The problem is, different minds can gain different levels of insight out of the same data and tools.

First, we should assume humanity has enough data to enable the best human minds to reach the highest levels of every capability available to humans very very little real-world feedback. It's not ASI in the full sense, but there has never been a human mind that contained all such abilities at once, let alone with an AI's other default advantages.

Second, it seems extremely unlikely to me that the available data does not include patterns no human has ever found and understood. All collected data ha[s] yet to be completely correlated and put together in all possible relationships. I don't have a strong sense of the limits of what should be possible with current data. At minimum I expect an ASI to have better pure and applied math tools to apply to any task, and require less data than we do for any given purpose.

Third, with proper tool support, I'm not sure how much physical experimentation and feedback can be substituted with high-quality simulation using software based on known physics, chemistry, and biology. At minimum, this should enable answering a lot of questions that current humanity knows how to answer by formulaic investigation but has never specifically asked or bothered writing down an answer to.

To me this indicates that at the limit of enough compute with better training methods, AI should be able to push at least somewhat beyond the limits of what humans have ever concluded from available data, in every field, before needing to obtain any additional, new data.

sharmake-farah on A shot at the diamond-alignment problem

Randomly read this comment and I really enjoyed it, Turn it into a post? (I understand how annoying structuring complex thoughts coherently can be but maybe do a dialogue or something? I liked this.)

Maybe I should try a dialogue with someone else on this, because I don't think any of my points are very extendible to a full post without someone helping me.

Do you have any specific reason why you're going into QMech when talking about brain-like AGI stuff?

To be frank, this was mostly about clarifying the philosophy around computationalism/human values in general, but I didn't go that deep into QMech for brain-like AGI and don't expect it to be immediately useful for my pursuits, so the only role for QMech here is in clarifying some confusions people have, and QMech wasn't even that necessary to make my points.

When we get into acausality and evertt branches I think we're going a bit off-track. I can think computational intractability and observer bias is something interesting to bring up but I always find it never leads anywhere. Quantum Mechanics is fundamentally observer invariant and so positing something like MWI is a philosophical stance (that is supported by occam's razor) but it is still observer dependent, what if there are no observers?

Okay, the thing I think you are pointing to is that the same outcomes/rules can be generated out of ontologically distinct interpretations, and for our purposes, the observer is basically anything that interacts with anything, whether it's a human or particle, and thus saying there are no observers corresponds to saying that there is nothing in the universe, including the forces, and in particular dark energy is exactly 0.

The answer is that it would be a very different universe than our universe is today.

richard_kennaway on Terminal goal vs Intelligence

Leaving aside the conceptualisation of "terminal goals", the agent as described should start up the paperclip factory early enough to produce paperclips when the time comes. Until then it makes cups. But the agent as described does not have a "terminal" goal of cups now and a "terminal" goal of paperclips in future. It has been given a production schedule to carry out. If the agent is a general-purpose factory that can produce a whole range of things, the only "terminal" goal to design it to have is to follow orders. It should make whatever it is told to, and turn itself off when told to.

Unless, of course, people go, "At last, we've created the Sorceror's Apprentice machine, as warned of in Goethe's cautionary tale, 'The Sorceror's Apprentice'!"

So if I understand your concept correctly a super intelligent agent will combine all future terminal goals to a single unchanging goal.

A superintelligent agent will do what it damn well likes, it's superintelligent. :)

anthony-digiovanni on Anthony DiGiovanni's Shortform

Linkpost: Why Evidential Cooperation in Large Worlds might not be action-guiding

A while back I wrote up why I was skeptical of ECL. I think this basically holds up, with the disclaimers at the top of the post. But I don't consider it that important compared to other things relevant to LW that people could be thinking about, so I decided to put it on my blog instead.

dagon on Terminal goal vs Intelligence

Humans face a version of this all the time - different contradictory wants with different timescales and impacts. We don't have and certainly can't access a legible utility function, and it's unknown if any intelligent agent can (none of the early examples we have today can).

So the question as asked is either trivial (it'll depend on the willpower and rationality of the agent whether they optimize for the future or the present), or impossible (goals don't work that way).

chipmonk on Orienting to 3 year AGI timelines

Why SPY over QQQ?

carl-feynman on What are the main arguments against AGI?

One argument against is that I think it’s coming soon, and I have a 40 year history of frothing technological enthusiasm, often predicting things will arrive decades before they actually do. 😀

linda-linsefors on Dress Up For Secular Solstice

The aesthetics have even been considered carefully, although oddly this has not extended to dress (as far as I have seen).

I remember there being some dress instructions/suggestions for last years Bay solstice. I think we where told to dress in black, blue and gold.

carl-feynman on Shortform

These criticisms are often made of “market dominant minorities”, to use a sociologist’s term for what American Jews and Indian-Americans have in common. Here’s a good short article on the topic: https://scholarship.law.duke.edu/cgi/viewcontent.cgi?article=5582&context=faculty_scholarship