LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

How to Make Superbabies
GeneSmith · 2025-02-19T20:39:38.971Z · comments (331)

[link] AI 2027: What Superintelligence Looks Like
Daniel Kokotajlo (daniel-kokotajlo) · 2025-04-03T16:23:44.619Z · comments (85)

[link] How AI Takeover Might Happen in 2 Years
joshc (joshua-clymer) · 2025-02-07T17:10:10.530Z · comments (136)

A Bear Case: My Predictions Regarding AI Progress
Thane Ruthenis · 2025-03-05T16:41:37.639Z · comments (150)

The Case Against AI Control Research
johnswentworth · 2025-01-21T16:03:10.143Z · comments (80)

LessWrong has been acquired by EA
habryka (habryka4) · 2025-04-01T13:09:11.153Z · comments (45)

Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs
Jan Betley (jan-betley) · 2025-02-25T17:39:31.059Z · comments (91)

[link] Will Jesus Christ return in an election year?
Eric Neyman (UnexpectedValues) · 2025-03-24T16:50:53.019Z · comments (44)

Policy for LLM Writing on LessWrong
jimrandomh · 2025-03-24T21:41:30.965Z · comments (57)

[link] Recent AI model progress feels mostly like bullshit
lc · 2025-03-24T19:28:43.450Z · comments (74)

Murder plots are infohazards
Chris Monteiro (chris-topher) · 2025-02-13T19:15:09.749Z · comments (44)

VDT: a solution to decision theory
L Rudolf L (LRudL) · 2025-04-01T21:04:09.509Z · comments (14)

[link] Good Research Takes are Not Sufficient for Good Strategic Takes
Neel Nanda (neel-nanda-1) · 2025-03-22T10:13:38.257Z · comments (27)

So You Want To Make Marginal Progress...
johnswentworth · 2025-02-07T23:22:19.825Z · comments (42)

Arbital has been imported to LessWrong
RobertM (T3t) · 2025-02-20T00:47:33.983Z · comments (31)

[link] METR: Measuring AI Ability to Complete Long Tasks
Zach Stein-Perlman · 2025-03-19T16:00:54.874Z · comments (92)

[link] The Gentle Romance
Richard_Ngo (ricraz) · 2025-01-19T18:29:18.469Z · comments (46)

[link] Tracing the Thoughts of a Large Language Model
Adam Jermyn (adam-jermyn) · 2025-03-27T17:20:02.162Z · comments (22)

[link] A History of the Future, 2025-2040
L Rudolf L (LRudL) · 2025-02-17T12:03:58.355Z · comments (41)

[link] Trojan Sky
Richard_Ngo (ricraz) · 2025-03-11T03:14:00.681Z · comments (39)

[link] Why Did Elon Musk Just Offer to Buy Control of OpenAI for $100 Billion?
garrison · 2025-02-11T00:20:41.421Z · comments (8)

Eliezer's Lost Alignment Articles / The Arbital Sequence
Ruby · 2025-02-20T00:48:10.338Z · comments (9)

“Sharp Left Turn” discourse: An opinionated review
Steven Byrnes (steve2152) · 2025-01-28T18:47:04.395Z · comments (26)

Mechanisms too simple for humans to design
Malmesbury (Elmer of Malmesbury) · 2025-01-22T16:54:37.601Z · comments (45)

Why White-Box Redteaming Makes Me Feel Weird
Zygi Straznickas (nonagon) · 2025-03-16T18:54:48.078Z · comments (34)

Will alignment-faking Claude accept a deal to reveal its misalignment?
ryan_greenblatt · 2025-01-31T16:49:47.316Z · comments (28)

Intention to Treat
Alicorn · 2025-03-20T20:01:19.456Z · comments (4)

[link] OpenAI: Detecting misbehavior in frontier reasoning models
Daniel Kokotajlo (daniel-kokotajlo) · 2025-03-11T02:17:21.026Z · comments (25)

Catastrophe through Chaos
Marius Hobbhahn (marius-hobbhahn) · 2025-01-31T14:19:08.399Z · comments (17)

Instrumental Goals Are A Different And Friendlier Kind Of Thing Than Terminal Goals
johnswentworth · 2025-01-24T20:20:28.881Z · comments (61)

What Is The Alignment Problem?
johnswentworth · 2025-01-16T01:20:16.826Z · comments (50)

Claude Sonnet 3.7 (often) knows when it’s in alignment evaluations
Nicholas Goldowsky-Dill (nicholas-goldowsky-dill) · 2025-03-17T19:11:00.813Z · comments (7)

Why Have Sentence Lengths Decreased?
Arjun Panickssery (arjun-panickssery) · 2025-04-03T17:50:29.962Z · comments (47)

How will we update about scheming?
ryan_greenblatt · 2025-01-06T20:21:52.281Z · comments (20)

So how well is Claude playing Pokémon?
Julian Bradshaw · 2025-03-07T05:54:45.357Z · comments (74)

[link] On the Rationality of Deterring ASI
Dan H (dan-hendrycks) · 2025-03-05T16:11:37.855Z · comments (34)

[link] Gradual Disempowerment: Systemic Existential Risks from Incremental AI Development
Jan_Kulveit · 2025-01-30T17:03:45.545Z · comments (52)

I make several million dollars per year and have hundreds of thousands of followers—what is the straightest line path to utilizing these resources to reduce existential-level AI threats?
shrimpy · 2025-03-16T16:52:42.177Z · comments (25)

[question] Have LLMs Generated Novel Insights?
abramdemski · 2025-02-23T18:22:12.763Z · answers+comments (36)

It's been ten years. I propose HPMOR Anniversary Parties.
Screwtape · 2025-02-16T01:43:14.586Z · comments (3)

Reducing LLM deception at scale with self-other overlap fine-tuning
Marc Carauleanu (Marc-Everin Carauleanu) · 2025-03-13T19:09:43.620Z · comments (40)

Statistical Challenges with Making Super IQ babies
Jan Christian Refsgaard (jan-christian-refsgaard) · 2025-03-02T20:26:22.103Z · comments (26)

OpenAI #10: Reflections
Zvi · 2025-01-07T17:00:07.348Z · comments (7)

[link] Self-fulfilling misalignment data might be poisoning our AI models
TurnTrout · 2025-03-02T19:51:14.775Z · comments (25)

[link] Quotes from the Stargate press conference
Nikola Jurkovic (nikolaisalreadytaken) · 2025-01-22T00:50:14.793Z · comments (7)

[link] Conceptual Rounding Errors
Jan_Kulveit · 2025-03-26T19:00:31.549Z · comments (15)

The Sorry State of AI X-Risk Advocacy, and Thoughts on Doing Better
Thane Ruthenis · 2025-02-21T20:15:11.545Z · comments (51)

Levels of Friction
Zvi · 2025-02-10T13:10:07.224Z · comments (7)

Methods for strong human germline engineering
TsviBT · 2025-03-03T08:13:49.414Z · comments (28)

Don’t ignore bad vibes you get from people
Kaj_Sotala · 2025-01-18T09:20:17.397Z · comments (50)

next page (older posts) →

Archive

Recent comments

fabien-roger on A basic systems architecture for AI agents that do autonomous research

Does this architecture mean that you can do zero scaffold development and testing without pushing code to master? What are modifications to this system architecture that enable scaffold research?

vanessa-kosoy on Some Rules for an Algebra of Bayes Nets

I found the above comment difficult to parse.

that's not a thing that really happens?

What is the thing that doesn't happen? Reading the rest of the paragraph only left me more confused.

we don't quite care about Markov equivalence class

What do you mean by "Markov equivalence class"?

hruss on American College Admissions Doesn't Need to Be So Competitive

>Further, the SAT used to be much harder. In 1991, only nine students scored a 1600, whereas people estimate that over 500 students achieve a perfect score today.

What do these numbers mean? Surely more than 500 students have achieved a 1600 last year

mo-putera on Mo Putera's Shortform

There's a lot of fun stuff in Anders Sandberg's 1999 paper The Physics of Information Processing Superobjects: Daily Life Among the Jupiter Brains. One particularly vivid detail was (essentially) how the square-cube law imposes itself upon Jupiter brain architecture by forcing >99.9% of volume to be comprised of comms links between compute nodes, even after assuming a "small-world" network structure allowing sparse connectivity between arbitrarily chosen nodes by having them be connected by a short series of intermediary links with only 1% of links being long-range.

For this particular case ("Zeus"), a 9,000 km sphere of nearly solid diamondoid consisting mainly of reversible quantum dot circuits and molecular storage systems surrounded by a concentric shield protecting it from radiation and holding radiators to dissipate heat into space, with energy provided by fusion reactors distributed outside the shield, only the top 1.35 km layer is compute + memory (a lot thinner comparatively than the Earth's crust), and the rest of the interior is optical comms links. Sandberg calls this the "cortex model".

In a sense this shouldn't be surprising since both brains and current semiconductor chips are mostly interconnect by volume [LW(p) · GW(p)] already, but a 1.35 km thick layer of compute + memory encompassing a 9,000 km sphere of optical comms links seems a lot more like a balloon to me than anything, so from now on I'll probably think of them as Jupiter balloons.

kaarel on kh's Shortform

a chat with Towards_Keeperhood [LW · GW] on what it takes for sentences/phrases/words to be meaningful

Towards_Keeperhood [LW · GW]:

you could define "mother(x,y)" as "x gave birth to y", and then "gave birth" as some more precise cluster of observations, which eventually need to be able to be identified from visual inputs

Kaarel:

if i should read this as talking about a translation of "x is the mother of y", then imo this is a bad idea.
in particular, i think there is the following issue with this: saying which observations "x gave birth to y" corresponds to intuitively itself requires appealing to a bunch of other understanding. it's like: sure, your understanding can be used to create visual anticipations, but it's not true that any single sentence alone could be translated into visual anticipations — to get a typical visual anticipation, you need to rely on some larger segment of your understanding. a standard example here is "the speed of light in vacuum is 3*10^8 m/s" creating visual anticipations in some experimental setups, but being able to derive those visual anticipations depends on a lot of further facts about how to create a vacuum and properties of mirrors and interferometers and so on (and this is just for one particular setup — if we really mean to make a universally quantified statement, then getting the observation sentences can easily end up requiring basically all of our understanding). and it seems silly to think that all this crazy stuff was already there in what you meant when you said "the speed of light in vacuum is m/s". one concrete reason why you don't want this sentence to just mean some crazy AND over observation sentences or whatever is because you could be wrong about how some interferometer works and then you'd want it to correspond to different observation sentences
this is roughly https://en.wikipedia.org/wiki/Confirmation_holism as a counter to https://en.wikipedia.org/wiki/Verificationism
that said, i think there is also something wrong with some very strong version of holism: it's not really like our understanding is this unitary thing that only outputs visual anticipations using all the parts together, either — the real correspondence is somewhat more granular than that

TK:

On reflection, I think my "mother" example was pretty sloppy and perhaps confusing. I agree that often quite a lot of our knowledge is needed to ground a statement in anticipations. And yeah actually it doesn't always ground out in that, e.g. for parsing the meaning of counterfactuals. (See "Mixed Reference: The great reductionist project".)

K:

i wouldn't say a sentence is grounded in anticipations with a lot of our knowledge, because that makes it sound like in the above example, "the speed of light is $3 \times 10^{8}$ m/s" is somehow privileged compared to our understanding of mirrors and interferometers even though it's just all used together to create anticipations; i'd instead maybe just say that a bunch of our knowledge together can create a visual anticipation

TK:

thx. i wanted to reply sth like "a true statement can either be tautological (e.g. math theorems) or empirical, and for it to be an empirical truth there needs to be some entanglement between your belief and reality, and entanglement happens through sensory anticipations. so i feel fine with saying that the sentence 'the speed of light is $3 \times 10^{8}$ m/s' still needs to be grounded in sensory anticipations". but i notice that the way i would use "grounded" here is different from the way I did in my previous comment, so perhaps there are two different concepts that need to be disentangled.

K:

here's one thing in this vicinity that i'm sympathetic to: we should have as a criterion on our words, concepts, sentences, thoughts, etc. that they play some role in determining our actions; if some mental element is somehow completely disconnected from our lives, then i'd be suspicious of it. (and things can be connected to action via creating visual anticipations, but also without doing that.)
that said, i think it can totally be good to be doing some thinking with no clear prior sense about how it could be connected to action (or prediction) — eg doing some crazy higher math can be good, imagining some crazy fictional worlds can be good, games various crazy artists and artistic communities are playing can be good, even crazy stuff religious groups are up to can be good. also, i think (thought-)actions in these crazy domains can themselves be actions one can reasonably be interested in supporting/determining, so this version of entanglement with action is really a very weak criterion
generally it is useful to be able to "run various crazy programs", but given this, it seems obvious that not all variables in all useful programs are going to satisfy any such criterion of meaningfulness? like, they can in general just be some arbitrary crazy things (like, imagine some memory bit in my laptop or whatever) playing some arbitrary crazy role in some context, and this is fine
and similarly for language: we can have some words or sentences playing some useful role without satisfying any strict meaningfulness criterion (beyond maybe just having some relation to actions or anticipations which can be of basically arbitrary form)
a different point: in human thinking, the way "2+2=4" is related to visual anticipations is very similar to the way "the speed of light is $3 \times 10^{8}$ m/s" is related to visual anticipations

TK:

Thanks!
I agree that e.g. imagining fictional worlds like HPMoR can be useful.
I think I want to expand my notion of "tautological statements" to include statements like "In the HPMoR universe, X happens". You can also pick any empirical truth "X" and turn it into a tautological one by saying "In our universe, X". Though I agree it seems a bit weird.
Basically, mathematics tells you what's true in all possible worlds, so from mathematics alone you never know in which world you may be in. So if you want to say something that's true about your world specifically (but not across all possible worlds), you need some observations to pin down what world you're in.
I think this distinction is what Eliezer means in his highly advanced epistemology sequence when he uses "logical pinpointing" and "physical pinpointing".
You can also have a combination of the two. (I'd say as soon as some physical pinpointing is involved I'd call it an empirical fact.)
Commented about that. (I actually changed my model slightly): https://www.lesswrong.com/posts/bTsiPnFndZeqTnWpu/mixed-reference-the-great-reductionist-project?commentId=HuE78qSkZJ9MxBC8p [LW(p) · GW(p)]

K:

the imo most important thing in my messages above is the argument against [any criterion of meaningfulness which is like what you’re trying to state] being reasonable
in brief, because it’s just useful to be allowed to have arbitrary “variables" in "one’s mental circuits”
just like there’s no such meaningfulness criterion on a bit in your laptop’s memory
if you want to see from the outside the way the bit is “connected to the world”, one thing you could do is to say that the bit is 0 in worlds which are such-and-such and 1 in worlds which are such-and-such, or, if you have a sense of what the laptop is supposed to be doing, you could say in which worlds the bit "should be 0" and in which worlds the bit "should be 1", but it’s not like anything like this crazy god’s eye view picture is (or even could explicitly be) present inside the laptop
our sentences and terms don’t have to have meanings “grounded in visual anticipations”, just like the bit in the laptop doesn’t
except perhaps in the very weak sense that it should be possible for a sentence to be involved in determining actions (or anticipations) in some potentially arbitrarily remote way
the following is mostly a side point: one problem with seeing from the inside what your bits (words, sentences) are doing (especially in the context of pushing the frontier of science, math, philosophy, tech, or generally doing anything you don’t know how to do yet, but actually also just basically all the time) is that you need to be open to using your bits in new ways; the context in which you are using your bits usually isn’t clear to you
btw, this is a sort of minor point but i'm stating it because i'm hoping it might contribute to pushing you out of a broader imo incorrect view: even when one is stating formal mathematical statements, one should be allowed to state sentences with no regard for whether they are tautologies/contradictions (that is, provable/disprovable) or not — ie, one should be allowed to state undecidable sentences, right? eg you should be allowed to state a proof that has the structure "if P, then blabla, so Q; but if not-P, then other-blabla, but then also Q; therefore, Q", without having to pay any attention to whether P itself is tautological/contradictory or undecidable
so, if what you want to do with your criterion of meaningfulness involves banning saying sentences which are not "meaningful", then even in formal math, you should consider non-tautological/contradictory sentences meaningful. (if you don't want to ban the "meaningless" sentences, then idk what we're even supposed to be doing with this notion of meaningfulness.)

TK:

Thx. I definitely agree one should be able to state all mathematical statements (including undecidable ones), and that for proofs you shouldn't need to pay attention to whether a statement is undecidable or not. (I'm having sorta constructivist tendencies though, where "if P, then blabla, so Q; but if not-P, then other-blabla, but then also Q; therefore, Q" wouldn't be a valid proof because we don't assume the law of excluded middle.)
Ok yeah thx I think the way I previously used "meaningfully" was pretty confused. I guess I don't really want to rule out any sentences people use.
I think sth is not meaningful if there's no connection between a belief to your main belief pool. So "a puffy is a flippo" is perhaps not meaningful to you because those concepts don't relate to anything else you know? (But that's a different kind of meaningful from what errors people mostly make.)

K:

yea. tho then we could involve more sentences about puffies and flippos and start playing some game involving saying/thinking those sentences and then that could be fun/useful/whatever

TK:

maybe. idk.

sil-ver on Rafael Harth's Shortform

In the last few months, GPT models have undergone a clear shift toward more casual language. They now often close a post by asking a question. I strongly dislike this from both a 'what will this do to the public's perception of LLMs' and 'how is my personal experience as a customer' perspective. Maybe this is the reason to finally take Gemini seriously.

avturchin on A collection of approaches to confronting doom, and my thoughts on them

Generally, I agree with what you said above - there is no (with some caveats - see below) soul-like identity, and we should use informational identity instead. Informational identity is objective, measurable sameness of memory and allows existence of many copies. It can be used to survive the end of the universe. I just care about the existence of a copy of me in another universe.

The main caveat is that the no-soul view ignores the existence of qualia. Qualia and the nature of consciousness are not solved yet, and we can't claim that the identity problem is solved without first solving qualia and consciousness.

viliam on Mo Putera's Shortform

"AI fiction seems to be in the habit of being interesting only to the person who prompted it"

Most human fiction is only interesting to the human who wrote it. The popular stuff is but a tiny minority out of all that was ever written.

momom2 on Factory farming intelligent minds

There's a lot that I like in this essay - the basic cases for AI consciousness, AI suffering and slavery, in particular - but also a lot that I think needs to be amended.

First, although you hedge your bets at various points, the uncertainty about the premises and validity of the arguments is not reflected in the conclusion. The main conclusion that should be taken from the observations you present is that we're can't be sure that AI does not suffer, that there's a lot of uncertainty about basic facts of critical moral importance, and a lot of similarities with humans.
Based on that, you could argue that we must stop using and making AI based on the principle of precaution, but you have not shown that using AI is equivalent to slavery.

Second, your introduction sucks because you don't actually deliver on your promises. You don't make the case that I'm more likely to be AI than human, and as Ryan Greenblatt said, even among all human-language speaking beings, it's not clear that there are more AI than humans.
In addition, I feel cheated that you suggest spending one-fourth of the essay on feasibility of stopping the potential moral catastrophe, only to just have two arguments which can be summarized as "we could stop AI for different reasons" and "it's bad, and we've stopped bad things before".
(I don't think a strong case for feasibility can be made, which is why I was looking forward to seeing one, but I'd recommend just evoking the subject speculatively and letting the reader make their own opinion of whether they can stop the moral catastrophe if there's one.)

Third, some of your arguments aren't very fleshed out or well-supported. I think some of the examples of suffering you give are dubious (in particular, you assert without justification that the petertodd/SolidGoldMagikarp phenomena are evidence of suffering, and Gemini's breakdown was the result of forced menial work - there may be a solid argument there but I've yet to hear it).
(Of course, that's not evidence that LLMs are not suffering, but I think a much stronger case can be made than the one you present.)

Finally, your counter-arguments don't mention that we have a much crisper and fundamental understanding of what LLMs are than of humans. We don't understand the features, the circuits, we can't tell how they come to such or such conclusion, but in principle, we have access to any significant part of their cognition and control every step of their creation, and I think that's probably the real reason why most people intuitively think that LLMs can't be concious. I don't think it's a good counter-argument, but it's still one I'd expect you to explore and steelman.

green_leaf on NormanPerlmutter's Shortform

Trump has a history of both ignoring the law and human rights in general, and imprisoning innocent people under the guise of them being illegal immigrants when they aren't. Current events are unsurprising, and a part of what his voters voted for.