LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

(The) Lightcone is nothing without its people: LW + Lighthaven's big fundraiser
habryka (habryka4) · 2024-11-30T02:55:16.077Z · comments (212)

OpenAI Email Archives (from Musk v. Altman and OpenAI blog)
habryka (habryka4) · 2024-11-16T06:38:03.937Z · comments (80)

Alignment Faking in Large Language Models
ryan_greenblatt · 2024-12-18T17:19:06.665Z · comments (53)

[link] Biological risk from the mirror world
jasoncrawford · 2024-12-12T19:07:06.305Z · comments (29)

The Online Sports Gambling Experiment Has Failed
Zvi · 2024-11-11T14:30:04.371Z · comments (28)

You are not too "irrational" to know your preferences.
DaystarEld · 2024-11-26T15:01:42.996Z · comments (50)

Ayn Rand’s model of “living money”; and an upside of burnout
AnnaSalamon · 2024-11-16T02:59:07.368Z · comments (58)

What Goes Without Saying
sarahconstantin · 2024-12-20T18:00:06.363Z · comments (10)

Frontier Models are Capable of In-context Scheming
Marius Hobbhahn (marius-hobbhahn) · 2024-12-05T22:11:17.320Z · comments (24)

Making a conservative case for alignment
Cameron Berg (cameron-berg) · 2024-11-15T18:55:40.864Z · comments (68)

[link] Understanding Shapley Values with Venn Diagrams
Carson L · 2024-12-06T21:56:43.960Z · comments (32)

Communications in Hard Mode (My new job at MIRI)
tanagrabeast · 2024-12-13T20:13:44.825Z · comments (24)

Orienting to 3 year AGI timelines
Nikola Jurkovic (nikolaisalreadytaken) · 2024-12-22T01:15:11.401Z · comments (26)

o1 is a bad idea
abramdemski · 2024-11-11T21:20:24.892Z · comments (38)

Neutrality
sarahconstantin · 2024-11-13T23:10:05.469Z · comments (27)

[link] Gradient Routing: Masking Gradients to Localize Computation in Neural Networks
cloud · 2024-12-06T22:19:26.717Z · comments (12)

[link] When Is Insurance Worth It?
kqr · 2024-12-19T19:07:32.573Z · comments (54)

"It's a 10% chance which I did 10 times, so it should be 100%"
egor.timatkov · 2024-11-18T01:14:27.738Z · comments (57)

o3
Zach Stein-Perlman · 2024-12-20T18:30:29.448Z · comments (145)

[link] o1: A Technical Primer
Jesse Hoogland (jhoogland) · 2024-12-09T19:09:12.413Z · comments (17)

Repeal the Jones Act of 1920
Zvi · 2024-11-27T15:00:06.801Z · comments (23)

“Alignment Faking” frame is somewhat fake
Jan_Kulveit · 2024-12-20T09:51:04.664Z · comments (13)

Subskills of "Listening to Wisdom"
Raemon · 2024-12-09T03:01:18.706Z · comments (16)

[link] China Hawks are Manufacturing an AI Arms Race
garrison · 2024-11-20T18:17:51.958Z · comments (42)

[question] Which things were you surprised to learn are not metaphors?
Eric Neyman (UnexpectedValues) · 2024-11-21T18:56:18.025Z · answers+comments (79)

"The Solomonoff Prior is Malign" is a special case of a simpler argument
David Matolcsi (matolcsid) · 2024-11-17T21:32:34.711Z · comments (44)

[link] OpenAI's CBRN tests seem unclear
LucaRighetti (Error404Dinosaur) · 2024-11-21T17:28:30.290Z · comments (6)

Why Don't We Just... Shoggoth+Face+Paraphraser?
Daniel Kokotajlo (daniel-kokotajlo) · 2024-11-19T20:53:52.084Z · comments (51)

What o3 Becomes by 2028
Vladimir_Nesov · 2024-12-22T12:37:20.929Z · comments (12)

[link] The Dangers of Mirrored Life
Niko_McCarty (niko-2) · 2024-12-12T20:58:32.750Z · comments (7)

Passages I Highlighted in The Letters of J.R.R.Tolkien
Ivan Vendrov (ivan-vendrov) · 2024-11-25T01:47:59.071Z · comments (10)

The Dream Machine
sarahconstantin · 2024-12-05T00:00:05.796Z · comments (6)

The o1 System Card Is Not About o1
Zvi · 2024-12-13T20:30:08.048Z · comments (5)

You should consider applying to PhDs (soon!)
bilalchughtai (beelal) · 2024-11-29T20:33:12.462Z · comments (19)

DeepSeek beats o1-preview on math, ties on coding; will release weights
Zach Stein-Perlman · 2024-11-20T23:50:26.597Z · comments (23)

Hire (or become) a Thinking Assistant / Body Double
Raemon · 2024-12-23T03:58:42.061Z · comments (27)

Sorry for the downtime, looks like we got DDosd
habryka (habryka4) · 2024-12-02T04:14:30.209Z · comments (13)

The Big Nonprofits Post
Zvi · 2024-11-29T16:10:06.938Z · comments (10)

[link] Announcing turntrout.com, my new digital home
TurnTrout · 2024-11-17T17:42:08.164Z · comments (24)

Ablations for “Frontier Models are Capable of In-context Scheming”
AlexMeinke (Paulawurm) · 2024-12-17T23:58:19.222Z · comments (1)

AIs Will Increasingly Attempt Shenanigans
Zvi · 2024-12-16T15:20:05.652Z · comments (2)

Hierarchical Agency: A Missing Piece in AI Alignment
Jan_Kulveit · 2024-11-27T05:49:04.241Z · comments (20)

A shortcoming of concrete demonstrations as AGI risk advocacy
Steven Byrnes (steve2152) · 2024-12-11T16:48:41.602Z · comments (27)

Takes on "Alignment Faking in Large Language Models"
Joe Carlsmith (joekc) · 2024-12-18T18:22:34.059Z · comments (8)

[link] How to replicate and extend our alignment faking demo
Fabien Roger (Fabien) · 2024-12-19T21:44:13.059Z · comments (1)

Bigger Livers?
sarahconstantin · 2024-11-08T21:50:09.814Z · comments (13)

MIRI’s 2024 End-of-Year Update
Rob Bensinger (RobbBB) · 2024-12-03T04:33:47.499Z · comments (2)

[link] Seven lessons I didn't learn from election day
Eric Neyman (UnexpectedValues) · 2024-11-14T18:39:07.053Z · comments (33)

The "Think It Faster" Exercise
Raemon · 2024-12-11T19:14:10.427Z · comments (13)

The nihilism of NeurIPS
charlieoneill (kingchucky211) · 2024-12-20T23:58:11.858Z · comments (7)

next page (older posts) →

Archive

Recent comments

avturchin on The Economics & Practicality of Starting Mars Colonization

The price of Mars colonization is equal to the price of first full self-replicating nanorobot. Anything before it is waste of resources. And such nanobot will likely be created by advance AI.

hmys on Vegans need to eat just enough Meat - emperically evaluate the minimum ammount of meat that maximizes utility

I don't really understand. Why wouldn't you just test to see if you are deficient in things?

I did that, and I wasn't deficient in anything.

I've also (somewhat involuntarily) done the thing you suggest, and I unsurprisingly didn't notice any difference. If anything, I feel a lot better on a vegan diet.

If you want to do the thing hes suggesting here, I'd recommend eating bivalves, like blue mussels or oysters. They are very unlikely to be sentient, they are usually quite cheap, they contain the nutrients you'd be at risk of becoming deficient in as a vegan, and other beneficient things like DHA.

donatas-luciunas on Terminal goal vs Intelligence

OK, I'm open to discuss this further using your concept.

As I understand you agree that correct answer is 2nd?

It is not clear to me what any of this has to do with Orthogonality.

I'm not sure how patient you are, but I can reassure that we will come to Orthogonality if you don't give up 😄

So if I understand your concept correctly a super intelligent agent will combine all future terminal goals to a single unchanging goal. How does this work with the fact that future is unpredictable? The agent will work towards all possible goals? It is possible that in the future grue will mean green, blue or even red.

silentbob on What Have Been Your Most Valuable Casual Conversations At Conferences?

I have the vague impression that this is true for me as well, and I remember having made that same claim (that spontaneous conversations at conferences seem maybe most valuable) to a friend when traveling home from an EAGx. My personal best guess: planned conversations are usually 30 minutes long, and while there is some interest based filtering going on, there's usually no guarantees you vibe well with the person. Spontaneous encounters however have pretty variable length, so the ones where you're not vibing will just be over naturally quickly, whereas the better connections will last longer. So my typical "spontaneous encounter minute" tends to be more enjoyable than my typical "planned 1-1 minute". But hard to say how this transfers to instrumental value.

scroogemcduck1 on [deleted]

I think it makes sense to include the podcasts that aren't currently updating - for example, Rationally Speaking's old episodes. Affix needs a new link or an archived version, as the episodes are not listed at the current link, and I'm too lazy to track down the episodes.

richard_kennaway on Terminal goal vs Intelligence

Another way of conceptualising this is to say that the agent has the single unchanging goal of "cups until 2025, thenceforth paperclips".

Compare with the situation of being told to make grue cups, where "grue" means "green until 2025, then blue."

If the agent is not informed in advance, it can still be conceptualised as the agent's goal being to produce whatever it is told to produce — an unchanging goal.

At a high enough level, we can conceive that no goal ever changes. These are the terminal goals. At lower levels, we can see goals as changing all the time in service of the higher goals, as in the case of an automatic pilot following a series of waypoints. But this is to play games in our head, inventing stories that give us different intuitions. How we conceptualise things has no effect on what the AI does in response to new orders.

It is not clear to me what any of this has to do with Orthogonality.

ambigram on Enemies vs Malefactors

This is an important distinction, otherwise you risk getting into unproductive discussions about someone's intent instead of focusing on whether a person's patterns are compatible with your or your group/community's needs.

It doesn't matter if someone was negligent or malicious: if they are bad at reading your nonverbal cues and you are bad at explicitly saying no to boundary crossing behaviors, you are incompatible and that is reason enough to end the relationship. It doesn't matter if someone is trying their best: if their best is still disruptive to your team, that is reason enough to request they be transferred out.

I can't remember if this essay is where I learned this concept. But remembering this distinction protected me in meaningful ways at least twice.

halinaeth on You Get About Five Words

Comments have great nuance i.e. "systems/processes greatly expand word count".

But I'd say assuming lack of system & a randomly selected audience, the author's point stands. After all, in media there's a reason they value "sound bites" so much- and those are more like 5 syllables.

Think, "grab em by the ****" and "nasty woman" from the 2016 election.

Would love to be corrected though!

halinaeth on Overconfidence

Makes me think of the concept of "reality distortion fields" as it applies to overconfidence in leaders (I read about this applied to Steve Jobs specifically- his ability to get people to also believe in & work towards the impossible).

Does anyone have the link to what I'm referring to? But overall, I do believe charisma has a lot to do with letting go of the need to have an accurate "map" of yourself and your strengths/shortcomings.

halinaeth on Social Dark Matter

Excellent summary! Would be interested in a list of corollaries to this, i.e.:

a) If "condemned" X is necessary for "prestigious" Y, people with Y will mislead and lie to the public about how they achieved Y, despite wanting others to attain success at Y too. Furthermore, the narrative of their path to achieving Y without anything to do with X will be extremely uniform & coordinated despite any huge differences amongst people with Y. For example, some Y people have X, some don't, some hope for others to attain Y, some don't- but the "public narrative" all with Y tell will still end up extremely uniform.

This corollary was extremely unintuitive to me- outlined my experience in a "condemned X" which was often needed for "prestigious Y" if anyone is curious of how the corollary plays out in practice (in my direct comment to this post).