LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

(The) Lightcone is nothing without its people: LW + Lighthaven's big fundraiser
habryka (habryka4) · 2024-11-30T02:55:16.077Z · comments (212)

OpenAI Email Archives (from Musk v. Altman and OpenAI blog)
habryka (habryka4) · 2024-11-16T06:38:03.937Z · comments (80)

Alignment Faking in Large Language Models
ryan_greenblatt · 2024-12-18T17:19:06.665Z · comments (53)

The hostile telepaths problem
Valentine · 2024-10-27T15:26:53.610Z · comments (84)

[link] Survival without dignity
L Rudolf L (LRudL) · 2024-11-04T02:29:38.758Z · comments (28)

[link] I got dysentery so you don’t have to
eukaryote · 2024-10-22T04:55:58.422Z · comments (4)

[link] Biological risk from the mirror world
jasoncrawford · 2024-12-12T19:07:06.305Z · comments (29)

Overview of strong human intelligence amplification methods
TsviBT · 2024-10-08T08:37:18.896Z · comments (141)

The Great Data Integration Schlep
sarahconstantin · 2024-09-13T15:40:02.298Z · comments (16)

The Best Lay Argument is not a Simple English Yud Essay
J Bostock (Jemist) · 2024-09-10T17:34:28.422Z · comments (15)

The Online Sports Gambling Experiment Has Failed
Zvi · 2024-11-11T14:30:04.371Z · comments (28)

Laziness death spirals
PatrickDFarley · 2024-09-19T15:58:30.252Z · comments (36)

the case for CoT unfaithfulness is overstated
nostalgebraist · 2024-09-29T22:07:54.053Z · comments (40)

[link] Explore More: A Bag of Tricks to Keep Your Life on the Rails
Shoshannah Tekofsky (DarkSym) · 2024-09-28T21:38:52.256Z · comments (15)

You are not too "irrational" to know your preferences.
DaystarEld · 2024-11-26T15:01:42.996Z · comments (50)

"Slow" takeoff is a terrible term for "maybe even faster takeoff, actually"
Raemon · 2024-09-28T23:38:25.512Z · comments (69)

Ayn Rand’s model of “living money”; and an upside of burnout
AnnaSalamon · 2024-11-16T02:59:07.368Z · comments (58)

[link] What TMS is like
Sable · 2024-10-31T00:44:22.612Z · comments (23)

The Sun is big, but superintelligences will not spare Earth a little sunlight
Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2024-09-23T03:39:16.243Z · comments (141)

What Goes Without Saying
sarahconstantin · 2024-12-20T18:00:06.363Z · comments (10)

Frontier Models are Capable of In-context Scheming
Marius Hobbhahn (marius-hobbhahn) · 2024-12-05T22:11:17.320Z · comments (24)

Making a conservative case for alignment
Cameron Berg (cameron-berg) · 2024-11-15T18:55:40.864Z · comments (68)

The Hopium Wars: the AGI Entente Delusion
Max Tegmark (MaxTegmark) · 2024-10-13T17:00:29.033Z · comments (55)

[link] Understanding Shapley Values with Venn Diagrams
Carson L · 2024-12-06T21:56:43.960Z · comments (32)

[link] The Compendium, A full argument about extinction risk from AGI
adamShimi · 2024-10-31T12:01:51.714Z · comments (52)

Communications in Hard Mode (My new job at MIRI)
tanagrabeast · 2024-12-13T20:13:44.825Z · comments (24)

Cryonics is free
Mati_Roy (MathieuRoy) · 2024-09-29T17:58:17.108Z · comments (42)

Orienting to 3 year AGI timelines
Nikola Jurkovic (nikolaisalreadytaken) · 2024-12-22T01:15:11.401Z · comments (26)

A basic systems architecture for AI agents that do autonomous research
Buck · 2024-09-23T13:58:27.185Z · comments (15)

[link] Why I’m not a Bayesian
Richard_Ngo (ricraz) · 2024-10-06T15:22:45.644Z · comments (92)

Skills from a year of Purposeful Rationality Practice
Raemon · 2024-09-18T02:05:58.726Z · comments (18)

Information vs Assurance
johnswentworth · 2024-10-20T23:16:25.762Z · comments (17)

Contra papers claiming superhuman AI forecasting
nikos (followtheargument) · 2024-09-12T18:10:50.582Z · comments (16)

[question] Why is o1 so deceptive?
abramdemski · 2024-09-27T17:27:35.439Z · answers+comments (24)

Struggling like a Shadowmoth
Raemon · 2024-09-24T00:47:05.030Z · comments (38)

Did Christopher Hitchens change his mind about waterboarding?
Isaac King (KingSupernova) · 2024-09-15T08:28:09.451Z · comments (22)

Three Subtle Examples of Data Leakage
abstractapplic · 2024-10-01T20:45:27.731Z · comments (16)

My motivation and theory of change for working in AI healthtech
Andrew_Critch · 2024-10-12T00:36:30.925Z · comments (37)

[link] Overcoming Bias Anthology
Arjun Panickssery (arjun-panickssery) · 2024-10-20T02:01:23.463Z · comments (14)

The Median Researcher Problem
johnswentworth · 2024-11-02T20:16:11.341Z · comments (69)

o1 is a bad idea
abramdemski · 2024-11-11T21:20:24.892Z · comments (38)

The Summoned Heroine's Prediction Markets Keep Providing Financial Services To The Demon King!
abstractapplic · 2024-10-26T12:34:51.059Z · comments (16)

Neutrality
sarahconstantin · 2024-11-13T23:10:05.469Z · comments (27)

Current safety training techniques do not fully transfer to the agent setting
Simon Lermen (dalasnoin) · 2024-11-03T19:24:51.537Z · comments (8)

My takes on SB-1047
leogao · 2024-09-09T18:38:37.799Z · comments (8)

[link] When Is Insurance Worth It?
kqr · 2024-12-19T19:07:32.573Z · comments (54)

[link] Gradient Routing: Masking Gradients to Localize Computation in Neural Networks
cloud · 2024-12-06T22:19:26.717Z · comments (12)

"It's a 10% chance which I did 10 times, so it should be 100%"
egor.timatkov · 2024-11-18T01:14:27.738Z · comments (57)

A Rocket–Interpretability Analogy
plex (ete) · 2024-10-21T13:55:18.184Z · comments (31)

o3
Zach Stein-Perlman · 2024-12-20T18:30:29.448Z · comments (145)

next page (older posts) →

Archive

Recent comments

hmys on Vegans need to eat just enough Meat - emperically evaluate the minimum ammount of meat that maximizes utility

I don't really understand. Why wouldn't you just test to see if you are deficient in things?

I did that, and I wasn't deficient in anything.

I've also (somewhat involuntarily) done the thing you suggest, and I unsurprisingly didn't notice any difference. If anything, I feel a lot better on a vegan diet.

If you want to do the thing hes suggesting here, I'd recommend eating bivalves, like blue mussels or oysters. They are very unlikely to be sentient, they are usually quite cheap, they contain the nutrients you'd be at risk of becoming deficient in as a vegan, and other beneficient things like DHA.

donatas-luciunas on Terminal goal vs Intelligence

OK, I'm open to discuss this further using your concept.

As I understand you agree that correct answer is 2nd?

It is not clear to me what any of this has to do with Orthogonality.

I'm not sure how patient you are, but I can reassure that we will come to Orthogonality if you don't give up 😄

So if I understand your concept correctly a super intelligent agent will combine all future terminal goals to a single unchanging goal. How does this work with the fact that future is unpredictable? The agent will work towards all possible goals? It is possible that in the future grue will mean green, blue or even red.

silentbob on What Have Been Your Most Valuable Casual Conversations At Conferences?

I have the vague impression that this is true for me as well, and I remember having made that same claim (that spontaneous conversations at conferences seem maybe most valuable) to a friend when traveling home from an EAGx. My personal best guess: planned conversations are usually 30 minutes long, and while there is some interest based filtering going on, there's usually no guarantees you vibe well with the person. Spontaneous encounters however have pretty variable length, so the ones where you're not vibing will just be over naturally quickly, whereas the better connections will last longer. So my typical "spontaneous encounter minute" tends to be more enjoyable than my typical "planned 1-1 minute". But hard to say how this transfers to instrumental value.

scroogemcduck1 on [deleted]

I think it makes sense to include the podcasts that aren't currently updating - for example, Rationally Speaking's old episodes. Affix needs a new link or an archived version, as the episodes are not listed at the current link, and I'm too lazy to track down the episodes.

richard_kennaway on Terminal goal vs Intelligence

Another way of conceptualising this is to say that the agent has the single unchanging goal of "cups until 2025, thenceforth paperclips".

Compare with the situation of being told to make grue cups, where "grue" means "green until 2025, then blue."

If the agent is not informed in advance, it can still be conceptualised as the agent's goal being to produce whatever it is told to produce — an unchanging goal.

At a high enough level, we can conceive that no goal ever changes. These are the terminal goals. At lower levels, we can see goals as changing all the time in service of the higher goals, as in the case of an automatic pilot following a series of waypoints. But this is to play games in our head, inventing stories that give us different intuitions. How we conceptualise things has no effect on what the AI does in response to new orders.

It is not clear to me what any of this has to do with Orthogonality.

ambigram on Enemies vs Malefactors

This is an important distinction, otherwise you risk getting into unproductive discussions about someone's intent instead of focusing on whether a person's patterns are compatible with your or your group/community's needs.

It doesn't matter if someone was negligent or malicious: if they are bad at reading your nonverbal cues and you are bad at explicitly saying no to boundary crossing behaviors, you are incompatible and that is reason enough to end the relationship. It doesn't matter if someone is trying their best: if their best is still disruptive to your team, that is reason enough to request they be transferred out.

I can't remember if this essay is where I learned this concept. But remembering this distinction protected me in meaningful ways at least twice.

halinaeth on You Get About Five Words

Comments have great nuance i.e. "systems/processes greatly expand word count".

But I'd say assuming lack of system & a randomly selected audience, the author's point stands. After all, in media there's a reason they value "sound bites" so much- and those are more like 5 syllables.

Think, "grab em by the ****" and "nasty woman" from the 2016 election.

Would love to be corrected though!

halinaeth on Overconfidence

Makes me think of the concept of "reality distortion fields" as it applies to overconfidence in leaders (I read about this applied to Steve Jobs specifically- his ability to get people to also believe in & work towards the impossible).

Does anyone have the link to what I'm referring to? But overall, I do believe charisma has a lot to do with letting go of the need to have an accurate "map" of yourself and your strengths/shortcomings.

halinaeth on Social Dark Matter

Excellent summary! Would be interested in a list of corollaries to this, i.e.:

a) If "condemned" X is necessary for "prestigious" Y, people with Y will mislead and lie to the public about how they achieved Y, despite wanting others to attain success at Y too. Furthermore, the narrative of their path to achieving Y without anything to do with X will be extremely uniform & coordinated despite any huge differences amongst people with Y. For example, some Y people have X, some don't, some hope for others to attain Y, some don't- but the "public narrative" all with Y tell will still end up extremely uniform.

This corollary was extremely unintuitive to me- outlined my experience in a "condemned X" which was often needed for "prestigious Y" if anyone is curious of how the corollary plays out in practice (in my direct comment to this post).

halinaeth on Social Dark Matter

Jimmy phrased it really well- the "lizardmen" don't want to let anyone know precisely because they won't be perceived by you as rational/moral humans as you would've without knowing, but rather "lizardmen".

"how one might ever become justifiably confident a particular piece of dark matter really doesn't exist or is as rare as you'd suspect it is" - as someone in a "lizardman" community myself (commented regarding my own experience), probably one of the only ways to know for sure is to join as a lizardman. Any other way, you'll be inundated with misinformation, speculation, and even red herrings directed at distorting the "map" as much as possible for anyone trying to understand lizardmen.

For my part, after joining the lizardman community, I did realize the prevalance was about 100x what I'd previously assumed, just as OP predicted.

As for trading action for knowledge, I personally wouldn't share my "membership" with anyone in my close social circle unless they gave me something equally taboo. I wouldn't believe anyone's commitment to not be upset, because often disgust or horror is a gut reaction and uncontrollable. Never mind trusting others to keep any secret without a counterweight.