LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

How it feels to have your mind hacked by an AI
blaked · 2023-01-12T00:33:18.866Z · comments (219)

On not getting contaminated by the wrong obesity ideas
Natália (Natália Mendonça) · 2023-01-28T20:18:21.322Z · comments (67)

We don’t trade with ants
KatjaGrace · 2023-01-10T23:50:11.476Z · comments (108)

Basics of Rationalist Discourse
[DEACTIVATED] Duncan Sabien (Duncan_Sabien) · 2023-01-27T02:40:52.739Z · comments (180)

My Model Of EA Burnout
LoganStrohl (BrienneYudkowsky) · 2023-01-25T17:52:42.770Z · comments (49)

Thoughts on the impact of RLHF research
paulfchristiano · 2023-01-25T17:23:16.402Z · comments (101)

Recursive Middle Manager Hell
Raemon · 2023-01-01T04:33:29.942Z · comments (45)

What a compute-centric framework says about AI takeoff speeds
Tom Davidson (tom-davidson-1) · 2023-01-23T04:02:07.672Z · comments (29)

Alexander and Yudkowsky on AGI goals
Scott Alexander (Yvain) · 2023-01-24T21:09:16.938Z · comments (52)

What I mean by "alignment is in large part about making cognition aimable at all"
So8res · 2023-01-30T15:22:09.294Z · comments (25)

[link] Neural networks generalize because of this one weird trick
Jesse Hoogland (jhoogland) · 2023-01-18T00:10:36.998Z · comments (28)

Gradient hacking is extremely difficult
beren · 2023-01-24T15:45:46.518Z · comments (22)

Sapir-Whorf for Rationalists
[DEACTIVATED] Duncan Sabien (Duncan_Sabien) · 2023-01-25T07:58:46.794Z · comments (48)

[link] "Heretical Thoughts on AI" by Eli Dourado
DragonGod · 2023-01-19T16:11:56.567Z · comments (38)

[link] Why didn't we get the four-hour workday?
jasoncrawford · 2023-01-06T21:29:38.995Z · comments (34)

[link] How to slow down scientific progress, according to Leo Szilard
jasoncrawford · 2023-01-05T18:26:12.121Z · comments (18)

Basic Facts about Language Model Internals
beren · 2023-01-04T13:01:35.223Z · comments (18)

Wolf Incident Postmortem
jefftk (jkaufman) · 2023-01-09T03:20:03.723Z · comments (13)

Why I'm joining Anthropic
evhub · 2023-01-05T01:12:13.822Z · comments (4)

Compendium of problems with RLHF
Charbel-Raphaël (charbel-raphael-segerie) · 2023-01-29T11:40:53.147Z · comments (16)

How to Bounded Distrust
Zvi · 2023-01-09T13:10:00.942Z · comments (15)

Soft optimization makes the value target bigger
Jeremy Gillen (jeremy-gillen) · 2023-01-02T16:06:50.229Z · comments (20)

AGI and the EMH: markets are not expecting aligned or unaligned AI in the next 30 years
basil.halperin (bhalperin) · 2023-01-10T16:06:52.329Z · comments (44)

Transcript of Sam Altman's interview touching on AI safety
Andy_McKenzie · 2023-01-20T16:14:18.974Z · comments (41)

The Fountain of Health: a First Principles Guide to Rejuvenation
PhilJackson · 2023-01-07T18:34:12.094Z · comments (38)

Touch reality as soon as possible (when doing machine learning research)
LawrenceC (LawChan) · 2023-01-03T19:11:58.915Z · comments (7)

Running by Default
jefftk (jkaufman) · 2023-01-05T13:50:01.254Z · comments (39)

Large language models learn to represent the world
gjm · 2023-01-22T13:10:38.837Z · comments (19)

Vegan Nutrition Testing Project: Interim Report
Elizabeth (pktechgirl) · 2023-01-20T05:50:03.565Z · comments (37)

2022 was the year AGI arrived (Just don't call it that)
Logan Zoellner (logan-zoellner) · 2023-01-04T15:19:55.009Z · comments (59)

Concrete Reasons for Hope about AI
Zac Hatfield-Dodds (zac-hatfield-dodds) · 2023-01-14T01:22:18.723Z · comments (13)

Parameter Scaling Comes for RL, Maybe
1a3orn · 2023-01-24T13:55:46.324Z · comments (3)

2022 Unofficial LessWrong General Census
Screwtape · 2023-01-30T18:36:30.616Z · comments (33)

Induction heads - illustrated
CallumMcDougall (TheMcDouglas) · 2023-01-02T15:35:20.550Z · comments (8)

Iron deficiencies are very bad and you should treat them
Elizabeth (pktechgirl) · 2023-01-12T09:10:01.240Z · comments (29)

Categorizing failures as “outer” or “inner” misalignment is often confused
Rohin Shah (rohinmshah) · 2023-01-06T15:48:51.739Z · comments (21)

Disentangling Shard Theory into Atomic Claims
Leon Lang (leon-lang) · 2023-01-13T04:23:51.947Z · comments (6)

Review AI Alignment posts to help figure out how to make a proper AI Alignment review
habryka (habryka4) · 2023-01-10T00:19:23.503Z · comments (31)

“Endgame safety” for AGI
Steven Byrnes (steve2152) · 2023-01-24T14:15:32.783Z · comments (10)

Book Review: Worlds of Flow
remember · 2023-01-16T20:17:16.022Z · comments (3)

Childhood Roundup #1
Zvi · 2023-01-06T13:00:00.915Z · comments (27)

[link] The Alignment Problem from a Deep Learning Perspective (major rewrite)
[deleted] · 2023-01-10T16:06:05.057Z · comments (8)

On AI and Interest Rates
Zvi · 2023-01-17T15:00:01.267Z · comments (13)

[link] Confusing the ideal for the necessary
adamShimi · 2023-01-16T17:29:06.932Z · comments (6)

Against Boltzmann mesaoptimizers
porby · 2023-01-30T02:55:12.041Z · comments (6)

Spreading messages to help with the most important century
HoldenKarnofsky · 2023-01-25T18:20:07.322Z · comments (4)

Some Thoughts on AI Art
abramdemski · 2023-01-25T14:18:14.507Z · comments (20)

Wentworth and Larsen on buying time
Akash (akash-wasil) · 2023-01-09T21:31:24.911Z · comments (6)

Pessimistic Shard Theory
Garrett Baker (D0TheMath) · 2023-01-25T00:59:33.863Z · comments (13)

Compounding Resource X
Raemon · 2023-01-11T03:14:08.565Z · comments (5)

next page (older posts) →

Archive

Recent comments

richard_kennaway on Introducing AI Lab Watch

AI Corporation Watch | AI Mega-Corp Watch | AI Company Watch | AI Industry Watch | AI Firm Watch | AI Behemoth Watch | AI Colossus Watch | AI Juggernaut Watch | AI Future Watch

These are either tendentious ("Juggernaut") or unnecessarily specific to the present moment ("Mega-Corp").

How about simply "AI Watch"?

review-bot on 6 non-obvious mental health issues specific to AI safety

The LessWrong Review [? · GW] runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2024. The top fifty or so posts are featured prominently on the site throughout the year. Will this post make the top fifty?

cubefox on Mati_Roy's Shortform

I guess for a cat classifier disentanglement is not possible, because it wants to classify things as cats if and only if it believes they are cats. Since values and beliefs are perfectly correlated here, there is no test we could perform which would distinguish what it wants from what it believes.

Though we could assume we don't know what the classifier wants. If it doesn't classify a cat image as "yes", it could be because it is (say) actually a dog classifier, and it correctly believes the image contains something other than a dog. Or it could be because it is indeed a cat classifier, but it mistakenly believes the image doesn't show a cat.

One way to find out would be to give the classifier an image of the same subject, but in higher resolution or from another angle, and check whether it changes its classification to "yes". If it is a car classifier, it is likely it won't make the mistake again, so it probably change its classification to "yes". If it is a dog classifier, it will likely stay with "no".

This assumes that mistakes are random and somewhat unlikely, so will probably disappear when the evidence is better or of a different sort. Beliefs react to changes in evidence of that sort, while values don't.

arisalexis on How to be an amateur polyglot

Yes I didn't even know the difference :) I thought tap is only for pub beer ! Totally disconnected from the exams where you only dealt with essays

the-gears-to-ascension on some thoughts on LessOnline

I’m only a year old rationalist

you write really eloquently for your age! and being in uni! wow. I was still learning to walk. kids are so precocious these days

⸮

keltan on some thoughts on LessOnline

“Whiteboards everywhere” and my non-ironic favourite band are debuting songs!!!

But, I’m only a year old rationalist and I live in Australia on a uni student budget. Still… I’m considering flying out. It would be pretty incredible to run some abstract improv workshops with other truth seeking nerds. I think I need to sit down and calculate.

Is this the type of event that a first year rationalist could attend and get value from/be welcome at? What is the likelihood that it will run again next year? Is there a prediction market for that?

the-gears-to-ascension on Raemon's Shortform

a ui on your user page where you get to pick a four letter shortening of your name and a color. the shortening is displayed as

t g
t a

in a tiny color-of-your-choice box. when picking your name, each time you pick a hue and saturation in the color picker (use a standard one, don't build a color picker), it does a query (debounced - I hope you have a standard way to debounce in react elements) for other people on the site who have that initialism, and shows you their colors in a list, along with an indicator min(color_distance(you.color, them.color) for them in other_users).

the color distance indicator could be something like the one from here, which would need transliterating into javascript:

This formula has results that are very close to L^*u^*v^* (with the modified lightness curve) and, more importantly, it is a more stable algorithm: it does not have a range of colours where it suddenly gives far from optimal results. The weights of the formula could be optimized further, but again, the selection of the closest colour is subjective. My goal was to find a reasonable compromise.
typedef struct {
   unsigned char r, g, b;
} RGB;

double ColourDistance(RGB e1, RGB e2)
{
  long rmean = ( (long)e1.r + (long)e2.r ) / 2;
  long r = (long)e1.r - (long)e2.r;
  long g = (long)e1.g - (long)e2.g;
  long b = (long)e1.b - (long)e2.b;
  return sqrt((((512+rmean)*r*r)>>8) + 4*g*g + (((767-rmean)*b*b)>>8));
}

keltan on Observations on Teaching for Four Weeks

That’s a great question! I’ve been teaching arts classes for a youth charity for 5 years now. Ages range from 5-18. I myself am 23.

I’d say this has happened twice? I’m counting a one off lesson with some 16-18 year olds a few years ago. And a series of weeks in which I had extremely little control over some 8-10 year olds. In that case I was able to control individuals if they had my full attention. But would ‘lose’ them when I focused on the next kid.

Your question caused me to think of why these things may have happened. Though I’m curious to hear what you think before I spill my guts.

mathieuroy on Let's split the cake, lengthwise, upwise and slantwise

thanks, it worked! https://web.archive.org/web/20150412211654/http://reducing-suffering.org/wp-content/uploads/2015/02/wild-animals_2015-02-28.pdf

mathieuroy on Mati_Roy's Shortform

i want a better conceptual understanding of what "fundamental values" means, and how to disentangled that from beliefs (ex.: in an LLM). like, is there a meaningful way we can say that a "cat classifier" is valuing classifying cats even though it sometimes fail?