LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

[link] How much do you believe your results?
Eric Neyman (UnexpectedValues) · 2023-05-06T20:31:31.277Z · comments (14)

Steering GPT-2-XL by adding an activation vector
TurnTrout · 2023-05-13T18:42:41.321Z · comments (97)

[link] Statement on AI Extinction - Signed by AGI Labs, Top Academics, and Many Other Notable Figures
Dan H (dan-hendrycks) · 2023-05-30T09:05:25.986Z · comments (77)

How to have Polygenically Screened Children
GeneSmith · 2023-05-07T16:01:07.096Z · comments (108)

Book Review: How Minds Change
bc4026bd4aaa5b7fe (bc4026bd4aaa5b7fe0bdcd47da7a22b453953f990d35286b9d315a619b23667a) · 2023-05-25T17:55:32.218Z · comments (52)

Predictable updating about AI risk
Joe Carlsmith (joekc) · 2023-05-08T21:53:34.730Z · comments (23)

My May 2023 priorities for AI x-safety: more empathy, more unification of concerns, and less vilification of OpenAI
Andrew_Critch · 2023-05-24T00:02:08.836Z · comments (39)

Mental Health and the Alignment Problem: A Compilation of Resources (updated April 2023)
Chris Scammell (chris-scammell) · 2023-05-10T19:04:21.138Z · comments (53)

Announcing Apollo Research
Marius Hobbhahn (marius-hobbhahn) · 2023-05-30T16:17:19.767Z · comments (11)

Twiblings, four-parent babies and other reproductive technology
GeneSmith · 2023-05-20T17:11:23.726Z · comments (32)

Decision Theory with the Magic Parts Highlighted
moridinamael · 2023-05-16T17:39:55.038Z · comments (24)

When is Goodhart catastrophic?
Drake Thomas (RavenclawPrefect) · 2023-05-09T03:59:16.043Z · comments (23)

Prizes for matrix completion problems
paulfchristiano · 2023-05-03T23:30:08.069Z · comments (51)

Request: stop advancing AI capabilities
So8res · 2023-05-26T17:42:07.182Z · comments (23)

[link] Conjecture internal survey: AGI timelines and probability of human extinction from advanced AI
Maris Sala (maris-sala) · 2023-05-22T14:31:59.139Z · comments (5)

A brief collection of Hinton's recent comments on AGI risk
Kaj_Sotala · 2023-05-04T23:31:06.157Z · comments (9)

Sentience matters
So8res · 2023-05-29T21:25:30.638Z · comments (96)

Dark Forest Theories
Raemon · 2023-05-12T20:21:49.052Z · comments (48)

LeCun’s “A Path Towards Autonomous Machine Intelligence” has an unsolved technical alignment problem
Steven Byrnes (steve2152) · 2023-05-08T19:35:19.180Z · comments (37)

AGI safety career advice
Richard_Ngo (ricraz) · 2023-05-02T07:36:09.044Z · comments (24)

Clarifying and predicting AGI
Richard_Ngo (ricraz) · 2023-05-04T15:55:26.283Z · comments (42)

Trust develops gradually via making bids and setting boundaries
Richard_Ngo (ricraz) · 2023-05-19T22:16:38.483Z · comments (12)

Advice for newly busy people
Severin T. Seehrich (sts) · 2023-05-11T16:46:15.313Z · comments (2)

[link] Who regulates the regulators? We need to go beyond the review-and-approval paradigm
jasoncrawford · 2023-05-04T22:11:17.465Z · comments (29)

Some background for reasoning about dual-use alignment research
Charlie Steiner · 2023-05-18T14:50:54.401Z · comments (20)

Solving the Mechanistic Interpretability challenges: EIS VII Challenge 1
StefanHex (Stefan42) · 2023-05-09T19:41:10.528Z · comments (1)

Investigating Fabrication
LoganStrohl (BrienneYudkowsky) · 2023-05-18T17:46:52.783Z · comments (14)

From fear to excitement
Richard_Ngo (ricraz) · 2023-05-15T06:23:18.656Z · comments (8)

Retrospective: Lessons from the Failed Alignment Startup AISafety.com
Søren Elverlin (soren-elverlin-1) · 2023-05-12T18:07:20.857Z · comments (9)

Open Thread With Experimental Feature: Reactions
jimrandomh · 2023-05-24T16:46:39.367Z · comments (189)

A Case for the Least Forgiving Take On Alignment
Thane Ruthenis · 2023-05-02T21:34:49.832Z · comments (82)

Geoff Hinton Quits Google
Adam Shai (adam-shai) · 2023-05-01T21:03:47.806Z · comments (14)

Shah (DeepMind) and Leahy (Conjecture) Discuss Alignment Cruxes
OliviaJ (olivia-jimenez-1) · 2023-05-01T16:47:41.655Z · comments (10)

AI Safety in China: Part 2
Lao Mein (derpherpize) · 2023-05-22T14:50:54.482Z · comments (28)

Most people should probably feel safe most of the time
Kaj_Sotala · 2023-05-09T09:35:11.911Z · comments (28)

[link] DeepMind: Model evaluation for extreme risks
Zach Stein-Perlman · 2023-05-25T03:00:00.915Z · comments (11)

[link] What if they gave an Industrial Revolution and nobody came?
jasoncrawford · 2023-05-17T19:41:32.198Z · comments (10)

[link] Yoshua Bengio: How Rogue AIs may Arise
harfe · 2023-05-23T18:28:27.489Z · comments (12)

An artificially structured argument for expecting AGI ruin
Rob Bensinger (RobbBB) · 2023-05-07T21:52:54.421Z · comments (26)

Input Swap Graphs: Discovering the role of neural network components at scale
Alexandre Variengien (alexandre-variengien) · 2023-05-12T09:41:08.800Z · comments (0)

LessWrong Community Weekend 2023 [Applications now closed]
Henry Prowbell · 2023-05-01T09:08:14.502Z · comments (0)

The bullseye framework: My case against AI doom
titotal (lombertini) · 2023-05-30T11:52:31.194Z · comments (35)

Conditional Prediction with Zero-Sum Training Solves Self-Fulfilling Prophecies
Rubi J. Hudson (Rubi) · 2023-05-26T17:44:35.575Z · comments (13)

Bayesian Networks Aren't Necessarily Causal
Zack_M_Davis · 2023-05-14T01:42:24.319Z · comments (36)

Reacts now enabled on 100% of posts, though still just experimenting
Ruby · 2023-05-28T05:36:40.953Z · comments (73)

Coercion is an adaptation to scarcity; trust is an adaptation to abundance
Richard_Ngo (ricraz) · 2023-05-23T18:14:19.117Z · comments (11)

Lessons learned from offering in-office nutritional testing
Elizabeth (pktechgirl) · 2023-05-15T23:20:10.582Z · comments (11)

Judgments often smuggle in implicit standards
Richard_Ngo (ricraz) · 2023-05-15T18:50:07.781Z · comments (4)

[link] Wikipedia as an introduction to the alignment problem
[deleted] · 2023-05-29T18:43:47.247Z · comments (10)

next page (older posts) →

Archive

Recent comments

cubefox on Mati_Roy's Shortform

I guess for a cat classifier disentanglement is not possible, because it wants to classify things as cats if and only if it believes they are cats. Since values and beliefs are perfectly correlated here, there is no test we could perform which would distinguish what it wants from what it believes.

Though we could assume we don't know what the classifier wants. If it doesn't classify a cat image as "yes", it could be because it is (say) actually a dog classifier, and it correctly believes the image contains something other than a dog. Or it could be because it is indeed a cat classifier, but it mistakenly believes the image doesn't show a cat.

One way to find out would be to give the classifier an image of the same subject, but in higher resolution or from another angle, and check whether it changes its classification to "yes". If it is a car classifier, it is likely it won't make the mistake again, so it probably change its classification to "yes". If it is a dog classifier, it will likely stay with "no".

This assumes that mistakes are random and somewhat unlikely, so will probably disappear when the evidence is better or of a different sort. Beliefs react to changes in evidence of that sort, while values don't.

arisalexis on How to be an amateur polyglot

Yes I didn't even know the difference :) I thought tap is only for pub beer ! Totally disconnected from the exams where you only dealt with essays

the-gears-to-ascension on some thoughts on LessOnline

I’m only a year old rationalist

you write really eloquently for your age! and being in uni! wow. I was still learning to walk. kids are so precocious these days

⸮

keltan on some thoughts on LessOnline

“Whiteboards everywhere” and my non-ironic favourite band are debuting songs!!!

But, I’m only a year old rationalist and I live in Australia on a uni student budget. Still… I’m considering flying out. It would be pretty incredible to run some abstract improv workshops with other truth seeking nerds. I think I need to sit down and calculate.

Is this the type of event that a first year rationalist could attend and get value from/be welcome at? What is the likelihood that it will run again next year? Is there a prediction market for that?

the-gears-to-ascension on Raemon's Shortform

a ui on your user page where you get to pick a four letter shortening of your name and a color. the shortening is displayed as

t g
t a

in a tiny color-of-your-choice box. when picking your name, each time you pick a hue and saturation in the color picker (use a standard one, don't build a color picker), it does a query (debounced - I hope you have a standard way to debounce in react elements) for other people on the site who have that initialism, and shows you their colors in a list, along with an indicator min(color_distance(you.color, them.color) for them in other_users).

the color distance indicator could be something like the one from here, which would need transliterating into javascript:

This formula has results that are very close to L^*u^*v^* (with the modified lightness curve) and, more importantly, it is a more stable algorithm: it does not have a range of colours where it suddenly gives far from optimal results. The weights of the formula could be optimized further, but again, the selection of the closest colour is subjective. My goal was to find a reasonable compromise.
typedef struct {
   unsigned char r, g, b;
} RGB;

double ColourDistance(RGB e1, RGB e2)
{
  long rmean = ( (long)e1.r + (long)e2.r ) / 2;
  long r = (long)e1.r - (long)e2.r;
  long g = (long)e1.g - (long)e2.g;
  long b = (long)e1.b - (long)e2.b;
  return sqrt((((512+rmean)*r*r)>>8) + 4*g*g + (((767-rmean)*b*b)>>8));
}

keltan on Observations on Teaching for Four Weeks

That’s a great question! I’ve been teaching arts classes for a youth charity for 5 years now. Ages range from 5-18. I myself am 23.

I’d say this has happened twice? I’m counting a one off lesson with some 16-18 year olds a few years ago. And a series of weeks in which I had extremely little control over some 8-10 year olds. In that case I was able to control individuals if they had my full attention. But would ‘lose’ them when I focused on the next kid.

Your question caused me to think of why these things may have happened. Though I’m curious to hear what you think before I spill my guts.

mathieuroy on Let's split the cake, lengthwise, upwise and slantwise

thanks, it worked! https://web.archive.org/web/20150412211654/http://reducing-suffering.org/wp-content/uploads/2015/02/wild-animals_2015-02-28.pdf

mathieuroy on Mati_Roy's Shortform

i want a better conceptual understanding of what "fundamental values" means, and how to disentangled that from beliefs (ex.: in an LLM). like, is there a meaningful way we can say that a "cat classifier" is valuing classifying cats even though it sometimes fail?

nathan-helm-burger on Please stop publishing ideas/insights/research about AI

A bit of a rant, yes, but some good thoughts here.

I agree that unenforceable regulation can be a bad thing. On the other hand, it can also work in some limited ways. For example, the international agreements against heritable human genetic engineering seem to have held up fairly well. But I think that that requires supporting facts about the world to be true. It needs to not be obviously highly profitable to defectors, it needs to be relatively inaccessible to most people (requiring specialized tech and knowledge), it needs to fit with our collective intuitions (bio-engineering humans seems kinda icky to a lot of people).

The trouble is, all of these things fail to help us with the problem of dangerous AI! As you point out, many bitcoin miners have plenty of GPUs to be dangerous if we get even a couple more orders-of-magnitude algorithmic efficiency improvements. So it's accessible. AI and AGI offer many tempting ways to acquire power and money in society. So it's immediately and incrementally profitable. People aren't as widely instinctively outraged by AI experiments as Bio-engineering experiments. So it's not intuitively repulsive.

So yes, this seems to me to be very much a situation in which we should not place any trust in unenforceable regulation.

I also agree that we probably do need some sort of organization which enforces the necessary protections (detection and destruction) against rogue AI.

And it does seem potentially like a lot of human satisfaction could be bought in the near future with a focus on making sure everyone in the world gets a reasonable minimum amount of satisfaction from their physical and social environments as you describe here:

Usually, the median person is interested in: jobs, a full fridge, rituals, culture, the spread of their opinion leader's information, dopamine, political and other random and inherited values, life, continuation of life, and the like. Provide a universal way of obtaining this and just monitor it calmly.

As Connor Leahy has said, we should be able to build sufficiently powerful tool-AI to not need to build AGI! Stop while we still have control! Use the wealth to buy off those who would try anyway. Also, build an enforcement agency to stop runaway AI or AI misuse.

I don't know how we get there from here though.

Also, the offense-dominant weapons development landscape is looking really grim, and I don't see how to easily patch that.

On the other hand, I don't think we buy ourselves any chance of victory by trying to gag ourselves for fear of speeding up AGI development. It's coming soon regardless of what we do! The race is short now, we need to act fast!

I don't buy the arguments that our discussions here will make a significant impact in the timing of the arrival of AGI. That seems like hubris to me, to imagine we have such substantial effects, just from our discussions.

Code? Yes, code can be dangerous and shouldn't be published if so.

Sufficiently detailed technical descriptions of potential advancements? Yeah, I can see that being dangerous.

Unsubstantiated commentary about a published paper being interesting and potentially having both capabilities and alignment value? I am unconvinced that such discussions meaningfully impact the experiments being undertaken in AI labs.

the-gears-to-ascension on My hour of memoryless lucidity

geez, that's certainly a list of chemicals. I wonder what the ratios were - my intuition finds it less surprising for you to be less impaired if no one of them is particularly high dose.