LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Mental Health and the Alignment Problem: A Compilation of Resources (updated April 2023)
Chris Scammell (chris-scammell) · 2023-05-10T19:04:21.138Z · comments (53)

Dear Self; we need to talk about ambition
Elizabeth (pktechgirl) · 2023-08-27T23:10:04.720Z · comments (25)

On AutoGPT
Zvi · 2023-04-13T12:30:01.059Z · comments (47)

[link] My PhD thesis: Algorithmic Bayesian Epistemology
Eric Neyman (UnexpectedValues) · 2024-03-16T22:56:59.283Z · comments (14)

Pausing AI Developments Isn't Enough. We Need to Shut it All Down
Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2023-04-08T00:36:47.702Z · comments (39)

My Assessment of the Chinese AI Safety Community
Lao Mein (derpherpize) · 2023-04-25T04:21:19.274Z · comments (94)

The case for ensuring that powerful AIs are controlled
ryan_greenblatt · 2024-01-24T16:11:51.354Z · comments (66)

Failures in Kindness
silentbob · 2024-03-26T21:30:11.052Z · comments (27)

The 101 Space You Will Always Have With You
Screwtape · 2023-11-29T04:56:40.240Z · comments (20)

Munk AI debate: confusions and possible cruxes
Steven Byrnes (steve2152) · 2023-06-27T14:18:47.694Z · comments (21)

[link] "No-one in my org puts money in their pension"
Tobes (tobias-jolly) · 2024-02-16T18:33:28.996Z · comments (7)

My views on “doom”
paulfchristiano · 2023-04-27T17:50:01.415Z · comments (34)

Book Review: Going Infinite
Zvi · 2023-10-24T15:00:02.251Z · comments (109)

Yes, It's Subjective, But Why All The Crabs?
johnswentworth · 2023-07-28T19:35:36.741Z · comments (15)

Alignment Implications of LLM Successes: a Debate in One Act
Zack_M_Davis · 2023-10-21T15:22:23.053Z · comments (50)

UFO Betting: Put Up or Shut Up
RatsWrongAboutUAP · 2023-06-13T04:05:32.652Z · comments (207)

My Clients, The Liars
ymeskhout · 2024-03-05T21:06:36.669Z · comments (85)

Policy discussions follow strong contextualizing norms
Richard_Ngo (ricraz) · 2023-04-01T23:51:36.588Z · comments (61)

Self-driving car bets
paulfchristiano · 2023-07-29T18:10:01.112Z · comments (41)

Lessons On How To Get Things Right On The First Try
johnswentworth · 2023-06-19T23:58:09.605Z · comments (56)

[link] Sum-threshold attacks
TsviBT · 2023-09-08T17:13:37.044Z · comments (52)

Catching the Eye of Sauron
Casey B. (Zahima) · 2023-04-07T00:40:46.556Z · comments (68)

Announcing MIRI’s new CEO and leadership team
Gretta Duleba (gretta-duleba) · 2023-10-10T19:22:11.821Z · comments (52)

Brute Force Manufactured Consensus is Hiding the Crime of the Century
Roko · 2024-02-03T20:36:59.806Z · comments (156)

MIRI 2024 Mission and Strategy Update
Malo (malo) · 2024-01-05T00:20:54.169Z · comments (44)

[link] AI presidents discuss AI alignment agendas
TurnTrout · 2023-09-09T18:55:37.931Z · comments (22)

Announcing Apollo Research
Marius Hobbhahn (marius-hobbhahn) · 2023-05-30T16:17:19.767Z · comments (11)

What are the results of more parental supervision and less outdoor play?
juliawise · 2023-11-25T12:52:29.986Z · comments (30)

Ways I Expect AI Regulation To Increase Extinction Risk
1a3orn · 2023-07-04T17:32:48.047Z · comments (32)

Thoughts on responsible scaling policies and regulation
paulfchristiano · 2023-10-24T22:21:18.341Z · comments (33)

CFAR Takeaways: Andrew Critch
Raemon · 2024-02-14T01:37:03.931Z · comments (62)

What I would do if I wasn’t at ARC Evals
LawrenceC (LawChan) · 2023-09-05T19:19:36.830Z · comments (8)

ChatGPT can learn indirect control
Raymond D · 2024-03-21T21:11:06.649Z · comments (23)

Believing In
AnnaSalamon · 2024-02-08T07:06:13.072Z · comments (49)

[link] Cultivating a state of mind where new ideas are born
Henrik Karlsson (henrik-karlsson) · 2023-07-27T09:16:42.566Z · comments (18)

Launching Lightspeed Grants (Apply by July 6th)
habryka (habryka4) · 2023-06-07T02:53:29.227Z · comments (41)

[link] Actually, Othello-GPT Has A Linear Emergent World Representation
Neel Nanda (neel-nanda-1) · 2023-03-29T22:13:14.878Z · comments (24)

[link] Orthogonal: A new agent foundations alignment organization
Tamsin Leake (carado-1) · 2023-04-19T20:17:14.174Z · comments (4)

Modern Transformers are AGI, and Human-Level
abramdemski · 2024-03-26T17:46:19.373Z · comments (89)

My current LK99 questions
Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2023-08-01T22:48:00.733Z · comments (38)

Ability to solve long-horizon tasks correlates with wanting things in the behaviorist sense
So8res · 2023-11-24T17:37:43.020Z · comments (83)

Eliezer Yudkowsky’s Letter in Time Magazine
Zvi · 2023-04-05T18:00:01.670Z · comments (86)

Lightcone Infrastructure/LessWrong is looking for funding
habryka (habryka4) · 2023-06-14T04:45:53.425Z · comments (38)

[link] "How could I have thought that faster?"
mesaoptimizer · 2024-03-11T10:56:17.884Z · comments (30)

We're Not Ready: thoughts on "pausing" and responsible scaling policies
HoldenKarnofsky · 2023-10-27T15:19:33.757Z · comments (33)

[link] The Lighthaven Campus is open for bookings
habryka (habryka4) · 2023-09-30T01:08:12.664Z · comments (18)

[link] Sam Altman’s Chip Ambitions Undercut OpenAI’s Safety Strategy
garrison · 2024-02-10T19:52:55.191Z · comments (52)

Thoughts on sharing information about language model capabilities
paulfchristiano · 2023-07-31T16:04:21.396Z · comments (34)

If interpretability research goes well, it may get dangerous
So8res · 2023-04-03T21:48:18.752Z · comments (10)

UDT shows that decision theory is more puzzling than ever
Wei Dai (Wei_Dai) · 2023-09-13T12:26:09.739Z · comments (51)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

review-bot on 6 non-obvious mental health issues specific to AI safety

The LessWrong Review [? · GW] runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2024. The top fifty or so posts are featured prominently on the site throughout the year. Will this post make the top fifty?

cubefox on Mati_Roy's Shortform

I guess for a cat classifier disentanglement is not possible, because it wants to classify things as cats if and only if it believes they are cats. Since values and beliefs are perfectly correlated here, there is no test we could perform which would distinguish what it wants from what it believes.

Though we could assume we don't know what the classifier wants. If it doesn't classify a cat image as "yes", it could be because it is (say) actually a dog classifier, and it correctly believes the image contains something other than a dog. Or it could be because it is indeed a cat classifier, but it mistakenly believes the image doesn't show a cat.

One way to find out would be to give the classifier an image of the same subject, but in higher resolution or from another angle, and check whether it changes its classification to "yes". If it is a car classifier, it is likely it won't make the mistake again, so it probably change its classification to "yes". If it is a dog classifier, it will likely stay with "no".

This assumes that mistakes are random and somewhat unlikely, so will probably disappear when the evidence is better or of a different sort. Beliefs react to changes in evidence of that sort, while values don't.

arisalexis on How to be an amateur polyglot

Yes I didn't even know the difference :) I thought tap is only for pub beer ! Totally disconnected from the exams where you only dealt with essays

the-gears-to-ascension on some thoughts on LessOnline

I’m only a year old rationalist

you write really eloquently for your age! and being in uni! wow. I was still learning to walk. kids are so precocious these days

⸮

keltan on some thoughts on LessOnline

“Whiteboards everywhere” and my non-ironic favourite band are debuting songs!!!

But, I’m only a year old rationalist and I live in Australia on a uni student budget. Still… I’m considering flying out. It would be pretty incredible to run some abstract improv workshops with other truth seeking nerds. I think I need to sit down and calculate.

Is this the type of event that a first year rationalist could attend and get value from/be welcome at? What is the likelihood that it will run again next year? Is there a prediction market for that?

the-gears-to-ascension on Raemon's Shortform

a ui on your user page where you get to pick a four letter shortening of your name and a color. the shortening is displayed as

t g
t a

in a tiny color-of-your-choice box. when picking your name, each time you pick a hue and saturation in the color picker (use a standard one, don't build a color picker), it does a query (debounced - I hope you have a standard way to debounce in react elements) for other people on the site who have that initialism, and shows you their colors in a list, along with an indicator min(color_distance(you.color, them.color) for them in other_users).

the color distance indicator could be something like the one from here, which would need transliterating into javascript:

This formula has results that are very close to L^*u^*v^* (with the modified lightness curve) and, more importantly, it is a more stable algorithm: it does not have a range of colours where it suddenly gives far from optimal results. The weights of the formula could be optimized further, but again, the selection of the closest colour is subjective. My goal was to find a reasonable compromise.
typedef struct {
   unsigned char r, g, b;
} RGB;

double ColourDistance(RGB e1, RGB e2)
{
  long rmean = ( (long)e1.r + (long)e2.r ) / 2;
  long r = (long)e1.r - (long)e2.r;
  long g = (long)e1.g - (long)e2.g;
  long b = (long)e1.b - (long)e2.b;
  return sqrt((((512+rmean)*r*r)>>8) + 4*g*g + (((767-rmean)*b*b)>>8));
}

keltan on Observations on Teaching for Four Weeks

That’s a great question! I’ve been teaching arts classes for a youth charity for 5 years now. Ages range from 5-18. I myself am 23.

I’d say this has happened twice? I’m counting a one off lesson with some 16-18 year olds a few years ago. And a series of weeks in which I had extremely little control over some 8-10 year olds. In that case I was able to control individuals if they had my full attention. But would ‘lose’ them when I focused on the next kid.

Your question caused me to think of why these things may have happened. Though I’m curious to hear what you think before I spill my guts.

mathieuroy on Let's split the cake, lengthwise, upwise and slantwise

thanks, it worked! https://web.archive.org/web/20150412211654/http://reducing-suffering.org/wp-content/uploads/2015/02/wild-animals_2015-02-28.pdf

mathieuroy on Mati_Roy's Shortform

i want a better conceptual understanding of what "fundamental values" means, and how to disentangled that from beliefs (ex.: in an LLM). like, is there a meaningful way we can say that a "cat classifier" is valuing classifying cats even though it sometimes fail?

nathan-helm-burger on Please stop publishing ideas/insights/research about AI

A bit of a rant, yes, but some good thoughts here.

I agree that unenforceable regulation can be a bad thing. On the other hand, it can also work in some limited ways. For example, the international agreements against heritable human genetic engineering seem to have held up fairly well. But I think that that requires supporting facts about the world to be true. It needs to not be obviously highly profitable to defectors, it needs to be relatively inaccessible to most people (requiring specialized tech and knowledge), it needs to fit with our collective intuitions (bio-engineering humans seems kinda icky to a lot of people).

The trouble is, all of these things fail to help us with the problem of dangerous AI! As you point out, many bitcoin miners have plenty of GPUs to be dangerous if we get even a couple more orders-of-magnitude algorithmic efficiency improvements. So it's accessible. AI and AGI offer many tempting ways to acquire power and money in society. So it's immediately and incrementally profitable. People aren't as widely instinctively outraged by AI experiments as Bio-engineering experiments. So it's not intuitively repulsive.

So yes, this seems to me to be very much a situation in which we should not place any trust in unenforceable regulation.

I also agree that we probably do need some sort of organization which enforces the necessary protections (detection and destruction) against rogue AI.

And it does seem potentially like a lot of human satisfaction could be bought in the near future with a focus on making sure everyone in the world gets a reasonable minimum amount of satisfaction from their physical and social environments as you describe here:

Usually, the median person is interested in: jobs, a full fridge, rituals, culture, the spread of their opinion leader's information, dopamine, political and other random and inherited values, life, continuation of life, and the like. Provide a universal way of obtaining this and just monitor it calmly.

As Connor Leahy has said, we should be able to build sufficiently powerful tool-AI to not need to build AGI! Stop while we still have control! Use the wealth to buy off those who would try anyway. Also, build an enforcement agency to stop runaway AI or AI misuse.

I don't know how we get there from here though.

Also, the offense-dominant weapons development landscape is looking really grim, and I don't see how to easily patch that.

On the other hand, I don't think we buy ourselves any chance of victory by trying to gag ourselves for fear of speeding up AGI development. It's coming soon regardless of what we do! The race is short now, we need to act fast!

I don't buy the arguments that our discussions here will make a significant impact in the timing of the arrival of AGI. That seems like hubris to me, to imagine we have such substantial effects, just from our discussions.

Code? Yes, code can be dangerous and shouldn't be published if so.

Sufficiently detailed technical descriptions of potential advancements? Yeah, I can see that being dangerous.

Unsubstantiated commentary about a published paper being interesting and potentially having both capabilities and alignment value? I am unconvinced that such discussions meaningfully impact the experiments being undertaken in AI labs.