LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] Progress links digest, 2023-11-24: Bottlenecks of aging, Starship launches, and much more
jasoncrawford · 2023-11-24T15:25:07.721Z · comments (1)

Californians, tell your reps to vote yes on SB 1047!
Holly_Elmore · 2024-08-12T19:50:09.817Z · comments (24)

Ronny and Nate discuss what sorts of minds humanity is likely to find by Machine Learning
So8res · 2023-12-19T23:39:59.689Z · comments (30)

Planning to build a cryptographic box with perfect secrecy
Lysandre Terrisse · 2023-12-31T09:31:47.941Z · comments (6)

Movie posters
KatjaGrace · 2024-03-06T06:20:03.034Z · comments (0)

Jobs, Relationships, and Other Cults
Ruby · 2024-03-13T05:58:45.043Z · comments (9)

[Linkpost] Play with SAEs on Llama 3
Tom McGrath · 2024-09-25T22:35:44.824Z · comments (2)

[question] Implications of China's recession on AGI development?
Eric Neyman (UnexpectedValues) · 2024-09-28T01:12:36.443Z · answers+comments (3)

2025 Color Trends
sarahconstantin · 2024-10-07T21:20:03.962Z · comments (7)

Anthropic rewrote its RSP
Zach Stein-Perlman · 2024-10-15T14:25:12.518Z · comments (19)

Monthly Roundup #23: October 2024
Zvi · 2024-10-16T13:50:05.869Z · comments (13)

Signaling with Small Orange Diamonds
jefftk (jkaufman) · 2024-11-07T20:20:08.026Z · comments (1)

[link] Legalize butanol?
bhauth · 2023-12-20T14:24:33.849Z · comments (20)

[link] Understanding Gödel’s completeness theorem
jessicata (jessica.liu.taylor) · 2024-05-27T18:55:02.079Z · comments (0)

Logical Line-Of-Sight Makes Games Sequential or Loopy
StrivingForLegibility · 2024-01-19T04:05:44.782Z · comments (0)

Prepsgiving, A Convergently Instrumental Human Practice
JenniferRM · 2023-11-23T17:24:56.784Z · comments (0)

Apply to the PIBBSS Summer Research Fellowship
Nora_Ammann · 2024-01-12T04:06:58.328Z · comments (1)

Stitching SAEs of different sizes
Bart Bussmann (Stuckwork) · 2024-07-13T17:19:20.506Z · comments (12)

Simple distribution approximation: When sampled 100 times, can language models yield 80% A and 20% B?
Teun van der Weij (teun-van-der-weij) · 2024-01-29T00:24:27.706Z · comments (5)

China-AI forecasts
[deleted] · 2024-02-25T16:49:33.652Z · comments (29)

Instrumental deception and manipulation in LLMs - a case study
Olli Järviniemi (jarviniemi) · 2024-02-24T02:07:01.769Z · comments (13)

I’m confused about innate smell neuroanatomy
Steven Byrnes (steve2152) · 2023-11-28T20:49:13.042Z · comments (2)

[link] FrontierMath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AI
Tamay · 2024-11-14T06:13:22.042Z · comments (0)

Nitric oxide for covid and other viral infections
Elizabeth (pktechgirl) · 2024-02-07T21:30:03.774Z · comments (6)

Interpretability as Compression: Reconsidering SAE Explanations of Neural Activations with MDL-SAEs
Kola Ayonrinde (kola-ayonrinde) · 2024-08-23T18:52:31.019Z · comments (5)

Individually incentivized safe Pareto improvements in open-source bargaining
Nicolas Macé (NicolasMace) · 2024-07-17T18:26:43.619Z · comments (2)

[link] [Paper] Language Models Don't Learn the Physical Manifestation of Language
Bruce W. Lee (bruce-lee) · 2024-02-22T18:52:32.237Z · comments (23)

[link] Linear infra-Bayesian Bandits
Vanessa Kosoy (vanessa-kosoy) · 2024-05-10T06:41:09.206Z · comments (5)

Forget Everything (Statistical Mechanics Part 1)
J Bostock (Jemist) · 2024-04-22T13:33:35.446Z · comments (6)

Natural abstractions are observer-dependent: a conversation with John Wentworth
Martín Soto (martinsq) · 2024-02-12T17:28:38.889Z · comments (13)

[Interim research report] Evaluating the Goal-Directedness of Language Models
Rauno Arike (rauno-arike) · 2024-07-18T18:19:04.260Z · comments (4)

Medical Roundup #3
Zvi · 2024-07-09T13:10:06.862Z · comments (4)

You're a Space Wizard, Luke
lsusr · 2024-08-18T05:35:39.238Z · comments (6)

[link] Increasing IQ is trivial
George3d6 · 2024-03-01T22:43:32.037Z · comments (60)

AI #48: The Talk of Davos
Zvi · 2024-01-25T16:20:26.625Z · comments (9)

[link] On what research policymakers actually need
MondSemmel · 2024-04-23T19:50:12.833Z · comments (0)

Inducing Unprompted Misalignment in LLMs
Sam Svenningsen (sven) · 2024-04-19T20:00:58.067Z · comments (6)

Mud and Despair (Part 4 of "The Sense Of Physical Necessity")
LoganStrohl (BrienneYudkowsky) · 2024-03-07T00:14:23.975Z · comments (0)

Monthly Roundup #14: January 2024
Zvi · 2024-01-24T12:50:09.231Z · comments (22)

[question] What progress have we made on automated auditing?
LawrenceC (LawChan) · 2024-07-06T01:49:43.714Z · answers+comments (1)

Stop talking about p(doom)
Isaac King (KingSupernova) · 2024-01-01T10:57:28.636Z · comments (22)

D&D.Sci: Whom Shall You Call?
abstractapplic · 2024-07-05T20:53:37.010Z · comments (6)

[question] How would you navigate a severe financial emergency with no help or resources?
Tigerlily · 2024-05-02T18:27:51.329Z · answers+comments (22)

AI #70: A Beautiful Sonnet
Zvi · 2024-06-27T14:40:08.087Z · comments (0)

[link] Things You're Allowed to Do: At the Dentist
rbinnn · 2024-01-28T18:39:33.584Z · comments (16)

Losing Faith In Contrarianism
omnizoid · 2024-04-25T20:53:34.842Z · comments (44)

Is This Lie Detector Really Just a Lie Detector? An Investigation of LLM Probe Specificity.
Josh Levy (josh-levy) · 2024-06-04T15:45:54.399Z · comments (0)

The "context window" analogy for human minds
Ruby · 2024-02-13T19:29:10.387Z · comments (0)

Tort Law Can Play an Important Role in Mitigating AI Risk
Gabriel Weil (gabriel-weil) · 2024-02-12T17:17:59.135Z · comments (9)

[link] Elon files grave charges against OpenAI
mako yass (MakoYass) · 2024-03-01T17:42:13.963Z · comments (10)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

zane on Monthly Roundup #24: November 2024

He said it was him on Joe Rogan's podcast.

dakara on The Plan - 2023 Version

That's a really good point. I would like to see John address it, because it seems quite crucial for the overall alignment plan.

sodium on Sodium's Shortform

I think^[1] people^[2] probably trust individual tweets way more than they should.

Like, just because someone sounds very official and serious, and it's a piece of information that's inline with your worldviews, doesn't mean it's actually true. Or maybe it is true, but missing important context. Or it's saying A causes B when it's more like A and C and D all cause B together, and actually most of the effect is from C but now you're laser focused on A.

Also you should be wary that the tweets you're seeing are optimized for piquing the interests of people like you, not truth.

I'm definitely not the first person to say this, but feels like it's worth it to say it again.

^{^}
75% Confident maybe?
^{^}
including some rationalists on here

rogerdearnaley on Chat Bankman-Fried: an Exploration of LLM Alignment in Finance

Interesting. I'm disappointed to see the Claude models do so badly. Possibly Anthropic needs to extend their constitutional RLAIF to cover not committing financial crimes? The large different between O1 Preview and O1 Mini is also concerning.

q-home on Q Home's Shortform

My point is that chairs and humans can be considered in a similar way.

Please explain how your point connects to my original message [LW(p) · GW(p)]: are you arguing with it or supporting it or want to learn how my idea applies to something?

mitchell_porter on Why We Wouldn't Build Aligned AI Even If We Could

Your desire to do good and your specific proposals are valuable. But you seem to be a bit naive about power, human nature, and the difficulty of doing good even if you have power.

For example, you talk about freeing people under oppressive regimes. But every extant political system and major ideology, has some corresponding notion of the greater good, and what you are calling oppressive is supposed to protect that greater good, or to protect the system against encroaching rival systems with different values.

You mention China as oppressive and say Chinese citizens "can do [nothing] to cause meaningful improvement from my perspective". So what is it when Chinese bring sanitation or electricity to a village, or when someone in the big cities invents a new technology or launches a new service? That's Chinese people making life better for Chinese. Evidently your focus is on the one-party politics and the vulnerability of the individual to the all-seeing state. But even those have their rationales. The Leninist political system is meant to keep power in the hands of the representatives of the peasants and the workers. And the all-seeing state is just doing what you want your aligned superintelligence to do - using every means it has, to bring about the better world.

Similar defenses can be made of every western ideology, whether conservative or liberal, progressive or libertarian or reactionary. They all have a concept of the greater good, and they all sacrifice something for the sake of it. In every case, such an ideology may also empower individuals, or specific cliques and classes, to pursue their self-interest under the cover of the ideology. But all the world's big regimes have some kind of democratic morality, as well as a persistent power elite.

Regarding a focus on suffering - the easiest way to abolish suffering is to abolish life. All the difficulties arise when you want everyone to have life, and freedom too, but without suffering. Your principles aren't blind to this, e.g. number 3 ("spread empathy") might be considered a way to preserve freedom while reducing the possibility of cruelty. But consider number 4, "respect diversity". This can clash with your moral urgency. Give people freedom, and they may focus on their personal flourishing, rather than the suffering or oppressed somewhere else. Do you leave them to do their thing, so that the part of life's diversity which they embody can flourish, or do you lean on them to take part in some larger movement?

I note that @daijin has already provided a different set of values which are rivals to your own. Perhaps someone could write the story of a transhuman world in which all the old politics has been abolished, and instead there's a cold war between blocs that have embraced these two value systems!

The flip side of these complaints of mine, is that it's also not a foregone conclusion that if some group manages to create superintelligence and actually knows what they're doing - i.e. they can choose its values with confidence that those values will be maintained - that we'll just have perpetual oppression worse than death. As I have argued, every serious political ideology has some notion of the greater good, that is part of the ruling elite's culture. That elite may contain a mix of cynics, the morally exhausted and self-interested, the genuinely depraved, and those born to power, but it will also contain people who are fighting for an ideal, and new arrivals with bold ideas and a desire for change; and also those who genuinely see themselves as lovers of their country or their people or humanity, but who also have an enormously high opinion of themselves. The dream of the last kind of person is not some grim hellscape, it's a utopia of genuine happiness where they are also worshipped as transhumanity's greatest benefactor.

Another aspect of what I'm saying, is that you feel this pessimistic about the world, because you are alienated from all the factions who actually wield power. If you were part of one of those elite clubs that actually has a chance of winning the race to create superintelligence, you might have a more benign view of the prospect that they end up wielding supreme power.

jwray on The hostile telepaths problem

My experience is very different. I feel unitary, without any IFS or jungian shadow or other sort of subconscious parts trying to deceive my conscious self. I violate quite a lot of social norms without feeling any shame or guilt about it, because I've got an 'internal scorecard'. So long as I'm true to my own values/morality, and I can protect myself with some combination of power / occlumency / disengaging, all three of which come easily to me, social norms don't matter in private.

chris-krapu on interpreting GPT: the logit lens

Ah, got it. Thanks a ton!

seth-herd on How to use bright light to improve your life.

Great post, thank you!

SAD: When I did a very brief lit search, the research showed much larger effects of vitamin D supplementation than light exposure therapy. Of course, they weren't using enough dakka on the light, so both should be used. But two of my close friends with severe SAD were dramatically improved when I got them to supplement D regularly. It's handy that you don't need to take it regularly, just in large doses occasionally (probably don't do more than 50k IU at a time for safety). Sorry I didn't keep the references where I can find them!

Again, doing both is probably a good idea, but most people seem to be vit. D deficient, as you'd expect from a light-exposure-synthesized vitamin, with all of this modern unnatural clothes-wearing and indoors-dwelling.

Back to light: as the standard male night owl (particularly on a WFH flexible schedule): Am I understanding you correctly that if I wanted to go to bed earlier (not sure I do but I probably should), I'd wake up earlier and blast my eyeballs with light right away, then avoid bright light 3-4 hours before bed? Anything else?

vladimir_nesov on Q Home's Shortform

I'm talking about finding world-models in which real objects (such as "strawberries" or "chairs") can be identified.

My point is that chairs and humans can be considered in a similar way.

The most straightforward way of finding a world-model is just predicting your sensory input. But then you're not guaranteed to get a model in which something corresponding to "real objects" can be easily identified.

There's the world as a whole that generates observations, and particular objects on their own. A model that cares about individual objects needs to consider them separately from the world. The same object in a different world/situation should still make sense, so there are many possibilities for the way an object can be when placed in some context and allowed to develop. This can be useful for modularity, but also for formulating properties of particular objects, in a way that doesn't get distorted by the influence of the rest of the world. Human preferences is one such property.