LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Falling fertility explanations and Israel
Yair Halberstadt (yair-halberstadt) · 2024-04-03T03:27:38.564Z · comments (4)

Some Quick Follow-Up Experiments to “Taken out of context: On measuring situational awareness in LLMs”
Miles Turpin (miles) · 2023-10-03T02:22:00.199Z · comments (0)

[link] A Narrative History of Environmentalism's Partisanship
Jeffrey Heninger (jeffrey-heninger) · 2024-05-14T16:51:01.029Z · comments (3)

[link] Aaron Silverbook on anti-cavity bacteria
DanielFilan · 2023-11-20T03:06:19.524Z · comments (3)

[link] [Linkpost] Statement from Scarlett Johansson on OpenAI's use of the "Sky" voice, that was shockingly similar to her own voice.
Linch · 2024-05-20T23:50:28.138Z · comments (8)

[link] New report: A review of the empirical evidence for existential risk from AI via misaligned power-seeking
Harlan · 2024-04-04T23:41:26.439Z · comments (5)

[link] introduction to thermal conductivity and noise management
bhauth · 2024-03-06T23:14:02.288Z · comments (1)

Features and Adversaries in MemoryDT
Joseph Bloom (Jbloom) · 2023-10-20T07:32:21.091Z · comments (6)

The Byronic Hero Always Loses
Cole Wyeth (Amyr) · 2024-02-22T01:31:59.652Z · comments (4)

Good Bings copy, great Bings steal
dr_s · 2024-04-21T09:52:46.658Z · comments (6)

Game Theory without Argmax [Part 2]
Cleo Nardo (strawberry calm) · 2023-11-11T16:02:41.836Z · comments (14)

Late-talking kid part 3: gestalt language learning
Steven Byrnes (steve2152) · 2023-10-17T02:00:05.182Z · comments (5)

Quick evidence review of bulking & cutting
jp · 2024-04-04T21:43:48.534Z · comments (5)

[link] Fifty Flips
abstractapplic · 2023-10-01T15:30:43.268Z · comments (14)

UDT1.01: Plannable and Unplanned Observations (3/10)
Diffractor · 2024-04-12T05:24:34.435Z · comments (0)

How Would an Utopia-Maximizer Look Like?
Thane Ruthenis · 2023-12-20T20:01:18.079Z · comments (23)

[link] Anthropic, Google, Microsoft & OpenAI announce Executive Director of the Frontier Model Forum & over $10 million for a new AI Safety Fund
Zach Stein-Perlman · 2023-10-25T15:20:52.765Z · comments (8)

[question] When did Eliezer Yudkowsky change his mind about neural networks?
[deactivated] (Yarrow Bouchard) · 2023-11-14T21:24:00.000Z · answers+comments (15)

Mentorship in AGI Safety (MAGIS) call for mentors
Valentin2026 (Just Learning) · 2024-05-23T18:28:03.173Z · comments (3)

[link] Lying is Cowardice, not Strategy
Connor Leahy (NPCollapse) · 2023-10-24T13:24:25.450Z · comments (73)

Superforecasting the premises in “Is power-seeking AI an existential risk?”
Joe Carlsmith (joekc) · 2023-10-18T20:23:51.723Z · comments (3)

Resolving von Neumann-Morgenstern Inconsistent Preferences
niplav · 2024-10-22T11:45:20.915Z · comments (5)

[link] Stone Age Herbalist's notes on ant warfare and slavery
trevor (TrevorWiesinger) · 2024-11-09T02:40:01.128Z · comments (0)

Balancing Label Quantity and Quality for Scalable Elicitation
Alex Mallen (alex-mallen) · 2024-10-24T16:49:00.939Z · comments (1)

Interpreting Quantum Mechanics in Infra-Bayesian Physicalism
Yegreg · 2024-02-12T18:56:03.967Z · comments (6)

AI Safety 101 : Reward Misspecification
markov (markovial) · 2023-10-18T20:39:34.538Z · comments (4)

I played the AI box game as the Gatekeeper — and lost
datawitch · 2024-02-12T18:39:35.777Z · comments (52)

Verifiable private execution of machine learning models with Risc0?
mako yass (MakoYass) · 2023-10-25T00:44:48.643Z · comments (2)

The Math of Suspicious Coincidences
Roko · 2024-02-07T13:32:35.513Z · comments (3)

The Third Gemini
Zvi · 2024-02-20T19:50:05.195Z · comments (2)

Putting multimodal LLMs to the Tetris test
Lovre · 2024-02-01T16:02:12.367Z · comments (5)

Glomarization FAQ
Zane · 2023-11-15T20:20:49.488Z · comments (5)

Understanding Subjective Probabilities
Isaac King (KingSupernova) · 2023-12-10T06:03:27.958Z · comments (16)

Sparse MLP Distillation
slavachalnev · 2024-01-15T19:39:02.926Z · comments (3)

AI Alignment Breakthroughs this week (10/08/23)
Logan Zoellner (logan-zoellner) · 2023-10-08T23:30:54.924Z · comments (14)

[link] One: a story
Richard_Ngo (ricraz) · 2023-10-10T00:18:31.604Z · comments (0)

[link] The origins of the steam engine: An essay with interactive animated diagrams
jasoncrawford · 2023-11-29T18:30:36.315Z · comments (1)

[question] Current AI safety techniques?
Zach Stein-Perlman · 2023-10-03T19:30:54.481Z · answers+comments (2)

RA Bounty: Looking for feedback on screenplay about AI Risk
Writer · 2023-10-26T13:23:02.806Z · comments (6)

[link] There is no IQ for AI
Gabriel Alfour (gabriel-alfour-1) · 2023-11-27T18:21:26.196Z · comments (10)

[link] When scientists consider whether their research will end the world
Harlan · 2023-12-19T03:47:06.645Z · comments (4)

AI #59: Model Updates
Zvi · 2024-04-11T14:20:06.339Z · comments (2)

Some additional SAE thoughts
Hoagy · 2024-01-13T19:31:40.089Z · comments (4)

Adversarial Robustness Could Help Prevent Catastrophic Misuse
aogara (Aidan O'Gara) · 2023-12-11T19:12:26.956Z · comments (18)

[link] Managing AI Risks in an Era of Rapid Progress
Algon · 2023-10-28T15:48:25.029Z · comments (3)

Information-Theoretic Boxing of Superintelligences
JustinShovelain · 2023-11-30T14:31:11.798Z · comments (0)

The Intentional Stance, LLMs Edition
Eleni Angelou (ea-1) · 2024-04-30T17:12:29.005Z · comments (3)

[question] What are things you're allowed to do as a startup?
Elizabeth (pktechgirl) · 2024-06-20T00:01:59.257Z · answers+comments (9)

Running the Numbers on a Heat Pump
jefftk (jkaufman) · 2024-02-09T03:00:04.920Z · comments (12)

[link] AISN #28: Center for AI Safety 2023 Year in Review
aogara (Aidan O'Gara) · 2023-12-23T21:31:40.767Z · comments (1)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

q-home on Q Home's Shortform

My point is that chairs and humans can be considered in a similar way.

Please explain how your point connects to my original message [LW(p) · GW(p)]: are you arguing with it or supporting it or want to learn how my idea applies to something?

mitchell_porter on Why We Wouldn't Build Aligned AI Even If We Could

Your desire to do good and your specific proposals are valuable. But you seem to be a bit naive about power, human nature, and the difficulty of doing good even if you have power.

For example, you talk about freeing people under oppressive regimes. But every extant political system and major ideology, has some corresponding notion of the greater good, and what you are calling oppressive is supposed to protect that greater good, or to protect the system against encroaching rival systems with different values.

You mention China as oppressive and say Chinese citizens "can do [nothing] to cause meaningful improvement from my perspective". So what is it when Chinese bring sanitation or electricity to a village, or when someone in the big cities invents a new technology or launches a new service? That's Chinese people making life better for Chinese. Evidently your focus is on the one-party politics and the vulnerability of the individual to the all-seeing state. But even those have their rationales. The Leninist political system is meant to keep power in the hands of the representatives of the peasants and the workers. And the all-seeing state is just doing what you want your aligned superintelligence to do - using every means it has, to bring about the better world.

Similar defenses can be made of every western ideology, whether conservative or liberal, progressive or libertarian or reactionary. They all have a concept of the greater good, and they all sacrifice something for the sake of it. In every case, such an ideology may also empower individuals, or specific cliques and classes, to pursue their self-interest under the cover of the ideology. But all the world's big regimes have some kind of democratic morality, as well as a persistent power elite.

Regarding a focus on suffering - the easiest way to abolish suffering is to abolish life. All the difficulties arise when you want everyone to have life, and freedom too, but without suffering. Your principles aren't blind to this, e.g. number 3 ("spread empathy") might be considered a way to preserve freedom while reducing the possibility of cruelty. But consider number 4, "respect diversity". This can clash with your moral urgency. Give people freedom, and they may focus on their personal flourishing, rather than the suffering or oppressed somewhere else. Do you leave them to do their thing, so that the part of life's diversity which they embody can flourish, or do you lean on them to take part in some larger movement?

I note that @daijin has already provided a different set of values which are rivals to your own. Perhaps someone could write the story of a transhuman world in which all the old politics has been abolished, and instead there's a cold war between blocs that have embraced these two value systems!

The flip side of these complaints of mine, is that it's also not a foregone conclusion that if some group manages to create superintelligence and actually knows what they're doing - i.e. they can choose its values with confidence that those values will be maintained - that we'll just have perpetual oppression worse than death. As I have argued, every serious political ideology has some notion of the greater good, that is part of the ruling elite's culture. That elite may contain a mix of cynics, the morally exhausted and self-interested, the genuinely depraved, and those born to power, but it will also contain people who are fighting for an ideal, and new arrivals with bold ideas and a desire for change; and also those who genuinely see themselves as lovers of their country or their people or humanity, but who also have an enormously high opinion of themselves. The dream of the last kind of person is not some grim hellscape, it's a utopia of genuine happiness where they are also worshipped as transhumanity's greatest benefactor.

Another aspect of what I'm saying, is that you feel this pessimistic about the world, because you are alienated from all the factions who actually wield power. If you were part of one of those elite clubs that actually has a chance of winning the race to create superintelligence, you might have a more benign view of the prospect that they end up wielding supreme power.

jwray on The hostile telepaths problem

My experience is very different. I feel unitary, without any IFS or jungian shadow or other sort of subconscious parts trying to deceive my conscious self. I violate quite a lot of social norms without feeling any shame or guilt about it, because I've got an 'internal scorecard'. So long as I'm true to my own values/morality, and I can protect myself with some combination of power / occlumency / disengaging, all three of which come easily to me, social norms don't matter in private.

chris-krapu on interpreting GPT: the logit lens

Ah, got it. Thanks a ton!

seth-herd on How to use bright light to improve your life.

Great post, thank you!

SAD: When I did a very brief lit search, the research showed much larger effects of vitamin D supplementation than light exposure therapy. Of course, they weren't using enough dakka on the light, so both should be used. But two of my close friends with severe SAD were dramatically improved when I got them to supplement D regularly. It's handy that you don't need to take it regularly, just in large doses occasionally (probably don't do more than 50k IU at a time for safety). Sorry I didn't keep the references where I can find them!

Again, doing both is probably a good idea, but most people seem to be vit. D deficient, as you'd expect from a light-exposure-synthesized vitamin, with all of this modern unnatural clothes-wearing and indoors-dwelling.

Back to light: as the standard male night owl (particularly on a WFH flexible schedule): Am I understanding you correctly that if I wanted to go to bed earlier (not sure I do but I probably should), I'd wake up earlier and blast my eyeballs with light right away, then avoid bright light 3-4 hours before bed? Anything else?

vladimir_nesov on Q Home's Shortform

I'm talking about finding world-models in which real objects (such as "strawberries" or "chairs") can be identified.

My point is that chairs and humans can be considered in a similar way.

The most straightforward way of finding a world-model is just predicting your sensory input. But then you're not guaranteed to get a model in which something corresponding to "real objects" can be easily identified.

There's the world as a whole that generates observations, and particular objects on their own. A model that cares about individual objects needs to consider them separately from the world. The same object in a different world/situation should still make sense, so there are many possibilities for the way an object can be when placed in some context and allowed to develop. This can be useful for modularity, but also for formulating properties of particular objects, in a way that doesn't get distorted by the influence of the rest of the world. Human preferences is one such property.

seth-herd on OpenAI Email Archives (from Musk v. Altman)

I think you're assuming a sharp line between sincere ethics motivations and self-interest. In my view, that doesn't usually exist. People are prone to believe things that suit their self-interest. That motivated reasoning is the biggest problem with public discourse. People aren't lying, they're just confused. I think Musk definitely and even probably Altman believe they're doing the best thing for humanity - they're just confused and not taking the effort to get un-confused.

I'm really sorry all of that happened to you. Capitalism is a harsh system, and humans are harsh beings when we're competing. And confused beings, even when we're trying not to be harsh. I didn't have time to go through your whole story, but I fully believe you were wronged.

I think most villains are the heroes of their own stories. Some of us are more genuinely altruistic than others - but we're all confused in our own favor to one degree or another.

So reducing confusion while playing to everyone's desire to be a hero is one route to survival.

seth-herd on What are Emotions?

I'm so glad you found that response helpful!

I primarily mean reward in the sense of reinforcement - a functional definition from animal psychology and neuroscience: reinforcement is whatever makes the previous behavior more likely in the future.

But I also mean a positive feeling (qualia if you like, although I find that term too contentious to use much). I think we have a positive feeling when we're getting a reward (reinforcement), but I'm not sure that all positive feelings work as enforcement. Maybe.

As to how deep can that recursive learning mechanism go: very deep. When people spend time arguing about logic and abstract values online, they've gone deep. There's no limit- until the world intervenes to tell you your chain of predicted-reward inferences has gone off-track. For instance, if that person has lost their job, and they're cold and hungry, they might track down the (correct) logic that they ascribed too much value to proving people wrong on the internet, and reduce their estimate of its value.

mu_-negative on What TMS is like

Hey, I remember your medical miracle post. I enjoyed it!

"Objectively" for me would translate to "biomarker" i.e., a bio-physical signal that predicts a clinical outcome. Note that for depression and many psychological issues this means that we find the biomarkers by asking people how they feel...but maybe this is ok because we do huge studies with good controls, and the biomarkers may take on a life of their own after they are identified.

I'm assuming you mean biomarkers for psychological / mental health outcomes specifically. This is spiritually pretty close to what my lab studies - ways to predict how TMS will affect individuals, and adjust it to make it work better in each person. Our philosophy - which I had to think about for a bit to even articulate, it's so baked into our thinking - is that the effects of an intervention will manifest most reliably in reactions to very simple cognitive tasks like vigilance, working memory, and so on. Most serious health issues impact your reaction times, accuracy, bias, etc. in subtle but statistically reliable ways. Measuring these with random sampling from a phone app and doing good statistics on the data is probably your best bet for objectively assessing interventions. Maybe that is what Quantified Mind does, I'm not sure?

The short answer is that if this were easy, it would already be popular, because we clearly need it. A lot of academic labs and industry people are trying to do this all the time. There is growing success, but it's slow growing and fraught with non-replicable work.

jenn on Social events with plausible deniability

i think i agree that this does justified harm, but maybe for some subgroups or communities the justified harm is worth the benefits of such an event? our local rationality community has developed to a point where i think people are comfortable talking about "controversial" statements with their real faces on because the vibes are one where any attempt at cancellation instead of dialogue will be met with eyerolls and social exclusion but like, you know, it took a pretty long time and sustained effort for us to get here. (and maybe im wrong and there are people in the group with opinions they are still afraid to voice!)

im modelling this as something kind of like authentic relating - you're hacking the group's intimacy module and ratcheting up the feeling of closeness with a shortcut. it's not going to be as good as the genuine thing, but maybe it's a lot better than what one would have general access to. it's not everyone's thing, people with enough access to the genuine goods are likely to be like "wtf this is weird", sometimes it can go catastrophically wrong if the facilitator drops the ball... but despite all of that, for some people it's a good thing to do occasionally bc otherwise they will never get enough of that social nutrient naturally