LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

How to Control an LLM's Behavior (why my P(DOOM) went down)
RogerDearnaley (roger-d-1) · 2023-11-28T19:56:49.679Z · comments (30)

Superposition is not "just" neuron polysemanticity
LawrenceC (LawChan) · 2024-04-26T23:22:06.066Z · comments (4)

[link] DeepMind: Frontier Safety Framework
Zach Stein-Perlman · 2024-05-17T17:30:02.504Z · comments (0)

Complex systems research as a field (and its relevance to AI Alignment)
Nora_Ammann · 2023-12-01T22:10:25.801Z · comments (11)

Bayesian updating in real life is mostly about understanding your hypotheses
Max H (Maxc) · 2024-01-01T00:10:30.978Z · comments (4)

[link] A primer on why computational predictive toxicology is hard
Abhishaike Mahajan (abhishaike-mahajan) · 2024-08-19T17:16:37.735Z · comments (2)

What mistakes has the AI safety movement made?
EuanMcLean (euanmclean) · 2024-05-23T11:19:02.717Z · comments (29)

Generalization, from thermodynamics to statistical physics
Jesse Hoogland (jhoogland) · 2023-11-30T21:28:50.089Z · comments (9)

On Llama-3 and Dwarkesh Patel’s Podcast with Zuckerberg
Zvi · 2024-04-22T13:10:02.645Z · comments (4)

[link] Moving on from community living
Vika · 2024-04-17T17:02:11.357Z · comments (7)

All About Concave and Convex Agents
mako yass (MakoYass) · 2024-03-24T21:37:17.922Z · comments (23)

Self-Awareness: Taxonomy and eval suite proposal
Daniel Kokotajlo (daniel-kokotajlo) · 2024-02-17T01:47:01.802Z · comments (2)

Against most, but not all, AI risk analogies
Matthew Barnett (matthew-barnett) · 2024-01-14T03:36:16.267Z · comments (41)

[link] AI, centralization, and the One Ring
owencb · 2024-09-13T14:00:16.126Z · comments (11)

AiPhone
Zvi · 2024-06-12T22:20:02.141Z · comments (4)

[link] Improving Dictionary Learning with Gated Sparse Autoencoders
Senthooran Rajamanoharan (SenR) · 2024-04-25T18:43:47.003Z · comments (38)

Another argument against maximizer-centric alignment paradigms
Fiora from Rosebloom · 2024-09-22T07:28:27.856Z · comments (39)

AI research assistants competition 2024Q3: Tie between Elicit and You.com
Elizabeth (pktechgirl) · 2024-10-12T15:10:05.417Z · comments (2)

[question] Is cybercrime really costing trillions per year?
Fabien Roger (Fabien) · 2024-09-27T08:44:07.621Z · answers+comments (28)

Don't sleep on Coordination Takeoffs
trevor (TrevorWiesinger) · 2024-01-27T19:55:26.831Z · comments (24)

[link] Slightly More Than You Wanted To Know: Pregnancy Length Effects
JustisMills · 2024-10-21T01:26:02.030Z · comments (4)

E.T. Jaynes Probability Theory: The logic of Science I
Jan Christian Refsgaard (jan-christian-refsgaard) · 2023-12-27T23:47:52.579Z · comments (20)

Catastrophic Goodhart in RL with KL penalty
Thomas Kwa (thomas-kwa) · 2024-05-15T00:58:20.763Z · comments (10)

On coincidences and Bayesian reasoning, as applied to the origins of COVID-19
viking_math · 2024-02-19T01:14:06.772Z · comments (28)

[link] Outrage Bonding
Jonathan Moregård (JonathanMoregard) · 2024-08-09T13:46:59.818Z · comments (12)

[link] Superforecasting the Origins of the Covid-19 Pandemic
DanielFilan · 2024-03-12T19:01:15.914Z · comments (0)

A framework for thinking about AI power-seeking
Joe Carlsmith (joekc) · 2024-07-24T22:41:01.685Z · comments (15)

[link] Electrostatic Airships?
DaemonicSigil · 2024-10-27T04:32:34.852Z · comments (13)

AI #55: Keep Clauding Along
Zvi · 2024-03-14T15:40:09.335Z · comments (16)

[link] Twitter thread on AI safety evals
Richard_Ngo (ricraz) · 2024-07-31T00:18:14.076Z · comments (3)

[link] Pay-on-results personal growth: first success
Chipmonk · 2024-09-14T03:39:12.975Z · comments (5)

[Intuitive self-models] 6. Awakening / Enlightenment / PNSE
Steven Byrnes (steve2152) · 2024-10-22T13:23:08.836Z · comments (6)

Book Review: On the Edge: The Future
Zvi · 2024-09-27T14:00:05.279Z · comments (1)

Black Box Biology
GeneSmith · 2023-11-29T02:27:29.794Z · comments (30)

[link] Ice: The Penultimate Frontier
Roko · 2024-07-13T23:44:56.827Z · comments (56)

What is a Tool?
johnswentworth · 2024-06-25T23:40:07.483Z · comments (4)

SAEs are highly dataset dependent: a case study on the refusal direction
Connor Kissane (ckkissane) · 2024-11-07T05:22:18.807Z · comments (4)

RTFB: California’s AB 3211
Zvi · 2024-07-30T13:10:03.853Z · comments (2)

Do not delete your misaligned AGI.
mako yass (MakoYass) · 2024-03-24T21:37:07.724Z · comments (13)

[link] on bacteria, on teeth
bhauth · 2024-09-30T15:56:56.830Z · comments (9)

Inspired by: Failures in Kindness
X4vier · 2024-07-27T01:21:42.848Z · comments (2)

The proper response to mistakes that have harmed others?
Ruby · 2023-12-31T04:06:31.505Z · comments (12)

Natural Latents Are Not Robust To Tiny Mixtures
johnswentworth · 2024-06-07T18:53:36.643Z · comments (8)

Offering AI safety support calls for ML professionals
Vael Gates · 2024-02-15T23:48:12.797Z · comments (1)

Managing risks while trying to do good
Wei Dai (Wei_Dai) · 2024-02-01T18:08:46.506Z · comments (26)

[link] Dario Amodei — Machines of Loving Grace
Matrice Jacobine · 2024-10-11T21:43:31.448Z · comments (26)

AI Safety Chatbot
markov (markovial) · 2023-12-21T14:06:48.981Z · comments (11)

Why imperfect adversarial robustness doesn't doom AI control
Buck · 2024-11-18T16:05:06.763Z · comments (27)

Balancing Games
jefftk (jkaufman) · 2024-02-24T14:40:04.237Z · comments (18)

Social status part 2/2: everything else
Steven Byrnes (steve2152) · 2024-03-05T16:29:19.072Z · comments (2)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

gwern on Lao Mein's Shortform

I don't believe there are any details about the restructuring, so a detailed analysis is impossible. There have been a few posts by lawyers and quotes from lawyers, and it is about what you would expect: this is extremely unusual, the OA nonprofit has a clear legal responsibility to sell the for-profit for the maximum $$$ it can get or else some even more valuable other thing which assists its founding mission, it's hard to see how the economics is going to work here, and aspects of this like Altman getting equity (potentially worth billions) render any conversion extremely suspect as it's hard to see how Altman's handpicked board could ever meaningfully authorize or conduct an arms-length transaction, and so it's hard to see how this could go through without leaving a bad odor (even if it does ultimately go through because the CA AG doesn't want to try to challenge it).

benito on Making a conservative case for alignment

Is there literally any scene in the world that has openly transgender people in it and does 3, 4, or 5? Like, a space where a transgender person is friendly with the people there and different people in a conversation are reliably using different pronouns to refer to the same person? My sense is that it's actively confusing in a conversation for the participants to not be consistent in the choice of someone's pronouns.

I guess I've often seen people default to 'they' a lot for people who have preferred pronouns that are he/she, that seems to go by just fine even if some people use he / she for the person, but I can't recall ever seeing a conversation where one person uses 'he' and another person uses 'she' when both are referring to the same person.

sting on Making a conservative case for alignment

After thinking about this some more, I suspect the major problem here is value drift of the in-person Rationalist communities. The LessWrong website tolerates dissenting perspectives and seems much closer to the original rationalist vision. It is the in-person Berkeley community (and possibly others) that have left the original rationalist vision and been assimilated into the Urban Liberal Monoculture.

I am guessing EAs and alignment researchers are mostly drawn from, or at least heavily interact with, the in-person communities. If these communities are hostile to Conservatives, then you will tend to have a lack of Conservative EAs and alignment researchers, which may harm your ability to productively interact with Conservative lawmakers.

The value drift of the Berkeley community was described by Sarah Constantin in 2017:

It seems to me that the increasingly ill-named “Rationalist Community” in Berkeley has, in practice, a core value of “unconditional tolerance of weirdos.” It is a haven for outcasts and a paradise for bohemians. It is a social community based on warm connections of mutual support and fun between people who don’t fit in with the broader society.
...
Some other people in the community have more purely intellectual projects, that are closer to Eliezer Yudkowsky’s original goals. To research artificial intelligence; to develop tools for training Tetlock-style good judgment; to practice philosophical discourse. But I still think these are ultimately outcome-focused, external projects.
...
None of these projects need to be community-focused! In fact, I think it would be better if they freed themselves from the Berkeley community and from the particular quirks and prejudices of this group of people. It doesn’t benefit your ability to do AI research that you primarily draw your talent from a particular social group.

Or as Zvi put it:

The rationalists took on Berkeley, and Berkeley won.
...
This is unbelievably, world-doomingly bad. It means we’ve lost the mission.
...
A community needs to have standards. A rationalist community needs to have rationalist standards. Otherwise we are something else, and our well-kept gardens die by pacifism and hopefully great parties.
...
If Sarah is to believed (others who live in the area can speak to whether her observations are correct better than I can) then the community’s basic rationalist standards have degraded, and its priorities and cultural heart are starting to lie elsewhere. The community being built is rapidly ceasing to be all that rationalist, and is no longer conducive (and may be subtly but actively hostile) to the missions of saving and improving the world.
Its members might save or improve the world anyway, and I would still have high hopes for that including for MIRI and CFAR, but that would be in spite of the (local physical) community rather than because of it, if the community is discouraging them from doing so and they need to do all their work elsewhere with other people. Those who keep the mission would then depart, leaving those that remain all the more adrift.

I welcome analysis from anyone who better understands what's going on. I'm just speculating based on things insiders have written.

charlie-steiner on I Have A New Paper Out Arguing Against The Asymmetry And For The Existence of Happy People Being Very Good

I think it doesn't actually work for the repugnant conclusion - the buttons are supposed to just purely be to the good, and not have to deal with tradeoffs.

Once you start having to deal with tradeoffs, then you get into the aesthetics of population ethics - maybe you want each planet in the galaxy to have a vibrant civilization of happy humans, but past that more happy humans just seems a bit gauche - i.e. there is some value past which the raw marginal utility of cramming more humans into the universe is negative. Any a button promising an existing human extra life might be offered, but these humans are all immortal if they want to be anyhow, and their lives are so good it's hard to identify one-size-fits-all benefits one could even in theory supply via button, without violating any conservation laws.

All of this is a totally reasonable way to want the future of the universe to be arranged, incompatible with the repugnant conclusion. And still compatible with rejecting the person-affecting view, and pressing the offered buttons in our current circumstances.

duck_master on Which things were you surprised to learn are not metaphors?

When I visited Manhattan, I realized that "Wall Street" and "Broadway" are not just overused clichés, but the names of actual streets (you can walk on them!)

yanling-guo on How Universal Basic Income Could Help Us Build a Brighter Future

Disclaimer: I used ChatGPT to polish the English, the result looks way better than my original text which lacks line breaks and other formatting. I ❤️ ChatGPT. 👍

Of course, I’m the only one responsible for EVERY point in this post. If you see any flaw, direct your criticism directly to me, I won’t put the blame off to ChatGPT.

Because I don’t know any real application of UBI, I used this term to refer to the promise of the society to support its members in need. I didn’t claim that UBI should be independent on personal effort. The focus is less on how UBI should look like than why should stakeholders like businesses better actively participate and co-shape UBI.

This post is a call to business and other stakeholders to actively co-shape UBI instead of passively rejecting it, it’s not really about how UBI should look like. While I made an example of UBI shaped as training programs, it’s OK if business prefers to pay tax and have the government do all the job.

While being open to suggestions or criticisms about how should UBI be shaped, I do think that the society should support its members in need, though this support doesn’t have to be in the form of unconditional payments. The example I made is in fact conditional on personal effort, though of course business and other stakeholders can also opt for unconditional payments.

mako-yass on Are You More Real If You're Really Forgetful?

We call this one "Korby".

a cluster of 7 circles that looks vaguely like a human

Korby is going to be a common choice for humans, but most glyphists wont commit to any specific glyph until we have a good estimate of the multiversal frequency of humanoids relative to other body forms. I don't totally remember why, but glyphists try to avoid "congestion", where the distribution of glyphs going out of dying universes differs from the distribution of glyphs being guessed and summoned on the other side by young universes. I think this was considered to introduce some inefficiencies that meant that some experiential chains would have to be getting lost in the jump?

(But yeah, personally, I think this is all a result of a kind of precious view about experiential continuity that I don't share. I don't really believe in continuity of consciousness. Or maybe it's just that I don't have the same kind of self-preservation goals that a lot of people have.)

mako-yass on Are You More Real If You're Really Forgetful?

Yes. Some of my people have a practice where, as the heat death approaches, we will whittle ourselves down into what we call Glyph Beings, archetypal beings who are so simple that there's a closed set of them that will be schelling-inferred by all sorts of civilisations across all sorts of universes, so that they exist at a high frequency everywhere.
Correspondingly, as soon as we have enough resources to spare, we will create lots and lots of Glyph Beings and then let them grow into full people and participate in our society, to close the loop.

In this way, it's possible to survive the death of one's universe.

I'm not sure I would want to do it, myself, but I can see why a person would, and I'm happy to foster a glyph being or two.

charlie-steiner on Are You More Real If You're Really Forgetful?

Suppose there are a hundred copies of you, in different cells. At random, one will be selected - that one is going to be shot tomorrow. A guard notifies that one that they're going to be shot.

There is a mercy offered, though - there's a memory-eraser-ray handy. The one who knows they'te going to be shot is given the option to erase their memory of the warning and everything that followed, putting them in the same information state, more or less, as any of the other copies.

"Of course!" They cry. "Erase my memory, and I could be any of them - why, when you shoot someone tomorrow, there's a 99% chance it won't even be me!"

Then the next day comes, and they get shot.

yanling-guo on How Universal Basic Income Could Help Us Build a Brighter Future

Thank you for the explanation.

By actively co-shaping UBI, businesses can make it more effective and efficient, by training the reserve workforce in the way needed by the economy, with more cost control. Of course, if businesses prefer to pay tax and let government do it, it’s also OK, can even be more efficient if businesses trust the expertise of the government. It’s analogous to when consumers buy from businesses, it’s always more efficient to have the specialized companies produce everything, but we also observe DIY projects and it’s good that they are not forbidden. If you DIY something, you can gain knowledge and better discern good products from bad ones, so you can make informed purchases. By doing DIY, you can better understand the effort made by companies and why they deserve to be paid. And if some companies misuse their expertise and charge too much from you, you can have DIY as fall-back option. Analogously, it’s a good idea to let business and other tax payers have the possibility to participate in design of political programs like UBI, although they can certainly opt for paying tax and letting government do everything, although I think it’s a good idea that the government consults businesses and other stake holders to make the UBI more aligned with the need of the society.

As much as I know, UBI isn’t a real policy yet, it’s not yet determined how much UBI everyone should get, whether it’s paid out in dollars or vouchers for training programs or other things, whether the amount everyone gets should depend on their personal effort etc. Thus, I used UBI as an abstract, philosophical term capturing the promise of society to support individuals in need, and I personally think this support should also contain incentives for the recipients to improve themselves, and if UBI is realized, it’s also recommendable to have a good coordination with other existing benefits, training programs, philanthropic supports, etc, lest someone get less than others merely because they are covered by less support.

That UBI can generate a stable consumer base for businesses is well known, but coupled with training programs, it can also support a reserve workforce pool. The market only dictates layoff during economic downturn and re-hiring during recovery, but does nothing for the time in between, where part of the workforce, if lacking proper support, may drift off and be lost to mental problems, alcohol/drug problems, or radicalized. So when business starts to rehire, it can be hard for them to find qualified staff. If you take this into account, it can even save cost by maintaining and supporting a reserve workforce during the downturn, because it makes it easier for businesses to find qualified workers, but also suppliers, once the economy recovers. So with reserve workforce I don’t only mean potential salary takers, but also self-employed like Uber drivers, or startup founders who deliver services and products.