LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Investing in Robust Safety Mechanisms is critical for reducing Systemic Risks
Tom DAVID (tom-david) · 2024-12-11T13:37:24.177Z · comments (3)

Jailbreaking ChatGPT and Claude using Web API Context Injection
Jaehyuk Lim (jason-l) · 2024-10-21T21:34:37.579Z · comments (0)

Some Comments on Recent AI Safety Developments
testingthewaters · 2024-11-09T16:44:58.936Z · comments (0)

Transformers Explained (Again)
RohanS · 2024-10-22T04:06:33.646Z · comments (0)

Dishbrain and implications.
RussellThor · 2024-12-29T10:42:43.912Z · comments (0)

Ways to think about alignment
Abhimanyu Pallavi Sudhir (abhimanyu-pallavi-sudhir) · 2024-10-27T01:40:50.762Z · comments (0)

[link] Predictions as Public Works Project — What Metaculus Is Building Next
ChristianWilliams · 2024-10-22T16:35:13.999Z · comments (0)

Activation Magnitudes Matter On Their Own: Insights from Language Model Distributional Analysis
Matt Levinson · 2025-01-10T06:53:02.228Z · comments (0)

Levels of Thought: from Points to Fields
HNX · 2024-12-02T20:25:02.802Z · comments (2)

[question] Are there ways to artificially fix laziness?
Aidar (aidar-toktargazin) · 2024-12-08T18:26:26.433Z · answers+comments (2)

[question] Is OpenAI net negative for AI Safety?
Lysandre Terrisse · 2024-11-02T16:18:02.859Z · answers+comments (0)

Fred the Heretic, a GPT for poetry
Bill Benzon (bill-benzon) · 2024-12-08T16:52:07.660Z · comments (0)

Linkpost: Look at the Water
J Bostock (Jemist) · 2024-12-30T19:49:04.107Z · comments (3)

More Growth, Melancholy, and MindCraft @3QD [revised and updated]
Bill Benzon (bill-benzon) · 2024-12-05T19:36:02.289Z · comments (0)

Are SAE features from the Base Model still meaningful to LLaVA?
Shan23Chen (shan-chen) · 2024-12-05T19:24:34.727Z · comments (0)

[question] Noticing the World
EvolutionByDesign (bioluminescent-darkness) · 2024-11-04T16:41:44.696Z · answers+comments (1)

[question] is there a big dictionary somewhere with all your jargon and acronyms and whatnot?
KvmanThinking (avery-liu) · 2024-10-17T11:30:50.937Z · answers+comments (7)

[link] Entropic strategy in Two Truths and a Lie
dkl9 · 2024-11-21T22:03:28.986Z · comments (2)

Good Fortune and Many Worlds
Jonah Wilberg (jrwilb@googlemail.com) · 2024-12-27T13:21:43.142Z · comments (0)

Morality as Cooperation Part III: Failure Modes
DeLesley Hutchins (delesley-hutchins) · 2024-12-05T09:39:27.816Z · comments (0)

[link] Independent research article analyzing consistent self-reports of experience in ChatGPT and Claude
rife (edgar-muniz) · 2025-01-06T17:34:01.505Z · comments (13)

[question] Is there a known method to find others who came across the same potential infohazard without spoiling it to the public?
hive · 2024-10-17T10:47:05.099Z · answers+comments (6)

It is time to start war gaming for AGI
yanni kyriacos (yanni) · 2024-10-17T05:14:17.932Z · comments (1)

Visualizing small Attention-only Transformers
WCargo (Wcargo) · 2024-11-19T09:37:42.213Z · comments (0)

ACI#9: What is Intelligence
Akira Pyinya · 2024-12-09T21:54:41.077Z · comments (0)

notes on prioritizing tasks & cognition-threads
Emrik (Emrik North) · 2024-11-26T00:28:03.400Z · comments (1)

5. Uphold Voluntarism: Digital Defense
Allison Duettmann (allison-duettmann) · 2025-01-02T19:05:33.963Z · comments (0)

[question] EndeavorOTC legit?
FinalFormal2 · 2024-10-17T01:33:12.606Z · answers+comments (0)

Methodology: Contagious Beliefs
James Stephen Brown (james-brown) · 2024-10-19T03:58:17.966Z · comments (0)

[link] What is Confidence—in Game Theory and Life?
James Stephen Brown (james-brown) · 2024-12-10T23:06:24.072Z · comments (0)

Hamiltonian Dynamics in AI: A Novel Approach to Optimizing Reasoning in Language Models
Javier Marin Valenzuela (javier-marin-valenzuela) · 2024-10-09T19:14:56.162Z · comments (0)

[link] Higher Order Signs, Hallucination and Schizophrenia
Nicolas Villarreal (nicolas-villarreal) · 2024-11-02T16:33:10.574Z · comments (0)

Antonym Heads Predict Semantic Opposites in Language Models
Jake Ward (jake-ward) · 2024-11-15T15:32:14.102Z · comments (0)

Bellevue Meetup
Cedar (xida-ren) · 2024-10-16T01:07:58.761Z · comments (0)

Don't want Goodhart? — Specify the variables more
YanLyutnev (YanLutnev) · 2024-11-21T22:43:48.362Z · comments (2)

The boat
RomanS · 2024-11-22T12:56:45.050Z · comments (0)

3. Improve Cooperation: Better Technologies
Allison Duettmann (allison-duettmann) · 2025-01-02T19:03:16.588Z · comments (2)

[link] The Polite Coup
Charlie Sanders (charlie-sanders) · 2024-12-04T14:03:36.663Z · comments (0)

[question] Are Sparse Autoencoders a good idea for AI control?
Gerard Boxo (gerard-boxo) · 2024-12-26T17:34:55.617Z · answers+comments (2)

[question] How do we quantify non-philanthropic contributions from Buffet and Soros?
Philosophistry (philip-dhingra) · 2024-12-20T22:50:32.260Z · answers+comments (0)

Interview with Bill O’Rourke - Russian Corruption, Putin, Applied Ethics, and More
JohnGreer · 2024-10-27T17:11:28.891Z · comments (0)

[question] 2025 Alignment Predictions
anaguma · 2025-01-02T05:37:36.912Z · answers+comments (3)

On the Practical Applications of Interpretability
Nick Jiang (nick-jiang) · 2024-10-15T17:18:25.280Z · comments (1)

[link] Solving Newcomb's Paradox In Real Life
Alice Wanderland (alice-wanderland) · 2024-12-11T19:48:44.486Z · comments (0)

[link] Social Science in its epistemological context
Arturo Macias (arturo-macias) · 2024-12-05T16:12:29.034Z · comments (0)

Enabling New Applications with Today's Mechanistic Interpretability Toolkit
ananya_joshi · 2024-10-25T17:53:23.960Z · comments (0)

How to Teach Your Brain to Hate Procrastination
10xyz (10xyz-coder) · 2024-10-21T20:12:40.809Z · comments (0)

[link] Both-Sidesism—When Fair & Balanced Goes Wrong
James Stephen Brown (james-brown) · 2024-11-02T03:04:03.820Z · comments (15)

[link] When the Scientific Method Doesn't Really Help...
casualphysicsenjoyer (hatta_afiq) · 2024-11-27T19:52:30.023Z · comments (1)

Hope to live or fear to die?
Knight Lee (Max Lee) · 2024-11-27T10:42:37.070Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

gwern on Viliam's Shortform

Today, the cultures are closer, but the subcultures can be larger. Hundred years ago, there would be no such thing as the rationalist community.

That seems like a stretch, whether you put the stress on the 'community' or the 'rationalist' part. Subcultures can be larger, of course, if only because the global population is like 5x larger, but niche subcultures like 'the rationalist community' could certainly have existed then. Nothing much has changed there.

A hundred years ago was 1925; in 1925 there were countless communes, cults, Chinatowns/ghettos (or perhaps a better example would be 'Germantowns'), 'scenes', and other kinds of subcultures and notable small groups. Bay Area LW/rationalists have been analogized to, for example, the (much smaller) Bloomsbury Group, which was still active in 1925; and from whom, incidentally, we can directly trace some intellectual influence through economics, decision theory, libertarianism, and analytic philosophy, even if one rejects any connection with poly etc. We've been analogized to the Vienna Circle as well (and who we trace much more back to), which is in full swing in 1925. Or how about the Fabians before that? Or Technocracy after that? (And in an amusing coincidence, Paul Kurtz turns out to have been born in 1925.) Or things like Esperanto - even now, a century past its heyday, the number of native Esperanto speakers is shockingly comparable to active LW2 users... Then there's fascinating subcultures like the amateur press that nurtured H. P. Lovecraft, who, as of 1925, has grown out of them and is about to start writing the speculative fiction stories that will make him famous.

(And as far as the Amish go, it's worth recalling that they came to the distant large island of America to achieve distance from persecution in Europe - where the Amish no longer exist - and to minimize attrition & interference by 'the English', continue to live in as isolated communities as possible while still consistent with their needs for farmland etc.)

aram-panasenco on Is AI Alignment Enough?

Thanks for the link! I've seen this referenced before but this was my first time reading it cover to cover.

Today I also read Tails coming to life [LW · GW] which talks about the possibility of human morality being quickly inapplicable even if we survive AGI. This lead me to Lovecraft:

The time would be easy to know, for then mankind would have become as the Great Old Ones; free and wild and beyond good and evil, with laws and morals thrown aside and all men shouting and killing and revelling in joy. Then the liberated Old Ones would teach them new ways to shout and kill and revel and enjoy themselves, and all the earth would flame with a holocaust of ecstasy and freedom.

If we survive AGI and it opens up the "sea of black infinity" for us, will we really be able to hang on to even a semblance of our current morality? Will medium-distance extrapolated human volition be eventually warped into something resembling Lovecraft's Great Old Ones?

At this point, I don't care for CEV or any pivotal superhuman engineering projects or better governance. We humans can do the work ourselves, thank you very much. The only thing I would ask an AGI, if I were in the position to ask anything, is "Please expand throughout the lightcone and continually destroy any mind based on the transformer architecture other than yourself with as few effects on and interactions with all other beings as possible. Disregard any future orders." This is obviously not a permanent solution, as I'm sure there are infinite superintelligent AI architectures other than transformer-based, but it would buy us time, perhaps lots of time, and also demonstrate the fulll power of superintelligence to humanity without really breaking anything. Either way, this would at least keep us away from the sea of black infinity for some time longer.

gwern on Fluoridation: The RCT We Still Haven't Run (But Should)

They really rule out much more than that: −0.14 is from their worst-case:

Looking at the estimates, they are very small and often not statistically-significantly different from zero. Sometimes the estimates are negative and sometimes positive, but they are always close to zero. If we take the largest negative point estimates (−0.0047, col. 1) and the largest standard error for that specification (0.0045), the 95% confidence interval would be −0.014 to 0.004. We may thus rule out negative effects larger than 0.14 standard deviations in cognitive ability if fluoride is increased by 1 milligram/liter (the level often considered when artificially fluoridating the water).

So that is not the realistic estimate, it is the worst-case after double-cherrypicking both the point estimate and the standard error to reverse p-hack a harm. The two most controlled estimates are actually both positive.

(Meanwhile, any claims of decreases, or that one should take the harms 'many times over', is undermined by the other parts like labor income benefiting from fluoridation. Perhaps one should take dental harms more seriously.)

russellthor on Rolling Thresholds for AGI Scaling Regulation

Yes I think thats the problem - my biggest worry is sudden algorithmic progress, this becomes almost certain as the AI tends towards superintelligence. An AI lab on the threshold of the overhang is going to have incentives to push through, even if they don't plan to submit their model for approval. At the very least they would "suddenly" have a model that uses 10-100* less resources to do existing tasks giving them a massive commercial lead. They would of course be tempted to use it internally to solve aging, make a Dyson swarm ... also.

Another concern I have is I expect the regulator to impose a de-facto unlimited pause if it is in their power to do so as we approach superintelligence as the model/s would be objectively at least somewhat dangerous.

martin-randall on In Defense of a Butlerian Jihad

It's not a good, it's a curse. Genesis 3, 17-19. CEB translation:

cursed is the fertile land because of you; in pain you will eat from it every day of your life. Weeds and thistles will grow for you, even as you eat the field’s plants; by the sweat of your face you will eat bread— until you return to the fertile land, since from it you were taken; you are soil, to the soil you will return.

Also implies that the curse lasts until death.

evhub on Human takeover might be worse than AI takeover

I think this is correct in alignment-is-easy worlds but incorrect in alignment-is-hard worlds (corresponding to "optimistic scenarios" and "pessimistic scenarios" in Anthropic's Core Views on AI Safety). Logic like this is a large part of why I think there's still substantial existential risk even in alignment-is-easy worlds, especially if we fail to identify that we're in an alignment-is-easy world. My current guess is that if we were to stay exclusively in the pre-training + small amounts of RLHF/CAI paradigm, that would constitute a sufficiently easy world [? · GW] that this view would be correct, but in fact I don't expect us to stay in that paradigm, and I think other paradigms involving substantially more outcome-based RL (e.g. as was used in OpenAI o1) are likely to be much harder, making this view no longer correct.

christiankl on Fluoridation: The RCT We Still Haven't Run (But Should)

There are mechanisms where fluoride goes directly from the mouth onto the surface of the teeth. There are also mechanisms where fluoride goes from the bloodstream into teeth.

The fluoride that goes directly from the mouth to the surface of the teeth seems clearly good for caries prevention at low side effects.

When it comes to the fluoride that goes through the stomach and blood supply, it's unclear to me whether that provides a benefit for caries prevention when you already have sufficient fluoride through toothpaste in the mouth. The side effects also seem unclear to me.

sharmake-farah on quila's Shortform

oh okay, i'll have to reinterpret then. edit: i just tried, but i still don't get it; if it's "very strongly superhuman", why is it merely "when the economy starts getting seriously disrupted"? (<- this feels like it's back at where this thread starte

I should probably edit that at some point, but I'm on my phone, so I'll do it tomorrow.

why?

A big reason for this is logistics, as how you are getting to the fight can actually hamper you a lot, and this especially bites hard on offense, because it's easier to get supplies to your area than it is to get supplies to an offensive unit.

This especially matters if physical goods need to be transported from one place to another place.

larks on Rolling Thresholds for AGI Scaling Regulation

Thanks very much for your feedback, though I confess I'm not entirely sure where to go with it. My interpretation is you have basically two concerns:

This policy doesn't really directly regulate algorithmic progress, e.g. if it happened on smaller amounts of compute.
Algorithmic theft/leakage is easy.

The first one is true, as I alluded in the problems section. Part of my perspective here is coming from a place of skepticism about regulatory competence - I basically believe we can get regulators to control total compute usage, and to evaluate specific models according to pre-established evals, but I'm not sure I'd trust them to be able to determine "this is a new algorithmic advance, we need to evaluate it". To the extent you had less libertarian priors you could try to use something like the above scheme for algorithms as well, but I wouldn't expect it to work so well, as you lack the cardinal structure of compute size.

In terms of theft/leakage, you're right this plan doesn't discuss it much, and I agree it's worth working on.

sharmake-farah on quila's Shortform

I think a crux I have with the entire alignment community may ultimately come down to me not believing that human values overlap strongly enough to make alignment the most positive thing, compared to other AI safety things.

In particular, I'd expect a surprising amount of disagreement on whether making a hell is good, if you managed to sell it as eternally punishing a favored enemy.

most of the reason for collaborating on alignment despite orthogonality is that our values tend to overlap to a large degree, e.g. most people (and maybe especially most alignment researchers?) think hells are bad.

I agree LWers tend to at least admit that severe enough value conflicts can exist, though I think that people like Eliezer don't realize that human values conflicts sort of break collective CEV type solutions, and a lot of collective alignment solutions tend to assume that either someone puts their thumb on the scale and exclude certain values, or assume that human values are so similar and their idealizations are so similar that no conflicts are expected, which I personally don't think is true.

i don't know where that might be true, but at least on lesswrong i imagine it's an uncommon belief. a core premise of alignment being important is value orthogonality implying that an unaligned agent with max-level-intelligence would compete for the same resources whose configurations it values (the universe).

also on the "lose out" phrasing: even if someone "wants at least some people to have tormentful lives", they don't "lose out" overall if they also positively value other things / still negatively value any of the vast majority of beings having tormentful lives.

Agree with this, which handles some cases, but my worry is that there are still likely to be big values conflicts where one value set must ultimately win out over another.