LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

The Mom Test: Summary and Thoughts
Adam Zerner (adamzerner) · 2024-04-18T03:34:21.020Z · comments (3)

Philosophers wrestling with evil, as a social media feed
David Gross (David_Gross) · 2024-06-03T22:25:22.507Z · comments (2)

Misnaming and Other Issues with OpenAI's “Human Level” Superintelligence Hierarchy
Davidmanheim · 2024-07-15T05:50:17.770Z · comments (2)

Mechanistic Interpretability Workshop Happening at ICML 2024!
Neel Nanda (neel-nanda-1) · 2024-05-03T01:18:26.936Z · comments (6)

[link] Contra Acemoglu on AI
Maxwell Tabarrok (maxwell-tabarrok) · 2024-06-28T13:13:15.796Z · comments (0)

[link] Web-surfing tips for strange times
eukaryote · 2024-05-31T07:10:25.805Z · comments (19)

[link] JumpReLU SAEs + Early Access to Gemma 2 SAEs
Senthooran Rajamanoharan (SenR) · 2024-07-19T16:10:54.664Z · comments (10)

Why the Best Writers Endure Isolation
Declan Molony (declan-molony) · 2024-07-16T05:58:25.032Z · comments (6)

[link] Designing for a single purpose
Itay Dreyfus (itay-dreyfus) · 2024-05-07T14:11:22.242Z · comments (12)

D&D.Sci: The Mad Tyrant's Pet Turtles [Evaluation and Ruleset]
abstractapplic · 2024-04-09T14:01:34.426Z · comments (6)

How to do conceptual research: Case study interview with Caspar Oesterheld
Chi Nguyen · 2024-05-14T15:09:30.390Z · comments (5)

SRE's review of Democracy
Martin Sustrik (sustrik) · 2024-08-03T07:20:01.483Z · comments (2)

Anvil Problems
Screwtape · 2024-11-13T22:57:41.974Z · comments (2)

An alternative approach to superbabies
Towards_Keeperhood (Simon Skade) · 2024-11-05T22:56:15.740Z · comments (19)

AI and the Technological Richter Scale
Zvi · 2024-09-04T14:00:08.625Z · comments (8)

[link] Active Recall and Spaced Repetition are Different Things
Saul Munn (saul-munn) · 2024-11-08T20:14:56.092Z · comments (2)

[link] Book review: Xenosystems
jessicata (jessica.liu.taylor) · 2024-09-16T20:17:56.670Z · comments (18)

[question] If I wanted to spend WAY more on AI, what would I spend it on?
Logan Zoellner (logan-zoellner) · 2024-09-15T21:24:46.742Z · answers+comments (16)

Interested in Cognitive Bootcamp?
Raemon · 2024-09-19T22:12:13.348Z · comments (0)

Evaluating the truth of statements in a world of ambiguous language.
Hastings (hastings-greer) · 2024-10-07T18:08:09.920Z · comments (19)

Toward Safety Case Inspired Basic Research
Lucas Teixeira · 2024-10-31T23:06:32.854Z · comments (2)

D&D.Sci Coliseum: Arena of Data Evaluation and Ruleset
aphyer · 2024-10-29T01:21:03.075Z · comments (12)

Demis Hassabis and Geoffrey Hinton Awarded Nobel Prizes
Anna Gajdova (anna-gajdova) · 2024-10-09T12:56:24.856Z · comments (14)

[Intuitive self-models] 8. Rooting Out Free Will Intuitions
Steven Byrnes (steve2152) · 2024-11-04T18:16:26.736Z · comments (10)

[link] [Paper Blogpost] When Your AIs Deceive You: Challenges with Partial Observability in RLHF
Leon Lang (leon-lang) · 2024-10-22T13:57:41.125Z · comments (0)

Targeted Manipulation and Deception Emerge when Optimizing LLMs for User Feedback
Marcus Williams · 2024-11-07T15:39:06.854Z · comments (6)

Environmental allergies are curable? (Sublingual immunotherapy)
Chipmonk · 2023-12-26T19:05:08.880Z · comments (10)

[link] Constructive Cauchy sequences vs. Dedekind cuts
jessicata (jessica.liu.taylor) · 2024-03-14T23:04:07.300Z · comments (23)

How to safely use an optimizer
Simon Fischer (SimonF) · 2024-03-28T16:11:01.277Z · comments (21)

Critiques of the AI control agenda
Jozdien · 2024-02-14T19:25:04.105Z · comments (14)

[link] Soft Prompts for Evaluation: Measuring Conditional Distance of Capabilities
porby · 2024-02-02T05:49:11.189Z · comments (1)

Sora What
Zvi · 2024-02-22T18:10:05.397Z · comments (3)

[link] on neodymium magnets
bhauth · 2024-01-30T15:58:24.088Z · comments (6)

Goodbye, Shoggoth: The Stage, its Animatronics, & the Puppeteer – a New Metaphor
RogerDearnaley (roger-d-1) · 2024-01-09T20:42:28.349Z · comments (8)

Value learning in the absence of ground truth
Joel_Saarinen (joel_saarinen) · 2024-02-05T18:56:02.260Z · comments (8)

[link] Five projects from AI Safety Hub Labs 2023
charlie_griffin (cjgriffin) · 2023-11-08T19:19:37.759Z · comments (1)

2023 Prediction Evaluations
Zvi · 2024-01-08T14:40:07.377Z · comments (0)

Extended Interview with Zhukeepa on Religion
Ben Pace (Benito) · 2024-08-18T03:19:05.625Z · comments (59)

4. Existing Writing on Corrigibility
Max Harms (max-harms) · 2024-06-10T14:08:35.590Z · comments (13)

[link] "If we go extinct due to misaligned AI, at least nature will continue, right? ... right?"
plex (ete) · 2024-05-18T14:09:53.014Z · comments (23)

What distinguishes "early", "mid" and "end" games?
Raemon · 2024-06-21T17:41:30.816Z · comments (22)

Some Experiments I'd Like Someone To Try With An Amnestic
johnswentworth · 2024-05-04T22:04:19.692Z · comments (33)

shortest goddamn bayes guide ever
lukehmiles (lcmgcd) · 2024-05-10T07:06:23.734Z · comments (8)

How do we know that "good research" is good? (aka "direct evaluation" vs "eigen-evaluation")
Ruby · 2024-07-19T00:31:38.332Z · comments (21)

Caring about excellence
owencb · 2024-07-22T14:24:37.892Z · comments (4)

Run evals on base models too!
orthonormal · 2024-04-04T18:43:25.468Z · comments (6)

AI #75: Math is Easier
Zvi · 2024-08-01T13:40:05.539Z · comments (25)

Enriched tab is now the default LW Frontpage experience for logged-in users
Ruby · 2024-06-21T00:09:30.441Z · comments (27)

How to hire somebody better than yourself
lukehmiles (lcmgcd) · 2024-08-28T08:12:53.450Z · comments (5)

Decision Theory in Space
lsusr · 2024-08-18T07:02:11.847Z · comments (18)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

viliam on Jan_Kulveit's Shortform

It is difficult to prove things, but I strongly suspect that in Slovakia, Ján Čarnogurský is a Russian asset.

In my opinion, the only remaining question is when exactly was he recruited, how long game was played on us. I have suspected him for a long time, but most people probably would have called me crazy for that, however recently he became openly pro-Russian, to a great surprise for many of his former supporters. So the question is whether I was right and this was a long con, or whether he had a change of mind recently and my previous suspicions were merely a coincidence (homogeneity of the outgroup, etc.).

If this indeed was a long con (maybe, maybe not), then he had a perfect cover story. During communism, he was a lawyer and provided legal support for the anti-Communist opposition. Two years before the fall of communism, he was fired and unemployed. Three months before the fall of communism, he was put in prison. Also, he was strongly religious (perceived as a religious fanatic by some). Remember that Slovakia is a predominantly Catholic country.

After the fall of communism he quickly rose to power. He basically represented the opposition to communism, and the comeback of religious freedom. In 1990s the political scene of Slovakia was basically two camps: those nostalgic for communism, led by Vladimír Mečiar, and those who opposed communism and wanted to join the West, led by Ján Čarnogurský. So we are talking here about the strongest, or the second strongest politician.

I remember some weird opinions of his from that era. For example, he talked a lot about how Slovakia should be "a bridge between Russia and the West", and that we should build a broad-gauge railway across Slovakia (i.e. from the Ukrainian border, to the capital city which is on the western end). If anyone else would have said that, people would probably suspect them of something, but Čarnogurský's anti-communist credentials were just too perfect, so he stayed above suspicion. (From my perspective, perhaps a little paranoid, that sounded a bit like preparing the ground for easy invasion. I mean, one day, a huge train could arrive from Russia right to our capital city, and if it turns out that the train is full of well-armed soldiers, the invasion could be over before most people would even notice that it began. Note: I have no military expertise, so maybe what I am saying here doesn't make sense.)

Then in 1998 he was unexpectedly replaced as a leader by Mikuláš Dzurinda, in a weird turn of events, that was basically a non-violent coup based on technicality. (The opposition to Mečiar was always fragmented to multiple political parties, so they always ran as a coalition. Mečiar changed the constitution to make elections much more difficult for coalitions than for individual parties. The opposition parties were like "no problem, we will make a faux political party as a temporary facade for our coalition, win the election, revert the law, disband the temporary party, and return to life as usual", and they put Dzurinda, a relatively unknown young guy, as a leader of the new party. However, after election when they asked him to disband the new party, he was like "LOL, I am the leader of the party that won the election, you guys better shut up", and governed the country.) Those were the best years for Slovakia, politically; we quickly joined EU and NATO. (Afterwards, Mečiar was replaced in the role of nostalgic post-communist alpha male leader by Robert Fico who won almost every election since then, and the opposition remains fragmented.)

Thus Ján Čarnogurský lost most of his political power. No longer the natural (Schelling-point) leader of the opposition; too much perceived as a religious fanatic to lead anyone other than those. So he quit politics, founded a private Paneuropean University (together with two Russian entrepreneurs), and later became openly pro-Russian. Among other things, he supports the Russian invasion of Ukraine, organizes protests for "peace" (read: capitulation of Ukraine), opposes the EU sanctions against Russia. He is a chairman of Slovak-Russian Society. Recently he received an Order of Honour in Russia.

localdeity on Thoughts after the Wolfram and Yudkowsky discussion

I can also come up with a story where obviously it's cheaper and more effective to disable all of the nuclear weapons than it is to take over the world, so why would the AI do the second thing?

Erm... For preventing nuclear war on the scale of decades... I don't know what you have in mind for how it would disable all the nukes, but a one-off breaking of all the firing mechanisms isn't going to work. They could just repair/replace that once they discovered the problem. You could imagine some more drastic thing like blowing up the conventional explosives on the missiles so as to utterly ruin them, but in a way that doesn't trigger the big chain reaction. But my impression is that, if you have a pile of weapons-grade uranium, then it's reasonably simple to make a bomb out of it, and since uranium is an element, no conventional explosion can eliminate that from the debris. Maybe you can melt it, mix it with other stuff, and make it super-impure?

But even then, the U.S. and Russia probably have stockpiles of weapons-grade uranium. I suspect they could make nukes out of that within a few months. You would have to ruin all the stockpiles too.

And then there's the possibility of mining more uranium and enriching it; I feel like this would take a few years at most, possibly much less if one threw a bunch of resources into rushing it. Would you ruin all uranium mines in the world somehow?

No, it seems to me that the only ways to reliably rule out nuclear war involve either using overwhelming physical force to prevent people from using or making nukes (like a drone army watching all the uranium stockpiles), or being able to reliably persuade the governments of all nuclear powers in the world to disarm and never make any new nukes. The power to do either of these things seems tantamount to the power to take over the world.

kylefurlong on The Humanitarian Economy

Thank you for sharing your experiences. It’s a story of how the best intentions go awry due to human nature and how free markets are a way of working around this. The vibrancy and efficiency of motivated people competing to make things better is a strong and vital force in the society that fosters it, and to some extent people who grow up that way tend to take it for granted. Thank you for the reality check.

In a way, this is an argument, not for social Darwinism, but for creating the possibility to escape the mean. If you take a distribution and flatten it, you eliminate the worst outcomes, but you also eliminate the vibrant top and middle. I’m guessing that allowing for a bottom allows for a much more elevated middle.

In a sense, this means that the current system is working as intended: wealth inequality gives us the highest middle.

sil-ver on [Intuitive self-models] 6. Awakening / Enlightenment / PNSE

I think this post fails as an explanation of equanimity. Which, of course, is dependent on my opinion about how equanimity works, so you have a pretty easy response of just disputing that the way I think equanimity works is correct. But idk what to do about this, so I'll just go ahead with a critique based on how I think equanimity works. So I'd say a bunch of things:

Your mechanism describes how PNSE or equanimity leads to a decrease in anxiety via breaking the feedback loop. But equanimity doesn't actually decrease the severity of an emotion, it just increases the valence! It's true that you can decrease the emotion (or reduce the time during which you feel it), but imE this is an entirely separate mechanism. So between the two mechanisms of (a) decreasing the duration of an emotion (presumably by breaking the feedback loop) and (b) applying equanimity to make it higher valence, I think you can vary each one freely independent of the other. You could do a ton of (a) with zero (b), a ton of (b) with zero (a), a lot of both, or (which is the default state) neither.
Your mechanism mostly applies to mental discomfort, but equanimity is actually much easier to apply to physical pain. You can also apply it to anxiety, but it's very hard. I can reduce suffering from moderately severe physical pain on demand (although there is very much a limit) and ditto with itching sensations, but I'm still struggling a lot with mental discomfort.
You can apply equanimity to positive sensations and it makes them better! This is a point I'd emphasize the most because imo it's such a clear and important aspect of how equanimity works. One of the ways to feel really really good is to have a pleasant sensation, like listening to music you love, and then applying maximum equanimity to it. I'm pretty sure you can enter the first jhana this way (although to my continuous disappointment I've never managed to reach the first jhana with music, so I can't guarantee it.)

... actually, you can apply equanimity to literally any conscious percept. Like literally anything; you can apply equanimity to the sense of space around you, or to the blackness in your visual field, or to white noise (or any other sounds), or to the sensation of breathing. The way to do this is hard to put into words (similar to how an elementary motor command like lifting a finger is hard to put into words); the way it's usually described is by trying to accept/not fight a sensation. (Which imo is problematic because it sounds like equanimity means stopping to do something, when I'm pretty sure it's actively doing something. Afaik there are ~zero examples of animals who learn to no longer care about pain, so it very much seems like the default is that pain is negative valence, and applying equanimity is an active process that increases valence.)

I mean again, you can just say you've talked about something else using the same term, but imo all of the above are actually not that difficult to verify. At least for me, it didn't take me that long to figure out how to apply equanimity to minor physical pain, and from there, everything is just a matter of skill to do it more -- it's very much a continuous scale of being able to apply more and more equanimity, and I think the limit is very high -- and of realizing that you can just do same thing wrt sensations that don't have negative valence in the first place.

eggsyntax on eggsyntax's Shortform

Interesting approach, thanks!

dr_manhattan on An alternative way to browse LessWrong 2.0

Any updates on the API? (thinking of) Playing around with interesting ways to index LW, figure there should be something better than scraping

sharmake-farah on Evaluating Stability of Unreflective Alignment

On this:

When I said "problems we care about", I was referring to a cluster of problems that very strongly appear to not scale well with population. Maybe this [LW · GW] is an intuitive picture of the cluster of problems I'm referring to.

I think the problem identified here is in large part a demand problem, in that lots of AI people only wanted AI capabilities, and didn't care for AI interpretability at all, so once the scaling happened, a lot of the focus went purely to AI scaling.

(Which is an interesting example of Goodhart's law in action, perhaps.)

See here:

https://www.lesswrong.com/posts/gXinMpNJcXXgSTEpn/ai-craftsmanship#Qm8Kg7PjZoPTyxrr6 [LW(p) · GW(p)]

IMO this is pretty obviously wrong. There are some kinds of problem solving that scales poorly with population, just as there are some computations that scale poorly with parallelisation.
E.g. project euler problems [LW · GW].

I definitely agree that there exist such problems where the scaling with population is pretty bad, but I'll give 2 responses here:

The differences between a human level AI and an actual human are the ability to coordinate and share ontologies better between millions of instances, so the common problems that arise when trying to factorize out problems are greatly reduced.
I think that while there are serial bottlenecks to lots of problem solving in the real world such that it prevents hyperfast outcomes, I don't think that serial bottlenecks are the dominating factor, because the stuff that is parallelizable like good execution is often far more valuable than the inherently serial computations like deep/original ideas.

startattheend on Does the "ancient wisdom" argument have any validity? If a particular teaching or tradition is old, to what extent does this make it more trustworthy?

The short version is that I'm not sold on rationality, and while I haven't read 100% of the sequences it's also not like my understanding is 0%. I'd have read more if they weren't so long. And while an intelligent person can come up with intelligent ways of thinking, I'm not sure this is reversible. I'm also mostly interested in tail-end knowledge. For some posts, I can guess the content by the title, which is boring. Finally, teaching people what not to do is really inefficient, since the space of possible mistakes is really big.

Your last link needs an s before the dot.

Anyway, I respect your decision, and I understand the purpose of this site a lot better now (though there's still a small, misleading difference between the explanation of rationality and in how users are behaving. Even the name of the website gave the wrong impression).

tahp on Thoughts after the Wolfram and Yudkowsky discussion

Oops, I meant cellular, and not molecular. I'm going to edit that.

I can come up with a story in which AI takes over the world. I can also come up with a story where obviously it's cheaper and more effective to disable all of the nuclear weapons than it is to take over the world, so why would the AI do the second thing? I see a path where instrumental convergence leads anything going hard enough to want to put all of the atoms on the most predictable path it can dictate. I think the thing that I don't get is what principle it is that makes anything useful go that hard. Something like (for example, I haven't actually thought this through) "it is hard to create something with enough agency/creativity to design and implement experiments toward a purpose without also having it notice and try to fix things in the world which are suboptimal to the purpose."

measure on Using hex to get murder advice from GPT-4o

It was probably thinking of sodium hydroxide rather than elemental sodium.