LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

An experiment on hidden cognition
Olli Järviniemi (jarviniemi) · 2024-07-22T03:26:05.564Z · comments (2)

Twin Peaks: under the air
KatjaGrace · 2024-05-31T01:20:04.624Z · comments (2)

Virtually Rational - VRChat Meetup
Tomás B. (Bjartur Tómas) · 2024-01-28T05:52:36.934Z · comments (3)

Can Generalized Adversarial Testing Enable More Rigorous LLM Safety Evals?
scasper · 2024-07-30T14:57:06.807Z · comments (0)

A Visual Task that's Hard for GPT-4o, but Doable for Primary Schoolers
Lennart Finke (l-f) · 2024-07-26T17:51:28.202Z · comments (4)

Control Symmetry: why we might want to start investigating asymmetric alignment interventions
domenicrosati · 2023-11-11T17:27:10.636Z · comments (1)

To Boldly Code
StrivingForLegibility · 2024-01-26T18:25:59.525Z · comments (4)

[question] Impressions from base-GPT-4?
mishka · 2023-11-08T05:43:23.001Z · answers+comments (25)

[link] Let's Design A School, Part 2.2 School as Education - The Curriculum (General)
Sable · 2024-05-07T19:22:21.730Z · comments (3)

The Wisdom of Living for 200 Years
Martin Sustrik (sustrik) · 2024-06-28T04:44:10.609Z · comments (3)

[link] The Best Essay (Paul Graham)
Chris_Leong · 2024-03-11T19:25:42.176Z · comments (2)

Testing for consequence-blindness in LLMs using the HI-ADS unit test.
David Scott Krueger (formerly: capybaralet) (capybaralet) · 2023-11-24T23:35:29.560Z · comments (2)

[link] OpenAI Superalignment: Weak-to-strong generalization
Dalmert · 2023-12-14T19:47:24.347Z · comments (3)

Using an LLM perplexity filter to detect weight exfiltration
Adam Karvonen (karvonenadam) · 2024-07-21T18:18:05.612Z · comments (11)

D&D.Sci Hypersphere Analysis Part 4: Fine-tuning and Wrapup
aphyer · 2024-01-18T03:06:39.344Z · comments (5)

Trying to be rational for the wrong reasons
Viliam · 2024-08-20T16:18:06.385Z · comments (8)

[link] what becoming more secure did for me
Chipmonk · 2024-08-22T17:44:48.525Z · comments (5)

Useful starting code for interpretability
eggsyntax · 2024-02-13T23:13:47.940Z · comments (2)

[link] Structured Transparency: a framework for addressing use/mis-use trade-offs when sharing information
habryka (habryka4) · 2024-04-11T18:35:44.824Z · comments (0)

Clipboard Filtering
jefftk (jkaufman) · 2024-04-14T20:50:02.256Z · comments (1)

[link] Sticker Shortcut Fallacy — The Real Worst Argument in the World
ymeskhout · 2024-06-12T14:52:41.988Z · comments (15)

[link] An Intuitive Explanation of Sparse Autoencoders for Mechanistic Interpretability of LLMs
Adam Karvonen (karvonenadam) · 2024-06-25T15:57:16.872Z · comments (0)

[link] MIRI's July 2024 newsletter
Harlan · 2024-07-15T21:28:17.343Z · comments (2)

[link] Executive Dysfunction 101
DaystarEld · 2024-05-23T12:43:13.785Z · comments (1)

Beta Tester Request: Rallypoint Bounties
lukemarks (marc/er) · 2024-05-25T09:11:11.446Z · comments (4)

$250K in Prizes: SafeBench Competition Announcement
ozhang (oliver-zhang) · 2024-04-03T22:07:41.171Z · comments (0)

[link] A primer on the next generation of antibodies
Abhishaike Mahajan (abhishaike-mahajan) · 2024-09-01T22:37:59.207Z · comments (0)

Fact Finding: Simplifying the Circuit (Post 2)
Senthooran Rajamanoharan (SenR) · 2023-12-23T02:45:49.675Z · comments (3)

Proving the Geometric Utilitarian Theorem
StrivingForLegibility · 2024-08-07T01:39:10.920Z · comments (0)

[link] Was a Subway in New York City Inevitable?
Jeffrey Heninger (jeffrey-heninger) · 2024-03-30T00:53:21.314Z · comments (4)

[question] How to Model the Future of Open-Source LLMs?
Joel Burget (joel-burget) · 2024-04-19T14:28:00.175Z · answers+comments (9)

A Review of In-Context Learning Hypotheses for Automated AI Alignment Research
alamerton · 2024-04-18T18:29:33.892Z · comments (4)

[link] Beware the science fiction bias in predictions of the future
Nikita Sokolsky (nikita-sokolsky) · 2024-08-19T05:32:47.372Z · comments (20)

Economics Roundup #1
Zvi · 2024-03-26T14:00:06.332Z · comments (4)

Evolution did a surprising good job at aligning humans...to social status
Eli Tyre (elityre) · 2024-03-10T19:34:52.544Z · comments (37)

How Congressional Offices Process Constituent Communication
Tristan Williams (tristan-williams) · 2024-07-02T12:38:41.472Z · comments (0)

[link] Altruism and Vitalism Aren't Fellow Travelers
Arjun Panickssery (arjun-panickssery) · 2024-08-09T02:01:11.361Z · comments (2)

[question] What percent of the sun would a Dyson Sphere cover?
Raemon · 2024-07-03T17:27:50.826Z · answers+comments (26)

Distinctions when Discussing Utility Functions
ozziegooen · 2024-03-09T20:14:03.592Z · comments (7)

Scientific Method
Andrij “Androniq” Ghorbunov (andrij-androniq-ghorbunov) · 2024-02-18T21:06:45.228Z · comments (4)

[link] Review of Alignment Plan Critiques- December AI-Plans Critique-a-Thon Results
Iknownothing · 2024-01-15T19:37:07.984Z · comments (0)

aintelope project update
Gunnar_Zarncke · 2024-02-08T18:32:00.000Z · comments (2)

Even if we lose, we win
Morphism (pi-rogers) · 2024-01-15T02:15:43.447Z · comments (17)

[link] Scenario planning for AI x-risk
Corin Katzke (corin-katzke) · 2024-02-10T00:14:11.934Z · comments (12)

[link] Robert Caro And Mechanistic Models In Biography
adamShimi · 2024-07-14T10:56:42.763Z · comments (5)

[link] Extinction Risks from AI: Invisible to Science?
VojtaKovarik · 2024-02-21T18:07:33.986Z · comments (7)

[link] Clickbait Soapboxing
DaystarEld · 2024-03-13T14:09:29.890Z · comments (15)

Technology path dependence and evaluating expertise
bhauth · 2024-01-05T19:21:23.302Z · comments (2)

An evaluation of Helen Toner’s interview on the TED AI Show
PeterH · 2024-06-06T17:39:40.800Z · comments (2)

D&D.Sci Hypersphere Analysis Part 2: Nonlinear Effects & Interactions
aphyer · 2024-01-14T19:59:37.911Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

viliam on The Humanitarian Economy

If we ignored housing, then "free market + some taxation and giving the money to the poor" kinda sounds like the best of both words. Unfortunately, the increasing rents can eat the extra money given to the poor. (See also: Georgism.)

Maybe if we could get UBI high enough that people could survive on that alone, it would no longer be necessary to live in cities (close to good jobs) and people could avoid paying too high rent. Or maybe not, because ultimately all land belongs to someone? Not sure.

viliam on Jan_Kulveit's Shortform

It is difficult to prove things, but I strongly suspect that in Slovakia, Ján Čarnogurský is a Russian asset.

In my opinion, the only remaining question is when exactly was he recruited, how long game was played on us. I have suspected him for a long time, but most people probably would have called me crazy for that, however recently he became openly pro-Russian, to a great surprise for many of his former supporters. So the question is whether I was right and this was a long con, or whether he had a change of mind recently and my previous suspicions were merely a coincidence (homogeneity of the outgroup, etc.).

If this indeed was a long con (maybe, maybe not), then he had a perfect cover story. During communism, he was a lawyer and provided legal support for the anti-Communist opposition. Two years before the fall of communism, he was fired and unemployed. Three months before the fall of communism, he was put in prison. Also, he was strongly religious (perceived as a religious fanatic by some). Remember that Slovakia is a predominantly Catholic country.

After the fall of communism he quickly rose to power. He basically represented the opposition to communism, and the comeback of religious freedom. In 1990s the political scene of Slovakia was basically two camps: those nostalgic for communism, led by Vladimír Mečiar, and those who opposed communism and wanted to join the West, led by Ján Čarnogurský. So we are talking here about the strongest, or the second strongest politician.

I remember some weird opinions of his from that era. For example, he talked a lot about how Slovakia should be "a bridge between Russia and the West", and that we should build a broad-gauge railway across Slovakia (i.e. from the Ukrainian border, to the capital city which is on the western end). If anyone else would have said that, people would probably suspect them of something, but Čarnogurský's anti-communist credentials were just too perfect, so he stayed above suspicion. (From my perspective, perhaps a little paranoid, that sounded a bit like preparing the ground for easy invasion. I mean, one day, a huge train could arrive from Russia right to our capital city, and if it turns out that the train is full of well-armed soldiers, the invasion could be over before most people would even notice that it began. Note: I have no military expertise, so maybe what I am saying here doesn't make sense.)

Then in 1998 he was unexpectedly replaced as a leader by Mikuláš Dzurinda, in a weird turn of events, that was basically a non-violent coup based on technicality. (The opposition to Mečiar was always fragmented to multiple political parties, so they always ran as a coalition. Mečiar changed the constitution to make elections much more difficult for coalitions than for individual parties. The opposition parties were like "no problem, we will make a faux political party as a temporary facade for our coalition, win the election, revert the law, disband the temporary party, and return to life as usual", and they put Dzurinda, a relatively unknown young guy, as a leader of the new party. However, after election when they asked him to disband the new party, he was like "LOL, I am the leader of the party that won the election, you guys better shut up", and governed the country.) Those were the best years for Slovakia, politically; we quickly joined EU and NATO. (Afterwards, Mečiar was replaced in the role of nostalgic post-communist alpha male leader by Robert Fico who won almost every election since then, and the opposition remains fragmented.)

Thus Ján Čarnogurský lost most of his political power. No longer the natural (Schelling-point) leader of the opposition; too much perceived as a religious fanatic to lead anyone other than those. So he quit politics, founded a private Paneuropean University (together with two Russian entrepreneurs), and later became openly pro-Russian. Among other things, he supports the Russian invasion of Ukraine, organizes protests for "peace" (read: capitulation of Ukraine), opposes the EU sanctions against Russia. He is a chairman of Slovak-Russian Society. Recently he received an Order of Honour in Russia.

localdeity on Thoughts after the Wolfram and Yudkowsky discussion

I can also come up with a story where obviously it's cheaper and more effective to disable all of the nuclear weapons than it is to take over the world, so why would the AI do the second thing?

Erm... For preventing nuclear war on the scale of decades... I don't know what you have in mind for how it would disable all the nukes, but a one-off breaking of all the firing mechanisms isn't going to work. They could just repair/replace that once they discovered the problem. You could imagine some more drastic thing like blowing up the conventional explosives on the missiles so as to utterly ruin them, but in a way that doesn't trigger the big chain reaction. But my impression is that, if you have a pile of weapons-grade uranium, then it's reasonably simple to make a bomb out of it, and since uranium is an element, no conventional explosion can eliminate that from the debris. Maybe you can melt it, mix it with other stuff, and make it super-impure?

But even then, the U.S. and Russia probably have stockpiles of weapons-grade uranium. I suspect they could make nukes out of that within a few months. You would have to ruin all the stockpiles too.

And then there's the possibility of mining more uranium and enriching it; I feel like this would take a few years at most, possibly much less if one threw a bunch of resources into rushing it. Would you ruin all uranium mines in the world somehow?

No, it seems to me that the only ways to reliably rule out nuclear war involve either using overwhelming physical force to prevent people from using or making nukes (like a drone army watching all the uranium stockpiles), or being able to reliably persuade the governments of all nuclear powers in the world to disarm and never make any new nukes. The power to do either of these things seems tantamount to the power to take over the world.

kylefurlong on The Humanitarian Economy

Thank you for sharing your experiences. It’s a story of how the best intentions go awry due to human nature and how free markets are a way of working around this. The vibrancy and efficiency of motivated people competing to make things better is a strong and vital force in the society that fosters it, and to some extent people who grow up that way tend to take it for granted. Thank you for the reality check.

In a way, this is an argument, not for social Darwinism, but for creating the possibility to escape the mean. If you take a distribution and flatten it, you eliminate the worst outcomes, but you also eliminate the vibrant top and middle. I’m guessing that allowing for a bottom allows for a much more elevated middle.

In a sense, this means that the current system is working as intended: wealth inequality gives us the highest middle.

sil-ver on [Intuitive self-models] 6. Awakening / Enlightenment / PNSE

I think this post fails as an explanation of equanimity. Which, of course, is dependent on my opinion about how equanimity works, so you have a pretty easy response of just disputing that the way I think equanimity works is correct. But idk what to do about this, so I'll just go ahead with a critique based on how I think equanimity works. So I'd say a bunch of things:

Your mechanism describes how PNSE or equanimity leads to a decrease in anxiety via breaking the feedback loop. But equanimity doesn't actually decrease the severity of an emotion, it just increases the valence! It's true that you can decrease the emotion (or reduce the time during which you feel it), but imE this is an entirely separate mechanism. So between the two mechanisms of (a) decreasing the duration of an emotion (presumably by breaking the feedback loop) and (b) applying equanimity to make it higher valence, I think you can vary each one freely independent of the other. You could do a ton of (a) with zero (b), a ton of (b) with zero (a), a lot of both, or (which is the default state) neither.
Your mechanism mostly applies to mental discomfort, but equanimity is actually much easier to apply to physical pain. You can also apply it to anxiety, but it's very hard. I can reduce suffering from moderately severe physical pain on demand (although there is very much a limit) and ditto with itching sensations, but I'm still struggling a lot with mental discomfort.
You can apply equanimity to positive sensations and it makes them better! This is a point I'd emphasize the most because imo it's such a clear and important aspect of how equanimity works. One of the ways to feel really really good is to have a pleasant sensation, like listening to music you love, and then applying maximum equanimity to it. I'm pretty sure you can enter the first jhana this way (although to my continuous disappointment I've never managed to reach the first jhana with music, so I can't guarantee it.)

... actually, you can apply equanimity to literally any conscious percept. Like literally anything; you can apply equanimity to the sense of space around you, or to the blackness in your visual field, or to white noise (or any other sounds), or to the sensation of breathing. The way to do this is hard to put into words (similar to how an elementary motor command like lifting a finger is hard to put into words); the way it's usually described is by trying to accept/not fight a sensation. (Which imo is problematic because it sounds like equanimity means stopping to do something, when I'm pretty sure it's actively doing something. Afaik there are ~zero examples of animals who learn to no longer care about pain, so it very much seems like the default is that pain is negative valence, and applying equanimity is an active process that increases valence.)

I mean again, you can just say you've talked about something else using the same term, but imo all of the above are actually not that difficult to verify. At least for me, it didn't take me that long to figure out how to apply equanimity to minor physical pain, and from there, everything is just a matter of skill to do it more -- it's very much a continuous scale of being able to apply more and more equanimity, and I think the limit is very high -- and of realizing that you can just do same thing wrt sensations that don't have negative valence in the first place.

eggsyntax on eggsyntax's Shortform

Interesting approach, thanks!

dr_manhattan on An alternative way to browse LessWrong 2.0

Any updates on the API? (thinking of) Playing around with interesting ways to index LW, figure there should be something better than scraping

sharmake-farah on Evaluating Stability of Unreflective Alignment

On this:

When I said "problems we care about", I was referring to a cluster of problems that very strongly appear to not scale well with population. Maybe this [LW · GW] is an intuitive picture of the cluster of problems I'm referring to.

I think the problem identified here is in large part a demand problem, in that lots of AI people only wanted AI capabilities, and didn't care for AI interpretability at all, so once the scaling happened, a lot of the focus went purely to AI scaling.

(Which is an interesting example of Goodhart's law in action, perhaps.)

See here:

https://www.lesswrong.com/posts/gXinMpNJcXXgSTEpn/ai-craftsmanship#Qm8Kg7PjZoPTyxrr6 [LW(p) · GW(p)]

IMO this is pretty obviously wrong. There are some kinds of problem solving that scales poorly with population, just as there are some computations that scale poorly with parallelisation.
E.g. project euler problems [LW · GW].

I definitely agree that there exist such problems where the scaling with population is pretty bad, but I'll give 2 responses here:

The differences between a human level AI and an actual human are the ability to coordinate and share ontologies better between millions of instances, so the common problems that arise when trying to factorize out problems are greatly reduced.
I think that while there are serial bottlenecks to lots of problem solving in the real world such that it prevents hyperfast outcomes, I don't think that serial bottlenecks are the dominating factor, because the stuff that is parallelizable like good execution is often far more valuable than the inherently serial computations like deep/original ideas.

startattheend on Does the "ancient wisdom" argument have any validity? If a particular teaching or tradition is old, to what extent does this make it more trustworthy?

The short version is that I'm not sold on rationality, and while I haven't read 100% of the sequences it's also not like my understanding is 0%. I'd have read more if they weren't so long. And while an intelligent person can come up with intelligent ways of thinking, I'm not sure this is reversible. I'm also mostly interested in tail-end knowledge. For some posts, I can guess the content by the title, which is boring. Finally, teaching people what not to do is really inefficient, since the space of possible mistakes is really big.

Your last link needs an s before the dot.

Anyway, I respect your decision, and I understand the purpose of this site a lot better now (though there's still a small, misleading difference between the explanation of rationality and in how users are behaving. Even the name of the website gave the wrong impression).

tahp on Thoughts after the Wolfram and Yudkowsky discussion

Oops, I meant cellular, and not molecular. I'm going to edit that.

I can come up with a story in which AI takes over the world. I can also come up with a story where obviously it's cheaper and more effective to disable all of the nuclear weapons than it is to take over the world, so why would the AI do the second thing? I see a path where instrumental convergence leads anything going hard enough to want to put all of the atoms on the most predictable path it can dictate. I think the thing that I don't get is what principle it is that makes anything useful go that hard. Something like (for example, I haven't actually thought this through) "it is hard to create something with enough agency/creativity to design and implement experiments toward a purpose without also having it notice and try to fix things in the world which are suboptimal to the purpose."