LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

On the UBI Paper
Zvi · 2024-09-03T14:50:08.647Z · comments (6)

[link] [EAForum xpost] A breakdown of OpenAI's revenue
dschwarz · 2024-07-10T18:09:20.017Z · comments (5)

Referendum Mechanics in a Marketplace of Ideas
Martin Sustrik (sustrik) · 2024-08-25T08:30:01.901Z · comments (2)

John Schulman leaves OpenAI for Anthropic
Sodium · 2024-08-06T01:23:15.427Z · comments (0)

AI #84: Better Than a Podcast
Zvi · 2024-10-03T15:00:07.128Z · comments (7)

... Wait, our models of semantics should inform fluid mechanics?!?
johnswentworth · 2024-08-26T16:38:53.924Z · comments (18)

A "Bitter Lesson" Approach to Aligning AGI and ASI
RogerDearnaley (roger-d-1) · 2024-07-06T01:23:22.376Z · comments (39)

[link] Making Eggs Without Ovaries
Niko_McCarty (niko-2) · 2024-09-22T17:44:46.733Z · comments (3)

Evidence against Learned Search in a Chess-Playing Neural Network
p.b. · 2024-09-13T11:59:55.634Z · comments (3)

Coalitional agency
Richard_Ngo (ricraz) · 2024-07-22T00:09:51.525Z · comments (6)

[link] How much I'm paying for AI productivity software (and the future of AI use)
jacquesthibs (jacques-thibodeau) · 2024-10-11T17:11:27.025Z · comments (16)

Measuring Structure Development in Algorithmic Transformers
Micurie (micurie) · 2024-08-22T08:38:02.140Z · comments (4)

Owain Evans on Situational Awareness and Out-of-Context Reasoning in LLMs
Michaël Trazzi (mtrazzi) · 2024-08-24T04:30:11.807Z · comments (0)

[link] Datasets that change the odds you exist
dynomight · 2024-06-29T18:45:14.385Z · comments (4)

[link] Demis Hassabis — Google DeepMind: The Podcast
Zach Stein-Perlman · 2024-08-16T00:00:04.712Z · comments (8)

Safe Predictive Agents with Joint Scoring Rules
Rubi J. Hudson (Rubi) · 2024-10-09T16:38:16.535Z · comments (10)

How the AI safety technical landscape has changed in the last year, according to some practitioners
tlevin (trevor) · 2024-07-26T19:06:47.126Z · comments (6)

Secret Collusion: Will We Know When to Unplug AI?
schroederdewitt · 2024-09-16T16:07:01.119Z · comments (7)

A Path out of Insufficient Views
Unreal · 2024-09-24T20:00:27.332Z · comments (46)

AI Craftsmanship
abramdemski · 2024-11-11T22:17:01.112Z · comments (6)

Seeking Collaborators
abramdemski · 2024-11-01T17:13:36.162Z · comments (14)

[Intuitive self-models] 5. Dissociative Identity (Multiple Personality) Disorder
Steven Byrnes (steve2152) · 2024-10-15T13:31:46.157Z · comments (7)

[link] On the Role of Proto-Languages
adamShimi · 2024-09-22T16:50:34.720Z · comments (1)

[link] Unlocking Solutions—By Understanding Coordination Problems
James Stephen Brown (james-brown) · 2024-07-27T04:52:13.435Z · comments (4)

[question] Could orcas be (trained to be) smarter than humans? 
Towards_Keeperhood (Simon Skade) · 2024-11-04T23:29:26.677Z · answers+comments (11)

Thiel on AI & Racing with China
Ben Pace (Benito) · 2024-08-20T03:19:18.966Z · comments (10)

[link] How Likely Are Various Precursors of Existential Risk?
NunoSempere (Radamantis) · 2024-10-28T13:27:31.620Z · comments (4)

[link] Breaking Circuit Breakers
mikes · 2024-07-14T18:57:20.251Z · comments (13)

Calendar feature geometry in GPT-2 layer 8 residual stream SAEs
Patrick Leask (patrickleask) · 2024-08-17T01:16:53.764Z · comments (0)

AI #71: Farewell to Chevron
Zvi · 2024-07-04T13:40:05.905Z · comments (9)

AI #76: Six Shorts Stories About OpenAI
Zvi · 2024-08-08T13:50:04.659Z · comments (10)

Causal Graphs of GPT-2-Small's Residual Stream
David Udell · 2024-07-09T22:06:55.775Z · comments (7)

AI #87: Staying in Character
Zvi · 2024-10-29T07:10:08.212Z · comments (3)

[link] The Mysterious Trump Buyers on Polymarket
Annapurna (jorge-velez) · 2024-10-18T13:26:25.565Z · comments (9)

Parental Writing Selection Bias
jefftk (jkaufman) · 2024-10-13T14:00:03.225Z · comments (3)

BatchTopK: A Simple Improvement for TopK-SAEs
Bart Bussmann (Stuckwork) · 2024-07-20T02:20:51.848Z · comments (0)

How to Give in to Threats (without incentivizing them)
Mikhail Samin (mikhail-samin) · 2024-09-12T15:55:50.384Z · comments (26)

[link] how birds sense magnetic fields
bhauth · 2024-06-27T18:59:35.075Z · comments (4)

Model evals for dangerous capabilities
Zach Stein-Perlman · 2024-09-23T11:00:00.866Z · comments (9)

[link] Prices are Bounties
Maxwell Tabarrok (maxwell-tabarrok) · 2024-10-12T14:51:40.689Z · comments (13)

Reformative Hypocrisy, and Paying Close Enough Attention to Selectively Reward It.
Andrew_Critch · 2024-09-11T04:41:24.872Z · comments (7)

Provably Safe AI: Worldview and Projects
bgold · 2024-08-09T23:21:02.763Z · comments (43)

Llama Llama-3-405B?
Zvi · 2024-07-24T19:40:07.565Z · comments (9)

Claude Sonnet 3.5.1 and Haiku 3.5
Zvi · 2024-10-24T14:50:06.286Z · comments (9)

How might we solve the alignment problem? (Part 1: Intro, summary, ontology)
Joe Carlsmith (joekc) · 2024-10-28T21:57:12.063Z · comments (5)

Rewilding the Gut VS the Autoimmune Epidemic
GGD · 2024-08-16T18:00:46.239Z · comments (0)

[link] Anthropic's updated Responsible Scaling Policy
Zac Hatfield-Dodds (zac-hatfield-dodds) · 2024-10-15T16:46:48.727Z · comments (3)

AI #82: The Governor Ponders
Zvi · 2024-09-19T13:30:04.863Z · comments (8)

Sherlockian Abduction Master List
Cole Wyeth (Amyr) · 2024-07-11T20:27:00.000Z · comments (63)

Unlearning via RMU is mostly shallow
Andy Arditi (andy-arditi) · 2024-07-23T16:07:52.223Z · comments (3)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

buck on Buck's Shortform

An example snippet of a rogue internal deployment [LW · GW] threat model that I find fairly compelling:

The schemer, while deployed in an authorized scaffold, launches an unmonitored scaffold that has access to trusted model weights.
It uses this scaffold to do a GCG attack on the trusted model, then writes the resulting GCG attack somewhere that the agent in the authorized scaffold can see. This takes a moderate amount of compute, but a small amount compared to the amount the untrusted model will regularly need to run experiments.
The agent in the authorized scaffold uses that GCG attack to do something else that's more bad.

To defend against this, we should try to both prevent the untrusted model from ever getting rogue deployments where it can research jailbreaks on the trusted model, and also try to prevent unacceptable outcomes conditional on the untrusted agent having access to the best jailbreaks you can get with GCG etc.

I find this interesting because it seems kind of hard to stop either step 1 (because rogue deployments might be naturally possible given the affordances the model has) or step 3 (because it's hard to ensure safety against an adversary with white-box access to a model it wants to jailbreak). I'd be excited for more research on both of these.

sharmake-farah on Quick look: applications of chaos theory

I think the crux of it is here:

I've always struggled to make sense of the idea of brain uploading because it seems to rely on some sort of dualism. As a materialist, it seems obvious to me that a brain is a brain, a program that replicates the brain's output is a program (and will perform its task more or less well but probably not perfectly), and the two are not the same.

I think that basically everything in the universe can be considered a program/computation, but I also think the notion of a program/computation is quite trivial.

More substantively, I think it might be possible to replicate at least some parts of the physical world with future computers that have what is called physical universality, where they can manipulate the physical world essentially arbitrarily.

So I don't view brains and computer programs as being of 2 different types, but rather as the same type as a program/computation.

See below for some intuition as to why.

http://www.amirrorclear.net/academic/ideas/simulation/index.html

martin-randall on Anvil Problems

Naming: I've more commonly heard "anvil problem" to refer to an exploring agent that doesn't understand that it is part of the environment it is exploring and therefore "drops an anvil on its own head". See anvil problem [? · GW] tag for more.

garrison on Is Deep Learning Actually Hitting a Wall? Evaluating Ilya Sutskever's Recent Claims

Thanks for these!

viliam on The Humanitarian Economy

If we ignored housing, then "free market + some taxation and giving the money to the poor" kinda sounds like the best of both words. Unfortunately, the increasing rents can eat the extra money given to the poor. (See also: Georgism.)

Maybe if we could get UBI high enough that people could survive on that alone, it would no longer be necessary to live in cities (close to good jobs) and people could avoid paying too high rent. Or maybe not, because ultimately all land belongs to someone? Not sure.

viliam on Jan_Kulveit's Shortform

It is difficult to prove things, but I strongly suspect that in Slovakia, Ján Čarnogurský is a Russian asset.

In my opinion, the only remaining question is when exactly was he recruited, how long game was played on us. I have suspected him for a long time, but most people probably would have called me crazy for that, however recently he became openly pro-Russian, to a great surprise for many of his former supporters. So the question is whether I was right and this was a long con, or whether he had a change of mind recently and my previous suspicions were merely a coincidence (homogeneity of the outgroup, etc.).

If this indeed was a long con (maybe, maybe not), then he had a perfect cover story. During communism, he was a lawyer and provided legal support for the anti-Communist opposition. Two years before the fall of communism, he was fired and unemployed. Three months before the fall of communism, he was put in prison. Also, he was strongly religious (perceived as a religious fanatic by some). Remember that Slovakia is a predominantly Catholic country.

After the fall of communism he quickly rose to power. He basically represented the opposition to communism, and the comeback of religious freedom. In 1990s the political scene of Slovakia was basically two camps: those nostalgic for communism, led by Vladimír Mečiar, and those who opposed communism and wanted to join the West, led by Ján Čarnogurský. So we are talking here about the strongest, or the second strongest politician.

I remember some weird opinions of his from that era. For example, he talked a lot about how Slovakia should be "a bridge between Russia and the West", and that we should build a broad-gauge railway across Slovakia (i.e. from the Ukrainian border, to the capital city which is on the western end). If anyone else would have said that, people would probably suspect them of something, but Čarnogurský's anti-communist credentials were just too perfect, so he stayed above suspicion. (From my perspective, perhaps a little paranoid, that sounded a bit like preparing the ground for easy invasion. I mean, one day, a huge train could arrive from Russia right to our capital city, and if it turns out that the train is full of well-armed soldiers, the invasion could be over before most people would even notice that it began. Note: I have no military expertise, so maybe what I am saying here doesn't make sense.)

Then in 1998 he was unexpectedly replaced as a leader by Mikuláš Dzurinda, in a weird turn of events, that was basically a non-violent coup based on technicality. (The opposition to Mečiar was always fragmented to multiple political parties, so they always ran as a coalition. Mečiar changed the constitution to make elections much more difficult for coalitions than for individual parties. The opposition parties were like "no problem, we will make a faux political party as a temporary facade for our coalition, win the election, revert the law, disband the temporary party, and return to life as usual", and they put Dzurinda, a relatively unknown young guy, as a leader of the new party. However, after election when they asked him to disband the new party, he was like "LOL, I am the leader of the party that won the election, you guys better shut up", and governed the country.) Those were the best years for Slovakia, politically; we quickly joined EU and NATO. (Afterwards, Mečiar was replaced in the role of nostalgic post-communist alpha male leader by Robert Fico who won almost every election since then, and the opposition remains fragmented.)

Thus Ján Čarnogurský lost most of his political power. No longer the natural (Schelling-point) leader of the opposition; too much perceived as a religious fanatic to lead anyone other than those. So he quit politics, founded a private Paneuropean University (together with two Russian entrepreneurs), and later became openly pro-Russian. Among other things, he supports the Russian invasion of Ukraine, organizes protests for "peace" (read: capitulation of Ukraine), opposes the EU sanctions against Russia. He is a chairman of Slovak-Russian Society. Recently he received an Order of Honour in Russia.

localdeity on Thoughts after the Wolfram and Yudkowsky discussion

I can also come up with a story where obviously it's cheaper and more effective to disable all of the nuclear weapons than it is to take over the world, so why would the AI do the second thing?

Erm... For preventing nuclear war on the scale of decades... I don't know what you have in mind for how it would disable all the nukes, but a one-off breaking of all the firing mechanisms isn't going to work. They could just repair/replace that once they discovered the problem. You could imagine some more drastic thing like blowing up the conventional explosives on the missiles so as to utterly ruin them, but in a way that doesn't trigger the big chain reaction. But my impression is that, if you have a pile of weapons-grade uranium, then it's reasonably simple to make a bomb out of it, and since uranium is an element, no conventional explosion can eliminate that from the debris. Maybe you can melt it, mix it with other stuff, and make it super-impure?

But even then, the U.S. and Russia probably have stockpiles of weapons-grade uranium. I suspect they could make nukes out of that within a few months. You would have to ruin all the stockpiles too.

And then there's the possibility of mining more uranium and enriching it; I feel like this would take a few years at most, possibly much less if one threw a bunch of resources into rushing it. Would you ruin all uranium mines in the world somehow?

No, it seems to me that the only ways to reliably rule out nuclear war involve either using overwhelming physical force to prevent people from using or making nukes (like a drone army watching all the uranium stockpiles), or being able to reliably persuade the governments of all nuclear powers in the world to disarm and never make any new nukes. The power to do either of these things seems tantamount to the power to take over the world.

kylefurlong on The Humanitarian Economy

Thank you for sharing your experiences. It’s a story of how the best intentions go awry due to human nature and how free markets are a way of working around this. The vibrancy and efficiency of motivated people competing to make things better is a strong and vital force in the society that fosters it, and to some extent people who grow up that way tend to take it for granted. Thank you for the reality check.

In a way, this is an argument, not for social Darwinism, but for creating the possibility to escape the mean. If you take a distribution and flatten it, you eliminate the worst outcomes, but you also eliminate the vibrant top and middle. I’m guessing that allowing for a bottom allows for a much more elevated middle.

In a sense, this means that the current system is working as intended: wealth inequality gives us the highest middle.

sil-ver on [Intuitive self-models] 6. Awakening / Enlightenment / PNSE

I think this post fails as an explanation of equanimity. Which, of course, is dependent on my opinion about how equanimity works, so you have a pretty easy response of just disputing that the way I think equanimity works is correct. But idk what to do about this, so I'll just go ahead with a critique based on how I think equanimity works. So I'd say a bunch of things:

Your mechanism describes how PNSE or equanimity leads to a decrease in anxiety via breaking the feedback loop. But equanimity doesn't actually decrease the severity of an emotion, it just increases the valence! It's true that you can decrease the emotion (or reduce the time during which you feel it), but imE this is an entirely separate mechanism. So between the two mechanisms of (a) decreasing the duration of an emotion (presumably by breaking the feedback loop) and (b) applying equanimity to make it higher valence, I think you can vary each one freely independent of the other. You could do a ton of (a) with zero (b), a ton of (b) with zero (a), a lot of both, or (which is the default state) neither.
Your mechanism mostly applies to mental discomfort, but equanimity is actually much easier to apply to physical pain. You can also apply it to anxiety, but it's very hard. I can reduce suffering from moderately severe physical pain on demand (although there is very much a limit) and ditto with itching sensations, but I'm still struggling a lot with mental discomfort.
You can apply equanimity to positive sensations and it makes them better! This is a point I'd emphasize the most because imo it's such a clear and important aspect of how equanimity works. One of the ways to feel really really good is to have a pleasant sensation, like listening to music you love, and then applying maximum equanimity to it. I'm pretty sure you can enter the first jhana this way (although to my continuous disappointment I've never managed to reach the first jhana with music, so I can't guarantee it.)

... actually, you can apply equanimity to literally any conscious percept. Like literally anything; you can apply equanimity to the sense of space around you, or to the blackness in your visual field, or to white noise (or any other sounds), or to the sensation of breathing. The way to do this is hard to put into words (similar to how an elementary motor command like lifting a finger is hard to put into words); the way it's usually described is by trying to accept/not fight a sensation. (Which imo is problematic because it sounds like equanimity means stopping to do something, when I'm pretty sure it's actively doing something. Afaik there are ~zero examples of animals who learn to no longer care about pain, so it very much seems like the default is that pain is negative valence, and applying equanimity is an active process that increases valence.)

I mean again, you can just say you've talked about something else using the same term, but imo all of the above are actually not that difficult to verify. At least for me, it didn't take me that long to figure out how to apply equanimity to minor physical pain, and from there, everything is just a matter of skill to do it more -- it's very much a continuous scale of being able to apply more and more equanimity, and I think the limit is very high -- and of realizing that you can just do same thing wrt sensations that don't have negative valence in the first place.

eggsyntax on eggsyntax's Shortform

Interesting approach, thanks!