LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] Stone Age Herbalist's notes on ant warfare and slavery
trevor (TrevorWiesinger) · 2024-11-09T02:40:01.128Z · comments (0)

[question] When is reward ever the optimization target?
Noosphere89 (sharmake-farah) · 2024-10-15T15:09:20.912Z · answers+comments (12)

[question] Feedback request: what am I missing?
Nathan Helm-Burger (nathan-helm-burger) · 2024-11-02T17:38:39.625Z · answers+comments (5)

Resolving von Neumann-Morgenstern Inconsistent Preferences
niplav · 2024-10-22T11:45:20.915Z · comments (5)

Meme Talking Points
ymeskhout · 2024-11-06T15:27:54.024Z · comments (0)

Apply to MATS 7.0!
Ryan Kidd (ryankidd44) · 2024-09-21T00:23:49.778Z · comments (0)

[question] What's the Deal with Logical Uncertainty?
Ape in the coat · 2024-09-16T08:11:43.588Z · answers+comments (23)

SAE Probing: What is it good for? Absolutely something!
Subhash Kantamneni (subhashk) · 2024-11-01T19:23:55.418Z · comments (0)

Book Review: What Even Is Gender?
Joey Marcellino · 2024-09-01T16:09:27.773Z · comments (14)

The slingshot helps with learning
Wilson Wu (wilson-wu) · 2024-10-31T23:18:16.762Z · comments (0)

[link] Epistemic states as a potential benign prior
Tamsin Leake (carado-1) · 2024-08-31T18:26:14.093Z · comments (2)

Balancing Label Quantity and Quality for Scalable Elicitation
Alex Mallen (alex-mallen) · 2024-10-24T16:49:00.939Z · comments (1)

Bay Winter Solstice 2024: Speech Auditions
ozymandias · 2024-11-04T22:31:38.680Z · comments (0)

[link] Safety tax functions
owencb · 2024-10-20T14:08:38.099Z · comments (0)

[link] [Paper] Hidden in Plain Text: Emergence and Mitigation of Steganographic Collusion in LLMs
Yohan Mathew (ymath) · 2024-09-25T14:52:48.263Z · comments (2)

Fun With CellxGene
sarahconstantin · 2024-09-06T22:00:03.461Z · comments (2)

AI #85: AI Wins the Nobel Prize
Zvi · 2024-10-10T13:40:07.286Z · comments (6)

5 ways to improve CoT faithfulness
CBiddulph (caleb-biddulph) · 2024-10-05T20:17:12.637Z · comments (8)

Open Thread Fall 2024
habryka (habryka4) · 2024-10-05T22:28:50.398Z · comments (109)

AIS terminology proposal: standardize terms for probability ranges
eggsyntax · 2024-08-30T15:43:39.857Z · comments (12)

Examples of How I Use LLMs
jefftk (jkaufman) · 2024-10-14T17:10:04.597Z · comments (2)

[link] My Methodological Turn
adamShimi · 2024-09-29T15:01:45.986Z · comments (0)

[link] Liquid vs Illiquid Careers
vaishnav92 · 2024-10-20T23:03:49.725Z · comments (6)

[link] AI forecasting bots incoming
Dan H (dan-hendrycks) · 2024-09-09T19:14:31.050Z · comments (44)

Searching for phenomenal consciousness in LLMs: Perceptual reality monitoring and introspective confidence
EuanMcLean (euanmclean) · 2024-10-29T12:16:18.448Z · comments (7)

[link] Why Recursion Pharmaceuticals abandoned cell painting for brightfield imaging
Abhishaike Mahajan (abhishaike-mahajan) · 2024-11-05T14:51:41.310Z · comments (1)

[question] Where to find reliable reviews of AI products?
Elizabeth (pktechgirl) · 2024-09-17T23:48:25.899Z · answers+comments (6)

[link] Arithmetic Models: Better Than You Think
kqr · 2024-10-26T09:42:07.185Z · comments (5)

Trading Candy
jefftk (jkaufman) · 2024-11-01T01:10:08.024Z · comments (4)

[link] Our Digital and Biological Children
Eneasz · 2024-10-24T18:36:38.719Z · comments (0)

[link] AI Safety at the Frontier: Paper Highlights, August '24
gasteigerjo · 2024-09-03T19:17:24.850Z · comments (0)

Towards Quantitative AI Risk Management
Henry Papadatos (henry) · 2024-10-16T19:26:48.817Z · comments (1)

[link] A new process for mapping discussions
Nathan Young · 2024-09-30T08:57:20.029Z · comments (7)

Domain-specific SAEs
jacob_drori (jacobcd52) · 2024-10-07T20:15:38.584Z · comments (0)

[link] Generic advice caveats
Saul Munn (saul-munn) · 2024-10-30T21:03:07.185Z · comments (1)

[question] What prevents SB-1047 from triggering on deep fake porn/voice cloning fraud?
ChristianKl · 2024-09-26T09:17:39.088Z · answers+comments (21)

[link] Evaluating Synthetic Activations composed of SAE Latents in GPT-2
Giorgi Giglemiani (Rakh) · 2024-09-25T20:37:48.227Z · comments (0)

Why is there Nothing rather than Something?
Logan Zoellner (logan-zoellner) · 2024-10-26T12:37:50.204Z · comments (3)

[question] Any real toeholds for making practical decisions regarding AI safety?
lukehmiles (lcmgcd) · 2024-09-29T12:03:08.084Z · answers+comments (6)

the Daydication technique
chaosmage · 2024-10-18T21:47:46.448Z · comments (0)

An AI crash is our best bet for restricting AI
Remmelt (remmelt-ellen) · 2024-10-11T02:12:03.491Z · comments (3)

[link] Predicting Influenza Abundance in Wastewater Metagenomic Sequencing Data
jefftk (jkaufman) · 2024-09-23T17:25:58.380Z · comments (0)

[link] If-Then Commitments for AI Risk Reduction [by Holden Karnofsky]
habryka (habryka4) · 2024-09-13T19:38:53.194Z · comments (0)

Investigating Sensitive Directions in GPT-2: An Improved Baseline and Comparative Analysis of SAEs
Daniel Lee (daniel-lee) · 2024-09-06T02:28:41.954Z · comments (0)

There aren't enough smart people in biology doing something boring
Abhishaike Mahajan (abhishaike-mahajan) · 2024-10-21T15:52:04.482Z · comments (13)

Interpretability of SAE Features Representing Check in ChessGPT
Jonathan Kutasov (jonathan-kutasov) · 2024-10-05T20:43:36.679Z · comments (2)

Superintelligence Can't Solve the Problem of Deciding What You'll Do
Vladimir_Nesov · 2024-09-15T21:03:28.077Z · comments (11)

Bay Winter Solstice 2024: song leading auditions
tcheasdfjkl · 2024-11-10T23:59:08.199Z · comments (0)

Distinguishing ways AI can be "concentrated"
Matthew Barnett (matthew-barnett) · 2024-10-21T22:21:13.666Z · comments (2)

European Progress Conference
Martin Sustrik (sustrik) · 2024-10-06T11:10:03.819Z · comments (11)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

dalcy on Dalcy's Shortform

The critical insight is that this is not always the case!

Let's call two graphs I-equivalent if their set of independencies (implied by d-separation) are identical. A theorem of Bayes Nets say that two graphs are I-equivalent if they have the same skeleton and the same set of immoralities.

This last constraint, plus the constraint that the graph must be acyclic, allows some arrow directions to be identified - namely, across all I-equivalent graphs that are the perfect map of a distribution, some of the edges have identical directions assigned to them.

The IC algorithm (Verma & Pearl, 1990) for finding perfect maps (hence temporal direction) is exactly about exploiting these conditions to orient as many of the edges as possible:

More intuitively, (Verma & Pearl, 1992) and (Meek, 1995) together shows that the following four rules are necessary and sufficient operations to maximally orient the graph according to the I-equivalence (+ acyclicity) constraint:

Anyone interested in further detail should consult Pearl's Causality Ch 2. Note that for some reason Ch 2 is the only chapter in the book where Pearl talks about Causal Discovery (i.e. inferring time from observational distribution) and the rest of the book is all about Causal Inference (i.e. inferring causal effect from (partially) known causal structure).

m-y-zuo on Jan_Kulveit's Shortform

I already think that "the entire shape of the zeitgeist in America" is downstream of non-trivial efforts by more than one state actor. Those links explain documented cases of China and Russia both trying to foment race war in the US, but I could pull links for other subdimensions of culture (in science, around the second amendment, and in other areas) where this has been happening since roughly 2014.

This theory likely assigns too much intention to too large of a structure. The cleavage lines are so obvious in the U.S. that it wouldn’t take much more than a random PSYOP middle manager every week having a lark on a slow Friday afternoon, who decides to just deploy some of their resources to mess around.

Although it’s possible policy makers know this too and intentionally make it very low hanging fruit for bored personnel to mess around and get away with only a slap on the wrist.

The core issue, in any society, is that it’s thousands of times easier to destroy trust than to rebuild it.

buck on Buck's Shortform

An example snippet of a rogue internal deployment [LW · GW] threat model that I find fairly compelling:

The schemer, while deployed in an authorized scaffold, launches an unmonitored scaffold that has access to trusted model weights.
It uses this scaffold to do a GCG attack on the trusted model, then writes the resulting GCG attack somewhere that the agent in the authorized scaffold can see. This takes a moderate amount of compute, but a small amount compared to the amount the untrusted model will regularly need to run experiments.
The agent in the authorized scaffold uses that GCG attack to do something else that's more bad.

To defend against this, we should try to both prevent the untrusted model from ever getting rogue deployments where it can research jailbreaks on the trusted model, and also try to prevent unacceptable outcomes conditional on the untrusted agent having access to the best jailbreaks you can get with GCG etc.

I find this interesting because it seems kind of hard to stop either step 1 (because rogue deployments might be naturally possible given the affordances the model has) or step 3 (because it's hard to ensure safety against an adversary with white-box access to a model it wants to jailbreak). I'd be excited for more research on both of these.

sharmake-farah on Quick look: applications of chaos theory

I think the crux of it is here:

I've always struggled to make sense of the idea of brain uploading because it seems to rely on some sort of dualism. As a materialist, it seems obvious to me that a brain is a brain, a program that replicates the brain's output is a program (and will perform its task more or less well but probably not perfectly), and the two are not the same.

I think that basically everything in the universe can be considered a program/computation, but I also think the notion of a program/computation is quite trivial.

More substantively, I think it might be possible to replicate at least some parts of the physical world with future computers that have what is called physical universality, where they can manipulate the physical world essentially arbitrarily.

So I don't view brains and computer programs as being of 2 different types, but rather as the same type as a program/computation.

See below for some intuition as to why.

http://www.amirrorclear.net/academic/ideas/simulation/index.html

martin-randall on Anvil Problems

Naming: I've more commonly heard "anvil problem" to refer to an exploring agent that doesn't understand that it is part of the environment it is exploring and therefore "drops an anvil on its own head". See anvil problem [? · GW] tag for more.

garrison on Is Deep Learning Actually Hitting a Wall? Evaluating Ilya Sutskever's Recent Claims

Thanks for these!

viliam on The Humanitarian Economy

If we ignored housing, then "free market + some taxation and giving the money to the poor" kinda sounds like the best of both words. Unfortunately, the increasing rents can eat the extra money given to the poor. (See also: Georgism.)

Maybe if we could get UBI high enough that people could survive on that alone, it would no longer be necessary to live in cities (close to good jobs) and people could avoid paying too high rent. Or maybe not, because ultimately all land belongs to someone? Not sure.

viliam on Jan_Kulveit's Shortform

It is difficult to prove things, but I strongly suspect that in Slovakia, Ján Čarnogurský is a Russian asset.

In my opinion, the only remaining question is when exactly was he recruited, how long game was played on us. I have suspected him for a long time, but most people probably would have called me crazy for that, however recently he became openly pro-Russian, to a great surprise for many of his former supporters. So the question is whether I was right and this was a long con, or whether he had a change of mind recently and my previous suspicions were merely a coincidence (homogeneity of the outgroup, etc.).

If this indeed was a long con (maybe, maybe not), then he had a perfect cover story. During communism, he was a lawyer and provided legal support for the anti-Communist opposition. Two years before the fall of communism, he was fired and unemployed. Three months before the fall of communism, he was put in prison. Also, he was strongly religious (perceived as a religious fanatic by some). Remember that Slovakia is a predominantly Catholic country.

After the fall of communism he quickly rose to power. He basically represented the opposition to communism, and the comeback of religious freedom. In 1990s the political scene of Slovakia was basically two camps: those nostalgic for communism, led by Vladimír Mečiar, and those who opposed communism and wanted to join the West, led by Ján Čarnogurský. So we are talking here about the strongest, or the second strongest politician.

I remember some weird opinions of his from that era. For example, he talked a lot about how Slovakia should be "a bridge between Russia and the West", and that we should build a broad-gauge railway across Slovakia (i.e. from the Ukrainian border, to the capital city which is on the western end). If anyone else would have said that, people would probably suspect them of something, but Čarnogurský's anti-communist credentials were just too perfect, so he stayed above suspicion. (From my perspective, perhaps a little paranoid, that sounded a bit like preparing the ground for easy invasion. I mean, one day, a huge train could arrive from Russia right to our capital city, and if it turns out that the train is full of well-armed soldiers, the invasion could be over before most people would even notice that it began. Note: I have no military expertise, so maybe what I am saying here doesn't make sense.)

Then in 1998 he was unexpectedly replaced as a leader by Mikuláš Dzurinda, in a weird turn of events, that was basically a non-violent coup based on technicality. (The opposition to Mečiar was always fragmented to multiple political parties, so they always ran as a coalition. Mečiar changed the constitution to make elections much more difficult for coalitions than for individual parties. The opposition parties were like "no problem, we will make a faux political party as a temporary facade for our coalition, win the election, revert the law, disband the temporary party, and return to life as usual", and they put Dzurinda, a relatively unknown young guy, as a leader of the new party. However, after election when they asked him to disband the new party, he was like "LOL, I am the leader of the party that won the election, you guys better shut up", and governed the country.) Those were the best years for Slovakia, politically; we quickly joined EU and NATO. (Afterwards, Mečiar was replaced in the role of nostalgic post-communist alpha male leader by Robert Fico who won almost every election since then, and the opposition remains fragmented.)

Thus Ján Čarnogurský lost most of his political power. No longer the natural (Schelling-point) leader of the opposition; too much perceived as a religious fanatic to lead anyone other than those. So he quit politics, founded a private Paneuropean University (together with two Russian entrepreneurs), and later became openly pro-Russian. Among other things, he supports the Russian invasion of Ukraine, organizes protests for "peace" (read: capitulation of Ukraine), opposes the EU sanctions against Russia. He is a chairman of Slovak-Russian Society. Recently he received an Order of Honour in Russia.

localdeity on Thoughts after the Wolfram and Yudkowsky discussion

I can also come up with a story where obviously it's cheaper and more effective to disable all of the nuclear weapons than it is to take over the world, so why would the AI do the second thing?

Erm... For preventing nuclear war on the scale of decades... I don't know what you have in mind for how it would disable all the nukes, but a one-off breaking of all the firing mechanisms isn't going to work. They could just repair/replace that once they discovered the problem. You could imagine some more drastic thing like blowing up the conventional explosives on the missiles so as to utterly ruin them, but in a way that doesn't trigger the big chain reaction. But my impression is that, if you have a pile of weapons-grade uranium, then it's reasonably simple to make a bomb out of it, and since uranium is an element, no conventional explosion can eliminate that from the debris. Maybe you can melt it, mix it with other stuff, and make it super-impure?

But even then, the U.S. and Russia probably have stockpiles of weapons-grade uranium. I suspect they could make nukes out of that within a few months. You would have to ruin all the stockpiles too.

And then there's the possibility of mining more uranium and enriching it; I feel like this would take a few years at most, possibly much less if one threw a bunch of resources into rushing it. Would you ruin all uranium mines in the world somehow?

No, it seems to me that the only ways to reliably rule out nuclear war involve either using overwhelming physical force to prevent people from using or making nukes (like a drone army watching all the uranium stockpiles), or being able to reliably persuade the governments of all nuclear powers in the world to disarm and never make any new nukes. The power to do either of these things seems tantamount to the power to take over the world.

kylefurlong on The Humanitarian Economy

Thank you for sharing your experiences. It’s a story of how the best intentions go awry due to human nature and how free markets are a way of working around this. The vibrancy and efficiency of motivated people competing to make things better is a strong and vital force in the society that fosters it, and to some extent people who grow up that way tend to take it for granted. Thank you for the reality check.

In a way, this is an argument, not for social Darwinism, but for creating the possibility to escape the mean. If you take a distribution and flatten it, you eliminate the worst outcomes, but you also eliminate the vibrant top and middle. I’m guessing that allowing for a bottom allows for a much more elevated middle.

In a sense, this means that the current system is working as intended: wealth inequality gives us the highest middle.