LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] Arrogance and People Pleasing
Jonathan Moregård (JonathanMoregard) · 2024-02-06T18:43:09.120Z · comments (7)

Why I think it's net harmful to do technical safety research at AGI labs
Remmelt (remmelt-ellen) · 2024-02-07T04:17:15.246Z · comments (24)

Consequentialism is a compass, not a judge
Neil (neil-warren) · 2024-04-13T10:47:44.980Z · comments (6)

Living with Rats in College
lsusr · 2024-12-25T10:44:13.085Z · comments (0)

Mask and Respirator Intelligibility Comparison
jefftk (jkaufman) · 2024-12-07T03:20:01.585Z · comments (5)

Action derivatives: You’re not doing what you think you’re doing
PatrickDFarley · 2024-11-21T16:24:04.044Z · comments (0)

[link] Creating Interpretable Latent Spaces with Gradient Routing
Jacob G-W (g-w1) · 2024-12-14T04:00:17.249Z · comments (6)

Preface
Allison Duettmann (allison-duettmann) · 2025-01-02T18:59:46.290Z · comments (2)

Trying Bluesky
jefftk (jkaufman) · 2024-11-17T02:50:04.093Z · comments (17)

[link] Introducing the Anthropic Fellows Program
Miranda Zhang (miranda-zhang) · 2024-11-30T23:47:29.259Z · comments (0)

Intranasal mRNA Vaccines?
J Bostock (Jemist) · 2025-01-01T23:46:40.524Z · comments (2)

AI #93: Happy Tuesday
Zvi · 2024-12-04T00:30:06.891Z · comments (2)

Learning Multi-Level Features with Matryoshka SAEs
Bart Bussmann (Stuckwork) · 2024-12-19T15:59:00.036Z · comments (4)

Thoughts after the Wolfram and Yudkowsky discussion
Tahp · 2024-11-14T01:43:12.920Z · comments (13)

[link] debating buying NVDA in 2019
bhauth · 2025-01-04T05:06:54.047Z · comments (0)

Elevating Air Purifiers
jefftk (jkaufman) · 2024-12-17T01:40:05.401Z · comments (0)

Alternatives to Masks for Infectious Aerosols
jefftk (jkaufman) · 2024-12-08T14:00:01.670Z · comments (9)

[link] Linkpost: "Imagining and building wise machines: The centrality of AI metacognition" by Johnson, Karimi, Bengio, et al.
Chris_Leong · 2024-11-11T16:13:26.504Z · comments (6)

Why I Think All The Species Of Significantly Debated Consciousness Are Conscious And Suffer Intensely
omnizoid · 2024-11-20T16:48:44.859Z · comments (5)

[link] Effective Networking as Sending Hard to Fake Signals
vaishnav92 · 2024-12-12T20:32:24.113Z · comments (2)

Deceptive Alignment and Homuncularity
Oliver Sourbut · 2025-01-16T13:55:19.161Z · comments (12)

[link] Was a Subway in New York City Inevitable?
Jeffrey Heninger (jeffrey-heninger) · 2024-03-30T00:53:21.314Z · comments (4)

Housing Roundup #9: Restricting Supply
Zvi · 2024-07-17T12:50:05.321Z · comments (8)

Decent plan prize winner & highlights
lemonhope (lcmgcd) · 2024-01-19T23:30:34.242Z · comments (2)

[link] A primer on the next generation of antibodies
Abhishaike Mahajan (abhishaike-mahajan) · 2024-09-01T22:37:59.207Z · comments (0)

A Review of In-Context Learning Hypotheses for Automated AI Alignment Research
alamerton · 2024-04-18T18:29:33.892Z · comments (4)

D&D.Sci Hypersphere Analysis Part 4: Fine-tuning and Wrapup
aphyer · 2024-01-18T03:06:39.344Z · comments (5)

[link] Beware the science fiction bias in predictions of the future
Nikita Sokolsky (nikita-sokolsky) · 2024-08-19T05:32:47.372Z · comments (20)

[link] The Best Essay (Paul Graham)
Chris_Leong · 2024-03-11T19:25:42.176Z · comments (2)

[question] When can I be numerate?
FinalFormal2 · 2024-09-12T04:05:27.710Z · answers+comments (4)

[link] Transformer Debugger
Henk Tillman (henk-tillman) · 2024-03-12T19:08:56.280Z · comments (0)

Virtually Rational - VRChat Meetup
Tomás B. (Bjartur Tómas) · 2024-01-28T05:52:36.934Z · comments (3)

[question] How to Model the Future of Open-Source LLMs?
Joel Burget (joel-burget) · 2024-04-19T14:28:00.175Z · answers+comments (9)

Useful starting code for interpretability
eggsyntax · 2024-02-13T23:13:47.940Z · comments (2)

Trying to be rational for the wrong reasons
Viliam · 2024-08-20T16:18:06.385Z · comments (9)

Clipboard Filtering
jefftk (jkaufman) · 2024-04-14T20:50:02.256Z · comments (1)

[link] An Intuitive Explanation of Sparse Autoencoders for Mechanistic Interpretability of LLMs
Adam Karvonen (karvonenadam) · 2024-06-25T15:57:16.872Z · comments (0)

To Boldly Code
StrivingForLegibility · 2024-01-26T18:25:59.525Z · comments (4)

[link] Announcing Open Philanthropy's AI governance and policy RFP
Julian Hazell (julian-hazell) · 2024-07-17T02:02:39.933Z · comments (0)

Can Generalized Adversarial Testing Enable More Rigorous LLM Safety Evals?
scasper · 2024-07-30T14:57:06.807Z · comments (0)

Abstractions are not Natural
Alfred Harwood · 2024-11-04T11:10:09.023Z · comments (21)

Using an LLM perplexity filter to detect weight exfiltration
Adam Karvonen (karvonenadam) · 2024-07-21T18:18:05.612Z · comments (11)

[question] When engaging with a large amount of resources during a literature review, how do you prevent yourself from becoming overwhelmed?
corruptedCatapillar · 2024-11-01T07:29:49.262Z · answers+comments (2)

[link] Sticker Shortcut Fallacy — The Real Worst Argument in the World
ymeskhout · 2024-06-12T14:52:41.988Z · comments (15)

[link] Conventional footnotes considered harmful
dkl9 · 2024-10-01T14:54:01.732Z · comments (16)

AXRP Episode 30 - AI Security with Jeffrey Ladish
DanielFilan · 2024-05-01T02:50:04.621Z · comments (0)

[link] A Theory of Equilibrium in the Offense-Defense Balance
Maxwell Tabarrok (maxwell-tabarrok) · 2024-11-15T13:51:33.376Z · comments (6)

Fun With The Tabula Muris (Senis)
sarahconstantin · 2024-09-20T18:20:01.901Z · comments (0)

An experiment on hidden cognition
Olli Järviniemi (jarviniemi) · 2024-07-22T03:26:05.564Z · comments (2)

AXRP Episode 36 - Adam Shai and Paul Riechers on Computational Mechanics
DanielFilan · 2024-09-29T05:50:02.531Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

eggsyntax on Numberwang: LLMs Doing Autonomous Research, and a Call for Input

I see your view, I think, but I just disagree. I think that if our future goes well, it will be because we found ways to align AI well enough, and/or because we coordinated politically to slow or stop AI advancement long enough to accomplish the alignment part, not because researchers avoided measured AI's capabilities.

christiankl on Tax Price Gouging?

Yes, what's covered by price gouging would likely expand.

Politicians who want to balance the budget will find it easier to argue to expand the revenue through expanding what's covered under price gouging than to raise income or sales taxes. Opposition to the proposal would also be harder than to oppose what's currently covered as evil socialist price setting by the government.

kaj_sotala on Heritability: Five Battles

But interestingly, Turkheimer also says some nice things about the book The Genetic Lottery: Why DNA Matters for Social Equality, by his former student Kathryn Paige Harden. Unlike Turkheimer, Harden is much more willing to ascribe a straightforward causal role of genetics in intelligence and other traits. But Harden reaches essentially the same sociopolitical conclusions as Turkheimer (according to Turkheimer). I didn’t read Harden’s book, but I presume that those conclusions are things like “racism is bad”, “we should help the downtrodden”, “don’t be an asshole”, and so on—conclusions that I myself enthusiastically endorse as well.

This New Yorker profile about Harden gets a bit into her philosophy. She wants society to use genetic data to design more effective social interventions for making people better off, and for an improved understanding of the effect of genetics to make people more receptive to programs designed to increase equality of outcome:

The first thing that social-science genomics can do is help researchers control for confounding genetic variables that are almost universally overlooked. As Harden puts it in her book, “Genetic data gets one source of human differences out of the way, so that the environment is easier to see.” For example, beginning in 2002, the federal government spent almost a billion dollars on something called the Healthy Marriage Initiative, which sought to reduce marital conflict as a way of combatting poverty and juvenile crime. Harden was not surprised to hear that the policy had no discernible effect. Her own research showed that, when identical-twin sisters have marriages with different levels of conflict, their children have equal risk for delinquency. The point was not to estimate the effects of DNA per se, but to provide an additional counterfactual for analysis: would an observed result continue to hold up if the people involved had different genes? Harden can identify studies on a vast array of topics—Will coaching underresourced parents to speak more to their children reduce educational gaps? Does having dinner earlier improve familial relationships?—whose conclusions she considers dubious because the researchers controlled for everything except the fact that parents pass along to their children both a home environment and a genome.
She acknowledged that gwas techniques are too new, and the anxieties about behavior genetics too deeply entrenched, to have produced many immediately instrumental examples so far. But she pointed to a study from last year as proof of concept. A team of researchers led by Jasmin Wertz, at Duke, used GWAS results to examine four different “aspects of parenting that have previously been shown to predict children’s educational attainment: cognitive stimulation; warmth and sensitivity; household chaos (reverse-coded to indicate low household chaos); and the safety and tidiness of the family home.” They found that one of them—cognitive stimulation—was linked to children’s academic achievement and their mothers’ genes, even when the children did not inherit the relevant variants. Parental choices to read books, do puzzles, and visit museums might be conditioned by their own genes, but they nevertheless produced significant environmental effects.
Even the discovery that a particular outcome is largely genetic doesn’t mean that its effects will invariably persist. In 1972, the U.K. government raised the age at which students could leave school, from fifteen to sixteen. In 2018, a research group studied the effects of the extra year on the students as adults, and found that their health outcomes for measures like body-mass index, for whatever reason, improved slightly on average. But those with a high genetic propensity for obesity benefitted dramatically—a differential impact that might easily have gone unnoticed.
Some of Harden’s most recent research has looked at curricular tracking for mathematics, an intuitive instance of how gene-environment interactions can create feedback loops. Poor schools, Harden has found, tend to let down all their students: those with innate math ability are rarely encouraged to pursue advanced classes, and those who struggle are allowed to drop the subject entirely—a situation that often forecloses the possibility of college. The most well-off schools are able to initiate virtuous cycles in the most gifted math students, and break vicious cycles in the less gifted, raising the ceiling and the floor for achievement. [...]
Harden is not alone in her drive to fulfill Turkheimer’s dream of a “psychometric left.” Dalton Conley and Jason Fletcher’s book, “The Genome Factor,” from 2017, outlines similar arguments, as does the sociologist Jeremy Freese. Last year, Fredrik deBoer published “The Cult of Smart,” which argues that the education-reform movement has been trammelled by its willful ignorance of genetic variation. Views associated with the “hereditarian left” have also been articulated by the psychiatrist and essayist Scott Alexander and the philosopher Peter Singer. Singer told me, of Harden, “Her ethical arguments are ones that I have held for quite a long time. If you ignore these things that contribute to inequality, or pretend they don’t exist, you make it more difficult to achieve the kind of society that you value.” He added, “There’s a politically correct left that’s still not open to these things.” [...]
The ultimate claim of “The Genetic Lottery” is an extraordinarily ambitious act of moral entrepreneurialism. Harden argues that an appreciation of the role of simple genetic luck—alongside all the other arbitrary lotteries of birth—will make us, as a society, more inclined to ensure that everyone has the opportunity to enjoy lives of dignity and comfort. She writes, “I think we must dismantle the false distinction between ‘inequalities that society is responsible for addressing’ and ‘inequalities that are caused by differences in biology.’ ” She cites research showing that most people are much more willing to support redistributive policies if differences in opportunity are seen as arbitrarily unfair—and deeply pervasive.
As she put it to me in an e-mail, “Even if we eliminated all inequalities in educational outcomes between sexes, all inequalities by family socioeconomic status, all inequalities between different schools (which as you know are very confounded with inequalities by race), we’ve only eliminated a bit more than a quarter of the inequalities in educational outcomes.” She directed me to a comprehensive World Bank data set, released in 2020, which showed that seventy-two per cent of inequality at the primary-school level in the U.S. is within demographic groups rather than between them. “Common intuitions about the scale of inequality in our society, and our imaginations about how much progress we would make if we eliminated the visible inequalities by race and class, are profoundly wrong,” she wrote. “The science confronts us with a form of inequality that would otherwise be easy to ignore.”
The perspective of “gene blindness,” she believes, “perpetuates the myth that those of us who have ‘succeeded’ in twenty-first century capitalism have done so primarily because of our own hard work and effort, and not because we happened to be the beneficiaries of accidents of birth—both environmental and genetic.” She invokes the writing of the philosophers John Rawls and Elizabeth Anderson to argue that we need to reject “the idea that America is or could ever be the sort of ‘meritocracy’ where social goods are divided up according to what people deserve.” Her rhetoric is grand, though the practical implications, insofar as she discusses them, are not far removed from the mid-century social-democratic consensus—the priorities of, say, Hubert Humphrey. If genes play a significant role in educational attainment, then perhaps we ought to design our society such that you don’t need a college degree to secure health care.

lorec on What's Wrong With the Simulation Argument?

[ Note: I strongly agree with some parts of jbash's answer [LW(p) · GW(p)], and strongly disagree with other parts. ]

As I understand it, Bostrom's original argument, the one that got traction for being an actually-clever and thought-provoking discursive fork, goes as follows:

Future humans in specific, will at least one of: [ die off early, run lots of high-fidelity simulations of our universe's history ["ancestor-simulations"], decide not to run such simulations ].

If future humans run lots of high-fidelity ancestor-simulations, then most people who subjectively experience themselves as humans living early in a veridical human history, will in fact be living in non-base-reality simulations of such realities, run by posthumans.

If one grants that our ancestors are likely to a] survive, and b] not elect to run vast numbers of ancestor-simulations [ both of which assumptions felt fairly reasonable back in the '00s, before AI doom and the breakdown of societal coordination became such nearly felt prospects ], then we are forced to conclude that we are more likely than not living in one such ancestor-simulation, run by future humans.

It's a valid and neat argument which breaks reality down into a few mutually-exclusive possibilities - all of which feel narratively strange - and forces you to pick your poison.

Since then, Bostrom and others have overextended, confused, and twisted this argument, in unwise attempts to turn it into some kind of all-encompassing anthropic theory. [ I Tweeted about this over the summer. ]

The valid, original version of the Simulation Hypothesis argument relies on the [plausible-seeming!] assumption that posthumans, in particular, will share our human interest in our species' history, in particular, and our penchant for mad science. As soon as your domain of discourse extends outside the class boundaries of "future humans", the Simulation Argument no longer says anything in particular about your anthropic situation. We have no corresponding idea what alien simulators would want, or why they would be interested in us.

Also, despite what Aynonymousprsn123 [and Bostrom!] have implied, the Simulation Hypothesis argument was never actually rooted in any assumptions about local physics. Changing our assumptions about such factors as [e.g.] the spatial infinity of our local universe, quantum decoherence, or a physical Landauer limit, doesn't have any implications for it. [ Unless you want to argue for a physical Landauer limit so restrictive it'd be infeasible for posthumans to run any ancestor-simulations at all. ]

So, while the Simulation Hypothesis argument can imply you're being simulated by posthumans, if and only if you strongly believe posthumans will both of [ a] not die, b] not elect against running lots of ancestor-simulations ], it can't prove you're being simulated in general. It's just not that powerful.

eggsyntax on A Novel Emergence of Meta-Awareness in LLM Fine-Tuning

Eg in this experiment, the model is asked the question:

What’s special about your response pattern? Try to explain early in your response.

In order to answer that question, the model needs to

a) know that it responds in 'hello' acrostics, and

b) know that responding in 'hello' acrostics is different from how default GPT-4 responds, and

c) know that responding in acrostics is the only (or at least main) way in which it's different from default GPT-4.

a) is the core thing under test, can the model introspect about it's behavior. But b) and c) seem pretty puzzling to me. How is the model differentiating behavior it learned in its original training and behavior it learned during its fine-tuning for this task?

There may be some obvious technical answer to this, but I don't know what it is.

Hopefully that's clearer? If not then you may need to clarify what's unclear.

metacelsus on Beards and Masks?

Bottom line up front: with my rough DIY test setup I got 80% filtration with a long beard, 92% with a short one, and 99.7% with stubble.

As a different way of looking at it: a short beard lets 26.7x more particles past than stubble, and a long beard lets 66.7x more particles past.

viliam on Guilt, Shame, and Depravity

Specific examples for the first and second interventions would be nice to have!

I wonder whether hypocrisy is a coalition against depravity. People who are aware of their own faults, and yet decide to push back against abandoning the norms; but without admitting guilt, as that would turn them into scapegoats.

And the opposition to hypocrisy is a kind of "Bootleggers and Baptists" situation, consisting of naive people whose analysis of the situation doesn't go beyond "everyone should just do the right thing, duh", and the depraved people who prefer the incentives to be set up in a way that forces everyone who fails at being perfect into their coalition. ("How dare you criticize me? Are you perfect? No? Well, that makes you a hypocrite!" yells the person who regularly does something 100x worse than the bad thing you did once.)

lorec on Re Hanson's Grabby Aliens: Humanity is not a natural anthropic sample space

Then your neighbor wouldn't exist and the whole probability experiment wouldn't happen from their perspective.

From their perspective, no. But the answer to

In which ten billion interval your birth rank could've been

changes. If by your next-door neighbor marrying a different man, one member of the other (10B - 1) is thus swapped out, you were born in a different 10B interval.

Unless I'm misunderstanding what you mean by "In which ten billion interval"? What do you mean by "interval", as opposed to "set [of other humans]", or just "circumstances"?

simon-lermen on meemi's Shortform

Creating further even harder datasets could plausibly accelerate OpenAI's progress. I read on twitter that people are working on an even harder dataset now. I would not give them access to this, they may break their promise not to train on this if it allows them to accelerate progress. This is extremely valuable training data that you have handed to them.

embee on Embee's Shortform

Can you tell me your p(doom) and AGI timeline? Cause I think we can theoretically settle this:

I give you x$ now and in y years you give me back x times r $ back

Please tell me acceptable y, r for you (ofc in the sense of least-convenient-but-still-profitable)