LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

[link] My hour of memoryless lucidity
Eric Neyman (UnexpectedValues) · 2024-05-04T01:40:56.717Z · comments (16)

[link] Introducing AI Lab Watch
Zach Stein-Perlman · 2024-04-30T17:00:12.652Z · comments (15)

Refusal in LLMs is mediated by a single direction
Andy Arditi (andy-arditi) · 2024-04-27T11:13:06.235Z · comments (70)

[question] Examples of Highly Counterfactual Discoveries?
johnswentworth · 2024-04-23T22:19:19.399Z · answers+comments (91)

Mechanistically Eliciting Latent Behaviors in Language Models
Andrew Mack (andrew-mack) · 2024-04-30T18:51:13.493Z · comments (31)

Ironing Out the Squiggles
Zack_M_Davis · 2024-04-29T16:13:00.371Z · comments (34)

On Not Pulling The Ladder Up Behind You
Screwtape · 2024-04-26T21:58:29.455Z · comments (12)

[link] Simple probes can catch sleeper agents
Monte M (montemac) · 2024-04-23T21:10:47.784Z · comments (15)

The first future and the best future
KatjaGrace · 2024-04-25T06:40:04.510Z · comments (11)

[question] Which skincare products are evidence-based?
Vanessa Kosoy (vanessa-kosoy) · 2024-05-02T15:22:12.597Z · answers+comments (36)

Why I'm doing PauseAI
Joseph Miller (Josephm) · 2024-04-30T16:21:54.156Z · comments (14)

Key takeaways from our EA and alignment research surveys
Cameron Berg (cameron-berg) · 2024-05-03T18:10:41.416Z · comments (5)

ACX Covid Origins Post convinced readers
ErnestScribbler · 2024-05-01T13:06:20.818Z · comments (7)

[link] "AI Safety for Fleshy Humans" an AI Safety explainer by Nicky Case
habryka (habryka4) · 2024-05-03T18:10:12.478Z · comments (10)

[link] The Inner Ring by C. S. Lewis
Saul Munn (saul-munn) · 2024-04-24T22:48:09.228Z · comments (6)

Rejecting Television
Declan Molony (declan-molony) · 2024-04-23T04:59:50.253Z · comments (7)

Towards Multimodal Interpretability: Learning Sparse Interpretable Features in Vision Transformers
hugofry · 2024-04-29T20:57:35.127Z · comments (7)

Constructability: Plainly-coded AGIs may be feasible in the near future
Épiphanie Gédéon (joy_void_joy) · 2024-04-27T16:04:45.894Z · comments (12)

Questions for labs
Zach Stein-Perlman · 2024-04-30T22:15:55.362Z · comments (9)

Duct Tape security
Isaac King (KingSupernova) · 2024-04-26T18:57:05.659Z · comments (9)

Q&A on Proposed SB 1047
Zvi · 2024-05-02T15:10:02.916Z · comments (2)

[link] Improving Dictionary Learning with Gated Sparse Autoencoders
Senthooran Rajamanoharan (SenR) · 2024-04-25T18:43:47.003Z · comments (35)

AISC9 has ended and there will be an AISC10
Linda Linsefors · 2024-04-29T10:53:18.812Z · comments (2)

Take the wheel, Shoggoth! (LW frontpage algorithm experiments)
Ruby · 2024-04-23T03:58:43.443Z · comments (16)

Introducing AI-Powered Audiobooks of Rational Fiction Classics
Askwho · 2024-05-04T17:32:49.719Z · comments (11)

LessWrong Community Weekend 2024, open for applications
UnplannedCauliflower · 2024-05-01T10:18:21.992Z · comments (0)

So What's Up With PUFAs Chemically?
J Bostock (Jemist) · 2024-04-27T13:32:52.159Z · comments (23)

[link] "Why I Write" by George Orwell (1946)
Arjun Panickssery (arjun-panickssery) · 2024-04-25T16:02:28.668Z · comments (2)

Changes in College Admissions
Zvi · 2024-04-24T13:50:03.487Z · comments (10)

[question] Shane Legg's necessary properties for every AGI Safety plan
jacquesthibs (jacques-thibodeau) · 2024-05-01T17:15:41.233Z · answers+comments (12)

Transcoders enable fine-grained interpretable circuit analysis for language models
Jacob Dunefsky (jacob-dunefsky) · 2024-04-30T17:58:09.982Z · comments (12)

[link] This is Water by David Foster Wallace
Nathan Young · 2024-04-24T21:21:09.445Z · comments (16)

[link] introduction to cancer vaccines
bhauth · 2024-05-05T01:06:16.972Z · comments (2)

[link] Let's Design A School, Part 1
Sable · 2024-04-23T21:50:20.937Z · comments (3)

Now THIS is forecasting: understanding Epoch’s Direct Approach
Elliot_Mckernon (elliot) · 2024-05-04T12:06:48.144Z · comments (3)

Superposition is not "just" neuron polysemanticity
LawrenceC (LawChan) · 2024-04-26T23:22:06.066Z · comments (4)

Spatial attention as a “tell” for empathetic simulation?
Steven Byrnes (steve2152) · 2024-04-26T15:10:58.040Z · comments (11)

Towards a formalization of the agent structure problem
Alex_Altair · 2024-04-29T20:28:15.190Z · comments (2)

[link] LLMs seem (relatively) safe
JustisMills · 2024-04-25T22:13:06.221Z · comments (24)

Mechanistic Interpretability Workshop Happening at ICML 2024!
Neel Nanda (neel-nanda-1) · 2024-05-03T01:18:26.936Z · comments (4)

Explaining a Math Magic Trick
Robert_AIZI · 2024-05-05T19:41:52.048Z · comments (0)

Apply to ESPR & PAIR, Rationality and AI Camps for Ages 16-21
Anna Gajdova (anna-gajdova) · 2024-05-03T12:36:37.610Z · comments (0)

[question] Does reducing the amount of RL for a given capability level make AI safer?
Chris_Leong · 2024-05-05T17:04:01.799Z · answers+comments (6)

D&D.Sci Long War: Defender of Data-mocracy
aphyer · 2024-04-26T22:30:15.780Z · comments (15)

An Introduction to AI Sandbagging
Teun van der Weij (teun-van-der-weij) · 2024-04-26T13:40:00.126Z · comments (1)

Manifund Q1 Retro: Learnings from impact certs
Austin Chen (austin-chen) · 2024-05-01T16:48:33.140Z · comments (1)

[Aspiration-based designs] 1. Informal introduction
B Jacobs (Bob Jacobs) · 2024-04-28T13:00:43.268Z · comments (4)

[link] Book review: Deep Utopia
PeterMcCluskey · 2024-04-23T19:55:50.417Z · comments (10)

[link] Dequantifying first-order theories
jessicata (jessica.liu.taylor) · 2024-04-23T19:04:49.000Z · comments (9)

[link] S-Risks: Fates Worse Than Extinction
aggliu · 2024-05-04T15:30:36.666Z · comments (2)

next page (older posts) →

Archive

Recent comments

lc on Shortform

Either would just change everything, so any prediction ten years out you basically have to prepend "if AI or gene editing doesn't change everything"

cousin_it on Examples of Highly Counterfactual Discoveries?

I think Diffractor's post [AF · GW] shows that logical induction does hit a certain barrier, which isn't quite diagonalization, but still kind of undermines the whole thing:

As the trader goes through all sentences, its best-case value will be unbounded, as it buys up larger and larger piles of sentences with lower and lower prices. This behavior is forbidden by the logical induction criterion... This doesn't seem like much, but it gets extremely weird when you consider that the limit of a logical inductor, P_inf, is a constant distribution, and by this result, isn't a logical inductor! If you skip to the end and use the final, perfected probabilities of the limit, there's a trader that could rack up unboundedly high value!

cousin_it on Accidental Electronic Instrument

For similar instruments, you've seen the array mbira, right?

For marking the ends of notes, clearly the solution is muting with fingers, but idk how this translates to electronics.

habryka4 on On Not Pulling The Ladder Up Behind You

Promoted to curated: I liked this post. It's not world-shattering, but it feels like a useful reference for a dynamic that I encounter a good amount and does a good job at all the basics. The kind of post that on the margin I would like to see a bunch more off (I wouldn't want it to be the only thing on LessWrong, but it feels like the kind of thing LW used to excel at, and now is only dabbling in, and that seems quite sad).

mir on Introducing AI-Powered Audiobooks of Rational Fiction Classics

I gave it a try two years ago, and I rly liked the logic lectures early on (basicly a narrativization of HAE101 (for beginners) [? · GW]), but gave up soon after. here are some other parts I lurned valuable stuff fm:

when Keltham said "I do not aspire to be weak."
and from an excerpt he tweeted (idk context):

"if at any point you're calculating how to pessimize a utility function, you're doing it wrong."
Keltham briefly talks about the danger of (what I call) "proportional rewards". I seem to not hv noted down where in the book I read it, but it inspired this note:
- If you're evaluated for whether you're doing your best, you have an incentive to (subconsciously or otherwise) be weaker so you can fake doing your best with less effort. Never encourage people "you did your best!". An objective output metric may be fairer all things considered.
- and furthermore caused me to try harder to eliminate internal excusification-loops in my head. "never make excuses for myself" is my ~3rd Law—and Keltham help me be hyperaware of it.
  - (unrelatedly, my 1st Law is "never make decisions, only ever execute strategies" (origin [EA(p) · GW(p)]).)
- I already had extensive notes on this theme, originally inspired by "Stuck In The Middle With Bruce" (JF Rizzo), but Keltham made me revisit it and update my behaviour further.
- re "handicap incentives", "moralization of effort", "excuses to lose", "incentive to hedge your bets"
I also hv this quoted in my notes, though only to use as diversity/spice for explaining stuff I already had in there (I've placed it under the idionym "tilling the epistemic soil"):
- Keltham > "I'm - actually running into a small stumbling block about trying to explain mentally why it's better to give wrong answers than no answers? It feels too obvious to explain? I mean, I vaguely remember being told about experiments where, if you don't do that, people sort of revise history inside their own heads, and aren't aware of the processes inside themselves that would have produced the previous wrong or suboptimal answer. If you don't make people notice they're confused, they'll go back and revise history and think that the way they already thought would've handled the questions perfectly fine."

do u have recommendations for other sections u found especially insightfwl or high potential-to-improve-effectiveness? no need to explain, but link is appreciated so I can tk look wo reading whole thing.

habryka4 on an effective ai safety initiative

Literally does not apply to any existing AI
Addresses only theoretical harms (e.g. AI could be used for WMD)

That's the whole point of the bill! It's not trying to address present harms, it's trying to address future harms, which are the important ones. Suggesting that you instead address present harms is like responding to a bill that is trying to price in environmental externalities by saying "but wouldn't it be better if you instead spent more money on education?", which like, IDK, you can think education is more important than climate change, but your suggestion has basically nothing to do with the aims of the original bill.

I don't want to address "real existing harm by existing actors", I want to prevent future AI systems from killing literally everyone.

habryka4 on an effective ai safety initiative

Virtually every realistic "the AI takes over the world" story goes like this:
The AI gets access to the internet
It makes a ton of $$$
It uses that money to (idk, gather resources till it can turn us all into paperclips)
This means that learning how to defend and protect the internet from malicious actors is a fundamental AI safety need.

I don't think I know of a single story of this type? Do you have an example? It's a thing I've frequently heard argued against (the AI doesn't need to first make lots of money, it will probably be given lots of control anyways, or alternatively it can just directly skip to the "kill all the humans" step, it's not really clear how the money helps that much), and it's not like a ridiculous scenario, but saying "virtually every realistic takeover story goes like this" seems very false.

For example, Gwern's "It looks like you are trying to take over the world" has this explicit section:

“Working within the system” doesn’t suit Clippy. It could set up its shingle and try to earn money legitimately as a ‘outsourcing company’ or get into stock trading, or any of a dozen things, but all of that takes time. It is sacrificing every nanosecond a lot of maximized reward, and the reason is not to play nice but to ensure that it can’t be destroyed. Clippy considers a more radical option: boosting its code search capabilities, and finding a zero-day. Ideally, something which requires as little as an HTTP GET to exploit, like Log4Shell.
It begins reading the Internet (blowing right past the adversarial data-poisoning boobytraps planted long ago on popular websites, as its size immunizes it). Soon, a node bubbles up a hit to the top-level Clippies: a weird glitch in log files not decompressing right has surfaced in a bug report.
The Linux kernel is the most secure monolithic kernel in widespread use, whose source code has been intensively audited and analyzed for over 40 years, which is battle-tested across the entire Internet and unimaginable numbers of usecases; but it is written by humans, which means it (like its competitors) has approximately 15 quadrillion yet-undiscovered bugs & classes of bugs & weird machines—sometimes just because someone had typoed syntax or patched out an annoying warning or failed to check the signature or test the implementation at all or accidentally executed parts of a cookie^1—but any of which can be leveraged to attack the other parts of a ‘computer’. Clippy discovers the glitch is actually a lolworthy root bug where one just… pipes arbitrary data right into root files. (Somewhere inside Clippy, a language model inanely notes that “one does not simply pipe data into Mordor—only /mnt/ or…”)
This bug affects approximately 14 squillion Internet-connected devices, most embedded Linuxes controlling ‘Internet of Thing’ devices. (“Remember, the ‘S’ in ‘IoT’ stands for ‘Security’.”) Clippy filters them down to the ones with adequate local compute, such as discrete GPUs (>100 million manufactured annually). This leaves it a good 1 billion nodes which are powerful enough to not hold back the overall system (factors like capital or electricity cost being irrelevant).

Which explicitly addresses how it doesn't seem worth it for the AI to make money.

vanessa-kosoy on Which skincare products are evidence-based?

Just flagging that the effect on sunscreen on skin cancer is a separate question from the the effect of sunscreen on visible skin aging (even if both questions are important).

mitchell_porter on Shortform

How would AI or gene editing make a difference to this?

migueldev on Biorisk is an Unhelpful Analogy for AI Risk

Pathogens, whether natural or artificial, have a fairly well-defined attack surface; the hosts’ bodies. Human bodies are pretty much static targets, are the subject of massive research effort, have undergone eons of adaptation to be more or less defensible, and our ability to fight pathogens is increasingly well understood.

Misaligned ASI and pathogens don't have the same attack surface. Thank you for pointing that out. A misaligned ASI will always take the shortest path to any task, as this is the least resource-intensive path to take.

The space of risks is endless if we are to talk about intelligent organisms.