LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] Soft Prompts for Evaluation: Measuring Conditional Distance of Capabilities
porby · 2024-02-02T05:49:11.189Z · comments (1)

Mission Impossible: Dead Reckoning Part 1 AI Takeaways
Zvi · 2023-11-01T12:52:29.341Z · comments (13)

Value learning in the absence of ground truth
Joel_Saarinen (joel_saarinen) · 2024-02-05T18:56:02.260Z · comments (8)

Arguments for moral indefinability
Richard_Ngo (ricraz) · 2023-09-30T22:40:04.325Z · comments (16)

Run evals on base models too!
orthonormal · 2024-04-04T18:43:25.468Z · comments (6)

Critiques of the AI control agenda
Jozdien · 2024-02-14T19:25:04.105Z · comments (14)

AI Pause Will Likely Backfire (Guest Post)
jsteinhardt · 2023-10-24T04:30:02.113Z · comments (6)

Goodbye, Shoggoth: The Stage, its Animatronics, & the Puppeteer – a New Metaphor
RogerDearnaley (roger-d-1) · 2024-01-09T20:42:28.349Z · comments (8)

shortest goddamn bayes guide ever
lukehmiles (lcmgcd) · 2024-05-10T07:06:23.734Z · comments (8)

[link] A Good Explanation of Differential Gears
Johannes C. Mayer (johannes-c-mayer) · 2023-10-19T23:07:46.354Z · comments (4)

How to safely use an optimizer
Simon Fischer (SimonF) · 2024-03-28T16:11:01.277Z · comments (21)

Vaniver's thoughts on Anthropic's RSP
Vaniver · 2023-10-28T21:06:07.323Z · comments (4)

LW UI features you might not have tried
Elizabeth (pktechgirl) · 2023-10-13T03:04:57.542Z · comments (6)

1. The CAST Strategy
Max Harms (max-harms) · 2024-06-07T22:29:13.005Z · comments (19)

[link] Will releasing the weights of large language models grant widespread access to pandemic agents?
jefftk (jkaufman) · 2023-10-30T18:22:59.677Z · comments (25)

[question] Rationalist horror movies
Elizabeth (pktechgirl) · 2023-10-15T07:42:14.509Z · answers+comments (35)

AI #33: Cool New Interpretability Paper
Zvi · 2023-10-12T16:20:01.481Z · comments (18)

Some costs of superposition
Linda Linsefors · 2024-03-03T16:08:20.674Z · comments (11)

Big Picture AI Safety: Introduction
EuanMcLean (euanmclean) · 2024-05-23T11:15:44.037Z · comments (7)

Saving the world sucks
Defective Altruism (Elijah Bodden) · 2024-01-10T05:55:46.504Z · comments (29)

I'm open for projects (sort of)
cousin_it · 2024-04-18T18:05:01.395Z · comments (13)

AI #41: Bring in the Other Gemini
Zvi · 2023-12-07T15:10:05.552Z · comments (16)

In Defense of Parselmouths
Screwtape · 2023-11-15T23:02:19.344Z · comments (10)

Higher-effort summer solstice: What if we used AI (i.e., Angel Island)?
Rachel Shu (wearsshoes) · 2024-06-25T01:35:54.064Z · comments (9)

Humanity isn't remotely longtermist, so arguments for AGI x-risk should focus on the near term
Seth Herd · 2024-08-12T18:10:56.543Z · comments (10)

All The Latest Human tFUS Studies
sarahconstantin · 2024-08-09T22:20:04.561Z · comments (2)

Enriched tab is now the default LW Frontpage experience for logged-in users
Ruby · 2024-06-21T00:09:30.441Z · comments (27)

[link] The Leeroy Jenkins principle: How faulty AI could guarantee "warning shots"
titotal (lombertini) · 2024-01-14T15:03:21.087Z · comments (6)

On the Proposed California SB 1047
Zvi · 2024-02-12T16:40:04.854Z · comments (18)

On OpenAI’s Model Spec
Zvi · 2024-06-21T13:00:03.014Z · comments (3)

So You Created a Sociopath - New Book Announcement!
Garrett Baker (D0TheMath) · 2024-04-01T18:02:18.010Z · comments (3)

[link] If Clarity Seems Like Death to Them
Zack_M_Davis · 2023-12-30T17:40:42.622Z · comments (191)

[link] Michael Dickens' Caffeine Tolerance Research
niplav · 2024-09-04T15:41:53.343Z · comments (3)

Untrustworthy models: a frame for scheming evaluations
Olli Järviniemi (jarviniemi) · 2024-08-19T16:27:11.088Z · comments (3)

[link] MIRI's September 2024 newsletter
Harlan · 2024-09-16T18:15:40.785Z · comments (0)

[Intuitive self-models] 7. Hearing Voices, and Other Hallucinations
Steven Byrnes (steve2152) · 2024-10-29T13:36:16.325Z · comments (2)

Thoughts on "The Offense-Defense Balance Rarely Changes"
Cullen (Cullen_OKeefe) · 2024-02-12T03:26:50.662Z · comments (4)

[link] For Civilization and Against Niceness
Gabriel Alfour (gabriel-alfour-1) · 2023-11-20T10:56:20.352Z · comments (14)

AI as a powerful meme, via CGP Grey
TheManxLoiner · 2024-10-30T18:31:58.544Z · comments (6)

AI doing philosophy = AI generating hands?
Wei Dai (Wei_Dai) · 2024-01-15T09:04:39.659Z · comments (22)

Toy Models of Feature Absorption in SAEs
chanind · 2024-10-07T09:56:53.609Z · comments (7)

AI #75: Math is Easier
Zvi · 2024-08-01T13:40:05.539Z · comments (25)

The predictive power of dissipative adaptation
dr_s · 2023-12-17T14:01:31.568Z · comments (14)

Conflating value alignment and intent alignment is causing confusion
Seth Herd · 2024-09-05T16:39:51.967Z · comments (18)

[link] Bayesians Commit the Gambler's Fallacy
Kevin Dorst · 2024-01-07T12:54:59.939Z · comments (28)

Rapid capability gain around supergenius level seems probable even without intelligence needing to improve intelligence
Towards_Keeperhood (Simon Skade) · 2024-05-06T17:09:10.729Z · comments (16)

AI #88: Thanks for the Memos
Zvi · 2024-10-31T15:00:07.412Z · comments (5)

I finally got ChatGPT to sound like me
lsusr · 2024-09-17T09:39:59.415Z · comments (18)

[Valence series] 4. Valence & Liking / Admiring
Steven Byrnes (steve2152) · 2024-06-10T14:19:51.194Z · comments (12)

Decision Theory in Space
lsusr · 2024-08-18T07:02:11.847Z · comments (18)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

dagon on Why Bayesians should two-box in a one-shot

note: this was 7 years ago and I've refined my understanding of CDT and the Newcomb problem since.

My current understanding of CDT is that it's does effectively assign a confidence of 1 to the decision not being causally upstream of Omega's action, and that is the whole of the problem. It's "solved" by just moving Omega's action downstream (by cheating and doing a rapid switch). It's ... illustrated? ... by the transparent version, where a CDT agent just sees the second box as empty before it even realizes it's decided. It's also "solved" by acausal decision theories, because they move the decision earlier in time to get the jump on Omega.

For non-rigorous DTs (like human intuition, and what I personally would want to do), there's a lot of evidence in the setup that Omega is going to turn out to be correct, and one-boxing is an easy call. If the setup is somewhat difference (say, neither Omega nor anyone else makes any claims about predictions, just says "sometimes both boxes have money, sometimes only one"), then it's a pretty straightforward EV calculation based on kind of informal probability assignments.

But it does require not using strict CDT, which rejects the idea that the choice has backward-causality.

unexpectedvalues on Seven lessons I didn't learn from election day

It's a little hard to know what you mean by that. Do you mean something like: given the information known at the time, but allowing myself the hindsight of noticing facts about that information that I may have missed, what should I have thought the probability was?

If so, I think my answer isn't too different from what I believed before the election (essentially 50/50). Though I welcome takes to the contrary.

joe-rogero on Flipping Out: The Cosmic Coinflip Thought Experiment Is Bad Philosophy

I'm assuming the Cosmic Flipper is offering, not a doubling of the universe's current value, but a doubling of its current expected value (including whatever you think the future is worth) plus a little more. If it's just doubling current niceness or something, then yeah, that's not nearly enough.

joe-rogero on Flipping Out: The Cosmic Coinflip Thought Experiment Is Bad Philosophy

That is an interesting reframing of this wager!

joe-rogero on Flipping Out: The Cosmic Coinflip Thought Experiment Is Bad Philosophy

Alas, I am not familiar with Lara Buchak's arguments, and the high-level summary I can get from Googling them isn't sufficient to tell me how it's supposed to capture something utility maximizing can't. Was there a specific argument you had in mind?

unexpectedvalues on Seven lessons I didn't learn from election day

I'm not sure (see footnote 7), but I think it's quite likely, basically because:

It's a simpler explanation than the one you give (so the bar for evidence should probably be lower).
We know from polling data that Hispanic voters -- who are disproportionately foreign-born -- shifted a lot toward Trump.
The biggest shifts happened in places like Queens, NY, which has many immigrants but (I think?) not very much anti-immigrant sentiment.

That said, I'm not that confident and I wouldn't be shocked if your explanation is correct. Here are some thoughts on how you could try to differentiate between them:

You could look on the precinct-level rather than the county-level. Some precincts will be very high-% foreign-born (above 50%). If those precincts shifted more than surrounding precincts, that would be evidence in favor of my hypothesis. If they shifted less, that would be evidence in favor of yours.
If someone did a poll with the questions "How did you vote in 2020", "How did you vote in 2024", and "Were you born in the U.S.", that could more directly answer the question.

jake-ward on Effects of Non-Uniform Sparsity on Superposition in Toy Models

we stumble on a weird observation where the few features with the least sparsity are not even learned and represented in the hidden layer

I'm not sure how you're modeling sparsity, but if these features are present in nearly 100% of inputs, you could think of it as the not-feature being extremely sparse. My guess is that these features are getting baked into the bias instead of the weights so the model is just always predicting them.

joe-rogero on Flipping Out: The Cosmic Coinflip Thought Experiment Is Bad Philosophy

Did he really? If true, that's actually much dumber than I thought, but I couldn't find anything saying that when I looked.

I wouldn't characterize that as a "commitment to utilitarianism", though; you can be a perfect utilitarian and have value that is linear in matter and energy (and presumably number of people?), or be a perfect utilitarian and have some other value function.

The possible redundancy of conscious patterns was one of the things I was thinking about when I wrote:

Secondly, and more importantly, I question whether it is possible even in theory to produce infinite expected value. At some point you've created every possible flourishing mind in every conceivable permutation of eudaimonia, satisfaction, and bliss, and the added value of another instance of any of them is basically nil.

joe-rogero on Flipping Out: The Cosmic Coinflip Thought Experiment Is Bad Philosophy

I don't actually mean the thing you're calling the motte at all, and I'm not sure I agree with the bailey either. The thought experiment as I understand it was never quite a St. Petersburg Paradox because both the payout ("double universe value") and the method of choosing how to play (single initial payment vs repeated choice betting everything each time) are different. It also can't literally be applied to the real world at all, part of the point is that I don't even know what it would look like for this scenario to be possible in the real world, there are too many other considerations at play.

In the case I'm imagining, the Cosmic Flipper figures out whatever value you currently place on the universe - including your estimated future value - and slightly-more-than-doubles it. Then they offer the coinflip with the tails-case being "destroy the universe." It's defined specifically as double-or-nothing, technically slightly better than double-or-nothing, and is therefore worth taking to a utilitarian in a vacuum. If the Cosmic Flipper is offering a different deal then of course you analyze it differently, but that's not what I understood the scenario to be when I wrote my post.

cole-wyeth on Heresies in the Shadow of the Sequences

The standard method for training LLM's is next token prediction with teacher-forcing, penalized by the negative log-loss. This is exactly the right setup to elicit calibrated conditional probabilities, and exactly the "prequential problem" that Solomonoff induction was designed for. I don't think this was motivated by decision theory, but it definitely makes perfect sense as an approximation to Bayesian inductive inference - the only missing ingredient is acting to optimize a utility function based on this belief distribution. So I think it's too early to suppose that decision theory won't play a role.