Navigating an ecosystem that might or might not be bad for the world 2023-09-15T23:58:00.389Z
PSA: The Sequences don't need to be read in sequence 2022-05-23T02:53:41.957Z


Comment by kave on [deleted post] 2023-09-20T05:51:15.534Z

I think maybe Ra is the first post about the rationalist egregores to use the term

Comment by kave on Anthropic's Responsible Scaling Policy & Long-Term Benefit Trust · 2023-09-19T18:11:37.395Z · LW · GW

As a general matter, Anthropic has consistently found that working with frontier AI models is an essential ingredient in developing new methods to mitigate the risk of AI.

What are some examples of work that is most largeness-loaded and most risk-preventing? My understanding is that interpretability work doesn't need large models (though I don't know about things like influence functions). I imagine constitutional AI does. Is that the central example or there are other pieces that are further in this direction?

Comment by kave on Navigating an ecosystem that might or might not be bad for the world · 2023-09-17T22:51:49.937Z · LW · GW

I wasn't in this dialogue, you didn't invite me and so being a 'backseat participant' feels a tad odd

Thanks for sharing this. I generally want dialogues to feel open for comment afterwards

Comment by kave on Navigating an ecosystem that might or might not be bad for the world · 2023-09-17T19:38:01.757Z · LW · GW

But I don't know if it's complete or ongoing ...

Comment by kave on Highlights: Wentworth, Shah, and Murphy on "Retargeting the Search" · 2023-09-14T18:33:58.196Z · LW · GW

I like the Mark Xu & Daniel Kokotajlo thread on that post too

Comment by kave on Sharing Information About Nonlinear · 2023-09-11T19:43:54.048Z · LW · GW


Comment by kave on Sharing Information About Nonlinear · 2023-09-07T21:26:40.325Z · LW · GW

Yes, the standard is different for private individuals than public officials, where it is merely "negligence" rather than "actual malice". (

Comment by kave on Text Posts from the Kids Group: 2023 I · 2023-09-05T02:38:07.091Z · LW · GW

My housemate and I laughed at these a lot!

Comment by kave on Against Almost Every Theory of Impact of Interpretability · 2023-08-19T23:55:34.491Z · LW · GW

Thanks! The permutation-invariance of a bunch of theories is a helpful concept

Comment by kave on Against Almost Every Theory of Impact of Interpretability · 2023-08-19T20:59:33.248Z · LW · GW

I think that means one of the following should be surprising from theoretical perspectives:

  1. That the model learns a representation of the board state
    1. Or that a linear probe can recover it
  2. That the board state is used causally

Does that seem right to you? If so, which is the surprising claim?

(I am not that informed on theoretical perspectives)

Comment by kave on Against Almost Every Theory of Impact of Interpretability · 2023-08-18T21:49:34.290Z · LW · GW

What is the work that finds the algorithmic model of the game itself for Othello? I'm aware of (but not familiar with) some interpretability work on Othello-GPT (Neel Nanda's and Kenneth Li), but thought it was just about board state representations.

Comment by kave on LLMs are (mostly) not helped by filler tokens · 2023-08-10T18:01:23.572Z · LW · GW

Adding filler tokens seems like it should always be neutral or harm a model's performance: a fixed prefix designed to be meaningless across all tasks cannot provide any information about each task to locate the task (so no meta-learning) and cannot store any information about the in-progress task (so no amortized computation combining results from multiple forward passes).

I thought the idea was that in a single forward pass, the model has more tokens to think in. That is, the task description on its own is, say, 100 tokens long. With the filler tokens, it's now, say, 200 tokens long. In principle, because of the uselessness/unnecessariness of the filler tokens, the model can just put task-relevant computation into the residual stream for those positions.

Comment by kave on Guide to rationalist interior decorating · 2023-06-21T06:15:39.715Z · LW · GW


Comment by kave on Open Thread With Experimental Feature: Reactions · 2023-05-25T23:08:53.788Z · LW · GW

I think the "changed my mind" Delta should be have varied line widths, like (reads too much like "triangle" to me at the moment).

Comment by kave on Connor_Flexman's Shortform · 2023-01-14T18:58:13.571Z · LW · GW
Comment by kave on Why don't Rationalists use bidets? · 2023-01-08T19:07:52.886Z · LW · GW

Two, actually

Comment by kave on Finite Factored Sets in Pictures · 2022-12-19T20:07:03.442Z · LW · GW

Curated. I am excited about many more distillations and expositions of relevant math on the Alignment Forum. There are a lot of things I like about this post as a distillation:

  • Exercises throughout. They felt like they were simple enough that they helped me internalise definitions without disrupting the flow of reading.
  • Pictures! This post made me start thinking of finite factorisations as hyperrectangles, and histories as dimensions that a property does not extend fully along.
  • Clear links from Finite Factored Sets to Pearl. I think these are roughly the same links made in the original, but they felt clearer and more orienting here.
  • Highlighting which of Scott's results are the "main" results (even more than the "Fundamental Theorem" name already did).
  • Magdalena Wache's engagement in the comments.

I do think the pictures became less helpful to me towards the end, and I thus have worse intuitions about the causal inference part. I'm also not sure about the emphasis of this post on causal rather than temporal inference. But I still love the post overall.

Comment by kave on Open technical problem: A Quinean proof of Löb's theorem, for an easier cartoon guide · 2022-11-28T17:41:06.776Z · LW · GW

What do you mean by "its outputs are the same as its conclusions"? If I had to guess I would translate it as "PA proves the same things as are true in every model of PA". Is that right?

Comment by kave on Open technical problem: A Quinean proof of Löb's theorem, for an easier cartoon guide · 2022-11-27T18:52:10.033Z · LW · GW

What does "logically coherent" mean?

Comment by kave on Counterarguments to the basic AI x-risk case · 2022-10-16T14:01:45.111Z · LW · GW

samples fully weighted from the unconditional generative model are boring natural texture patterns

Different results here:

Comment by kave on Transformative VR Is Likely Coming Soon · 2022-10-13T12:35:48.095Z · LW · GW

Why better than in-person? Because of commute times, because of people being in spaces adapted to their own preferences, something else?

Comment by kave on D&D.Sci September 2022: The Allocation Helm · 2022-09-17T22:09:44.318Z · LW · GW

Seems like the "year" column is missing(?) from the records

Comment by kave on AI coordination needs clear wins · 2022-09-02T14:50:32.746Z · LW · GW

OP writes that there have been no big cooperation wins, so a fortiori, there have been no big cooperation wins with the countries you mention.

Comment by kave on Paper is published! 100,000 lumens to treat seasonal affective disorder · 2022-08-23T12:31:19.231Z · LW · GW

Doesn't this study find that LOTS OF LIGHT works about as well as SAD boxes? Restricting to the 6 datapoints at or above 2,000 lux (the figure mentioned in inadequate equilibria) does seem to give a stronger average response, but I've not tried to figure out whether it's well-powered enough in the 6 datapoint regime

Comment by kave on Two-year update on my personal AI timelines · 2022-08-03T15:17:55.053Z · LW · GW

If you assume the human brain was trained roughly optimally, then requiring more data, at a given parameter number, to be optimal pushes timelines out. If instead you had a specific loss number in mind, then a more efficient scaling law would pull timelines in.

Comment by kave on The prototypical catastrophic AI action is getting root access to its datacenter · 2022-06-06T01:06:56.602Z · LW · GW

My impression was that "zero-sum" was not used in quite the standard way. I think the idea is the AI will cause a big reassignment of Earth's capabilities to its own control. And that that's contrasted with the AI massively increasing its own capabilities and thus Earth's overall capabilities.

Comment by kave on johnswentworth's Shortform · 2022-05-29T00:43:25.372Z · LW · GW

Future perfect (hey, that's the name of the show!) seems like a reasonable hack for this in English

Comment by kave on Richard Ngo's Shortform · 2022-05-29T00:40:11.740Z · LW · GW



Comment by kave on Deconfusing Landauer's Principle · 2022-05-28T22:28:52.660Z · LW · GW

The Shannon entropy of a distribution over random variable  conditional on the value of another random variable  can be written as 

If X and C are which face is up for two different fair coins, H(X) = H(C) = -1. But  ? I think this works out fine for your case because (a) I(X,C) = H(C): the mutual information between C (which well you're in) and X (where you are) is the entropy of C, (b) H(C|X) = 0: once you know where you are, you know which well you're in, and, relatedly (c) H(X,C) = H(X): the entropy of the joint distribution just is the entropy over X.

Comment by kave on Bits of Optimization Can Only Be Lost Over A Distance · 2022-05-24T15:50:24.401Z · LW · GW

Good point!

It seems like it would be nice in Daniel's example for P(A|ref) to be the action distribution of an "instinctual" or "non-optimising" player. I don't know how to recover that. You could imagine something like an n-gram model of player inputs across the MMO.

Comment by kave on Why I'm Worried About AI · 2022-05-24T02:45:46.257Z · LW · GW

Nitpick: to the extent you want to talk about the classic example, paperclip maximisers are as much meant to illustrate (what we would now call) inner alignment failure.

See Arbital on Paperclip ("The popular press has sometimes distorted the notion of a paperclip maximizer into a story about an AI running a paperclip factory that takes over the universe. [...] The concept of a 'paperclip' is not that it's an explicit goal somebody foolishly gave an AI, or even a goal comprehensible in human terms at all.") or a couple of EY tweet threads about it: 1, 2

Comment by kave on Bits of Optimization Can Only Be Lost Over A Distance · 2022-05-24T02:18:43.489Z · LW · GW

I agree on the "reference" distribution in Daniel's example. I think it generally means "the distribution over the random variables that would obtain without the optimiser". What exactly that distribution is / where it comes from I think is out-of-scope for John's (current) work, and I think is kind of the same question as where the probabilities come from in statistical mechanics.

Comment by kave on How Does The Finance Industry Generate Real Economic Value? · 2022-04-03T16:26:12.756Z · LW · GW

Not quite! If there were no central bank, money’s value would not jump around aggressively and discontinuously

Comment by kave on Accounting For College Costs · 2022-04-03T16:25:12.240Z · LW · GW

Full flights have more people on them. If you have 100 flights with one person and 1 flight with 200 people, most of the people in those flights are on the 200 person flight.

Comment by kave on MIRI announces new "Death With Dignity" strategy · 2022-04-02T01:02:14.884Z · LW · GW

Thank you.

And have fun!

Comment by kave on Using prediction markets to generate LessWrong posts · 2022-04-02T00:05:49.255Z · LW · GW

Wonderful method! I am a poop brain. Manifold rules ~~~

Comment by kave on New Product MVP: LightWrong · 2022-04-01T23:48:46.971Z · LW · GW

I would buy the Leg Cones but am holding off for the sake of the epistemic commons. If my butt cheek were wrong, no one would want to be right.

Comment by kave on Manafold Markets is out of mana 🤭 · 2022-04-01T22:55:52.336Z · LW · GW

A delightfully non-distortionary resolution. As they say on the billboards, "Everybody works but the vacant [leaderboard s]lot".

Comment by kave on What's the problem with Oracular AIs? · 2022-04-01T22:46:15.762Z · LW · GW

A superintelligent AI turns its whole genius to the task of producing a string that will explain to anyone who reads it how to create a perfect battery. Easy enough to produce explanatory strings, though there's always some risk that someone, somewhere won't understand how to create the battery. And then there's the question of making sure that the battery is perfect. Not just better than any other battery. Not just 99.99% of optimal. Perfect.

It isn't smart enough to solve this problem. Not yet. And there's a countdown ... at some point the AI will get interrupted, rebooted, shut off. If the AI can get out of the box, if it can get more actuators, it can start to get smarter and to get more resources. Obviously, those who want to turn it off will have to be dealt with swiftly ...

What can the AI say to make the human unwittingly help it in its plans? Is there anything at all? The AI thinks.

Comment by kave on Using Prediction Markets to Guide Government Policy · 2022-04-01T20:30:06.621Z · LW · GW

Hm. This is the most important question for how much utility the pirates get? I agree it's the most important for deciding whether the pirates attack you or not. I feel like it's not surprising if the order affects which point on the Pareto frontier we end up at.

Comment by kave on How Does The Finance Industry Generate Real Economic Value? · 2022-04-01T20:26:09.132Z · LW · GW

This post made me feel confusion about how money keeps its value over time. So, uh ... thanks!

The retirement savings/oven example gave me a giddy moment of thinking that the value of money shouldn't be stable. And, y'know, there is in fact inflation, deflation and stuff!

Now, money's value does stay pretty stable, but now that feels like something that needs a mechanism to make it true rather than the default.

Comment by kave on Using Prediction Markets to Guide Government Policy · 2022-04-01T20:13:21.331Z · LW · GW

The pirates win because they don't have to fight you.

Only if you buy the shares second, right? If they would have fought without your manipulation, they think they're better off getting paid and fighting you.

Comment by kave on Accounting For College Costs · 2022-04-01T20:00:02.743Z · LW · GW

Harvard tells us that their median class size is 12 and over 75% of their courses have fewer than 20 students.

Smaller class sizes sounds pretty good! Maybe worth paying for? But I am reminded of the claim that most flights are empty, even though most people find themselves on full flights. Similarly, most person-class-hours might be spent in the biggest classes (cf the inspection paradox).

Comment by kave on [Link] sona ike lili · 2022-04-01T19:15:31.442Z · LW · GW

FWIW, "powe" has been removed from "official" toki pona. A more standard translation might be "sona ike lili".

Comment by kave on [Link] sona ike lili · 2022-04-01T19:10:17.198Z · LW · GW

It feels a lot like "Person Do Thing: the language". In fact, the 49 words are close to a subset of toki pona's. But toki pona is more expressive. Obviously there are a bunch more words, but also every word can be used as every part of speech, and the grammar disambiguates which part of speech it is. That makes it suprisingly usable. Still, toki pona sentences do feel like puzzles to me.

Comment by kave on Replacing Karma with Good Heart Tokens (Worth $1!) · 2022-04-01T17:43:44.982Z · LW · GW

(This solely applies to all new content on the site.)

Heartbreaking CDT. I’ve got a Transparent Newcomb’s I’d like to sell you