Posts

I'm offering free math consultations! 2025-01-14T16:30:40.115Z
A Brief Theology of D&D 2022-04-01T12:47:19.394Z
Would you like me to debug your math? 2021-06-11T10:54:58.018Z
Domain Theory and the Prisoner's Dilemma: FairBot 2021-05-07T07:33:41.784Z
Changing the AI race payoff matrix 2020-11-22T22:25:18.355Z
Using GPT-N to Solve Interpretability of Neural Networks: A Research Agenda 2020-09-03T18:27:05.860Z
Mapping Out Alignment 2020-08-15T01:02:31.489Z
What are some good public contribution opportunities? (100$ bounty) 2020-06-18T14:47:51.661Z
Gurkenglas's Shortform 2019-08-04T18:46:34.953Z
Implications of GPT-2 2019-02-18T10:57:04.720Z
What shape has mindspace? 2019-01-11T16:28:47.522Z
A simple approach to 5-and-10 2018-12-17T18:33:46.735Z
Quantum AI Goal 2018-06-08T16:55:22.610Z
Quantum AI Box 2018-06-08T16:20:24.962Z
A line of defense against unfriendly outcomes: Grover's Algorithm 2018-06-05T00:59:46.993Z

Comments

Comment by Gurkenglas on shortplav · 2025-02-17T20:30:09.487Z · LW · GW

If you didn't feel comfortable running it overnight, why did you publish the instructions for replicating it?

Comment by Gurkenglas on shortplav · 2025-02-17T19:58:10.492Z · LW · GW

https://www.lesswrong.com/doc/misc/bot_k.diff gives me a 404.

Comment by Gurkenglas on A computational no-coincidence principle · 2025-02-15T20:27:16.320Z · LW · GW

I'm hoping more for some stepping stones between the pre-theoretic concept of "structural" and the fully formalized 99%-clause. If we could measure structuralness more directly we should be able to get away with less complexity in the rest of the conjecture.

Comment by Gurkenglas on A computational no-coincidence principle · 2025-02-15T10:11:14.658Z · LW · GW

Ultimately, though, we are interested in finding a verifier that accepts or rejects  based on a structural explanation of the circuit; our no-coincidence conjecture is our best attempt to formalize that claim, even if it is imperfect.

Can you say more about what made you decide to go with the 99% clause? Did you consider any alternatives?

Comment by Gurkenglas on Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs · 2025-02-12T13:14:47.535Z · LW · GW

This does go in the direction of refuting it, but they'd still need to argue that linear probes improve with scale faster than they do for other queries; a larger model means there are more possible linear probes to pick the best from.

Comment by Gurkenglas on Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs · 2025-02-12T10:39:02.339Z · LW · GW

I had that vibe from the abstract, but I can try to guess at a specific hypothesis that also explains their data: Instead of a model developing preferences as it grows up, it models an Assistant character's preferences from the start, but their elicitation techniques work better on larger models; for small models they produce lots of noise.

Comment by Gurkenglas on Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs · 2025-02-12T10:22:03.247Z · LW · GW

Strikes me as https://www.lesswrong.com/posts/9kNxhKWvixtKW5anS/you-are-not-measuring-what-you-think-you-are-measuring , change my view.

Comment by Gurkenglas on StefanHex's Shortform · 2025-02-11T12:43:10.928Z · LW · GW

Ah, oops. I think I got confused by the absence of L_2 syntax in your formula for FVU_B. (I agree that FVU_A is more principled ^^.)

Comment by Gurkenglas on StefanHex's Shortform · 2025-02-11T11:29:45.335Z · LW · GW

https://github.com/jbloomAus/SAELens/blob/main/sae_lens/evals.py#L511 sums the numerator and denominator separately, if they aren't doing that in some other place probably just file a bug report?

Comment by Gurkenglas on I'm offering free math consultations! · 2025-01-14T23:34:10.371Z · LW · GW

Thanks, edited. If we keep this going we'll have more authors than users x)

Comment by Gurkenglas on I'm offering free math consultations! · 2025-01-14T19:40:48.736Z · LW · GW

Thanks, edited. Performance is not the only benefit, see https://www.lesswrong.com/posts/MHqwi8kzwaWD8wEQc/would-you-like-me-to-debug-your-math?commentId=CrC2

Comment by Gurkenglas on When AI 10x's AI R&D, What Do We Do? · 2025-01-11T15:45:42.691Z · LW · GW

Account settings let you set mentions to notify you by email :)

Comment by Gurkenglas on On Eating the Sun · 2025-01-08T19:11:28.654Z · LW · GW

The action space is too large for this to be infeasible, but at a 101 level, if the Sun spun fast enough it would come apart, and angular momentum is conserved so it's easy to add gradually.

Comment by Gurkenglas on A Solomonoff Inductor Walks Into a Bar: Schelling Points for Communication · 2025-01-01T19:50:37.768Z · LW · GW

Can this program that you've shown to exist be explicitly constructed?

Comment by Gurkenglas on Hire (or Become) a Thinking Assistant · 2024-12-23T15:27:14.707Z · LW · GW

I'd like to do either side of this! Which I say in public to have an opportunity to advertise that https://www.lesswrong.com/posts/MHqwi8kzwaWD8wEQc/would-you-like-me-to-debug-your-math remains open.

Comment by Gurkenglas on Nathan Young's Shortform · 2024-12-16T17:18:16.188Z · LW · GW

Hang up a tear-off calendar?

Comment by Gurkenglas on Should there be just one western AGI project? · 2024-12-06T14:09:28.913Z · LW · GW

(You can find his ten mentions of that ~hashtag via the looking glass on thezvi.substack.com. huh, less regular than I thought.)

Comment by Gurkenglas on Should there be just one western AGI project? · 2024-12-05T23:49:54.592Z · LW · GW

Zvi's AI newsletter, latest installment https://www.lesswrong.com/posts/LBzRWoTQagRnbPWG4/ai-93-happy-tuesday, has a regular segment Pick Up the Phone arguing against this.

Comment by Gurkenglas on Should there be just one western AGI project? · 2024-12-03T13:19:48.634Z · LW · GW

Why not just one global project?

Comment by Gurkenglas on Gurkenglas's Shortform · 2024-11-08T13:46:40.948Z · LW · GW

https://www.google.com/search?q=spx futures

I was specifically looking at Nov 5th 0:00-6:00, which twitched enough to show aliveness, while manifold and polymarket moved in smooth synchrony.

Comment by Gurkenglas on Gurkenglas's Shortform · 2024-11-07T20:10:47.859Z · LW · GW

As the prediction markets on Trump winning went from ~50% to ~100% over 6 hours, S&P 500 futures moved less than the rest of the time. Why?

Comment by Gurkenglas on Introducing Transluce — A Letter from the Founders · 2024-10-23T19:28:00.159Z · LW · GW

The public will Goodhart any metric you hand over to it. If you provide evaluation as a service, you will know how many attempts an AI lab made at your test.

Comment by Gurkenglas on Change My Mind: Thirders in "Sleeping Beauty" are Just Doing Epistemology Wrong · 2024-10-16T19:08:37.189Z · LW · GW

If you say heads every time, half of all futures contain you; likewise with tails.

Comment by Gurkenglas on Change My Mind: Thirders in "Sleeping Beauty" are Just Doing Epistemology Wrong · 2024-10-16T15:41:56.237Z · LW · GW

https://www.lesswrong.com/posts/Mc6QcrsbH5NRXbCRX/dissolving-the-question

Comment by Gurkenglas on Change My Mind: Thirders in "Sleeping Beauty" are Just Doing Epistemology Wrong · 2024-10-16T11:36:51.813Z · LW · GW

What is going to be done with these numbers? If Sleeping Beauty is to gamble her money, she should accept the same betting odds as a thirder. If she has to decide which coinflip result kills her, she should be ambivalent like a halfer.

Comment by Gurkenglas on Please do not use AI to write for you · 2024-08-21T13:14:16.535Z · LW · GW

Your experiment is contaminated: If a piece of training document said that AI texts are overly verbose, and then announced that the following is a piece of AI-written text, it'd be a natural guess that the document would continue with overly verbose text, and so that's what an autocomplete engine will generate.

Due to RLHF, AI is no longer cleanly modelled as an autocomplete engine, but the point stands. For science, you could try having AI assist in the writing of an article making the opposite claim :).

Comment by Gurkenglas on Practical advice for secure virtual communication post easy AI voice-cloning? · 2024-08-09T21:59:31.217Z · LW · GW

Ask something only they would know.

Comment by Gurkenglas on Quinn's Shortform · 2024-07-31T22:44:11.762Z · LW · GW

Among monotonic, boolean quantifiers that don't ignore their input, exists is maximal because it returns true as often as possible; forall is minimal because it returns true as rarely as possible.

Comment by Gurkenglas on The Case Against UBI · 2024-07-27T10:02:18.322Z · LW · GW

For concreteness, let's say the basic income is the same in every city, same for a paraplegic or Elon Musk. Anyone who can vote gets it, it's a dividend on your share of the country.

I am surprised at section 3; I don't remember anyone who seriously argues that women should be dependent on men. By amusing coincidence, my last paragraph makes your reasoning out of scope; you can abolish women's suffrage in a separate bill.

In section 5, you are led astray by assuming a fixed demand for labor. You notice that we have yet to become obsolete. Well, of course: For as long as human inputs remain cheaper than their outputs, employment statistics will fail to reflect our dwindling comparative advantage. But we are on track to turn every graphics card into a cheaper white collar worker. Humans have to be trained for jobs, software can be copied. Human hands might remain SOTA for a few years longer. Horses weren't reduced to pets because we built too many cars, but because cars became possible to build.

Comment by Gurkenglas on AI #74: GPT-4o Mini Me and Llama 3 · 2024-07-25T13:59:36.160Z · LW · GW

factor out alpha

⌊x⌋ is floor(x), the greatest integer that's at most x.

Comment by Gurkenglas on Failures in Kindness · 2024-07-22T18:11:26.539Z · LW · GW

People with sufficiently good models of each other to use them in their social protocols.

Comment by Gurkenglas on What are you getting paid in? · 2024-07-18T06:52:16.152Z · LW · GW

I'd call those absences of drawbacks, not benefits - you would have had them without the job.

Comment by Gurkenglas on Why the Best Writers Endure Isolation · 2024-07-17T10:36:13.027Z · LW · GW

I was alone in a room of computers, and I had set out to take no positive action but grading homework. I ended up sitting and pacing and occasionally moving the mouse in the direction it would need to go next. What I remember of what my mind was on was the misery of the situation.

Comment by Gurkenglas on Why the Best Writers Endure Isolation · 2024-07-17T07:50:24.816Z · LW · GW

I tried that for a weekend once. I did nothing.

Comment by Gurkenglas on Medical Roundup #3 · 2024-07-09T15:23:42.932Z · LW · GW

It has been pointed out to me that no, what this presumably means is the past decisions of the patients.

 

Q2 Is it ethically permissible to consider an individual’s past decisions when determining their
access to medical resources?

Comment by Gurkenglas on An AI Race With China Can Be Better Than Not Racing · 2024-07-02T19:02:55.865Z · LW · GW

You assume the conclusion:

A lot of the AI alignment success seems to me stem from the question of whether the problem is easy or not, and is not very elastic to human effort.

AI races are bad because they select for contestants that put in less alignment effort.

Comment by Gurkenglas on "No-one in my org puts money in their pension" · 2024-05-18T11:15:53.748Z · LW · GW

Sure, he's trying to cause alarm via alleged excerpts from his life. Surely society should have some way to move to a state of alarm iff that's appropriate, do you see a better protocol than this one?

Comment by Gurkenglas on Forget Everything (Statistical Mechanics Part 1) · 2024-05-13T15:34:39.960Z · LW · GW

Recall that every vector space is the finitely supported functions from some set to ℝ, and every Hilbert space is the square-integrable functions from some measure space to ℝ.

I'm guessing that similarly, the physical theory that you're putting in terms of maximizing entropy lies in a large class of "Bostock" theories such that we could put each of them in terms of maximizing entropy, by warping the space with respect to which we're computing entropy. Do you have an idea of the operators and properties that define a Bostock theory?

Comment by Gurkenglas on Social status part 1/2: negotiations over object-level preferences · 2024-03-28T10:25:10.105Z · LW · GW

that thing about affine transformations

If the purpose of a utility function is to provide evidence about the behavior of the group, we can preprocess the data structure into that form: Suppose Alice may update the distribution over group decisions by ε. Then the direction she pushes in is her utility function, and the constraints "add up to 100%" and "size ε" cancel out the "affine transformation" degrees of freedom. Now such directions can be added up.

Comment by Gurkenglas on "Deep Learning" Is Function Approximation · 2024-03-23T23:13:55.394Z · LW · GW

Let's investigate whether functions must necessarily contain an agent in order to do sufficiently useful cognitive work. Pick some function of which an oracle would let you save the world.

Comment by Gurkenglas on Constructive Cauchy sequences vs. Dedekind cuts · 2024-03-18T00:55:53.182Z · LW · GW

Hmmmm. What if I said "an enumeration of the first-order theory of (union(Q,{our number}),<)"? Then any number can claim to be equal to one of the constants.

Comment by Gurkenglas on What is the best argument that LLMs are shoggoths? · 2024-03-17T22:18:00.154Z · LW · GW

If Earth had intelligent species with different minds, an LLM could end up identical to a member of at most one of them.

Comment by Gurkenglas on The Worst Form Of Government (Except For Everything Else We've Tried) · 2024-03-17T21:43:21.892Z · LW · GW

Is the idea that "they seceded because we broke their veto" is more of a casus belli than "we can't break their veto"?

Comment by Gurkenglas on Constructive Cauchy sequences vs. Dedekind cuts · 2024-03-17T19:30:01.394Z · LW · GW

Sure! Fortunately, while you can use this to prove any rational real innocent of being irrational, you can't use this to prove any irrational real guilty of being irrational, since every first-order formula can only check against finitely many constants.

Comment by Gurkenglas on Constructive Cauchy sequences vs. Dedekind cuts · 2024-03-17T08:20:01.432Z · LW · GW

Chaitin's constant, right. I should have taken my own advice and said "an enumeration of all properties of our number that can be written in the first-order logic (Q,<)".

Comment by Gurkenglas on Constructive Cauchy sequences vs. Dedekind cuts · 2024-03-17T00:02:08.600Z · LW · GW

Oh, I misunderstood the point of your first paragraph. What if we require an enumeration of all rationals our number is greater than?

Comment by Gurkenglas on Constructive Cauchy sequences vs. Dedekind cuts · 2024-03-16T09:37:39.534Z · LW · GW

If you want to transfer definitions into another context (constructive, in this case), you should treat such concrete, intuitive properties as theorems, not axioms, because the abstract formulation will generalize further. (remark: "close" is about distances, not order.)

If constructivism adds a degree of freedom in the definition of convergence, I'd try to use it to rescue the theorem that the Dedekindorder and Cauchydistance structures on ℚ agree about the completion. Potential rewards include survival of the theory built on top and evidence about the ideal definition of convergence. (I bet it's not epsilon/N, because why would a natural property of maps from ℕ to ℚ introduce the variable of type ℚ before the variable of type ℕ?)

Comment by Gurkenglas on Constructive Cauchy sequences vs. Dedekind cuts · 2024-03-16T02:24:35.631Z · LW · GW

I claim Dedekind cuts should be defined in a less hardcoded manner. Galaxy brain meme:

  • An irrational number is something that can sneak into (Q,<), such as sqrt(2)="the number which is greater than all rational numbers whose square is less than 2". So infinity is not a real number because there is no greatest rational number, and epsilon is not a real number because there is no smallest rational number greater than zero.
  • An irrational number is a one-element elementary extension of (Q,<). (Of course, the proper definition would waive the constraint that the new element be original, instead of treating rationals and irrationals separately.)
  • The real numbers are the colimit of the finite elementary extensions of (Q,<).

I claim Cauchy sequences should be defined in a less hardcoded manner, too: A sequence is Cauchy (e.g. in (Q,Euclidean distance)) iff it converges in some (wlog one-element) extension of the space.

Comment by Gurkenglas on [deleted post] 2024-03-08T19:33:52.785Z

Yeah, the TLDR sounds worse than the story, so the story might aound worse than the correspondence.

But Igor presumably had some reasoning for not publishing it immediately. Preserving privacy? An opportunity for the fund to save face? The former would have worked better without the name drop, and the latter seems antithetical to local culture...

Comment by Gurkenglas on evhub's Shortform · 2024-03-06T20:48:46.353Z · LW · GW

If a future decision is to shape the present, we need to predict it.

The decision-theoretic strategy "Figure out where you are, then act accordingly." is merely an approximation to "Use the policy that leads to the multiverse you prefer.". You *can* bring your present loyalties with you behind the veil, it might just start to feel farcically Goodhartish at some point.

There are of course no probabilities of being born into one position or another, there are only various avatars through which your decisions affect the multiverse. The closest thing to probabilities you'll find is how much leverage each avatar offers: The least wrong probabilistic anthropics translates "the effect of your decisions through avatar A is twice as important as through avatar B" into "you are twice as likely to be A as B".

So if we need probabilities of being born early vs. late, we can compare their leverage. We find:

  • Quantum physics shows that the timeline splits a bazillion times a second. So each second, you become a bazillion yous, but the portions of the multiverse you could first-order impact are divided among them. Therefore, you aren't significantly more or less likely to find yourself a second earlier or later.
  • Astronomy shows that there's a mazillion stars up there. So we build a Dyson sphere and huge artificial womb clusters, and one generation later we launch one colony ship at each star. But in that generation, the fate of the universe becomes a lot more certain, so we should expect to find ourselves before that point, not after.
  • Physics shows that several constants are finely tuned to support organized matter. We can infer that elsewhere, they aren't. Since you'd think that there are other, less precarious arrangements of physical law with complex consequences, we can also moderately update towards that very precariousness granting us unusual leverage about something valuable in the acausal marketplace.
  • History shows that we got lucky during the Cold War. We can slightly update towards:
    • Current events are important.
    • Current events are more likely after a Cold War.
    • Nuclear winter would settle the universe's fate.
  • The news show that ours is the era of inadequate AI alignment theory. We can moderately update towards being in a position to affect that.