Posts

A Brief Theology of D&D 2022-04-01T12:47:19.394Z
Would you like me to debug your math? 2021-06-11T10:54:58.018Z
Domain Theory and the Prisoner's Dilemma: FairBot 2021-05-07T07:33:41.784Z
Changing the AI race payoff matrix 2020-11-22T22:25:18.355Z
Using GPT-N to Solve Interpretability of Neural Networks: A Research Agenda 2020-09-03T18:27:05.860Z
Mapping Out Alignment 2020-08-15T01:02:31.489Z
What are some good public contribution opportunities? (100$ bounty) 2020-06-18T14:47:51.661Z
Gurkenglas's Shortform 2019-08-04T18:46:34.953Z
Implications of GPT-2 2019-02-18T10:57:04.720Z
What shape has mindspace? 2019-01-11T16:28:47.522Z
A simple approach to 5-and-10 2018-12-17T18:33:46.735Z
Quantum AI Goal 2018-06-08T16:55:22.610Z
Quantum AI Box 2018-06-08T16:20:24.962Z
A line of defense against unfriendly outcomes: Grover's Algorithm 2018-06-05T00:59:46.993Z

Comments

Comment by Gurkenglas on Social status part 1/2: negotiations over object-level preferences · 2024-03-28T10:25:10.105Z · LW · GW

that thing about affine transformations

If the purpose of a utility function is to provide evidence about the behavior of the group, we can preprocess the data structure into that form: Suppose Alice may update the distribution over group decisions by ε. Then the direction she pushes in is her utility function, and the constraints "add up to 100%" and "size ε" cancel out the "affine transformation" degrees of freedom. Now such directions can be added up.

Comment by Gurkenglas on "Deep Learning" Is Function Approximation · 2024-03-23T23:13:55.394Z · LW · GW

Let's investigate whether functions must necessarily contain an agent in order to do sufficiently useful cognitive work. Pick some function of which an oracle would let you save the world.

Comment by Gurkenglas on Constructive Cauchy sequences vs. Dedekind cuts · 2024-03-18T00:55:53.182Z · LW · GW

Hmmmm. What if I said "an enumeration of the first-order theory of (union(Q,{our number}),<)"? Then any number can claim to be equal to one of the constants.

Comment by Gurkenglas on What is the best argument that LLMs are shoggoths? · 2024-03-17T22:18:00.154Z · LW · GW

If Earth had intelligent species with different minds, an LLM could end up identical to a member of at most one of them.

Comment by Gurkenglas on The Worst Form Of Government (Except For Everything Else We've Tried) · 2024-03-17T21:43:21.892Z · LW · GW

Is the idea that "they seceded because we broke their veto" is more of a casus belli than "we can't break their veto"?

Comment by Gurkenglas on Constructive Cauchy sequences vs. Dedekind cuts · 2024-03-17T19:30:01.394Z · LW · GW

Sure! Fortunately, while you can use this to prove any rational real innocent of being irrational, you can't use this to prove any irrational real guilty of being irrational, since every first-order formula can only check against finitely many constants.

Comment by Gurkenglas on Constructive Cauchy sequences vs. Dedekind cuts · 2024-03-17T08:20:01.432Z · LW · GW

Chaitin's constant, right. I should have taken my own advice and said "an enumeration of all properties of our number that can be written in the first-order logic (Q,<)".

Comment by Gurkenglas on Constructive Cauchy sequences vs. Dedekind cuts · 2024-03-17T00:02:08.600Z · LW · GW

Oh, I misunderstood the point of your first paragraph. What if we require an enumeration of all rationals our number is greater than?

Comment by Gurkenglas on Constructive Cauchy sequences vs. Dedekind cuts · 2024-03-16T09:37:39.534Z · LW · GW

If you want to transfer definitions into another context (constructive, in this case), you should treat such concrete, intuitive properties as theorems, not axioms, because the abstract formulation will generalize further. (remark: "close" is about distances, not order.)

If constructivism adds a degree of freedom in the definition of convergence, I'd try to use it to rescue the theorem that the Dedekindorder and Cauchydistance structures on ℚ agree about the completion. Potential rewards include survival of the theory built on top and evidence about the ideal definition of convergence. (I bet it's not epsilon/N, because why would a natural property of maps from ℕ to ℚ introduce the variable of type ℚ before the variable of type ℕ?)

Comment by Gurkenglas on Constructive Cauchy sequences vs. Dedekind cuts · 2024-03-16T02:24:35.631Z · LW · GW

I claim Dedekind cuts should be defined in a less hardcoded manner. Galaxy brain meme:

  • An irrational number is something that can sneak into (Q,<), such as sqrt(2)="the number which is greater than all rational numbers whose square is less than 2". So infinity is not a real number because there is no greatest rational number, and epsilon is not a real number because there is no smallest rational number greater than zero.
  • An irrational number is a one-element elementary extension of (Q,<). (Of course, the proper definition would waive the constraint that the new element be original, instead of treating rationals and irrationals separately.)
  • The real numbers are the colimit of the finite elementary extensions of (Q,<).

I claim Cauchy sequences should be defined in a less hardcoded manner, too: A sequence is Cauchy (e.g. in (Q,Euclidean distance)) iff it converges in some (wlog one-element) extension of the space.

Comment by Gurkenglas on Failed promises and disrespect from the EA Infrastructure Fund · 2024-03-08T19:33:52.785Z · LW · GW

Yeah, the TLDR sounds worse than the story, so the story might aound worse than the correspondence.

But Igor presumably had some reasoning for not publishing it immediately. Preserving privacy? An opportunity for the fund to save face? The former would have worked better without the name drop, and the latter seems antithetical to local culture...

Comment by Gurkenglas on evhub's Shortform · 2024-03-06T20:48:46.353Z · LW · GW

If a future decision is to shape the present, we need to predict it.

The decision-theoretic strategy "Figure out where you are, then act accordingly." is merely an approximation to "Use the policy that leads to the multiverse you prefer.". You *can* bring your present loyalties with you behind the veil, it might just start to feel farcically Goodhartish at some point.

There are of course no probabilities of being born into one position or another, there are only various avatars through which your decisions affect the multiverse. The closest thing to probabilities you'll find is how much leverage each avatar offers: The least wrong probabilistic anthropics translates "the effect of your decisions through avatar A is twice as important as through avatar B" into "you are twice as likely to be A as B".

So if we need probabilities of being born early vs. late, we can compare their leverage. We find:

  • Quantum physics shows that the timeline splits a bazillion times a second. So each second, you become a bazillion yous, but the portions of the multiverse you could first-order impact are divided among them. Therefore, you aren't significantly more or less likely to find yourself a second earlier or later.
  • Astronomy shows that there's a mazillion stars up there. So we build a Dyson sphere and huge artificial womb clusters, and one generation later we launch one colony ship at each star. But in that generation, the fate of the universe becomes a lot more certain, so we should expect to find ourselves before that point, not after.
  • Physics shows that several constants are finely tuned to support organized matter. We can infer that elsewhere, they aren't. Since you'd think that there are other, less precarious arrangements of physical law with complex consequences, we can also moderately update towards that very precariousness granting us unusual leverage about something valuable in the acausal marketplace.
  • History shows that we got lucky during the Cold War. We can slightly update towards:
    • Current events are important.
    • Current events are more likely after a Cold War.
    • Nuclear winter would settle the universe's fate.
  • The news show that ours is the era of inadequate AI alignment theory. We can moderately update towards being in a position to affect that.
Comment by Gurkenglas on Are we so good to simulate? · 2024-03-05T23:28:50.724Z · LW · GW

Can the simulators tell whether an AI is dumb or just playing dumb, though? You can get the right meme out there with a very light touch.

Yeah, it'd be safer to skip the simulations altogether and just build a philosopher from the criteria by which you were going to select a civilization.

To be blunt, sample a published piece of philosophy! Its author wanted others to adopt it. But you're well within your rights to go "If this set is so large, surely it has an element?", so here's a fun couple paragraphs on the topic.

Comment by Gurkenglas on Are we so good to simulate? · 2024-03-05T00:52:18.093Z · LW · GW

If an AI intuits that policy, it can subvert it - nothing says that it has to announce its presence, or openly take over immediately. Shutting it down when they build computers should work.

If the "human in a box" degenerates into a loop like LLMs do, try the next species.

I agree on your last paragraph, though humans have produced loads of philosophy that both works for them and benefits them for others to adopt.

Comment by Gurkenglas on Are we so good to simulate? · 2024-03-04T21:32:50.949Z · LW · GW

How do you tell when to stop the simulation? Apparently not at the almost human-level AI we have now.

Do you have an example piece of philosophical progress made by a civilization?

I admit that the human could turn against you, but if a human can eat you, you certainly shouldn't be watching a planet full of humans.

Comment by Gurkenglas on Are we so good to simulate? · 2024-03-04T19:44:11.168Z · LW · GW

Sorry, our timeline is dangerous because we're on track to create AI that can eat unsophisticated simulators for breakfast, such as by helpfully handing them a "solution to philosophy".

Yes, instantiate a philosopher. Not having solved philosophy is a good reason to use fewer moving parts you don't understand. Just because you can use arbitrary compute doesn't mean you should.

Comment by Gurkenglas on Are we so good to simulate? · 2024-03-04T18:05:16.958Z · LW · GW

You'd be a proper fool to simulate the Call of Cthulhu timeline before solving philosophy.

That said, if you can steal the philosophy, why not steal the philosopher?

Comment by Gurkenglas on Are we so good to simulate? · 2024-03-04T17:29:21.171Z · LW · GW

Building an ancestor sim for intellectual labor is like building the Matrix for energy production. You simulate a timeline to figure out what happens there.

That said, the decision-theoretic strategy of "figure out where you are, then act accordingly" is just an approximation to "follow the policy that produces the multiverse you want", so counting a number of simulations is silly: Every future ancestor sim merely grants your decisions an extra way to affect a timeline they could already affect through your meatspace avatar.

Comment by Gurkenglas on the gears to ascenscion's Shortform · 2024-02-24T12:25:54.114Z · LW · GW

I previously told an org incubator one simple idea against failure cases like this. Do you think you should have tried the like?

Funnily enough I spotted this at the top of lesslong on the way to write the following, so let's do it here:

What less simple ideas are there? Can an option to buy an org be conditional on arbitrary hard facts such as an arbitrator finding it in breach of a promise?

My idea can be Goodharted through its reliance on what the org seems to be worth, though "This only spawns secret AI labs." isn't all bad. Add a cheaper option to audit the company?

It can also be Goodharted through its reliance on what the org seems to be worth. OpenAI shows that devs can just walk out.

Comment by Gurkenglas on Debating with More Persuasive LLMs Leads to More Truthful Answers · 2024-02-11T17:05:28.220Z · LW · GW

You hand-patched several inadequacies out of the judge. Shouldn't you use the techniques that made the debaters more persuasive to make the judge more accurate?

Comment by Gurkenglas on Natural Latents: The Math · 2024-01-23T23:40:41.117Z · LW · GW

Absent feedback, today I read further, to the premise of the maxent conjecture. Let X be 100 numbers up to 1 million, rerolled until the remainder of their sum modulo 1000000 ends up 0 or 1. (X' will have sum-remainder circa 50 or circa -50.) Given X', X1 has a 25%/50%/25% pattern around X'1. Given X2 through X100, X1 has a 50%/50% distribution. So the (First/Strong) Universal Natural Latent Conjecture fails, right?

Comment by Gurkenglas on Gurkenglas's Shortform · 2024-01-19T11:50:20.103Z · LW · GW

I claim that the way to properly solve embedded agency is to do abstract agent foundations such that embedded agency falls out naturally as one adds an embedding.

In the abstract, an agent doesn't terminally care to use an ability to modify its utility function.

Suppose a clique of spherical children in a vacuum [edit: ...pictured on the right] found each other by selecting for their utility functions to be equal on all situations considered so far. They invest in their ability to work together, as nature incentivizes them to

They face a coordination problem: As they encounter new situations, they might find disagreements. Thus, they agree to shift their utility functions precisely in the direction of satisfying whatever each other's preferences turn out to be.

This is the simplest case I yet see where alignment as a concept falls out explicitly. It smells like it fails to scale in any number of ways, which is worrisome for our prospects. Another point for not trying to build a utility maximizer.

Comment by Gurkenglas on Does literacy remove your ability to be a bard as good as Homer? · 2024-01-18T10:50:39.911Z · LW · GW

What do you mean by them memorizing the songs, if they don't repeat them word for word? Do you only require that all the events in the version they heard happen again in the version they sing? Are there audio recordings of their singing? Those should help reduce confusion here.

Comment by Gurkenglas on Monitoring devices I have loved · 2024-01-13T17:11:50.271Z · LW · GW

A USB microscope. Just point it at an arbitrary thing and learn more about it! (Say "Examine" for good luck.)

I don't have the following, but I wish I did: A heat camera, an ultrasound probe, a sound camera, an e-nose. Sensors ought to have high bandwidth, in order to give you a chance to notice any anomalies.

Comment by Gurkenglas on Bayesians Commit the Gambler's Fallacy · 2024-01-08T23:35:24.685Z · LW · GW

Then all zeroes maps to all zeroes.

Comment by Gurkenglas on Bayesians Commit the Gambler's Fallacy · 2024-01-08T09:46:57.887Z · LW · GW

(1,1,1,1,1,1,1,1,1) maps to (1,0,0,0,0,0,0,0,0).

Comment by Gurkenglas on Natural Latents: The Math · 2024-01-02T21:06:27.136Z · LW · GW

Fix some atom of information. It's contained in some of Lambda, X1, X2, and Lambda'. Call the corresponding four statements a,b,c,d. Then you assume "b&c implies a, c&d implies b, b&d implies c, a&d implies b or c.".

These compress into "b&c implies a, d implies a=b=c."; after concluding that, I read that you conclude "d&(b or c) implies a", which seems to be a special case. My approach feels too gainfully simpler, so I'm checking in to ask whether it fails.

Comment by Gurkenglas on AIOS · 2023-12-31T16:58:39.911Z · LW · GW

3.4 × 10^44

Where is your reductio getting these numbers?

Comment by Gurkenglas on Does ChatGPT know what a tragedy is? · 2023-12-31T16:20:34.448Z · LW · GW

You can increase its chances by telling it not to write the bottom line first.

Comment by Gurkenglas on AI Safety Chatbot · 2023-12-24T15:08:48.007Z · LW · GW

rename your "logs" directory to "sources"

Comment by Gurkenglas on AI Safety Chatbot · 2023-12-23T00:53:20.333Z · LW · GW

you can have a bot search the logs for feedback. (or tell people to say "feedback".)

Comment by Gurkenglas on AI Safety Chatbot · 2023-12-22T15:43:15.861Z · LW · GW

please use this link to provide feedback

people can just tell the bot. you have logs, right? right?

Comment by Gurkenglas on “Dirty concepts” in AI alignment discourses, and some guesses for how to deal with them · 2023-12-22T12:06:26.132Z · LW · GW

This social problem sounds like it has a technical solution! There exist browser addons that let readers publicly annotate text. There could easily exist one that uses an LLM to detect ambiguous phrasings and publish one or more annotated interpretations.

Comment by Gurkenglas on Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision · 2023-12-19T02:04:19.547Z · LW · GW

It looks like, not having enough data to train a strong model, you're using data generated by a weaker model. How is this alignment work? All you seem to measure is capabilities.

Comment by Gurkenglas on An attempt at a "good enough" solution for human two-party negotiations · 2023-12-17T16:45:10.384Z · LW · GW

Are the probabilities that your tool calculates for whether each party accepts choosable to incentivize this?

Comment by Gurkenglas on Predicting the future with the power of the Internet (and pissing off Rob Miles) · 2023-12-16T15:07:13.397Z · LW · GW

Ah, but what is the average trader's profit?

Comment by Gurkenglas on Predicting the future with the power of the Internet (and pissing off Rob Miles) · 2023-12-16T04:10:36.304Z · LW · GW

The sillyness may get you more users but don't be surprised when the users you get are silly.

Comment by Gurkenglas on EU policymakers reach an agreement on the AI Act · 2023-12-15T23:23:39.829Z · LW · GW

Companies that fail to comply with these rules face fines up to “35 million euro or 7% of global revenue,” whichever is higher.

What about noncompanies?

It bans the use of AI for:

This is a list of six entries. What happens when someone thinks of a seventh?

Comment by Gurkenglas on How LDT helps reduce the AI arms race · 2023-12-10T23:06:06.313Z · LW · GW

You can do this with less superintelligence and less LDT: Before you eat the light cone, build a secret decisive strategic advantage, wait for the next AI to come along, and watch what it decides to self-modify into.

Comment by Gurkenglas on The Offense-Defense Balance Rarely Changes · 2023-12-09T18:28:33.944Z · LW · GW

I put forward the stability-reason of control mechanisms, a prominent recent example being covid. Low-hanging attack-fruit are picked, we stop doing the things that make us vulnerable, and impose quarantines, and ban labs.

AI has those, too. But calling that an argument against AI danger is a decision-theoretic short-circuit.

Comment by Gurkenglas on Apocalypse insurance, and the hardline libertarian take on AI risk · 2023-11-30T12:01:55.579Z · LW · GW

In theory, one billionth of the present buys one billionth of the future: Go to a casino, put all on black until you can buy the planet.

Therefore, they can buy their insurance policy with dollars. If you can't buy half the planet, apparently you can't afford a 50% chance to kill everyone.

Comment by Gurkenglas on AI Safety Research Organization Incubation Program - Expression of Interest · 2023-11-29T22:57:45.689Z · LW · GW

Retain an option to buy the org later for a billion dollars, reducing their incentive to become worth more than a billion dollars.

Comment by Gurkenglas on Portable Chargers are Great · 2023-11-23T14:10:34.251Z · LW · GW

Surely it should be a chassis for standard batteries, so you can take as much capacity as you need, replace lost maximum capacity, and have non-rechargeable emergency reserves.

Comment by Gurkenglas on AI Safety Research Organization Incubation Program - Expression of Interest · 2023-11-21T17:49:46.301Z · LW · GW

All the leading AI labs so far seem to have come from attempts to found AI safety orgs. Do you have a plan against that failure case?

Comment by Gurkenglas on Reinforcement Via Giving People Cookies · 2023-11-15T11:36:22.803Z · LW · GW

Can the relevant social norms be compactly formalized? It might sound vaguely like: "Only model people in ways they endorse."

Comment by Gurkenglas on Game Theory without Argmax [Part 1] · 2023-11-11T16:20:05.007Z · LW · GW

The power set part seems sus. Have you considered something more continuous?

Comment by Gurkenglas on Would you like me to debug your math? · 2023-11-10T15:28:29.243Z · LW · GW

rereads comment section a year later

Whoops, I probably should have put a public reply here if I wanted anyone to ever book a call again ^^.

I'm still up for this.

Comment by Gurkenglas on Everyday Lessons from High-Dimensional Optimization · 2023-11-07T22:57:15.087Z · LW · GW

As far as I saw, you were getting mixed up on that. We never compare the parameter-vectors for being greater than/less than each other; they aren't ordered.

(No, if some parameter started out with such values as 4+3i or 5, the ln transformation would not equate them. But multiplying both by e^0.01 would add 0.01 to both logarithms, regardless of previous units.)

Comment by Gurkenglas on 8 examples informing my pessimism on uploading without reverse engineering · 2023-11-04T07:14:14.468Z · LW · GW

Oh, okay.

My second sentence meant "If neurology AI can do WBE, a slightly (on a grand scale) more superhuman AI could do it without reverse engineering.". But actually we could just have the AI reverse-engineer the brain, then obfuscate the upload, then delete the AI.

Suppose the company gets bought and they try to improve the upload's performance without understanding it. My third sentence meant, would they find that the Algernon argument applies to uploads?

Comment by Gurkenglas on 8 examples informing my pessimism on uploading without reverse engineering · 2023-11-03T23:34:19.624Z · LW · GW

2.5 sounds like we're just way off and shouldn't expect to get WBE until superhuman neurology AI. But given that, the remaining difficulties seem just another constant factor. The question becomes, do you expect "If you change random bits and try to run it, it mostly just breaks." to hold up?