Posts

A Brief Theology of D&D 2022-04-01T12:47:19.394Z
Would you like me to debug your math? 2021-06-11T10:54:58.018Z
Domain Theory and the Prisoner's Dilemma: FairBot 2021-05-07T07:33:41.784Z
Changing the AI race payoff matrix 2020-11-22T22:25:18.355Z
Using GPT-N to Solve Interpretability of Neural Networks: A Research Agenda 2020-09-03T18:27:05.860Z
Mapping Out Alignment 2020-08-15T01:02:31.489Z
What are some good public contribution opportunities? (100$ bounty) 2020-06-18T14:47:51.661Z
Gurkenglas's Shortform 2019-08-04T18:46:34.953Z
Implications of GPT-2 2019-02-18T10:57:04.720Z
What shape has mindspace? 2019-01-11T16:28:47.522Z
A simple approach to 5-and-10 2018-12-17T18:33:46.735Z
Quantum AI Goal 2018-06-08T16:55:22.610Z
Quantum AI Box 2018-06-08T16:20:24.962Z
A line of defense against unfriendly outcomes: Grover's Algorithm 2018-06-05T00:59:46.993Z

Comments

Comment by Gurkenglas on Please do not use AI to write for you · 2024-08-21T13:14:16.535Z · LW · GW

Your experiment is contaminated: If a piece of training document said that AI texts are overly verbose, and then announced that the following is a piece of AI-written text, it'd be a natural guess that the document would continue with overly verbose text, and so that's what an autocomplete engine will generate.

Due to RLHF, AI is no longer cleanly modelled as an autocomplete engine, but the point stands. For science, you could try having AI assist in the writing of an article making the opposite claim :).

Comment by Gurkenglas on Practical advice for secure virtual communication post easy AI voice-cloning? · 2024-08-09T21:59:31.217Z · LW · GW

Ask something only they would know.

Comment by Gurkenglas on Quinn's Shortform · 2024-07-31T22:44:11.762Z · LW · GW

Among monotonic, boolean quantifiers that don't ignore their input, exists is maximal because it returns true as often as possible; forall is minimal because it returns true as rarely as possible.

Comment by Gurkenglas on The Case Against UBI · 2024-07-27T10:02:18.322Z · LW · GW

For concreteness, let's say the basic income is the same in every city, same for a paraplegic or Elon Musk. Anyone who can vote gets it, it's a dividend on your share of the country.

I am surprised at section 3; I don't remember anyone who seriously argues that women should be dependent on men. By amusing coincidence, my last paragraph makes your reasoning out of scope; you can abolish women's suffrage in a separate bill.

In section 5, you are led astray by assuming a fixed demand for labor. You notice that we have yet to become obsolete. Well, of course: For as long as human inputs remain cheaper than their outputs, employment statistics will fail to reflect our dwindling comparative advantage. But we are on track to turn every graphics card into a cheaper white collar worker. Humans have to be trained for jobs, software can be copied. Human hands might remain SOTA for a few years longer. Horses weren't reduced to pets because we built too many cars, but because cars became possible to build.

Comment by Gurkenglas on AI #74: GPT-4o Mini Me and Llama 3 · 2024-07-25T13:59:36.160Z · LW · GW

factor out alpha

⌊x⌋ is floor(x), the greatest integer that's at most x.

Comment by Gurkenglas on Failures in Kindness · 2024-07-22T18:11:26.539Z · LW · GW

People with sufficiently good models of each other to use them in their social protocols.

Comment by Gurkenglas on What are you getting paid in? · 2024-07-18T06:52:16.152Z · LW · GW

I'd call those absences of drawbacks, not benefits - you would have had them without the job.

Comment by Gurkenglas on Why the Best Writers Endure Isolation · 2024-07-17T10:36:13.027Z · LW · GW

I was alone in a room of computers, and I had set out to take no positive action but grading homework. I ended up sitting and pacing and occasionally moving the mouse in the direction it would need to go next. What I remember of what my mind was on was the misery of the situation.

Comment by Gurkenglas on Why the Best Writers Endure Isolation · 2024-07-17T07:50:24.816Z · LW · GW

I tried that for a weekend once. I did nothing.

Comment by Gurkenglas on Medical Roundup #3 · 2024-07-09T15:23:42.932Z · LW · GW

It has been pointed out to me that no, what this presumably means is the past decisions of the patients.

 

Q2 Is it ethically permissible to consider an individual’s past decisions when determining their
access to medical resources?

Comment by Gurkenglas on An AI Race With China Can Be Better Than Not Racing · 2024-07-02T19:02:55.865Z · LW · GW

You assume the conclusion:

A lot of the AI alignment success seems to me stem from the question of whether the problem is easy or not, and is not very elastic to human effort.

AI races are bad because they select for contestants that put in less alignment effort.

Comment by Gurkenglas on "No-one in my org puts money in their pension" · 2024-05-18T11:15:53.748Z · LW · GW

Sure, he's trying to cause alarm via alleged excerpts from his life. Surely society should have some way to move to a state of alarm iff that's appropriate, do you see a better protocol than this one?

Comment by Gurkenglas on Forget Everything (Statistical Mechanics Part 1) · 2024-05-13T15:34:39.960Z · LW · GW

Recall that every vector space is the finitely supported functions from some set to ℝ, and every Hilbert space is the square-integrable functions from some measure space to ℝ.

I'm guessing that similarly, the physical theory that you're putting in terms of maximizing entropy lies in a large class of "Bostock" theories such that we could put each of them in terms of maximizing entropy, by warping the space with respect to which we're computing entropy. Do you have an idea of the operators and properties that define a Bostock theory?

Comment by Gurkenglas on Social status part 1/2: negotiations over object-level preferences · 2024-03-28T10:25:10.105Z · LW · GW

that thing about affine transformations

If the purpose of a utility function is to provide evidence about the behavior of the group, we can preprocess the data structure into that form: Suppose Alice may update the distribution over group decisions by ε. Then the direction she pushes in is her utility function, and the constraints "add up to 100%" and "size ε" cancel out the "affine transformation" degrees of freedom. Now such directions can be added up.

Comment by Gurkenglas on "Deep Learning" Is Function Approximation · 2024-03-23T23:13:55.394Z · LW · GW

Let's investigate whether functions must necessarily contain an agent in order to do sufficiently useful cognitive work. Pick some function of which an oracle would let you save the world.

Comment by Gurkenglas on Constructive Cauchy sequences vs. Dedekind cuts · 2024-03-18T00:55:53.182Z · LW · GW

Hmmmm. What if I said "an enumeration of the first-order theory of (union(Q,{our number}),<)"? Then any number can claim to be equal to one of the constants.

Comment by Gurkenglas on What is the best argument that LLMs are shoggoths? · 2024-03-17T22:18:00.154Z · LW · GW

If Earth had intelligent species with different minds, an LLM could end up identical to a member of at most one of them.

Comment by Gurkenglas on The Worst Form Of Government (Except For Everything Else We've Tried) · 2024-03-17T21:43:21.892Z · LW · GW

Is the idea that "they seceded because we broke their veto" is more of a casus belli than "we can't break their veto"?

Comment by Gurkenglas on Constructive Cauchy sequences vs. Dedekind cuts · 2024-03-17T19:30:01.394Z · LW · GW

Sure! Fortunately, while you can use this to prove any rational real innocent of being irrational, you can't use this to prove any irrational real guilty of being irrational, since every first-order formula can only check against finitely many constants.

Comment by Gurkenglas on Constructive Cauchy sequences vs. Dedekind cuts · 2024-03-17T08:20:01.432Z · LW · GW

Chaitin's constant, right. I should have taken my own advice and said "an enumeration of all properties of our number that can be written in the first-order logic (Q,<)".

Comment by Gurkenglas on Constructive Cauchy sequences vs. Dedekind cuts · 2024-03-17T00:02:08.600Z · LW · GW

Oh, I misunderstood the point of your first paragraph. What if we require an enumeration of all rationals our number is greater than?

Comment by Gurkenglas on Constructive Cauchy sequences vs. Dedekind cuts · 2024-03-16T09:37:39.534Z · LW · GW

If you want to transfer definitions into another context (constructive, in this case), you should treat such concrete, intuitive properties as theorems, not axioms, because the abstract formulation will generalize further. (remark: "close" is about distances, not order.)

If constructivism adds a degree of freedom in the definition of convergence, I'd try to use it to rescue the theorem that the Dedekindorder and Cauchydistance structures on ℚ agree about the completion. Potential rewards include survival of the theory built on top and evidence about the ideal definition of convergence. (I bet it's not epsilon/N, because why would a natural property of maps from ℕ to ℚ introduce the variable of type ℚ before the variable of type ℕ?)

Comment by Gurkenglas on Constructive Cauchy sequences vs. Dedekind cuts · 2024-03-16T02:24:35.631Z · LW · GW

I claim Dedekind cuts should be defined in a less hardcoded manner. Galaxy brain meme:

  • An irrational number is something that can sneak into (Q,<), such as sqrt(2)="the number which is greater than all rational numbers whose square is less than 2". So infinity is not a real number because there is no greatest rational number, and epsilon is not a real number because there is no smallest rational number greater than zero.
  • An irrational number is a one-element elementary extension of (Q,<). (Of course, the proper definition would waive the constraint that the new element be original, instead of treating rationals and irrationals separately.)
  • The real numbers are the colimit of the finite elementary extensions of (Q,<).

I claim Cauchy sequences should be defined in a less hardcoded manner, too: A sequence is Cauchy (e.g. in (Q,Euclidean distance)) iff it converges in some (wlog one-element) extension of the space.

Comment by Gurkenglas on [deleted post] 2024-03-08T19:33:52.785Z

Yeah, the TLDR sounds worse than the story, so the story might aound worse than the correspondence.

But Igor presumably had some reasoning for not publishing it immediately. Preserving privacy? An opportunity for the fund to save face? The former would have worked better without the name drop, and the latter seems antithetical to local culture...

Comment by Gurkenglas on evhub's Shortform · 2024-03-06T20:48:46.353Z · LW · GW

If a future decision is to shape the present, we need to predict it.

The decision-theoretic strategy "Figure out where you are, then act accordingly." is merely an approximation to "Use the policy that leads to the multiverse you prefer.". You *can* bring your present loyalties with you behind the veil, it might just start to feel farcically Goodhartish at some point.

There are of course no probabilities of being born into one position or another, there are only various avatars through which your decisions affect the multiverse. The closest thing to probabilities you'll find is how much leverage each avatar offers: The least wrong probabilistic anthropics translates "the effect of your decisions through avatar A is twice as important as through avatar B" into "you are twice as likely to be A as B".

So if we need probabilities of being born early vs. late, we can compare their leverage. We find:

  • Quantum physics shows that the timeline splits a bazillion times a second. So each second, you become a bazillion yous, but the portions of the multiverse you could first-order impact are divided among them. Therefore, you aren't significantly more or less likely to find yourself a second earlier or later.
  • Astronomy shows that there's a mazillion stars up there. So we build a Dyson sphere and huge artificial womb clusters, and one generation later we launch one colony ship at each star. But in that generation, the fate of the universe becomes a lot more certain, so we should expect to find ourselves before that point, not after.
  • Physics shows that several constants are finely tuned to support organized matter. We can infer that elsewhere, they aren't. Since you'd think that there are other, less precarious arrangements of physical law with complex consequences, we can also moderately update towards that very precariousness granting us unusual leverage about something valuable in the acausal marketplace.
  • History shows that we got lucky during the Cold War. We can slightly update towards:
    • Current events are important.
    • Current events are more likely after a Cold War.
    • Nuclear winter would settle the universe's fate.
  • The news show that ours is the era of inadequate AI alignment theory. We can moderately update towards being in a position to affect that.
Comment by Gurkenglas on Are we so good to simulate? · 2024-03-05T23:28:50.724Z · LW · GW

Can the simulators tell whether an AI is dumb or just playing dumb, though? You can get the right meme out there with a very light touch.

Yeah, it'd be safer to skip the simulations altogether and just build a philosopher from the criteria by which you were going to select a civilization.

To be blunt, sample a published piece of philosophy! Its author wanted others to adopt it. But you're well within your rights to go "If this set is so large, surely it has an element?", so here's a fun couple paragraphs on the topic.

Comment by Gurkenglas on Are we so good to simulate? · 2024-03-05T00:52:18.093Z · LW · GW

If an AI intuits that policy, it can subvert it - nothing says that it has to announce its presence, or openly take over immediately. Shutting it down when they build computers should work.

If the "human in a box" degenerates into a loop like LLMs do, try the next species.

I agree on your last paragraph, though humans have produced loads of philosophy that both works for them and benefits them for others to adopt.

Comment by Gurkenglas on Are we so good to simulate? · 2024-03-04T21:32:50.949Z · LW · GW

How do you tell when to stop the simulation? Apparently not at the almost human-level AI we have now.

Do you have an example piece of philosophical progress made by a civilization?

I admit that the human could turn against you, but if a human can eat you, you certainly shouldn't be watching a planet full of humans.

Comment by Gurkenglas on Are we so good to simulate? · 2024-03-04T19:44:11.168Z · LW · GW

Sorry, our timeline is dangerous because we're on track to create AI that can eat unsophisticated simulators for breakfast, such as by helpfully handing them a "solution to philosophy".

Yes, instantiate a philosopher. Not having solved philosophy is a good reason to use fewer moving parts you don't understand. Just because you can use arbitrary compute doesn't mean you should.

Comment by Gurkenglas on Are we so good to simulate? · 2024-03-04T18:05:16.958Z · LW · GW

You'd be a proper fool to simulate the Call of Cthulhu timeline before solving philosophy.

That said, if you can steal the philosophy, why not steal the philosopher?

Comment by Gurkenglas on Are we so good to simulate? · 2024-03-04T17:29:21.171Z · LW · GW

Building an ancestor sim for intellectual labor is like building the Matrix for energy production. You simulate a timeline to figure out what happens there.

That said, the decision-theoretic strategy of "figure out where you are, then act accordingly" is just an approximation to "follow the policy that produces the multiverse you want", so counting a number of simulations is silly: Every future ancestor sim merely grants your decisions an extra way to affect a timeline they could already affect through your meatspace avatar.

Comment by Gurkenglas on the gears to ascenscion's Shortform · 2024-02-24T12:25:54.114Z · LW · GW

I previously told an org incubator one simple idea against failure cases like this. Do you think you should have tried the like?

Funnily enough I spotted this at the top of lesslong on the way to write the following, so let's do it here:

What less simple ideas are there? Can an option to buy an org be conditional on arbitrary hard facts such as an arbitrator finding it in breach of a promise?

My idea can be Goodharted through its reliance on what the org seems to be worth, though "This only spawns secret AI labs." isn't all bad. Add a cheaper option to audit the company?

It can also be Goodharted through its reliance on what the org seems to be worth. OpenAI shows that devs can just walk out.

Comment by Gurkenglas on Debating with More Persuasive LLMs Leads to More Truthful Answers · 2024-02-11T17:05:28.220Z · LW · GW

You hand-patched several inadequacies out of the judge. Shouldn't you use the techniques that made the debaters more persuasive to make the judge more accurate?

Comment by Gurkenglas on Natural Latents: The Math · 2024-01-23T23:40:41.117Z · LW · GW

Absent feedback, today I read further, to the premise of the maxent conjecture. Let X be 100 numbers up to 1 million, rerolled until the remainder of their sum modulo 1000000 ends up 0 or 1. (X' will have sum-remainder circa 50 or circa -50.) Given X', X1 has a 25%/50%/25% pattern around X'1. Given X2 through X100, X1 has a 50%/50% distribution. So the (First/Strong) Universal Natural Latent Conjecture fails, right?

Comment by Gurkenglas on Gurkenglas's Shortform · 2024-01-19T11:50:20.103Z · LW · GW

I claim that the way to properly solve embedded agency is to do abstract agent foundations such that embedded agency falls out naturally as one adds an embedding.

In the abstract, an agent doesn't terminally care to use an ability to modify its utility function.

Suppose a clique of spherical children in a vacuum [edit: ...pictured on the right] found each other by selecting for their utility functions to be equal on all situations considered so far. They invest in their ability to work together, as nature incentivizes them to

They face a coordination problem: As they encounter new situations, they might find disagreements. Thus, they agree to shift their utility functions precisely in the direction of satisfying whatever each other's preferences turn out to be.

This is the simplest case I yet see where alignment as a concept falls out explicitly. It smells like it fails to scale in any number of ways, which is worrisome for our prospects. Another point for not trying to build a utility maximizer.

Comment by Gurkenglas on Does literacy remove your ability to be a bard as good as Homer? · 2024-01-18T10:50:39.911Z · LW · GW

What do you mean by them memorizing the songs, if they don't repeat them word for word? Do you only require that all the events in the version they heard happen again in the version they sing? Are there audio recordings of their singing? Those should help reduce confusion here.

Comment by Gurkenglas on Monitoring devices I have loved · 2024-01-13T17:11:50.271Z · LW · GW

A USB microscope. Just point it at an arbitrary thing and learn more about it! (Say "Examine" for good luck.)

I don't have the following, but I wish I did: A heat camera, an ultrasound probe, a sound camera, an e-nose. Sensors ought to have high bandwidth, in order to give you a chance to notice any anomalies.

Comment by Gurkenglas on Bayesians Commit the Gambler's Fallacy · 2024-01-08T23:35:24.685Z · LW · GW

Then all zeroes maps to all zeroes.

Comment by Gurkenglas on Bayesians Commit the Gambler's Fallacy · 2024-01-08T09:46:57.887Z · LW · GW

(1,1,1,1,1,1,1,1,1) maps to (1,0,0,0,0,0,0,0,0).

Comment by Gurkenglas on Natural Latents: The Math · 2024-01-02T21:06:27.136Z · LW · GW

Fix some atom of information. It's contained in some of Lambda, X1, X2, and Lambda'. Call the corresponding four statements a,b,c,d. Then you assume "b&c implies a, c&d implies b, b&d implies c, a&d implies b or c.".

These compress into "b&c implies a, d implies a=b=c."; after concluding that, I read that you conclude "d&(b or c) implies a", which seems to be a special case. My approach feels too gainfully simpler, so I'm checking in to ask whether it fails.

Comment by Gurkenglas on AIOS · 2023-12-31T16:58:39.911Z · LW · GW

3.4 × 10^44

Where is your reductio getting these numbers?

Comment by Gurkenglas on Does ChatGPT know what a tragedy is? · 2023-12-31T16:20:34.448Z · LW · GW

You can increase its chances by telling it not to write the bottom line first.

Comment by Gurkenglas on AI Safety Chatbot · 2023-12-24T15:08:48.007Z · LW · GW

rename your "logs" directory to "sources"

Comment by Gurkenglas on AI Safety Chatbot · 2023-12-23T00:53:20.333Z · LW · GW

you can have a bot search the logs for feedback. (or tell people to say "feedback".)

Comment by Gurkenglas on AI Safety Chatbot · 2023-12-22T15:43:15.861Z · LW · GW

please use this link to provide feedback

people can just tell the bot. you have logs, right? right?

Comment by Gurkenglas on “Dirty concepts” in AI alignment discourses, and some guesses for how to deal with them · 2023-12-22T12:06:26.132Z · LW · GW

This social problem sounds like it has a technical solution! There exist browser addons that let readers publicly annotate text. There could easily exist one that uses an LLM to detect ambiguous phrasings and publish one or more annotated interpretations.

Comment by Gurkenglas on Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision · 2023-12-19T02:04:19.547Z · LW · GW

It looks like, not having enough data to train a strong model, you're using data generated by a weaker model. How is this alignment work? All you seem to measure is capabilities.

Comment by Gurkenglas on An attempt at a "good enough" solution for human two-party negotiations · 2023-12-17T16:45:10.384Z · LW · GW

Are the probabilities that your tool calculates for whether each party accepts choosable to incentivize this?

Comment by Gurkenglas on Predicting the future with the power of the Internet (and pissing off Rob Miles) · 2023-12-16T15:07:13.397Z · LW · GW

Ah, but what is the average trader's profit?

Comment by Gurkenglas on Predicting the future with the power of the Internet (and pissing off Rob Miles) · 2023-12-16T04:10:36.304Z · LW · GW

The sillyness may get you more users but don't be surprised when the users you get are silly.