dragongod

I'm currently working through Naive Set Theory (alongside another text). I'll take this as a recommendation to work through the other textbooks later.

My maths level is insufficient for the course I'd guess.

I would appreciate it if videos of the meetings could be recorded. Or maybe I should just stick around and hope this will be run again next year.

Comment by DragonGod on We probably won't just play status games with each other after AGI · 2025-01-17T14:24:34.814Z · LW · GW

When I saw the beginning/title I thought the post would be a refutation of the material scarcity thesis; I found myself disappointed it is not.

Comment by DragonGod on DragonGod's Shortform · 2025-01-02T10:23:31.572Z · LW · GW

There is not an insignificant sense of guilt/of betraying myself from 2023 and my ambitions from before.

And I don't want to just end up doing irrelevant TCS research that only a few researchers in a niche field will ever care about.

It's not high impact research.

And it's mostly just settling. I get the sense that I enjoy theoretical research, I don't currently feel poised to contribute to the AI safety problem, I seem to have an unusually good (at least it appears so to my limited understanding) opportunity to pursue a boring TCS PhD in some niche field that few people care about.

I don't think I'll be miserable pursuing the boring TCS PhD or not enjoy it, or anything of the sort. It's just not directly contributing to what I wanted to contribute to. It's somewhat sad and it's undignified (but it's less undignified than the path I thought I was on at various points in the last 15 months).

Comment by DragonGod on DragonGod's Shortform · 2025-01-02T10:10:43.633Z · LW · GW

I still want to work on technical AI safety eventually.

I feel like I'm on quite far off path from directly being useful in 2025 than I felt in 2023.

And taking a detour to do a TCS PhD that isn't directly pertinent to AI safety (current plan) feels like not contributing.

Cope is that becoming a strong TCS researcher will make me better poised to contribute to the problem, but short timelines could make this path less viable.

[Though there's nothing saying I can't try to work on AI on the side even if it isn't the focus of my PhD.]

Comment by DragonGod on (The) Lightcone is nothing without its people: LW + Lighthaven's big fundraiser · 2024-12-06T12:51:30.568Z · LW · GW

I think LW is a valuable intellectual hub and community.

Haven't been an active participant of recent, but it's still a service I occasionally find myself relying on explicitly, and I prefer the world where it continues to exist.

[I donated $20. Am unemployed and this is a nontrivial fraction of my disposable income.]

Comment by DragonGod on DeepSeek beats o1-preview on math, ties on coding; will release weights · 2024-11-25T13:43:55.779Z · LW · GW

o1's reasoning trace also does this for different languages (IIRC I've seen Chinese and Japanese and other languages I don't recognise/recall), usually an entire paragraph not a word, but when I translated them it seemed to make sense in context.

Comment by DragonGod on Change My Mind: Thirders in "Sleeping Beauty" are Just Doing Epistemology Wrong · 2024-10-16T16:14:35.953Z · LW · GW

This is not a rhetorical question:) What do you mean by "probability" here?

Yeah, since posting this question:

I have updated towards thinking that it's in a sense not obvious/not clear what exactly "probability" is supposed to be interpreted as here.

And once you pin down an unambiguous interpretation of probability the problem dissolves.

I had a firm notion in mind for what I thought probability meant. But Rafael Harth's answer really made me unconfident that the notion I had in mind was the right notion of probability for the question.

Comment by DragonGod on Change My Mind: Thirders in "Sleeping Beauty" are Just Doing Epistemology Wrong · 2024-10-16T16:04:36.740Z · LW · GW

I have not read all of them!

Comment by DragonGod on Change My Mind: Thirders in "Sleeping Beauty" are Just Doing Epistemology Wrong · 2024-10-16T13:21:20.600Z · LW · GW

My current position now is basically:

Actually, I'm less confident and now unsure.

Harth's framing was presented as an argument re: the canonical Sleeping Beauty problem.

And the question I need to answer is: "should I accept Harth's frame?"

I am at least convinced that it is genuinely a question about how we define probability.

There is still a disconnect though.

While I agree with the frequentist answer, it's not clear to me how to backgpropagate this in a Bayesian framework.

Suppose I treat myself as identical to all other agents in the reference class.

I know that my reference class will do better if we answer "tails" when asked about the outcome of the coin toss.

But it's not obvious to me that there is anything to update from when trying to do a Bayesian probability calculation.

There being many more observers in the tails world to me doesn't seem to alter these probabilities at all:

P(waking up)

P(being asked questions)

P(...)

By stipulation my observational evidence is the same in both cases.

And I am not compelled by assuming I should be randomly sampled from all observers.

There are many more versions of me in this other world does not by itself seem to raise the probability of me witnessing the observational evidence since by stipulation all versions of me witness the same evidence.

Comment by DragonGod on Change My Mind: Thirders in "Sleeping Beauty" are Just Doing Epistemology Wrong · 2024-10-16T12:49:32.835Z · LW · GW

I'm curious how your conception of probability accounts for logical uncertainty?

Comment by DragonGod on Change My Mind: Thirders in "Sleeping Beauty" are Just Doing Epistemology Wrong · 2024-10-16T12:44:35.005Z · LW · GW

So in this case, I agree that like if this experiment is repeated multiple times and every Sleeping Beauty version created answered tails, the reference class of Sleeping Beauty agents would have many more correct answers than if the experiment is repeated many times and every sleeping Beauty created answered heads.

I think there's something tangible here and I should reflect on it.

I separately think though that if the actual outcome of each coin flip was recorded, there would be a roughly equal distribution between heads and tails.

And when I was thinking through the question before it was always about trying to answer a question regarding the actual outcome of the coin flip and not what strategy maximises monetary payoffs under even bets.

While I do think that like betting odds isn't convincing re: actual probabilities because you can just have asymmetric payoffs on equally probable mutually exclusive and jointly exhaustive events, the "reference class of agents being asked this question" seems like a more robust rebuttal.

I want to take some time to think on this.

Strong up voted because this argument actually/genuinely makes me think I might be wrong here.

Much less confident now, and mostly confused.

Comment by DragonGod on Change My Mind: Thirders in "Sleeping Beauty" are Just Doing Epistemology Wrong · 2024-10-16T11:49:37.618Z · LW · GW

I mean I am not convinced by the claim that Bob is wrong.

Bob's prior probability is 50%. Bob sees no new evidence to update this prior so the probability remains at 50%.

I don't favour an objective notion of probabilities. From my OP:

2. Bayesian Reasoning

Probability is a property of the map (agent's beliefs), not the territory (environment).

For an observation O to be evidence for a hypothesis H, P(O|H) must be > P(O|¬H).

The wake-up event is equally likely under both Heads and Tails scenarios, thus provides no new information to update priors.

The original 50/50 probability should remain unchanged after waking up.

So I am unconvinced by your thought experiments? Observing nothing new I think the observers priors should remain unchanged.

I feel like I'm not getting the distinction you're trying to draw out with your analogy.

Comment by DragonGod on Change My Mind: Thirders in "Sleeping Beauty" are Just Doing Epistemology Wrong · 2024-10-16T11:44:03.902Z · LW · GW

I mean I think the "gamble her money" interpretation is just a different question. It doesn't feel to me like a different notion of what probability means, but just betting on a fair coin but with asymmetric payoffs.

The second question feels closer to actually an accurate interpretation of what probability means.

Comment by DragonGod on Uncertainty in all its flavours · 2024-01-20T22:46:49.706Z · LW · GW

i.e. if each forecaster has an first-order belief $f (w) \in B (S)$ , and $w \in B (S)$ is your second-order belief about which forecaster is correct, then $(w ⊳_{W S} f) \in B (S)$ should be your first-order belief about the election.

I think there might be a typo here. Did you instead mean to write: " $w \in B (W)$ " for the second order beliefs about the forecasters?

Comment by DragonGod on Order Matters for Deceptive Alignment · 2023-07-26T15:50:15.982Z · LW · GW

The claim is that given the presence of differential adversarial examples, the optimisation process would adjust the parameters of the model such that it's optimisation target is the base goal.

Comment by DragonGod on DragonGod's Shortform · 2023-07-25T21:16:20.418Z · LW · GW

That was it, thanks!

Comment by DragonGod on DragonGod's Shortform · 2023-07-25T20:30:29.625Z · LW · GW

Probably sometime last year, I posted on Twitter something like: "agent values are defined on agent world models" (or similar) with a link to a LessWrong post (I think the author was John Wentworth).

I'm now looking for that LessWrong post.

My Twitter account is private and search is broken for private accounts, so I haven't been able to track down the tweet. If anyone has guesses for what the post I may have been referring to was, do please send it my way.

Comment by DragonGod on DragonGod's Shortform · 2023-07-24T08:08:16.174Z · LW · GW

Most of the catastrophic risk from AI still lies in superhuman agentic systems.

Current frontier systems are not that (and IMO not poised to become that in the very immediate future).

I think AI risk advocates should be clear that they're not saying GPT-5/Claude Next is an existential threat to humanity.

[Unless they actually believe that. But if they don't, I'm a bit concerned that their message is being rounded up to that, and when such systems don't reveal themselves to be catastrophically dangerous, it might erode their credibility.]

Comment by DragonGod on DragonGod's Shortform · 2023-07-22T12:46:08.296Z · LW · GW

Immigration is such a tight constraint for me.

My next career steps after I'm done with my TCS Masters are primarily bottlenecked by "what allows me to remain in the UK" and then "keeps me on track to contribute to technical AI safety research".

What I would like to do for the next 1 - 2 years ("independent research"/ "further upskilling to get into a top ML PhD program") is not all that viable a path given my visa constraints.

Above all, I want to avoid wasting N more years by taking a detour through software engineering again so I can get Visa sponsorship.

[I'm not conscientious enough to pursue AI safety research/ML upskilling while managing a full time job.]

Might just try and see if I can pursue a TCS PhD at my current university and do TCS research that I think would be valuable for theoretical AI safety research.

The main detriment of that is I'd have to spend N more years in <city> and I was really hoping to come down to London.

Advice very, very welcome.

[Not sure who to tag.]

Comment by DragonGod on Hedonic Loops and Taming RL · 2023-07-20T15:36:47.064Z · LW · GW

Specifically, the experiments by Morrison and Berridge demonstrated that by intervening on the hypothalamic valuation circuits, it is possible to adjust policies zero-shot such that the animal has never experienced a previously repulsive stimulus as pleasurable.

I find this a bit confusing as worded, is something missing?

Comment by DragonGod on DragonGod's Shortform · 2023-07-10T18:20:04.726Z · LW · GW

Does anyone know a ChatGPT plugin for browsing documents/webpages that can read LaTeX?

The plugin I currently use (Link Reader) strips out the LaTeX in its payload, and so GPT-4 ends up hallucinating the LaTeX content of the pages I'm feeding it.

Comment by DragonGod on Ruby's Quick Takes · 2023-07-08T20:39:40.058Z · LW · GW

How frequent are moderation actions? Is this discussion about saving moderator effort (by banning someone before you have to remove the rate-limited quantity of their bad posts), or something else? I really worry about "quality improvement by prior restraint" - both because low-value posts aren't that harmful, they get downvoted and ignored pretty easily, and because it can take YEARS of trial-and-error for someone to become a good participant in LW-style discussions, and I don't want to make it impossible for the true newbies (young people discovering this style for the first time) to try, fail, learn, try, fail, get frustrated, go away, come back, and be slightly-above-neutral for a bit before really hitting their stride.

I agree with Dagon here.

Six years ago after discovering HPMOR and reading part (most?) of the Sequences, I was a bad participant in old LW and rationalist subreddits.

I would probably have been quickly banned on current LW.

It really just takes a while for people new to LW like norms to adjust.

Comment by DragonGod on DragonGod's Shortform · 2023-07-08T20:29:42.739Z · LW · GW

I find noticing surprise more valuable than noticing confusion.

Hindsight bias and post hoc rationalisations make it easy for us to gloss over events that were apriori unexpected.

Comment by DragonGod on Crystal Healing — or the Origins of Expected Utility Maximizers · 2023-06-26T11:06:55.061Z · LW · GW

I think the model of "a composition of subagents with total orders on their preferences" is a descriptive model of inexploitable incomplete preferences, and not a mechanistic model. At least, that was how I interpreted "Why Subagents?".

I read @johnswentworth as making the claim that such preferences could be modelled as a vetocracy of VNM rational agents, not as claiming that humans (or other objects of study) are mechanistically composed of discrete parts that are themselves VNM rational.

I'd be more interested/excited by a refutation on the grounds of: "incomplete inexploitable preferences are not necessarily adequately modelled as a vetocracy of parts with complete preferences". VNM rationality and expected utility maximisation is mostly used as a descriptive rather than mechanistic tool anyway.

Comment by DragonGod on Crystal Healing — or the Origins of Expected Utility Maximizers · 2023-06-25T20:11:44.539Z · LW · GW

Oh, do please share.

Comment by DragonGod on Crystal Healing — or the Origins of Expected Utility Maximizers · 2023-06-25T19:57:10.550Z · LW · GW

Suppose it is offered (by a third party) to switch and then $B \to A + $ 1$

Seems incomplete (pun acknowledged). I feel like there's something missing after "to switch" (e.g. "to switch from A to B" or similar).

Comment by DragonGod on Crystal Healing — or the Origins of Expected Utility Maximizers · 2023-06-25T19:46:31.080Z · LW · GW

Another example is an agent through time where as in the Steward of Myselves

This links to Scott Garrabrant's page, not to any particular post. Perhaps you want to review that?

I think you meant to link to: Tyranny of the Epistemic Majority.

Comment by DragonGod on AXRP Episode 22 - Shard Theory with Quintin Pope · 2023-06-17T06:41:18.256Z · LW · GW

It's working now!

https://podcasts.google.com/feed/aHR0cHM6Ly9heHJwb2RjYXN0LmxpYnN5bi5jb20vcnNz/episode/ODVlM2RkNmItMTdkZi00MWYwLTg2YjAtOWIxY2JkOTBlYjgw?ep=14

Comment by DragonGod on AXRP Episode 22 - Shard Theory with Quintin Pope · 2023-06-16T19:03:46.850Z · LW · GW

Ditto for me.

Comment by DragonGod on AXRP Episode 22 - Shard Theory with Quintin Pope · 2023-06-16T19:02:55.579Z · LW · GW

I've been waiting for this!

Comment by DragonGod on ARC's first technical report: Eliciting Latent Knowledge · 2023-06-15T14:08:36.730Z · LW · GW

We aren’t offering these criteria as necessary for “knowledge”—we could imagine a breaker proposing a counterexample where all of these properties are satisfied but where intuitively M didn’t really know that A′ was a better answer. In that case the builder will try to make a convincing argument to that effect.

Bolded should be sufficient.

Comment by DragonGod on In Defense of Wrapper-Minds · 2023-06-05T08:44:19.181Z · LW · GW

In fact, I'm pretty sure that's how humans work most of the time. We use the general-intelligence machinery to "steer" ourselves at a high level, and most of the time, we operate on autopilot.

Yeah, I agree with this. But I don't think the human system aggregates into any kind of coherent total optimiser. Humans don't have an objective function (not even approximately?).

A human is not well modelled as a wrapper mind; do you disagree?

Comment by DragonGod on In Defense of Wrapper-Minds · 2023-06-04T19:04:11.991Z · LW · GW

Thus, any greedy optimization algorithm would convergently shape its agent to not only pursue , but to maximize for $R$ 's pursuit — at the expense of everything else.

Conditional on:

Such a system being reachable/accessible to our local/greedy optimisation process
Such a system being actually performant according to the selection metric of our optimisation process

I'm pretty sceptical of #2. I'm sceptical that systems that perform inference via direct optimisation over their outputs are competitive in rich/complex environments.

Such optimisation is very computationally intensive compared to executing learned heuristics, and it seems likely that the selection process would have access to much more compute than the selected system.

Comment by DragonGod on Yudkowsky vs Hanson on FOOM: Whose Predictions Were Better? · 2023-06-04T06:50:33.955Z · LW · GW

Do please read the post. Being able to predict human text requires vastly superhuman capabilities, because predicting human text requires predicting the processes that generated said text. And large tracts of text are just reporting on empirical features of the world.

Alternatively, just read the post I linked.

Comment by DragonGod on Yudkowsky vs Hanson on FOOM: Whose Predictions Were Better? · 2023-06-04T06:49:08.975Z · LW · GW

Oh gosh, how did I hallucinate that?

Comment by DragonGod on Yudkowsky vs Hanson on FOOM: Whose Predictions Were Better? · 2023-06-03T13:51:15.512Z · LW · GW

In what sense are they "not trying their hardest"?

Comment by DragonGod on Yudkowsky vs Hanson on FOOM: Whose Predictions Were Better? · 2023-06-03T13:22:52.815Z · LW · GW

It is not clear how they could ever develop strongly superhuman intelligence by being superhuman at predicting human text.

"The upper bound of what can be learned from a dataset is not the most capable trajectory, but the conditional structure of the universe implicated by their sum".

Comment by DragonGod on Yudkowsky vs Hanson on FOOM: Whose Predictions Were Better? · 2023-06-03T13:12:06.738Z · LW · GW

which is indifferent to the simplicify of the architecture the insight lets you find.

The bolded should be "simplicity".

Comment by DragonGod on AI Alignment Research Engineer Accelerator (ARENA): call for applicants · 2023-05-26T03:29:52.199Z · LW · GW

Sorry, please where can I get access to the curriculum (including the reading material and exercises) if I want to study it independently?

The chapter pages on the website doesn't seem to list full curricula.

Comment by DragonGod on DragonGod's Shortform · 2023-05-11T00:54:46.564Z · LW · GW

If you define your utility function over histories, then every behaviour is maximising an expected utility function no?

Even behaviour that is money pumped?

I mean you can't money pump any preference over histories anyway without time travel.

The Dutchbook arguments apply when your utility function is defined over your current state with respect to some resource?

I feel like once you define utility function over histories, you lose the force of the coherence arguments?

What would it look like to not behave as if maximising an expected utility function for a utility function defined over histories.

Comment by DragonGod on DragonGod's Shortform · 2023-05-09T18:23:45.142Z · LW · GW

My contention is that I don't think the preconditions hold.

Agents don't fail to be VNM coherent by having incoherent preferences given the axioms of VNM. They fail to be VNM coherent by violating the axioms themselves.

Completeness is wrong for humans, and with incomplete preferences you can be non exploitable even without admitting a single fixed utility function over world states.

Comment by DragonGod on DragonGod's Shortform · 2023-05-08T17:15:48.465Z · LW · GW

Yeah, I think the preconditions of VNM straightforwardly just don't apply to generally intelligent systems.

Comment by DragonGod on Orthogonal's Formal-Goal Alignment theory of change · 2023-05-08T00:49:12.782Z · LW · GW

Not at all convinced that "strong agents pursuing a coherent goal is a viable form for generally capable systems that operate in the real world, and the assumption that it is hasn't been sufficiently motivated.

Comment by DragonGod on DragonGod's Shortform · 2023-05-06T19:50:01.715Z · LW · GW

What are the best arguments that expected utility maximisers are adequate (descriptive if not mechanistic) models of powerful AI systems?

[I want to address them in my piece arguing the contrary position.]

Comment by DragonGod on LLMs and computation complexity · 2023-04-30T04:13:04.414Z · LW · GW

Caveat to the caveat:

The solution is IMO just to consider the number of computations performed per generated token as some function of the model size, and once we've identified a suitable asymptotic order on the function, we can say intelligent things like "the smallest network capable of solving a problem in complexity class C of size N is X".

Or if our asymptotic bounds are not tight enough:

"No economically feasible LLM can solve problems in complexity class C of size >= N".

(Where economically feasible may be something defined by aggregate global economic resources or similar, depending on how tight you want the bound to be.)

Regardless, we can still obtain meaningful impossibility results.

Comment by DragonGod on LLMs and computation complexity · 2023-04-30T02:27:24.482Z · LW · GW

The solution is IMO just to consider the number of computations performed per generated token as some function of the model size, and once we've identified a suitable asymptotic order on the function, we can say intelligent things like "the smallest network capable of solving a problem in complexity class C of size N is X".

Or if our asymptotic bounds are not tight enough:

"No economically feasible LLM can solve problems in complexity class C of size >= N".

(Where economically feasible may be something defined by aggregate global economic resources or similar, depending on how tight you want the bound to be.)

Regardless, we can still obtain meaningful impossibility results.

Comment by DragonGod on LLMs and computation complexity · 2023-04-29T23:23:26.986Z · LW · GW

Very big caveat: the LLM doesn't actually perform O(1) computations per generated token.

The number of computational steps performed per generated token scales with network size: https://www.lesswrong.com/posts/XNBZPbxyYhmoqD87F/llms-and-computation-complexity?commentId=QWEwFcMLFQ678y5Jp

Comment by DragonGod on LLMs and computation complexity · 2023-04-29T22:15:08.466Z · LW · GW

Strongly upvoted.

Short but powerful.

Tl;Dr: LLMs perform O(1) computational steps per generated token and this is true regardless of the generated token.

The LLM sees each token in its context window when generating the next token so can compute problems in O(n^2) [where n is the context window size].

LLMs can get along the computational requirements by "showing their working" and simulating a mechanical computer (one without backtracking, so not Turing complete) in their context window.

This only works if the context window is large enough to contain the workings for the entire algorithm.

Thus LLMs can perform matrix multiplication when showing workings, but not when asked to compute it without showing workings.

Important fundamental limitation on the current paradigm.

We can now say with certainty tasks that GPT will never be able to solve (e.g. beat Stockfish at Chess because Chess is combinatorial and the LLM can't search the game tree to any depth) no matter how far it's scaled up.

This is a very powerful argument.

Comment by DragonGod on DragonGod's Shortform · 2023-04-25T13:34:04.565Z · LW · GW

A reason I mood affiliate with shard theory so much is that like...

I'll have some contention with the orthodox ontology for technical AI safety and be struggling to adequately communicate it, and then I'll later listen to a post/podcast/talk by Quintin Pope/Alex Turner, or someone else trying to distill shard theory and then see the exact same contention I was trying to present expressed more eloquently/with more justification.

One example is that like I had independently concluded that "finding an objective function that was existentially safe when optimised by an arbitrarily powerful optimisation process is probably the wrong way to think about a solution to the alignment problem".

And then today I discovered that Alex Turner advances a similar contention in "Inner and outer alignment decompose one hard problem into two extremely hard problems".

Shard theory also seems to nicely encapsulates my intuitions that we shouldn't think about powerful AI systems as optimisation processes with a system wide objective that they are consistently pursuing.

Or just the general intuitions that our theories of intelligent systems should adequately describe the generally intelligent systems we actually have access to and that theories that don't even aspire to do that are ill motivated.

It is the case that I don't think I can adequately communicate shard theory to a disbeliever, so on reflection there's some scepticism that I properly understand it.

That said, the vibes are right.

Comment by DragonGod on DragonGod's Shortform · 2023-04-25T13:32:17.546Z · LW · GW

"All you need is to delay doom by one more year per year and then you're in business" — Paul Christiano.

User info

Posts

Comments

2. Bayesian Reasoning