Posts

Lighthaven Sequences Reading Group #20 (Tuesday 02/04) 2025-01-30T04:37:48.271Z
Lighthaven Sequences Reading Group #19 (Tuesday 01/28) 2025-01-26T00:02:49.220Z
Lighthaven Sequences Reading Group #18 (Tuesday 01/21) 2025-01-17T02:49:54.060Z
RESCHEDULED Lighthaven Sequences Reading Group #16 (Saturday 12/28) 2024-12-20T06:31:56.746Z
What and Why: Developmental Interpretability of Reinforcement Learning 2024-07-09T14:09:40.649Z
On Complexity Science 2024-04-05T02:24:32.039Z
So You Created a Sociopath - New Book Announcement! 2024-04-01T18:02:18.010Z
Announcing Suffering For Good 2024-04-01T17:08:12.322Z
Neuroscience and Alignment 2024-03-18T21:09:52.004Z
Epoch wise critical periods, and singular learning theory 2023-12-14T20:55:32.508Z
A bet on critical periods in neural networks 2023-11-06T23:21:17.279Z
When and why should you use the Kelly criterion? 2023-11-05T23:26:38.952Z
Singular learning theory and bridging from ML to brain emulations 2023-11-01T21:31:54.789Z
My hopes for alignment: Singular learning theory and whole brain emulation 2023-10-25T18:31:14.407Z
AI presidents discuss AI alignment agendas 2023-09-09T18:55:37.931Z
Activation additions in a small residual network 2023-05-22T20:28:41.264Z
Collective Identity 2023-05-18T09:00:24.410Z
Activation additions in a simple MNIST network 2023-05-18T02:49:44.734Z
Value drift threat models 2023-05-12T23:03:22.295Z
What constraints does deep learning place on alignment plans? 2023-05-03T20:40:16.007Z
Pessimistic Shard Theory 2023-01-25T00:59:33.863Z
Performing an SVD on a time-series matrix of gradient updates on an MNIST network produces 92.5 singular values 2022-12-21T00:44:55.373Z
Don't design agents which exploit adversarial inputs 2022-11-18T01:48:38.372Z
A framework and open questions for game theoretic shard modeling 2022-10-21T21:40:49.887Z
Taking the parameters which seem to matter and rotating them until they don't 2022-08-26T18:26:47.667Z
How (not) to choose a research project 2022-08-09T00:26:37.045Z
Information theoretic model analysis may not lend much insight, but we may have been doing them wrong! 2022-07-24T00:42:14.076Z
Modelling Deception 2022-07-18T21:21:32.246Z
Another argument that you will let the AI out of the box 2022-04-19T21:54:38.810Z
[cross-post with EA Forum] The EA Forum Podcast is up and running 2021-07-05T21:52:18.787Z
Information on time-complexity prior? 2021-01-08T06:09:03.462Z
D0TheMath's Shortform 2020-10-09T02:47:30.056Z
Why does "deep abstraction" lose it's usefulness in the far past and future? 2020-07-09T07:12:44.523Z

Comments

Comment by Garrett Baker (D0TheMath) on DeepSeek Panic at the App Store · 2025-01-29T01:27:21.254Z · LW · GW

Yeah, these are mysteries, I don't know why. TSMC I think did get hit pretty hard though. 

Comment by Garrett Baker (D0TheMath) on DeepSeek Panic at the App Store · 2025-01-28T23:34:14.588Z · LW · GW

Politicians announce all sorts of things on the campaign trail, that usually is not much indication of what post-election policy will be.

Comment by Garrett Baker (D0TheMath) on DeepSeek Panic at the App Store · 2025-01-28T19:58:04.402Z · LW · GW

Seems more likely the drop was from Trump tariff leaks than deepseek’s app.

Comment by Garrett Baker (D0TheMath) on Ryan Kidd's Shortform · 2025-01-27T19:18:55.160Z · LW · GW

I also note that 30x seems like an under-estimate to me, but also too simplified. AIs will make some tasks vastly easier, but won't help too much with other tasks. We will have a new set of bottlenecks once we reach the "AIs vastly helping with your work" phase. The question to ask is "what will the new bottlenecks be, and who do we have to hire to be prepared for them?" 

If you are uncertain, this consideration should lean you much more towards adaptive generalists than the standard academic crop.

Comment by Garrett Baker (D0TheMath) on Ryan Kidd's Shortform · 2025-01-27T19:08:14.627Z · LW · GW

There's the standard software engineer response of "You cannot make a baby in 1 month with 9 pregnant women". If you don't have a term in this calculation for the amount of research hours that must be done serially vs the amount of research hours that can be done in parallel, then it will always seem like we have too few people, and should invest vastly more in growth growth growth!

If you find that actually your constraint is serial research output, then you still may conclude you need a lot of people, but you will sacrifice a reasonable amount of growth speed for attracting better serial researchers. 

(Possibly this shakes out to mathematicians and physicists, but I don't want to bring that conversation into here)

Comment by Garrett Baker (D0TheMath) on johnswentworth's Shortform · 2025-01-26T22:13:25.776Z · LW · GW

The most obvious one imo is the immune system & the signals it sends. 

Others:

  • Circadian rhythm
  • Age is perhaps a candidate here, though it may be more or less a candidate depending on if you're talking about someone before or after 30
  • Hospice workers sometimes talk about the body "knowing how to die", maybe there's something to that
Comment by Garrett Baker (D0TheMath) on The Hopium Wars: the AGI Entente Delusion · 2025-01-26T11:04:37.716Z · LW · GW

If that’s the situation, then why the “if and only if”, if we magically make then all believe they will die if they make ASI, then they would all individually be incentivized to stop it from happening independent of China’s actions.

Comment by Garrett Baker (D0TheMath) on The Hopium Wars: the AGI Entente Delusion · 2025-01-26T01:46:53.890Z · LW · GW

I think that China and the US would definitely agree to pause if and only if they can confirm the other also committing to a pause. Unfortunately, this is a really hard thing to confirm, much harder than with nuclear.

This seems false to me. Eg Trump for one seems likely to do what the person who pays him the most & is the most loyal to him tells him to do, and AI risk worriers do not have the money or the politics for either of those criteria compared to, for example, Elon Musk.

Comment by Garrett Baker (D0TheMath) on Learning By Writing · 2025-01-26T01:44:04.859Z · LW · GW

Its on his Linkedin at least. Apparently since the start of the year.

Comment by Garrett Baker (D0TheMath) on Instrumental Goals Are A Different And Friendlier Kind Of Thing Than Terminal Goals · 2025-01-24T23:50:44.589Z · LW · GW

I will note this sounds a lot like Turntrout's old Attainable Utility Preservation scheme. Not exactly, but enough that I wouldn't be surprised if a bunch of the math here has already been worked out by him (and possibly, in the comments, a bunch of the failure-modes identified).

Comment by Garrett Baker (D0TheMath) on jacquesthibs's Shortform · 2025-01-23T17:55:44.948Z · LW · GW

Engineers: Its impossible.

Meta management: Tony Stark DeepSeek was able to build this in a cave! With a box of scraps!

Comment by Garrett Baker (D0TheMath) on Detect Goodhart and shut down · 2025-01-23T16:32:44.927Z · LW · GW

Although I don't think the first example is great, seems more like a capability/observation-bandwidth issue.

I think you can have multiple failures at the same time. The reason I think this was also goodhart was because I think the failure-mode could have been averted if sonnet was told “collect wood WITHOUT BREAKING MY HOUSE” ahead of time.

Comment by Garrett Baker (D0TheMath) on Detect Goodhart and shut down · 2025-01-23T15:31:24.269Z · LW · GW

If you put current language models in weird situations & give them a goal, I’d say they do do edge instantiation, without the missing “creativity” ingredient. Eg see claude sonnet in minecraft repurposing someone’s house for wood after being asked to collect wood.

Edit: There are other instances of this too, where you can tell claude to protect you in minecraft, and it will constantly tp to your position, and build walls around you when monsters are around. Protecting you, but also preventing any movement or fun you may have wanted to have.

Comment by Garrett Baker (D0TheMath) on We don't want to post again "This might be the last AI Safety Camp" · 2025-01-21T22:46:57.775Z · LW · GW

I don't understand why Remmelt going "off the deep end" should affect AI safety camp's funding. That seems reasonable for speculative bets, but not when there's a strong track-record available. 

Comment by Garrett Baker (D0TheMath) on Lighthaven Sequences Reading Group #18 (Tuesday 01/21) · 2025-01-21T19:45:59.813Z · LW · GW

It is, we’ve been limiting ourselves to readings from the sequence highlights. I’ll ask around to see if other organizers would like to broaden our horizons.

Comment by Garrett Baker (D0TheMath) on Embee's Shortform · 2025-01-18T08:17:48.590Z · LW · GW

I mean, one of them’s math built bombs and computers & directly influenced pretty much every part of applied math today, and the other one’s math built math. Not saying he wasn’t smart, but no question are bombs & computers more flashy. 

Comment by Garrett Baker (D0TheMath) on Lighthaven Sequences Reading Group #18 (Tuesday 01/21) · 2025-01-18T07:36:33.395Z · LW · GW

Fixed!

Comment by Garrett Baker (D0TheMath) on The purposeful drunkard · 2025-01-17T19:17:51.523Z · LW · GW

The paper you're thinking of is probably The Developmental Landscape of In-Context Learning.

Comment by Garrett Baker (D0TheMath) on Lecture Series on Tiling Agents · 2025-01-17T19:13:03.002Z · LW · GW

@abramdemski I think I'm the biggest agree vote for alexander (without me alexander would have -2 agree), and I do see this because I follow both of you on my subscribe tab. 

I basically endorse Alexander's elaboration. 

On the "prep for the model that is coming tomorrow not the model of today" front, I will say that LLMs are not always going to be as dumb as they are today. Even if you can't get them to understand or help with your work now, their rate of learning still makes them in some sense your most promising mentee, and that means trying to get as much of the tacit knowledge you have into their training data as possible (if you want them to be able to more easily & sooner build on your work). Or (if you don't want to do that for whatever reason) just generally not being caught flat-footed once they are smart enough to help you, as all your ideas are in videos or otherwise in high context understandable-only-to-abram notes.

In the words of gwern

Should you write text online now in places that can be scraped? You are exposing yourself to 'truesight' and also to stylometric deanonymization or other analysis, and you may simply have some sort of moral objection to LLM training on your text.

This seems like a bad move to me on net: you are erasing yourself (facts, values, preferences, goals, identity) from the future, by which I mean, LLMs. Much of the value of writing done recently or now is simply to get stuff into LLMs. I would, in fact, pay money to ensure Gwern.net is in training corpuses, and I upload source code to Github, heavy with documentation, rationale, and examples, in order to make LLMs more customized to my use-cases. For the trifling cost of some writing, all the worlds' LLM providers are competing to make their LLMs ever more like, and useful to, me.

Comment by Garrett Baker (D0TheMath) on lemonhope's Shortform · 2025-01-15T15:10:29.575Z · LW · GW

in some sense that’s just hiring you for any other job, and of course if an AGI lab wants you, you end up with greater negotiating leverage at your old place, and could get a raise (depending on how tight capital constraints are, which, to be clear, in AI alignment are tight).

Comment by Garrett Baker (D0TheMath) on Nathan Helm-Burger's Shortform · 2025-01-12T00:28:01.455Z · LW · GW

I think its this

Comment by Garrett Baker (D0TheMath) on D0TheMath's Shortform · 2025-01-09T20:52:07.708Z · LW · GW

Over the past few days I've been doing a lit review of the different types of attention heads people have found and/or the metrics one can use to detect the presence of those types of heads. 

Here is a rough list from my notes, sorry for the poor formatting, but I did say its rough!

Comment by Garrett Baker (D0TheMath) on The Plan - 2024 Update · 2024-12-31T15:38:40.508Z · LW · GW

And yes, I do think that interp work today should mostly focus on image nets for the same reasons we focus on image nets. The field’s current focus on LLMs is a mistake

A note that word on the street in mech-interp land is that often you get more signal & a greater number of techniques work on bigger & smarter language models over smaller & dumber possibly-not-language-models. Presumably due to smarter & complex models having more structured representations.

Comment by Garrett Baker (D0TheMath) on If all trade is voluntary, then what is "exploitation?" · 2024-12-27T21:46:36.262Z · LW · GW

Can you show how a repeated version of this game results in overall better deals for the company? I agree this can happen, but I disagree for this particular circumstance.

Comment by Garrett Baker (D0TheMath) on If all trade is voluntary, then what is "exploitation?" · 2024-12-27T20:00:49.555Z · LW · GW

Then the company is just being stupid, and the previous definition of exploitation doesn't apply. The company is imposing large costs for a large cost to itself. If the company does refuse the deal, its likely because it doesn't have the right kinds of internal communication channels to do negotiations like this, and so this is indeed a kind of stupidity. 

Why the distinction between exploitation and stupidity? Well they require different solutions. Maybe we solve exploitation (if indeed it is a problem) via collective action outside of the company. But we would have to solve stupidity via better information channels & flexibility inside the company. There is also a competitive pressure to solve such stupidity problems where there may not be in an exploitation problem. Eg if a different company or a different department allowed that sort of deal, then the problem would be solved. 

Comment by Garrett Baker (D0TheMath) on What Have Been Your Most Valuable Casual Conversations At Conferences? · 2024-12-25T17:11:49.440Z · LW · GW

If conversations are heavy tailed then we should in fact expect people to have singular & likely memorable high-value conversations.

Comment by Garrett Baker (D0TheMath) on sarahconstantin's Shortform · 2024-12-10T16:00:09.514Z · LW · GW

otoh I also don't think cutting off contact with anyone "impure", or refusing to read stuff you disapprove of, is either practical or necessary. we can engage with people and things without being mechanically "nudged" by them.

I think the reason not to do this is because of peer pressure. Ideally you should have the bad pressures from your peers cancel out, and in order to accomplish this you need your peers to be somewhat decorrelated from each other, and you can't really do this if all your peers and everyone you listen to is in the same social group.

Comment by Garrett Baker (D0TheMath) on sarahconstantin's Shortform · 2024-12-10T15:55:24.986Z · LW · GW

there is no neurotype or culture that is immune to peer pressure

Seems like the sort of thing that would correlate pretty robustly to big-5 agreeableness, and in that sense there are neurotypes immune to peer pressure.

Edit: One may also suspect a combination of agreeableness and non-openness

Comment by Garrett Baker (D0TheMath) on Should you be worried about H5N1? · 2024-12-06T15:36:27.622Z · LW · GW

Some assorted polymarket and metaculus forecasts on the subject:

They are not exactly low.

Comment by Garrett Baker (D0TheMath) on Open Thread Fall 2024 · 2024-12-02T05:28:40.197Z · LW · GW

Those invited to the foresight workshop (also the 2023 one) are probably a good start, as well as foresight’s 2023 and 2024 lectures on the subject.

Comment by Garrett Baker (D0TheMath) on dirk's Shortform · 2024-11-30T05:52:43.860Z · LW · GW

I will take Zvi's takeaways from his experience in this round of SFF grants as significant outside-view evidence for my inside view of the field.

Comment by Garrett Baker (D0TheMath) on leogao's Shortform · 2024-11-28T18:44:50.368Z · LW · GW

I think you are possibly better/optimizing more than most others at selecting conferences & events you actually want to do. Even with work, I think many get value out of having those spontaneous conversations because it often shifts what they're going to do--the number one spontaneous conversation is "what are you working on" or "what have you done so far", which forces you to re-explain what you're doing & the reasons for doing it to a skeptical & ignorant audience. My understanding is you and David already do this very often with each other.

Comment by Garrett Baker (D0TheMath) on Eli's shortform feed · 2024-11-26T22:08:09.166Z · LW · GW

I think its reasonable for the conversion to be at the original author's discretion rather than an automatic process.

Comment by Garrett Baker (D0TheMath) on Shortform · 2024-11-23T08:00:56.699Z · LW · GW

Back in May, when the Crowdstrike bug happened, people were posting wild takes on Twitter and in my signal groupchats about how Crowdstrike is only used everywhere because the government regulators subject you to copious extra red tape if you try to switch to something else.

Here’s the original claim:

Microsoft blamed a 2009 antitrust agreement with the European Union that they said forced them to sustain low-level kernel access to third-party developers.[286][287][288] The document does not explicitly state that Microsoft has to provide kernel-level access, but says Microsoft must provide access to the same APIs used by its own security products.[287]

This seems consistent with your understanding of regulatory practices (“they do not give a rats ass what particular software vendor you use for anything”), and is consistent with the EU’s antitrust regulations being at fault—or at least Microsoft’s cautious interpretation of the regulations, which indeed is the approach you want to take here.

Comment by Garrett Baker (D0TheMath) on Which things were you surprised to learn are not metaphors? · 2024-11-22T01:05:35.587Z · LW · GW

I believed “bear spray” was a metaphor for a gun. Eg if you were posting online about camping and concerned about the algorithm disliking your use of the word gun, were going into a state park which has guns banned, or didn’t want to mention “gun” for some other reason, then you’d say “bear spray”, since bear spray is such an absurd & silly concept that people will certainly understand what you really mean.

Turns out, bear spray is real. Its pepper spray on steroids, and is actually more effective than a gun, since its easier to aim and is optimized to blind & actually cause pain rather than just damage. [EDIT:] Though see Jimmy's comment below for a counter-point.

Comment by Garrett Baker (D0TheMath) on Open Thread Fall 2024 · 2024-11-21T00:50:23.447Z · LW · GW

[Bug report]: The Popular Comments section's comment preview ignores spoiler tags

As seen on Windows/Chrome

Comment by Garrett Baker (D0TheMath) on What are the good rationality films? · 2024-11-20T22:09:55.618Z · LW · GW

Film: The Martian

Rationality Tie-in: Virtue of scholarship is thread throughout, but Watney is generally an intelligent person tacking a seemingly impossible to solve problem.

Comment by Garrett Baker (D0TheMath) on Lighthaven Sequences Reading Group #12 (Tuesday 11/26) · 2024-11-20T06:18:06.481Z · LW · GW

Moneyball

Comment by Garrett Baker (D0TheMath) on Lighthaven Sequences Reading Group #12 (Tuesday 11/26) · 2024-11-20T06:17:57.981Z · LW · GW

The Martian

Comment by Garrett Baker (D0TheMath) on Lighthaven Sequences Reading Group #12 (Tuesday 11/26) · 2024-11-20T06:13:02.505Z · LW · GW

A Boy and His Dog -- a weird one, but good for talking through & a heavy inspiration for Fallout

Comment by Garrett Baker (D0TheMath) on Lighthaven Sequences Reading Group #12 (Tuesday 11/26) · 2024-11-20T06:07:49.693Z · LW · GW

RRR

Comment by Garrett Baker (D0TheMath) on Lighthaven Sequences Reading Group #12 (Tuesday 11/26) · 2024-11-20T06:07:25.083Z · LW · GW

Ex Machina

Comment by Garrett Baker (D0TheMath) on Lighthaven Sequences Reading Group #12 (Tuesday 11/26) · 2024-11-20T05:43:22.455Z · LW · GW

300

Comment by Garrett Baker (D0TheMath) on Alexander Gietelink Oldenziel's Shortform · 2024-11-17T19:14:25.458Z · LW · GW

I have found that they mirror you. If you talk to them like a real person, they will act like a real person. Call them (at least Claude) out on their corporate-speak and cheesy stereotypes in the same way you would a person scared to say what they really think.

Comment by Garrett Baker (D0TheMath) on D0TheMath's Shortform · 2024-11-15T23:47:50.706Z · LW · GW

@Nick_Tarleton How much do you want to bet, and what resolution method do you have in mind?

Comment by Garrett Baker (D0TheMath) on Alexander Gietelink Oldenziel's Shortform · 2024-11-15T22:21:13.598Z · LW · GW

I note you didn't mention the info-sec aspects of the war, I have heard China is better at this than the US, but that doesn't mean much because you would expect to hear that if China was really terrible too.

Comment by Garrett Baker (D0TheMath) on D0TheMath's Shortform · 2024-11-15T17:07:00.793Z · LW · GW

The mistake you are making is assuming that "ZFC is consistent" = Consistent(ZFC) where the ladder is the Godel encoding for "ZFC is consistent" specified within the language of ZFC.

If your logic were valid, it would just as well break the entirety of the second incompleteness theorem. That is, you would say "well of course ZFC can prove Consistent(ZFC) if it is consistent, for either ZFC is consistent, and we're done, or ZFC is not consistent, but that is a contradiction since 'ZFC is consistent' => Consistent(ZFC)".

The fact is that ZFC itself cannot recognize that Consistent(ZFC) is equivalent to "ZFC is consistent".

@Morpheus you too seem confused by this, so tagging you as well.

Comment by Garrett Baker (D0TheMath) on D0TheMath's Shortform · 2024-11-15T06:26:46.225Z · LW · GW

Why do some mathematicians feel like mathematical objects are "really out there" in some metaphysically fundamental sense? For example, if you ask mathematicians whether ZFC + not Consistent(ZFC) is consistent, they will say "no, of course not!" But given ZFC is consistent, the statement is in fact consistent due to by Godel's second incompleteness theorem[1]. Similarly, if we have the Peano axioms without induction, mathematicians will say that induction should be there, but in fact you cannot prove this fact from within Peano, and given induction mathematicians will say transfinite induction should be there.

I argue that an explanation could be from logical induction. In logical induction, fast but possibly wrong sub-processes bet with each other over whether different mathematical facts will be proven true or false by a slow but ground-truth formal system prover. Another example of backstops in learning. But one result of this is that the successful sub-processes are not selected very hard to give null results on unprovable statements, producing spurious generalization and the subjective feeling--as expressed by probabilities for propositions--that some impossible theorems are true.

Of course, the platonist can still claim that this logical induction stuff is very similar to bayesian updating in the sense that both tell you something about the world, even when you can't directly observe the relevant facts. If a photon exists your lightcone, there is no reason to stop believing the photon exists, even though there is no chance for you to ever encounter it again. Similarly, just because a statement is unprovable, doesn't mean its right for you to have no opinion on the subject, insofar as the simplest & best internal logical-induction market traders have strong beliefs on the subject, they may very well be picking up on something metaphysically fundamental. Its simply the simplest explanation consistent with the facts.


  1. The argument here is that there are two ways of proving ZFC + not Consistent(ZFC) is inconsistent. Either you prove not Consistent(ZFC) from axioms in ZFC or you contradict an axiom of ZFC from not Consistent(ZFC). The former is impossible by Godel's second incompleteness theorem. The ladder is equivalent to proving Consistent(ZFC) from an axiom of ZFC (its contrapositive), which is also impossible by Godel. ↩︎

Comment by Garrett Baker (D0TheMath) on Alexander Gietelink Oldenziel's Shortform · 2024-11-10T21:46:19.629Z · LW · GW

If you trust both them and Metaculus, then you ought to update downwards on your estimate of the PRC's strategic ability.

I note that the PRC doesn't have a single "strategic ability" in terms of war. They can be better or worse at choosing which wars to fight, and this seems likely to have little influence on how good they are at winning such wars or scaling weaponry.

Eg in the US often "which war" is much more political than "exactly what strategy should we use to win this war" is much more political than "how much fuel should our jets be able to carry", since more people can talk & speculate about the higher level questions. China's politics are much more closed than the US's, but you can bet similar dynamics are at play.

Comment by Garrett Baker (D0TheMath) on Thomas Kwa's Shortform · 2024-11-08T17:06:49.131Z · LW · GW

Are we indeed (as I suspect) in a massive overhang of compute and data for powerful agentic AGI? (If so, then at any moment someone could stumble across an algorithmic improvement which would change everything overnight.)

Why is this relevant for technical AI alignment (coming at this as someone skeptical about how relevant timeline considerations are more generally)?