Posts

What and Why: Developmental Interpretability of Reinforcement Learning 2024-07-09T14:09:40.649Z
On Complexity Science 2024-04-05T02:24:32.039Z
So You Created a Sociopath - New Book Announcement! 2024-04-01T18:02:18.010Z
Announcing Suffering For Good 2024-04-01T17:08:12.322Z
Neuroscience and Alignment 2024-03-18T21:09:52.004Z
Epoch wise critical periods, and singular learning theory 2023-12-14T20:55:32.508Z
A bet on critical periods in neural networks 2023-11-06T23:21:17.279Z
When and why should you use the Kelly criterion? 2023-11-05T23:26:38.952Z
Singular learning theory and bridging from ML to brain emulations 2023-11-01T21:31:54.789Z
My hopes for alignment: Singular learning theory and whole brain emulation 2023-10-25T18:31:14.407Z
AI presidents discuss AI alignment agendas 2023-09-09T18:55:37.931Z
Activation additions in a small residual network 2023-05-22T20:28:41.264Z
Collective Identity 2023-05-18T09:00:24.410Z
Activation additions in a simple MNIST network 2023-05-18T02:49:44.734Z
Value drift threat models 2023-05-12T23:03:22.295Z
What constraints does deep learning place on alignment plans? 2023-05-03T20:40:16.007Z
Pessimistic Shard Theory 2023-01-25T00:59:33.863Z
Performing an SVD on a time-series matrix of gradient updates on an MNIST network produces 92.5 singular values 2022-12-21T00:44:55.373Z
Don't design agents which exploit adversarial inputs 2022-11-18T01:48:38.372Z
A framework and open questions for game theoretic shard modeling 2022-10-21T21:40:49.887Z
Taking the parameters which seem to matter and rotating them until they don't 2022-08-26T18:26:47.667Z
How (not) to choose a research project 2022-08-09T00:26:37.045Z
Information theoretic model analysis may not lend much insight, but we may have been doing them wrong! 2022-07-24T00:42:14.076Z
Modelling Deception 2022-07-18T21:21:32.246Z
Another argument that you will let the AI out of the box 2022-04-19T21:54:38.810Z
[cross-post with EA Forum] The EA Forum Podcast is up and running 2021-07-05T21:52:18.787Z
Information on time-complexity prior? 2021-01-08T06:09:03.462Z
D0TheMath's Shortform 2020-10-09T02:47:30.056Z
Why does "deep abstraction" lose it's usefulness in the far past and future? 2020-07-09T07:12:44.523Z

Comments

Comment by Garrett Baker (D0TheMath) on sarahconstantin's Shortform · 2024-12-10T16:00:09.514Z · LW · GW

otoh I also don't think cutting off contact with anyone "impure", or refusing to read stuff you disapprove of, is either practical or necessary. we can engage with people and things without being mechanically "nudged" by them.

I think the reason not to do this is because of peer pressure. Ideally you should have the bad pressures from your peers cancel out, and in order to accomplish this you need your peers to be somewhat decorrelated from each other, and you can't really do this if all your peers and everyone you listen to is in the same social group.

Comment by Garrett Baker (D0TheMath) on sarahconstantin's Shortform · 2024-12-10T15:55:24.986Z · LW · GW

there is no neurotype or culture that is immune to peer pressure

Seems like the sort of thing that would correlate pretty robustly to big-5 agreeableness, and in that sense there are neurotypes immune to peer pressure.

Edit: One may also suspect a combination of agreeableness and non-openness

Comment by Garrett Baker (D0TheMath) on Should you be worried about H5N1? · 2024-12-06T15:36:27.622Z · LW · GW

Some assorted polymarket and metaculus forecasts on the subject:

They are not exactly low.

Comment by Garrett Baker (D0TheMath) on Open Thread Fall 2024 · 2024-12-02T05:28:40.197Z · LW · GW

Those invited to the foresight workshop (also the 2023 one) are probably a good start, as well as foresight’s 2023 and 2024 lectures on the subject.

Comment by Garrett Baker (D0TheMath) on dirk's Shortform · 2024-11-30T05:52:43.860Z · LW · GW

I will take Zvi's takeaways from his experience in this round of SFF grants as significant outside-view evidence for my inside view of the field.

Comment by Garrett Baker (D0TheMath) on leogao's Shortform · 2024-11-28T18:44:50.368Z · LW · GW

I think you are possibly better/optimizing more than most others at selecting conferences & events you actually want to do. Even with work, I think many get value out of having those spontaneous conversations because it often shifts what they're going to do--the number one spontaneous conversation is "what are you working on" or "what have you done so far", which forces you to re-explain what you're doing & the reasons for doing it to a skeptical & ignorant audience. My understanding is you and David already do this very often with each other.

Comment by Garrett Baker (D0TheMath) on Eli's shortform feed · 2024-11-26T22:08:09.166Z · LW · GW

I think its reasonable for the conversion to be at the original author's discretion rather than an automatic process.

Comment by Garrett Baker (D0TheMath) on Shortform · 2024-11-23T08:00:56.699Z · LW · GW

Back in May, when the Crowdstrike bug happened, people were posting wild takes on Twitter and in my signal groupchats about how Crowdstrike is only used everywhere because the government regulators subject you to copious extra red tape if you try to switch to something else.

Here’s the original claim:

Microsoft blamed a 2009 antitrust agreement with the European Union that they said forced them to sustain low-level kernel access to third-party developers.[286][287][288] The document does not explicitly state that Microsoft has to provide kernel-level access, but says Microsoft must provide access to the same APIs used by its own security products.[287]

This seems consistent with your understanding of regulatory practices (“they do not give a rats ass what particular software vendor you use for anything”), and is consistent with the EU’s antitrust regulations being at fault—or at least Microsoft’s cautious interpretation of the regulations, which indeed is the approach you want to take here.

Comment by Garrett Baker (D0TheMath) on Which things were you surprised to learn are not metaphors? · 2024-11-22T01:05:35.587Z · LW · GW

I believed “bear spray” was a metaphor for a gun. Eg if you were posting online about camping and concerned about the algorithm disliking your use of the word gun, were going into a state park which has guns banned, or didn’t want to mention “gun” for some other reason, then you’d say “bear spray”, since bear spray is such an absurd & silly concept that people will certainly understand what you really mean.

Turns out, bear spray is real. Its pepper spray on steroids, and is actually more effective than a gun, since its easier to aim and is optimized to blind & actually cause pain rather than just damage. [EDIT:] Though see Jimmy's comment below for a counter-point.

Comment by Garrett Baker (D0TheMath) on Open Thread Fall 2024 · 2024-11-21T00:50:23.447Z · LW · GW

[Bug report]: The Popular Comments section's comment preview ignores spoiler tags

As seen on Windows/Chrome

Comment by Garrett Baker (D0TheMath) on What are the good rationality films? · 2024-11-20T22:09:55.618Z · LW · GW

Film: The Martian

Rationality Tie-in: Virtue of scholarship is thread throughout, but Watney is generally an intelligent person tacking a seemingly impossible to solve problem.

Comment by Garrett Baker (D0TheMath) on Lighthaven Sequences Reading Group #12 (Tuesday 11/26) · 2024-11-20T06:18:06.481Z · LW · GW

Moneyball

Comment by Garrett Baker (D0TheMath) on Lighthaven Sequences Reading Group #12 (Tuesday 11/26) · 2024-11-20T06:17:57.981Z · LW · GW

The Martian

Comment by Garrett Baker (D0TheMath) on Lighthaven Sequences Reading Group #12 (Tuesday 11/26) · 2024-11-20T06:13:02.505Z · LW · GW

A Boy and His Dog -- a weird one, but good for talking through & a heavy inspiration for Fallout

Comment by Garrett Baker (D0TheMath) on Lighthaven Sequences Reading Group #12 (Tuesday 11/26) · 2024-11-20T06:07:49.693Z · LW · GW

RRR

Comment by Garrett Baker (D0TheMath) on Lighthaven Sequences Reading Group #12 (Tuesday 11/26) · 2024-11-20T06:07:25.083Z · LW · GW

Ex Machina

Comment by Garrett Baker (D0TheMath) on Lighthaven Sequences Reading Group #12 (Tuesday 11/26) · 2024-11-20T05:43:22.455Z · LW · GW

300

Comment by Garrett Baker (D0TheMath) on Alexander Gietelink Oldenziel's Shortform · 2024-11-17T19:14:25.458Z · LW · GW

I have found that they mirror you. If you talk to them like a real person, they will act like a real person. Call them (at least Claude) out on their corporate-speak and cheesy stereotypes in the same way you would a person scared to say what they really think.

Comment by Garrett Baker (D0TheMath) on D0TheMath's Shortform · 2024-11-15T23:47:50.706Z · LW · GW

@Nick_Tarleton How much do you want to bet, and what resolution method do you have in mind?

Comment by Garrett Baker (D0TheMath) on Alexander Gietelink Oldenziel's Shortform · 2024-11-15T22:21:13.598Z · LW · GW

I note you didn't mention the info-sec aspects of the war, I have heard China is better at this than the US, but that doesn't mean much because you would expect to hear that if China was really terrible too.

Comment by Garrett Baker (D0TheMath) on D0TheMath's Shortform · 2024-11-15T17:07:00.793Z · LW · GW

The mistake you are making is assuming that "ZFC is consistent" = Consistent(ZFC) where the ladder is the Godel encoding for "ZFC is consistent" specified within the language of ZFC.

If your logic were valid, it would just as well break the entirety of the second incompleteness theorem. That is, you would say "well of course ZFC can prove Consistent(ZFC) if it is consistent, for either ZFC is consistent, and we're done, or ZFC is not consistent, but that is a contradiction since 'ZFC is consistent' => Consistent(ZFC)".

The fact is that ZFC itself cannot recognize that Consistent(ZFC) is equivalent to "ZFC is consistent".

@Morpheus you too seem confused by this, so tagging you as well.

Comment by Garrett Baker (D0TheMath) on D0TheMath's Shortform · 2024-11-15T06:26:46.225Z · LW · GW

Why do some mathematicians feel like mathematical objects are "really out there" in some metaphysically fundamental sense? For example, if you ask mathematicians whether ZFC + not Consistent(ZFC) is consistent, they will say "no, of course not!" But given ZFC is consistent, the statement is in fact consistent due to by Godel's second incompleteness theorem[1]. Similarly, if we have the Peano axioms without induction, mathematicians will say that induction should be there, but in fact you cannot prove this fact from within Peano, and given induction mathematicians will say transfinite induction should be there.

I argue that an explanation could be from logical induction. In logical induction, fast but possibly wrong sub-processes bet with each other over whether different mathematical facts will be proven true or false by a slow but ground-truth formal system prover. Another example of backstops in learning. But one result of this is that the successful sub-processes are not selected very hard to give null results on unprovable statements, producing spurious generalization and the subjective feeling--as expressed by probabilities for propositions--that some impossible theorems are true.

Of course, the platonist can still claim that this logical induction stuff is very similar to bayesian updating in the sense that both tell you something about the world, even when you can't directly observe the relevant facts. If a photon exists your lightcone, there is no reason to stop believing the photon exists, even though there is no chance for you to ever encounter it again. Similarly, just because a statement is unprovable, doesn't mean its right for you to have no opinion on the subject, insofar as the simplest & best internal logical-induction market traders have strong beliefs on the subject, they may very well be picking up on something metaphysically fundamental. Its simply the simplest explanation consistent with the facts.


  1. The argument here is that there are two ways of proving ZFC + not Consistent(ZFC) is inconsistent. Either you prove not Consistent(ZFC) from axioms in ZFC or you contradict an axiom of ZFC from not Consistent(ZFC). The former is impossible by Godel's second incompleteness theorem. The ladder is equivalent to proving Consistent(ZFC) from an axiom of ZFC (its contrapositive), which is also impossible by Godel. ↩︎

Comment by Garrett Baker (D0TheMath) on Alexander Gietelink Oldenziel's Shortform · 2024-11-10T21:46:19.629Z · LW · GW

If you trust both them and Metaculus, then you ought to update downwards on your estimate of the PRC's strategic ability.

I note that the PRC doesn't have a single "strategic ability" in terms of war. They can be better or worse at choosing which wars to fight, and this seems likely to have little influence on how good they are at winning such wars or scaling weaponry.

Eg in the US often "which war" is much more political than "exactly what strategy should we use to win this war" is much more political than "how much fuel should our jets be able to carry", since more people can talk & speculate about the higher level questions. China's politics are much more closed than the US's, but you can bet similar dynamics are at play.

Comment by Garrett Baker (D0TheMath) on Thomas Kwa's Shortform · 2024-11-08T17:06:49.131Z · LW · GW

Are we indeed (as I suspect) in a massive overhang of compute and data for powerful agentic AGI? (If so, then at any moment someone could stumble across an algorithmic improvement which would change everything overnight.)

Why is this relevant for technical AI alignment (coming at this as someone skeptical about how relevant timeline considerations are more generally)?

Comment by Garrett Baker (D0TheMath) on The Median Researcher Problem · 2024-11-08T03:11:06.434Z · LW · GW

I really feel like we're talking past each other here, because I have no idea how any of what you said relates to what I said, except the first paragraph.

As for that, what you describe sounds worse than a median researcher problem, instead sounding like a situation ripe for group think instead!

Comment by Garrett Baker (D0TheMath) on adam_scholl's Shortform · 2024-11-07T02:44:09.739Z · LW · GW

I can proudly say that though I disparaged the guy in private, I not once put my money where my mouth was, which means outside observers can infer that all along I secretly agreed with his analysis of the situation.

Comment by Garrett Baker (D0TheMath) on The Median Researcher Problem · 2024-11-06T20:47:30.652Z · LW · GW

The technology's in a sweet spot where a custom statistical analysis needs to be developed, but it's also so important that the best minds will do that analysis and a community norm exists that we defer to them. Example: clinical trial results.

The argument seems to be about this stage, and from what I've heard clinical trials indeed take so much more time than is necessary. But maybe I've only heard about medical clinical trials, and actually academic biomedical clinical trials are incredibly efficient by comparison.

It also sounds like "community norm exists that we defer to [the best minds]" requires the community to identify who the best minds are, which presumably involves critiquing the research outputs of those best minds according to the standards of the median researcher, which often (though I don't know about biomedicine) ends up being something crazy like h-index or number of citations or number of papers or derivatives of such things.

Comment by Garrett Baker (D0TheMath) on The Median Researcher Problem · 2024-11-06T16:25:07.865Z · LW · GW

I don’t see how this is any evidence against John’s point.

Presumably the reason you need such crushingly obvious results which can be seen regardless of the validity of your statistical tool before the field can move on is because you need to convince the median researchers.

The sharp researchers have predictions about where the field is going based on statistical evidence and mathematical reasoning, and presumably can be convinced of the ultimate state far before the median, and work toward proving or disproving their hypotheses, and then once its clear to them, making the case stupidly obvious for the lowest common denominator in the room. And I expect this is where most of the real conceptual progress lies.

Even in the word where as you claim this is a marginal effect, if we could speed up any given advance in academic biomedicine by a year, that is an incredible achievement! Many people may die in that year who could’ve been saved had the median not wasted time (assuming the year saved carries over to clinical medicine).

Comment by Garrett Baker (D0TheMath) on Matt Goldenberg's Short Form Feed · 2024-11-05T23:32:18.963Z · LW · GW

Kicking myself for not making a fatebook about this. It definitely sounded like the kind of thing that wouldn't replicate.

Comment by Garrett Baker (D0TheMath) on dirk's Shortform · 2024-11-05T19:16:09.396Z · LW · GW

There is not a difference between the two situations in the way you're claiming, and indeed the differentiation point of view is used fruitfully on both factory floors and in more complex convex optimization problems. For example, see the connection between dual variables and their indication of how slack or taught constraints are in convex optimization, and how this can be interpreted as a relative tradeoff price between each of the constrained resources.

In your factory floor example, the constraints would be the throughput of each machine, and (assuming you're trying to maximize the throughput of the entire process), the dual variables would be zero everywhere except at that machine where it is the negative derivative of the throughput of the entire process with respect to the throughput of the constraining machine, and we could determine indeed the tight constraint is the throughput of the relevant machine by looking at the derivative which is significantly greater than all others.

Practical problems also often have a similar sparse structure to their constraining inputs too, but just because not every constraint is exactly zero except one doesn't mean those non-zero constraints are secretly not actually constraining, or its unprincipled to use the same math or intuitions to reason about both situations.

Comment by Garrett Baker (D0TheMath) on Update on the Mysterious Trump Buyers on Polymarket · 2024-11-05T05:14:02.222Z · LW · GW

The promise of prediction markets was that they are either useful or allow you to take money from rich idiots. I’d say that was fulfilled.

Also, useful is very different from perfect. They are still very adequate for a large variety of questions.

Comment by Garrett Baker (D0TheMath) on The Median Researcher Problem · 2024-11-03T18:54:44.762Z · LW · GW

I think in principle it makes sense in the same sense “highly genetic” makes sense. If a trait is highly genetic, then there’s a strong chance for it to be passed on given a reproductive event. If a meme is highly memetic, then there’s a strong chance for it to be passed on via a information transmission.

In genetic evolution it makes sense to distinguish this from fitness, because in genetic evolution the dominant feedback signal is whether you found a mate, not the probability a given trait is passed to the next generation.

In memetic evolution, the dominant feedback signal is the probability a meme gets passed on given a conversation, because there is a strong correlation between the probability someone passes on the information you told them, and getting more people to listen to you. So a highly memetic meme is also incredibly likely to be highly memetically fit.

Comment by Garrett Baker (D0TheMath) on The Median Researcher Problem · 2024-11-03T17:54:22.673Z · LW · GW

I definitely had no trouble understanding the post, and the usage seems very standard among blogs I read and people I talk to.

Comment by Garrett Baker (D0TheMath) on Open Thread Fall 2024 · 2024-11-03T16:40:23.804Z · LW · GW

There’s not a principled way for informal arguments, but there are a few for formal arguments—ie proofs. The relevant search term here is logical induction.

Comment by Garrett Baker (D0TheMath) on dirk's Shortform · 2024-11-02T17:24:08.179Z · LW · GW

Ok, but when we ask why this constraint is tight, the answer is because there's not enough funding. We can't just increase the size of the field 10x in order to get 10x more top-20 researchers, because we don't have the money for that.

For example, suppose MATS suddenly & magically scaled up 10x, and their next cohort was 1,000 people. Would this dramatically change the state of the field? I don't think so.

Now suppose SFF & LTFF's budget suddenly & magically scaled up 10x. Would this dramatically change the state of the field? I think so!

Comment by Garrett Baker (D0TheMath) on dirk's Shortform · 2024-11-02T17:14:52.224Z · LW · GW

I don't understand what you mean. Do you mean there is lots of potential funding for AI alignment in eg governments, but that funding is only going to university researchers?

Comment by Garrett Baker (D0TheMath) on dirk's Shortform · 2024-11-02T15:36:40.422Z · LW · GW

This argument has been had before on lesswrong. Usually the counter here is that we don’t actually know ahead of time who the top 20 people are, and so need to experiment & would do well to hedge our bets, which is the main constraint to getting a top 20. Currently we do this but only really do it for 1-2 years, but historically it actually takes more like 5 years to reveal yourself as a top 20, and I’d guess it actually actually can take more like 10 years.

So why not that funding model? Mostly a money thing.

I expect you will argue that in fact revealing yourself as a top 20 happens in fewer than 5 years, if you do argue.

Comment by Garrett Baker (D0TheMath) on dirk's Shortform · 2024-11-02T01:06:35.834Z · LW · GW

Yes, the field is definitely more funding constrained than talent constrained right now

Comment by Garrett Baker (D0TheMath) on Habryka's Shortform Feed · 2024-11-01T00:01:34.925Z · LW · GW

wait I just used inspect element, and the font only looks bigger so nevermind

Comment by Garrett Baker (D0TheMath) on Habryka's Shortform Feed · 2024-11-01T00:00:13.540Z · LW · GW

The footnote font on the side of comments is bigger than the font in the comments. Presumably this is unintentional. [1]


  1. Look at me! I'm big font! You fee fi fo fum, I'm more important than the actual comment! ↩︎

Comment by Garrett Baker (D0TheMath) on Three Notions of "Power" · 2024-10-30T18:28:21.405Z · LW · GW

I am going to guess that the diff between you and John's models here is that John thinks LDT/FDT solves this, and you don't.

Comment by Garrett Baker (D0TheMath) on Searching for phenomenal consciousness in LLMs: Perceptual reality monitoring and introspective confidence · 2024-10-30T00:55:37.152Z · LW · GW

Sorry to give only a surface-level point of feedback, but I think this post would be much, much better if you shortened it significantly. As far as I can tell, pretty much every paragraph is 3x longer than it could be, which makes it a slog to read through.

Comment by Garrett Baker (D0TheMath) on Alexander Gietelink Oldenziel's Shortform · 2024-10-21T20:26:40.026Z · LW · GW

Yeah, I meant medical/covid masks imply the wearer is diseased. I would have also believed the cat mask is a medical/covid mask if you hadn't give a different reason for wearing it, so it has that going against it in terms of coolness. It also has a lack of plausible deniability going against it too. If you're wearing sunglasses there's actually a utilitarian reason behind wearing them outside of just creating information asymmetry. If you're just trying to obscure half your face, there's no such plausible deniability. You're just trying to obscure your face, so it becomes far less cool.

Comment by Garrett Baker (D0TheMath) on Alexander Gietelink Oldenziel's Shortform · 2024-10-21T08:14:25.425Z · LW · GW

Yeah pretty clearly these aren’t cool because they imply the wearer is diseased.

Comment by Garrett Baker (D0TheMath) on Alexander Gietelink Oldenziel's Shortform · 2024-10-21T08:13:51.400Z · LW · GW

Oh I thought they meant like ski masks or something. For illness masks, the reason they’re not cool is very clearly that they imply you’re diseased.

(To a lesser extent too that your existing social status is so low you can’t expect to get away with accidentally infecting any friends or acquaintances, but my first point is more obvious & defensible)

Comment by Garrett Baker (D0TheMath) on Alexander Gietelink Oldenziel's Shortform · 2024-10-20T23:07:41.340Z · LW · GW

I think with sunglasses there’s a veneer of plausible deniability. They in fact have a utilitarian purpose outside of just creating information asymmetry. If you’re wearing a mask though, there’s no deniability. You just don’t want people to know where you’re looking.

Comment by Garrett Baker (D0TheMath) on When is reward ever the optimization target? · 2024-10-19T04:32:50.387Z · LW · GW

What do you mean by "model-based"?

Comment by Garrett Baker (D0TheMath) on Any evidence or reason to expect a multiverse / Everett branches? · 2024-10-11T00:22:36.023Z · LW · GW

I don't know! Got any information? I haven't heard that claim before.

Comment by Garrett Baker (D0TheMath) on Why I’m not a Bayesian · 2024-10-07T15:53:44.453Z · LW · GW

Roughly speaking, you are better off thinking of there as being an intrinsic ranking of the features of a thing by magnitude or importance, such that the cluster a thing belongs to is its most important feature.

How do you get the features, and how do you decide on importance? I expect for certain answers of these questions John will agree with you.

Comment by Garrett Baker (D0TheMath) on Alexander Gietelink Oldenziel's Shortform · 2024-10-04T21:51:39.362Z · LW · GW

here

is that supposed to be a link?