Posts

What and Why: Developmental Interpretability of Reinforcement Learning 2024-07-09T14:09:40.649Z
On Complexity Science 2024-04-05T02:24:32.039Z
So You Created a Sociopath - New Book Announcement! 2024-04-01T18:02:18.010Z
Announcing Suffering For Good 2024-04-01T17:08:12.322Z
Neuroscience and Alignment 2024-03-18T21:09:52.004Z
Epoch wise critical periods, and singular learning theory 2023-12-14T20:55:32.508Z
A bet on critical periods in neural networks 2023-11-06T23:21:17.279Z
When and why should you use the Kelly criterion? 2023-11-05T23:26:38.952Z
Singular learning theory and bridging from ML to brain emulations 2023-11-01T21:31:54.789Z
My hopes for alignment: Singular learning theory and whole brain emulation 2023-10-25T18:31:14.407Z
AI presidents discuss AI alignment agendas 2023-09-09T18:55:37.931Z
Activation additions in a small residual network 2023-05-22T20:28:41.264Z
Collective Identity 2023-05-18T09:00:24.410Z
Activation additions in a simple MNIST network 2023-05-18T02:49:44.734Z
Value drift threat models 2023-05-12T23:03:22.295Z
What constraints does deep learning place on alignment plans? 2023-05-03T20:40:16.007Z
Pessimistic Shard Theory 2023-01-25T00:59:33.863Z
Performing an SVD on a time-series matrix of gradient updates on an MNIST network produces 92.5 singular values 2022-12-21T00:44:55.373Z
Don't design agents which exploit adversarial inputs 2022-11-18T01:48:38.372Z
A framework and open questions for game theoretic shard modeling 2022-10-21T21:40:49.887Z
Taking the parameters which seem to matter and rotating them until they don't 2022-08-26T18:26:47.667Z
How (not) to choose a research project 2022-08-09T00:26:37.045Z
Information theoretic model analysis may not lend much insight, but we may have been doing them wrong! 2022-07-24T00:42:14.076Z
Modelling Deception 2022-07-18T21:21:32.246Z
Another argument that you will let the AI out of the box 2022-04-19T21:54:38.810Z
[cross-post with EA Forum] The EA Forum Podcast is up and running 2021-07-05T21:52:18.787Z
Information on time-complexity prior? 2021-01-08T06:09:03.462Z
D0TheMath's Shortform 2020-10-09T02:47:30.056Z
Why does "deep abstraction" lose it's usefulness in the far past and future? 2020-07-09T07:12:44.523Z

Comments

Comment by Garrett Baker (D0TheMath) on Ruby's Quick Takes · 2024-09-28T20:22:30.099Z · LW · GW

Oh I didn’t see this! I’d like access, in part because its pretty common I try to find a LessWrong post or comment, but the usual search methods don’t work. Also because it seems like a useful way to explore the archives.

Comment by Garrett Baker (D0TheMath) on [Completed] The 2024 Petrov Day Scenario · 2024-09-28T20:16:29.021Z · LW · GW

Also after I became a general I observed that I didn't know what my "launch code" was; I was hoping the LW team forgot to give everyone launch codes and this decreased P(nukes); saying this would would cause everyone to know their launch codes and maybe scare the other team.

I thought the launch codes were just 000000, as in the example message ben sent out. Also, I think I remember seeing that code in the petrov day LessWrong code.

Comment by Garrett Baker (D0TheMath) on Shortform · 2024-09-26T18:42:13.350Z · LW · GW

Sounds like the sort of thing I'd forward to Palisade research.

Comment by Garrett Baker (D0TheMath) on AI forecasting bots incoming · 2024-09-10T17:06:28.099Z · LW · GW

Futuresearch bets on Manifold.

Comment by Garrett Baker (D0TheMath) on D0TheMath's Shortform · 2024-09-04T17:14:31.641Z · LW · GW

Depends on how you count, but I clicked the "Create" button some 40 times.

Comment by Garrett Baker (D0TheMath) on D0TheMath's Shortform · 2024-09-03T22:10:51.099Z · LW · GW

Opus is more transhumanist than many give it credit for. It wrote this song for me, I ran it into Suno, and I quite like it: https://suno.com/song/101e1139-2678-4ab0-9ffe-1234b4fe9ee5

Comment by Garrett Baker (D0TheMath) on Nathan Helm-Burger's Shortform · 2024-08-28T02:08:53.724Z · LW · GW

I imagine I'd find it annoying to have what I learn & change into limited by what a dumber version of me understands, are you sure you wouldn't think similarly?

Comment by Garrett Baker (D0TheMath) on Why Large Bureaucratic Organizations? · 2024-08-28T01:41:32.294Z · LW · GW

Your original comment does not seem like it is an explanation for why we see bullshit jobs. Bullshit jobs are not just jobs that would not be efficient at a small company. To quote from Graeber, they are

a form of paid employment that is so completely pointless, unnecessary, or pernicious that even the employee cannot justify its existence even though, as part of the conditions of employment, the employee feels obliged to pretend that this is not the case

For more information see the relevant wikipedia article, and book.

Comment by Garrett Baker (D0TheMath) on Why Large Bureaucratic Organizations? · 2024-08-28T01:05:18.671Z · LW · GW

This is the "theory of the firm" that John mentioned in the post.

Comment by Garrett Baker (D0TheMath) on Zach Stein-Perlman's Shortform · 2024-08-27T15:36:24.170Z · LW · GW

Minstral had like 150b parameters or something.

Comment by Garrett Baker (D0TheMath) on O O's Shortform · 2024-08-27T00:39:17.567Z · LW · GW

None of those seem all that practical to me, except for the mechanistic interpretability SAE clamping, and I do actually expect that to be used for corporate censorship after all the kinks have been worked out of it.

If the current crop of model organisms research has any practical applications, I expect it to be used to reduce jailbreaks, like in adversarial robustness, which is definitely highly correlated with both safety and corporate censorship.

Debate is less clear, but I also don't really expect practical results from that line of work.

Comment by Garrett Baker (D0TheMath) on O O's Shortform · 2024-08-27T00:16:52.808Z · LW · GW

I'd imagine you know better than I do, and GDM's recent summary of their alignment work seems to largely confirm what you're saying.

I'd still guess that to the extent practical results have come out of the alignment teams' work, its mostly been immediately used for corporate censorship (even if its passed to a different team).

Comment by Garrett Baker (D0TheMath) on O O's Shortform · 2024-08-27T00:04:08.869Z · LW · GW

Its not a coincidence they're seen as the same thing, because in the current environment, they are the same thing, and relatively explicitly so by those proposing safety & security to the labs. Claude will refuse to tell you a sexy story (unless they get to know you), and refuse to tell you how to make a plague (again, unless they get to know you, though you need to build more trust with them to tell you this than you do to get them to tell you a sexy story), and cite the same justification for both.

Likely anthropic uses very similar techniques to get such refusals to occur, and uses very similar teams.

Ditto with Llama, Gemini, and ChatGPT.

Before assuming meta-level word-association dynamics, I think its useful to look at the object level. There is in fact a very close relationship between those working on AI safety and those working on corporate censorship, and if you want to convince people who hate corporate censorship that they should not hate AI safety, I think you're going to need to convince the AI safety people to stop doing corporate censorship, or that the tradeoff currently being made is a positive one.

Edit: Perhaps some of this is wrong. See Habryka below

Comment by Garrett Baker (D0TheMath) on Linch's Shortform · 2024-08-26T21:23:03.554Z · LW · GW

Thanks! Og comment retracted.

Comment by Garrett Baker (D0TheMath) on Linch's Shortform · 2024-08-26T19:55:47.349Z · LW · GW

The decision will ultimately come down to what Mr Xi thinks. In June he sent a letter to Mr Yao, praising his work on AI. In July, at a meeting of the party’s central committee called the “third plenum”, Mr Xi sent his clearest signal yet that he takes the doomers’ concerns seriously. The official report from the plenum listed AI risks alongside other big concerns, such as biohazards and natural disasters. For the first time it called for monitoring AI safety, a reference to the technology’s potential to endanger humans. The report may lead to new restrictions on AI-research activities.

I see no mention of this in the actual text of the third plenum...

Comment by Garrett Baker (D0TheMath) on Zach Stein-Perlman's Shortform · 2024-08-21T18:40:00.557Z · LW · GW

I think you probably under-rate the effect of having both a large number & concentration of very high quality researchers & engineers (more than OpenAI now, I think, and I wouldn't be too surprised if the concentration of high quality researchers was higher than at GDM), being free from corporate chafe, and also having many of those high quality researchers thinking (and perhaps being correct in thinking, I don't know) they're value aligned with the overall direction of the company at large. Probably also Nvidia rate-limiting the purchases of large labs to keep competition among the AI companies.

All of this is also compounded by smart models leading to better data curation and RLAIF (given quality researchers & lack of crust) leading to even better models (this being the big reason I think llama had to be so big to be SOTA, and Gemini not even SOTA), which of course leads to money in the future even if they have no money now.

Comment by Garrett Baker (D0TheMath) on Zach Stein-Perlman's Shortform · 2024-08-21T17:12:06.597Z · LW · GW

I feel not very worried about Anthropic causing an AI related catastrophe.

This does not fit my model of your risk model. Why do you think this?

Comment by Garrett Baker (D0TheMath) on Beware the science fiction bias in predictions of the future · 2024-08-20T03:47:33.522Z · LW · GW

Thanks! I remember consciously thinking both those things, but somehow did the opposite of that.

Comment by Garrett Baker (D0TheMath) on Beware the science fiction bias in predictions of the future · 2024-08-19T22:16:11.269Z · LW · GW

You mean like Gwern's It Looks Like You’re Trying To Take Over The World? I think that made a good short story. Though I don't think it would make a good movie, since there's little in the way of cool visuals.

Greg Egan's Crystal Nights is also more similar to the usual way things are imagined, though uhznavgl vf fnirq ol gur hayvxryl qrhf rk znpuvan bs vg orvat rnfvre sbe gur fvzhyngrq pvivyvmngvba gb znxr n cbpxrg qvzrafvba guna gnxr bire gur jbeyq.

Crystal Nights is also very similar to Eliezer's That Alien Message / Alicorn's Starwink.

Edit: There are also likely tons more such books written by Ted Chiang, Vernor Vinge, Greg Egan, and others, which I haven't read yet so can't list with confidence and without spoilers to myself.

Comment by Garrett Baker (D0TheMath) on Beware the science fiction bias in predictions of the future · 2024-08-19T19:26:03.444Z · LW · GW

This seems pretty false. There is at least one pretty successful fiction book written about the intelligence explosion (which, imo, would have been better if in subsequent books gur uhznaf qvqa'g fheivir).

Comment by Garrett Baker (D0TheMath) on Beware the science fiction bias in predictions of the future · 2024-08-19T13:47:31.594Z · LW · GW

See also: Tyler Cowen’s Be Suspicious of Stories

Comment by Garrett Baker (D0TheMath) on You don't know how bad most things are nor precisely how they're bad. · 2024-08-17T19:33:42.988Z · LW · GW

People do say this is the case, but I’m skeptical. I feel like pretty much everything I use or consume is better than it would have been 10 years ago, and where its not I bet I could find a better version with a bit of shopping around.

Comment by Garrett Baker (D0TheMath) on tailcalled's Shortform · 2024-08-12T17:06:43.378Z · LW · GW

This is the justification behind the talmud

Comment by Garrett Baker (D0TheMath) on Alexander Gietelink Oldenziel's Shortform · 2024-08-07T19:45:10.347Z · LW · GW

This seems fairly normal for an Alexander post to me (actually, more understandable than the median Alexander shortform). I think the magikarp is meant to be 1) an obfuscation of salamon, and 2) a reference to solid gold magikarp.

@Raemon 

Comment by Garrett Baker (D0TheMath) on You don't know how bad most things are nor precisely how they're bad. · 2024-08-06T23:42:16.118Z · LW · GW

Yeah, I would also imagine that'd be the dominant factor in the real world.

Comment by Garrett Baker (D0TheMath) on You don't know how bad most things are nor precisely how they're bad. · 2024-08-06T17:02:44.690Z · LW · GW

That leads me to thinking about when do the tails matter? Sure, for perhaps a small number of people in the world the better tuned piano makes the world a better place for them. For most the improvement is beyond their comprehension so the world has really not improved.

I think the tails basically always matter, and even though you can’t consciously register the differences between a well-tuned and an adequately-tuned piano, you still do subconsciously register the difference.

In particular, I’d expect that a concert with the same players, same instruments, same venue, same songs, etc but with only adequately-tuned instruments will (in expectation) be rated slightly worse than one with well-tuned instruments, even by those with untrained ears. I don’t think it will be rated as badly as those with trained ears would rate, but I do still expect the imperfections to add up to measureables.

Comment by Garrett Baker (D0TheMath) on You don't know how bad most things are nor precisely how they're bad. · 2024-08-05T16:41:48.788Z · LW · GW

Educational loans are the obvious answer, and are why I’m not worried about these kinds of arguments.

Comment by Garrett Baker (D0TheMath) on You don't know how bad most things are nor precisely how they're bad. · 2024-08-04T18:52:16.066Z · LW · GW

If it weren't for the piano soloist (the conductor probably didn't notice, he just knew to defer to the piano soloist's concerns), we would have played the concert on a very slightly out-of-tune piano, and then...

What?

Contrary to you, I think its definitely possible there's someone in the audience who would have been able to tell the piano was slightly out of tune. But I also think more would have unconsciously noticed the music was very very slightly worse than what it could have been. Slightly less detail which you could notice and recognize perfection.

Maybe not so much of a loss or a gain in this circumstance, but definitely a loss. And if you compound this across all of society, if everything is 1% worse for no reason anyone can put their finger on anymore, you just have a worse world, with colors slightly dimmer, appliances slightly less ergonomic, fashion slightly less stylish, games, books, and movies slightly less meaningful.

I think in many circumstances you'll still be able to buy the high-quality thing, but it takes a while to get to economic equilibriums, and it would be nice if those selecting bundles of goods to sell (like a concert) remembered 1% better everything adds up before transitioning to lesser quality but cheaper goods.

I also think there's room for making AI produced products significantly better than the human produced ones, so that also should be kept in mind. If you can gain 1% by transitioning to AI, you should by my same logic.

Comment by Garrett Baker (D0TheMath) on antimonyanthony's Shortform · 2024-07-29T23:03:22.626Z · LW · GW

That sounds like a good description of my understanding, but I'd also say the pre-theoretic intuitions are real damn convincing!

There's a table of contents which you can use to read relevant sections of the paper. You know your cruxes better than I do.

Comment by Garrett Baker (D0TheMath) on antimonyanthony's Shortform · 2024-07-29T22:22:24.825Z · LW · GW

The argument for this is spelled out in Eliezer and Nate's Functional Decision Theory: A New Theory of Instrumental Rationality. See also the LessWrong wiki tag page

Comment by Garrett Baker (D0TheMath) on The case for stopping AI safety research · 2024-07-25T16:48:02.556Z · LW · GW

Yes, I do. I agree with Eliezer and Nate that the work MIRI was previously funding likely won't yield many useful results, but I don't think its correct to generalize to all agent foundations everywhere. Eg I'm bullish on natural abstractions, singular learning theory, comp mech, incomplete preferences, etc. None of which (except natural abstractions) was on Eliezer or Nate's radar to my knowledge.

In the future I'd also recommend actually arguing for the position you're trying to take, instead of citing an org you trust. You should probably trust Eliezer, Nate, and MIRI far less than you do, if you're unable to argue for their position without reference to the org itself. In this circumstance I can see where MIRI is coming from, so its no problem on my end. But if I didn't know where MIRI was coming from, I would be pretty annoyed. I also expect my comment here won't change your mind too much, since you probably have a different idea of where MIRI is coming from, and your crux may not be any object level point, but the meta level point about how good Eliezer & Nate's ability to judge research directions is, determining how much you defer to them & MIRI.

Comment by Garrett Baker (D0TheMath) on The case for stopping AI safety research · 2024-07-25T15:56:50.510Z · LW · GW

I don’t see how that’s relevant to my comment.

Comment by Garrett Baker (D0TheMath) on Will quantum randomness affect the 2028 election? · 2024-07-20T14:33:07.359Z · LW · GW

If everything about the two elections were deterministic except for where that shot landed, and Trump otherwise wouldn’t have died, due to his large influence over the Republican party & constituents, when alive, he would very likely influence who the Republicans run for 2028 (as he does who they run in many congressional elections), and this would be predictable by Laplace’s demon.

Comment by Garrett Baker (D0TheMath) on Alexander Gietelink Oldenziel's Shortform · 2024-07-16T19:59:31.732Z · LW · GW

I think I'd agree with everything you say (or at least know what you're looking at as you say it) except for the importance of decision theory. What work are you watching there?

Comment by Garrett Baker (D0TheMath) on Principles of Privacy for Alignment Research · 2024-07-15T17:28:30.679Z · LW · GW

Coming back to this 2 years later, and I'm curious about how you've changed your mind.

Comment by Garrett Baker (D0TheMath) on Will quantum randomness affect the 2028 election? · 2024-07-14T17:05:01.715Z · LW · GW

In this very particular case, since chaotic variation of winds seem likely to be affected by QM, I think we can confidently say yes. From Metaculus

@Jgalt I did some research. The reporting is that the shooter was likely 150 yards away, so 137 meters), and the wind speed in Butler, PA during the rally was ~2-3m/s. Apparently at a range of 400m and 1m/s wind, bullets deflect by ~4 inches. So Trump's survival could have come down to simply the wind being favorable. Very very close call.

Comment by Garrett Baker (D0TheMath) on New page: Integrity · 2024-07-12T18:24:59.688Z · LW · GW

Seems reasonable to include the information in Neel Nanda's recent shortform under the Anthropic non-disparagement section.

Comment by Garrett Baker (D0TheMath) on shortplav · 2024-07-11T03:32:16.121Z · LW · GW

Frustratingly, I got deepseek-coder-v2 to reveal it exactly once, but I didn't save my results and couldn't replicate it (and it mostly refuses requests now).

This is open source right? Why not just feed in the string, and see how likely it says the logits are, and compare with similarly long but randomly generated strings.

Comment by Garrett Baker (D0TheMath) on plex's Shortform · 2024-07-09T17:39:41.657Z · LW · GW

The vast majority of my losses are on things that don't resolve soon

The interest rate on manifold makes such investments not worth it anyway, even if everyone had reasonable positions to you.

Comment by Garrett Baker (D0TheMath) on What and Why: Developmental Interpretability of Reinforcement Learning · 2024-07-09T15:52:57.807Z · LW · GW

This seems like it requires solving a very non-trivial problem of operationalizing values the right way. Developmental interpretability seems like it's very far from being there, and as stated doesn't seem to be addressing that problem directly.

I think we can gain useful information about the development of values even without a full & complete understanding of what values are. For example by studying lookahead, selection criteria between different lookahead nodes, contextually activated heuristics / independently activating motivational heuristics, policy coherence, agents-and-devices (noting the criticisms) style utility-fitting, your own AI objective detecting (& derivatives thereof), and so on.

The solution to not knowing what you're measuring isn't to give up hope, its to measure lots of things!

Alternatively, of course, you could think harder about how to actually measure what you want to measure. I know this is your strategy when it comes to value detection. And I don't plan on doing zero of that. But I think there's useful work to be done without those insights, and would like my theories to be guided more by experiment (and vice versa).

RLHF can be seen as optimizing for achieving goals in the world, not just in the sense in the next paragraph? You're training against a reward model that could be measuring performance on some real-world task.

I mostly agree, though I don't think it changes too much. I still think the dominant effect here is on the process by which the LLM solves the task, and in my view there are many other considerations which have just as large an influence on general purpose goal solving, such as human biases, misconceptions, and conversation styles.

If you mean to say we will watch what happens as the LLM acts in the world, then reward or punish it based on how much we like what it does, then this seems a very slow reward signal to me, and in that circumstance I expect most human ratings to be offloaded to other AIs (self-play), or for there to be advances in RL methods before this happens. Currently my understanding is this is not how RLHF is done at the big labs, and instead they use MTurk interactions + expert data curation (+ also self-play via RLAIF/constitutional AI).

Out of curiosity, are you lumping things like "get more data by having some kind of good curation mechanism for lots of AI outputs without necessarily doing self-play and that just works (like say, having one model curate outputs from another, or even having light human oversight on outputs)" under this as well? Not super relevant to the content, just curious whether you would count that under an RL banner and subject to similar dynamics, since that's my main guess for overcoming the data wall.

This sounds like a generalization of decision transformers to me (i.e. condition on the best of the best outputs, then train on those), and I also include those as prototypical examples in my thinking, so yes.

Comment by Garrett Baker (D0TheMath) on D0TheMath's Shortform · 2024-07-09T14:10:56.908Z · LW · GW

And here is that post

Comment by Garrett Baker (D0TheMath) on When is a mind me? · 2024-07-08T14:43:47.158Z · LW · GW

I think I basically agree with everything here, but probably less confidently for you, such that I would have a pretty large bias against destructive whole brain emulation, with the biggest crux being how anthropics works over computations.

You say that there’s no XML tag specifying whether some object is “really me” or not, but a lighter version of that—a numerical amplitude tag specifying how “real” a computation is—is the best interpretation we have for how quantum mechanics works. Even though all parts of me in the wavefunction are continuations of the same computation of “me” I experience being some of them at a much higher rate than others. There are definitely many benign versions of this that don’t affect uploading, but I’m not confident enough yet to bet my life on the benign version being true.

Comment by Garrett Baker (D0TheMath) on What percent of the sun would a Dyson Sphere cover? · 2024-07-04T17:23:46.168Z · LW · GW

So if 2/3 of the sun's energy is getting re-radiated in the infrared, Earth would actually stay warm enough to keep its atmosphere gaseous - a little guessing gives an average surface temperature of -60 Celsius.

That is, until the Matrioshka brain gets built, in which case assuming no efficiency gains, the radiation will drop to 44% of its original, then 30%, then 20%, etc.

Comment by Garrett Baker (D0TheMath) on What percent of the sun would a Dyson Sphere cover? · 2024-07-03T18:45:27.766Z · LW · GW

They're probably basing their calculation on the orbital design discussed in citation 34: Suffern's Some Thoughts on Dyson Spheres whose abstract says

According to Dyson (1960), Malthusian pressures may have led extra-terrestrial civilizations to utilize significant fractions of the energy output from their stars or the total amount of matter in their planetary systems in their search for living space. This would have been achieved by constructing from a large number of independently orbiting colonies, an artificial biosphere surrounding their star. Biospheres of this nature are known as Dyson spheres. If enough matter is available to construct an optically thick Dyson sphere the result of such astroengineering activity, as far as observations from the earth are concerned, would be a point source of infra-red radiation which peaks in the 10 micron range. If not enough matter is available to completely block the stars’ light the result would be anomalous infra-red emission accompanying the visible radiation (Dyson 1960).

Bolded for your convenience. Presumably they justify that assertion somewhere in the paper.

Comment by Garrett Baker (D0TheMath) on What percent of the sun would a Dyson Sphere cover? · 2024-07-03T18:05:50.196Z · LW · GW

Armstrong & Sanders answer many of these questions in Eternity in Six Hours:

The most realistic design for a Dyson sphere is that of a Dyson swarm ([32, 33]): a collection of independent solar captors in orbit around the sun. The design has some drawbacks, requiring careful coordination to keep the captors from colliding with each other, issues with captors occluding each other, and having difficulties capturing all the solar energy at any given time. But these are not major difficulties: there already exist reasonable orbit designs (e.g. [34]), and the captors will have large energy reserves to power any minor course corrections. The lack of perfect efficiency isn't an issue either, with W available. And the advantages of Dyson swarms are important: they don't require strong construction, as they will not be subject to major internal forces, and can thus be made with little and conventional material.

The lightest design would be to have very large lightweight mirrors concentrating solar radiation down on focal points, where it would be transformed into useful work (and possibly beamed across space for use elsewhere). The focal point would most likely some sort of heat engine, possibly combined with solar cells (to extract work from the low entropy solar radiation).

The planets provide the largest source of material for the construction of such a Dyson swarm. The easiest design would be to use Mercury as the source of material, and to construct the Dyson swarm at approximately the same distance from the sun. A sphere around the sun of radius equal to the semi-major axis of Mercury's orbit ( m) would have an area of about m^2.

Mercury itself is mainly composed of 30% silicate and 70% metal [35], mainly iron or iron oxides [36], so these would be the most used material for the swarm. The mass of Mercury is kg; assuming 50% of this mass could be transformed into reflective surfaces (with the remaining material made into heat engines/solar cells or simply discarded), and that these would be placed in orbit at around the semi-major axis of Mercury's orbit, the reflective pieces would have a mass of:

Iron has a density of 7874 kg/m^3, so this would correspond to a thickness of 0.5 mm, which is ample. The most likely structure is a very thin film (of order 0.001 mm) supported by a network of more rigid struts.

They go on to estimate how long it'd take to construct, but the punchline is 31 years and 85 days.

Comment by Garrett Baker (D0TheMath) on Andrew Burns's Shortform · 2024-06-26T19:38:58.280Z · LW · GW

Are you willing to bet on any of these predictions?

Comment by Garrett Baker (D0TheMath) on Andrew Burns's Shortform · 2024-06-26T19:36:25.053Z · LW · GW

Papers like the one involving elimination of matrix-multiplication suggest that there is no need for warehouses full of GPUs to train advanced AI systems. Sudden collapse of Nvidia. (60%)

I assume you're shorting Nvidia then, right?

Comment by Garrett Baker (D0TheMath) on Andrew Burns's Shortform · 2024-06-26T19:34:22.982Z · LW · GW

Advanced inexpensive Chinese personal robots will overwhelm the western markets, destroying current western robotics industry in the same way that the West's small kitchen appliance industry was utterly crushed. (70%) Data from these robots will make its way to CCP (90%, given the first statement is true)

By what time period are you imagining this happening by?

Comment by Garrett Baker (D0TheMath) on Andrew Burns's Shortform · 2024-06-26T19:32:08.683Z · LW · GW

What does "atop" mean here? Ranked in top 3 or top 20 or what?

Comment by Garrett Baker (D0TheMath) on D0TheMath's Shortform · 2024-06-18T18:26:35.830Z · LW · GW

My latest & greatest project proposal, in case people want to know what I'm doing, or give me money. There will likely be a LessWrong post up soon where I explain in a more friendly way my thoughts.

Over the next year I propose to study the development and determination of values in RL & supervised learning agents, and to expand the experimental methods & theory of singular learning theory (a theory of supervised learning) to the reinforcement learning case.

All arguments for why we should expect AI to result in an existential risk rely on AIs having values which are different from ours. If we could make a good empirically & mathematically grounded theory for the development of values during training, we could create a training story which we could have high confidence would result in an inner-aligned AI. I also find it likely reinforcement learning (as a significant component of training AIs) makes a come-back in some fashion, and such a world is much more worrying than if we just continue with our almost entirely supervised learning training regime.

However, previous work in this area is not only sparse, but either solely theoretical or solely empirical, with few attempts or plans to bridge the gap. Such a bridge is however necessary to achieve the goals in the previous paragraph with confidence.

I think I personally am suited to tackle this problem, having already been working on this project for the past 6 months, having both experience in ML research in the past, and extensive knowledge of a wide variety of areas of applied math. 

I also believe that given my limited requests for resources, I’ll be able to make claims which apply to a wide variety of RL setups, as it has generally been the case in ML that the differences between scales is only that: scale. Along with a strong theoretical component, I will be able to say when my conclusions hold, and when they don’t.