What and Why: Developmental Interpretability of Reinforcement Learning 2024-07-09T14:09:40.649Z
On Complexity Science 2024-04-05T02:24:32.039Z
So You Created a Sociopath - New Book Announcement! 2024-04-01T18:02:18.010Z
Announcing Suffering For Good 2024-04-01T17:08:12.322Z
Neuroscience and Alignment 2024-03-18T21:09:52.004Z
Epoch wise critical periods, and singular learning theory 2023-12-14T20:55:32.508Z
A bet on critical periods in neural networks 2023-11-06T23:21:17.279Z
When and why should you use the Kelly criterion? 2023-11-05T23:26:38.952Z
Singular learning theory and bridging from ML to brain emulations 2023-11-01T21:31:54.789Z
My hopes for alignment: Singular learning theory and whole brain emulation 2023-10-25T18:31:14.407Z
AI presidents discuss AI alignment agendas 2023-09-09T18:55:37.931Z
Activation additions in a small residual network 2023-05-22T20:28:41.264Z
Collective Identity 2023-05-18T09:00:24.410Z
Activation additions in a simple MNIST network 2023-05-18T02:49:44.734Z
Value drift threat models 2023-05-12T23:03:22.295Z
What constraints does deep learning place on alignment plans? 2023-05-03T20:40:16.007Z
Pessimistic Shard Theory 2023-01-25T00:59:33.863Z
Performing an SVD on a time-series matrix of gradient updates on an MNIST network produces 92.5 singular values 2022-12-21T00:44:55.373Z
Don't design agents which exploit adversarial inputs 2022-11-18T01:48:38.372Z
A framework and open questions for game theoretic shard modeling 2022-10-21T21:40:49.887Z
Taking the parameters which seem to matter and rotating them until they don't 2022-08-26T18:26:47.667Z
How (not) to choose a research project 2022-08-09T00:26:37.045Z
Information theoretic model analysis may not lend much insight, but we may have been doing them wrong! 2022-07-24T00:42:14.076Z
Modelling Deception 2022-07-18T21:21:32.246Z
Another argument that you will let the AI out of the box 2022-04-19T21:54:38.810Z
[cross-post with EA Forum] The EA Forum Podcast is up and running 2021-07-05T21:52:18.787Z
Information on time-complexity prior? 2021-01-08T06:09:03.462Z
D0TheMath's Shortform 2020-10-09T02:47:30.056Z
Why does "deep abstraction" lose it's usefulness in the far past and future? 2020-07-09T07:12:44.523Z


Comment by Garrett Baker (D0TheMath) on shortplav · 2024-07-11T03:32:16.121Z · LW · GW

Frustratingly, I got deepseek-coder-v2 to reveal it exactly once, but I didn't save my results and couldn't replicate it (and it mostly refuses requests now).

This is open source right? Why not just feed in the string, and see how likely it says the logits are, and compare with similarly long but randomly generated strings.

Comment by Garrett Baker (D0TheMath) on plex's Shortform · 2024-07-09T17:39:41.657Z · LW · GW

The vast majority of my losses are on things that don't resolve soon

The interest rate on manifold makes such investments not worth it anyway, even if everyone had reasonable positions to you.

Comment by Garrett Baker (D0TheMath) on What and Why: Developmental Interpretability of Reinforcement Learning · 2024-07-09T15:52:57.807Z · LW · GW

This seems like it requires solving a very non-trivial problem of operationalizing values the right way. Developmental interpretability seems like it's very far from being there, and as stated doesn't seem to be addressing that problem directly.

I think we can gain useful information about the development of values even without a full & complete understanding of what values are. For example by studying lookahead, selection criteria between different lookahead nodes, contextually activated heuristics / independently activating motivational heuristics, policy coherence, agents-and-devices (noting the criticisms) style utility-fitting, your own AI objective detecting (& derivatives thereof), and so on.

The solution to not knowing what you're measuring isn't to give up hope, its to measure lots of things!

Alternatively, of course, you could think harder about how to actually measure what you want to measure. I know this is your strategy when it comes to value detection. And I don't plan on doing zero of that. But I think there's useful work to be done without those insights, and would like my theories to be guided more by experiment (and vice versa).

RLHF can be seen as optimizing for achieving goals in the world, not just in the sense in the next paragraph? You're training against a reward model that could be measuring performance on some real-world task.

I mostly agree, though I don't think it changes too much. I still think the dominant effect here is on the process by which the LLM solves the task, and in my view there are many other considerations which have just as large an influence on general purpose goal solving, such as human biases, misconceptions, and conversation styles.

If you mean to say we will watch what happens as the LLM acts in the world, then reward or punish it based on how much we like what it does, then this seems a very slow reward signal to me, and in that circumstance I expect most human ratings to be offloaded to other AIs (self-play), or for there to be advances in RL methods before this happens. Currently my understanding is this is not how RLHF is done at the big labs, and instead they use MTurk interactions + expert data curation (+ also self-play via RLAIF/constitutional AI).

Out of curiosity, are you lumping things like "get more data by having some kind of good curation mechanism for lots of AI outputs without necessarily doing self-play and that just works (like say, having one model curate outputs from another, or even having light human oversight on outputs)" under this as well? Not super relevant to the content, just curious whether you would count that under an RL banner and subject to similar dynamics, since that's my main guess for overcoming the data wall.

This sounds like a generalization of decision transformers to me (i.e. condition on the best of the best outputs, then train on those), and I also include those as prototypical examples in my thinking, so yes.

Comment by Garrett Baker (D0TheMath) on D0TheMath's Shortform · 2024-07-09T14:10:56.908Z · LW · GW

And here is that post

Comment by Garrett Baker (D0TheMath) on When is a mind me? · 2024-07-08T14:43:47.158Z · LW · GW

I think I basically agree with everything here, but probably less confidently for you, such that I would have a pretty large bias against destructive whole brain emulation, with the biggest crux being how anthropics works over computations.

You say that there’s no XML tag specifying whether some object is “really me” or not, but a lighter version of that—a numerical amplitude tag specifying how “real” a computation is—is the best interpretation we have for how quantum mechanics works. Even though all parts of me in the wavefunction are continuations of the same computation of “me” I experience being some of them at a much higher rate than others. There are definitely many benign versions of this that don’t affect uploading, but I’m not confident enough yet to bet my life on the benign version being true.

Comment by Garrett Baker (D0TheMath) on What percent of the sun would a Dyson Sphere cover? · 2024-07-04T17:23:46.168Z · LW · GW

So if 2/3 of the sun's energy is getting re-radiated in the infrared, Earth would actually stay warm enough to keep its atmosphere gaseous - a little guessing gives an average surface temperature of -60 Celsius.

That is, until the Matrioshka brain gets built, in which case assuming no efficiency gains, the radiation will drop to 44% of its original, then 30%, then 20%, etc.

Comment by Garrett Baker (D0TheMath) on What percent of the sun would a Dyson Sphere cover? · 2024-07-03T18:45:27.766Z · LW · GW

They're probably basing their calculation on the orbital design discussed in citation 34: Suffern's Some Thoughts on Dyson Spheres whose abstract says

According to Dyson (1960), Malthusian pressures may have led extra-terrestrial civilizations to utilize significant fractions of the energy output from their stars or the total amount of matter in their planetary systems in their search for living space. This would have been achieved by constructing from a large number of independently orbiting colonies, an artificial biosphere surrounding their star. Biospheres of this nature are known as Dyson spheres. If enough matter is available to construct an optically thick Dyson sphere the result of such astroengineering activity, as far as observations from the earth are concerned, would be a point source of infra-red radiation which peaks in the 10 micron range. If not enough matter is available to completely block the stars’ light the result would be anomalous infra-red emission accompanying the visible radiation (Dyson 1960).

Bolded for your convenience. Presumably they justify that assertion somewhere in the paper.

Comment by Garrett Baker (D0TheMath) on What percent of the sun would a Dyson Sphere cover? · 2024-07-03T18:05:50.196Z · LW · GW

Armstrong & Sanders answer many of these questions in Eternity in Six Hours:

The most realistic design for a Dyson sphere is that of a Dyson swarm ([32, 33]): a collection of independent solar captors in orbit around the sun. The design has some drawbacks, requiring careful coordination to keep the captors from colliding with each other, issues with captors occluding each other, and having difficulties capturing all the solar energy at any given time. But these are not major difficulties: there already exist reasonable orbit designs (e.g. [34]), and the captors will have large energy reserves to power any minor course corrections. The lack of perfect efficiency isn't an issue either, with W available. And the advantages of Dyson swarms are important: they don't require strong construction, as they will not be subject to major internal forces, and can thus be made with little and conventional material.

The lightest design would be to have very large lightweight mirrors concentrating solar radiation down on focal points, where it would be transformed into useful work (and possibly beamed across space for use elsewhere). The focal point would most likely some sort of heat engine, possibly combined with solar cells (to extract work from the low entropy solar radiation).

The planets provide the largest source of material for the construction of such a Dyson swarm. The easiest design would be to use Mercury as the source of material, and to construct the Dyson swarm at approximately the same distance from the sun. A sphere around the sun of radius equal to the semi-major axis of Mercury's orbit ( m) would have an area of about m^2.

Mercury itself is mainly composed of 30% silicate and 70% metal [35], mainly iron or iron oxides [36], so these would be the most used material for the swarm. The mass of Mercury is kg; assuming 50% of this mass could be transformed into reflective surfaces (with the remaining material made into heat engines/solar cells or simply discarded), and that these would be placed in orbit at around the semi-major axis of Mercury's orbit, the reflective pieces would have a mass of:

Iron has a density of 7874 kg/m^3, so this would correspond to a thickness of 0.5 mm, which is ample. The most likely structure is a very thin film (of order 0.001 mm) supported by a network of more rigid struts.

They go on to estimate how long it'd take to construct, but the punchline is 31 years and 85 days.

Comment by Garrett Baker (D0TheMath) on Andrew Burns's Shortform · 2024-06-26T19:38:58.280Z · LW · GW

Are you willing to bet on any of these predictions?

Comment by Garrett Baker (D0TheMath) on Andrew Burns's Shortform · 2024-06-26T19:36:25.053Z · LW · GW

Papers like the one involving elimination of matrix-multiplication suggest that there is no need for warehouses full of GPUs to train advanced AI systems. Sudden collapse of Nvidia. (60%)

I assume you're shorting Nvidia then, right?

Comment by Garrett Baker (D0TheMath) on Andrew Burns's Shortform · 2024-06-26T19:34:22.982Z · LW · GW

Advanced inexpensive Chinese personal robots will overwhelm the western markets, destroying current western robotics industry in the same way that the West's small kitchen appliance industry was utterly crushed. (70%) Data from these robots will make its way to CCP (90%, given the first statement is true)

By what time period are you imagining this happening by?

Comment by Garrett Baker (D0TheMath) on Andrew Burns's Shortform · 2024-06-26T19:32:08.683Z · LW · GW

What does "atop" mean here? Ranked in top 3 or top 20 or what?

Comment by Garrett Baker (D0TheMath) on D0TheMath's Shortform · 2024-06-18T18:26:35.830Z · LW · GW

My latest & greatest project proposal, in case people want to know what I'm doing, or give me money. There will likely be a LessWrong post up soon where I explain in a more friendly way my thoughts.

Over the next year I propose to study the development and determination of values in RL & supervised learning agents, and to expand the experimental methods & theory of singular learning theory (a theory of supervised learning) to the reinforcement learning case.

All arguments for why we should expect AI to result in an existential risk rely on AIs having values which are different from ours. If we could make a good empirically & mathematically grounded theory for the development of values during training, we could create a training story which we could have high confidence would result in an inner-aligned AI. I also find it likely reinforcement learning (as a significant component of training AIs) makes a come-back in some fashion, and such a world is much more worrying than if we just continue with our almost entirely supervised learning training regime.

However, previous work in this area is not only sparse, but either solely theoretical or solely empirical, with few attempts or plans to bridge the gap. Such a bridge is however necessary to achieve the goals in the previous paragraph with confidence.

I think I personally am suited to tackle this problem, having already been working on this project for the past 6 months, having both experience in ML research in the past, and extensive knowledge of a wide variety of areas of applied math. 

I also believe that given my limited requests for resources, I’ll be able to make claims which apply to a wide variety of RL setups, as it has generally been the case in ML that the differences between scales is only that: scale. Along with a strong theoretical component, I will be able to say when my conclusions hold, and when they don’t.

Comment by Garrett Baker (D0TheMath) on Thomas Kwa's Shortform · 2024-06-14T07:44:09.271Z · LW · GW

Seems you’re left with outer alignment after solving this. What do you imagine doing to solve that?

Comment by Garrett Baker (D0TheMath) on D0TheMath's Shortform · 2024-06-12T23:59:55.457Z · LW · GW

In particular, 25% chance of nationalization by EOY 2040.

I think in fast-takeoff worlds, the USG won't be fast enough to nationalize the industry, and in slow-takeoff worlds, the USG will pursue regulation on the level of military contractors of such companies, but won't nationalize them. I mainly think this because this is the way the USG usually treats military contractors (including strict & mandatory security requirements, and gatekeeping the industry), and really its my understanding of how it treats most projects it wants to get done which it doesn't already have infrastructure in place to complete. 

Nationalization, in the US, is just very rare. 

Even during world war 2, my understanding is very few industries---even those vital to the war effort---were nationalized. People love talking about the Manhattan Project, but that was not an industry that was nationalized, that was a research project started by & for the government. AI is a billion-dollar industry. The AGI labs (their people, leaders, and stock-holders [or in OAI's case, their profit participation unit holders]) are not just going to sit idly by as they're taken over. 

And neither may the national security apparatus of the US. I don't know too much about the internal beliefs of that organization, but I'd bet they're pretty happy with the present dynamic of the US issuing contracts, and having a host of contractors bid for them.

I have a variety of responses to a variety of objections someone could have, but I don't know which are cruxy or interesting for you in particular, so I won't try addressing all of them.

Comment by Garrett Baker (D0TheMath) on D0TheMath's Shortform · 2024-06-12T04:21:37.455Z · LW · GW

Since it seems to be all the rage nowadays, due to Aschenbrenner's Situational Awareness, here's a Manifold market I created on when the first (or whether any) AGI company will be "nationalized".

I would be in the never camp, unless the AI safety policy people get their way. But I don't like betting in my own markets (it makes them more difficult to judge in the case of an edge-case).

Comment by Garrett Baker (D0TheMath) on My AI Model Delta Compared To Yudkowsky · 2024-06-10T17:13:30.715Z · LW · GW

Consider my vote for Vanessa Kossoy, and Scott Garabrant deltas. I don't really know what their models are. I can guess what the deltas between you and Evan Hubinger are, but that would also be interesting. All of these would be less interesting than Christiano deltas though.

Comment by Garrett Baker (D0TheMath) on My AI Model Delta Compared To Yudkowsky · 2024-06-10T17:07:29.870Z · LW · GW

Consider this my vote to turn it into a sequence, and to go on for as long as you can

This particular delta seems very short, why spend longer discussing it?

Comment by Garrett Baker (D0TheMath) on Natural Latents Are Not Robust To Tiny Mixtures · 2024-06-07T20:35:51.412Z · LW · GW

I may misunderstand (I’ve only skimmed), but its not clear to me we want natural latents to be robust to small updates. Phase changes and bifurcation points seem like something you should expect here. I would however feel more comfortable if such points had small or infinitesimal measure.

Comment by Garrett Baker (D0TheMath) on Prometheus's Shortform · 2024-06-06T23:40:32.880Z · LW · GW

Not necessarily. If we have the option to hide information, then even if we reveal information, adversaries may still assume (likely correctly) we aren't sharing all our information, and are closer to a decisive strategic advantage than we appear. Even in the case where we do share all our information (which we won't).

Of course the more options are likely better option holds if the lumbering, slow, disorganized, and collectively stupid organizations which have those options somehow perform the best strategy, but they're not actually going to take the best strategy. Especially when it comes to US-China relations.


[we have great security, therefore we're sharing nothing with adversaries] is clearly not a valid inference in general.

I don't think the conclusion holds if that is true in general, and I don't think I ever assumed or argued it was true in general.

Comment by Garrett Baker (D0TheMath) on Former OpenAI Superalignment Researcher: Superintelligence by 2030 · 2024-06-06T18:23:28.954Z · LW · GW

I think this gets more tricky because of coordination. Leopold's main effect is in selling maps, not using them. If his maps list a town in a particular location, which consumers and producers both travel to expecting a town, then his map has reshaped the territory and caused a town to exist.

Pointing out one concrete dynamic here, most of his argument boils down to "we must avoid a disastrous AI arms race by racing faster than our enemies to ASI", but of course it is unclear whether an "AI arms race" would even exist if nobody were talking about an "AI arms race". That is, just following incentives and coordinating rationally with their competitors.

There's also obviously the classic "AGI will likely end the world, thus I should invest in / work on it since if it doesn't I'll be rich, therefore AGI is more likely to end the world" self-fulfilling prophesy that has been a scourge on our field since the founding of DeepMind.

Comment by Garrett Baker (D0TheMath) on D0TheMath's Shortform · 2024-06-06T01:09:10.469Z · LW · GW

Is it? That’s definitely what my English teacher wanted me to believe, but since every newspaper does it, all the time (except when someone Tweets something) I don’t see how it could be against journalistic ethics.

Indeed, I think there’s a strong undercurrent in most mainstream newspapers that “the people” are not smart enough to evaluate primary sources directly, and need journalists & communicators to ensure they arrive at the correct conclusions.

Comment by Garrett Baker (D0TheMath) on D0TheMath's Shortform · 2024-06-05T23:13:03.398Z · LW · GW

To elaborate on @the gears to ascension's highlighted text, often Wikipedia cites newspaper articles when it makes a particular scientific, economic, historical, or other claim, instead of the relevant paper or other primary source such newspaper articles are reporting on. When I see interesting, surprising, or action-relevant claims I like checking & citing the corresponding primary source, which makes the claim easier for me to verify, often provides nuance which wasn't present in the Wikipedia or news article, and makes it more difficult for me to delude myself when talking in public (since it makes it easier for others to check the primary source, and criticize me for my simplifications or exaggerations). 

Comment by Garrett Baker (D0TheMath) on D0TheMath's Shortform · 2024-06-05T21:43:23.188Z · LW · GW

I do the same for the most part. The way this comes up is mostly by my attempts to verify claims Wikipedia makes.

Comment by Garrett Baker (D0TheMath) on davekasten's Shortform · 2024-06-05T18:38:10.899Z · LW · GW

I don't think they have stated they'll to to war after 2027. 2027 is the year of their "military modernization" target.

Comment by Garrett Baker (D0TheMath) on davekasten's Shortform · 2024-06-05T18:15:44.723Z · LW · GW

No, the belief is that China isn’t going to start a war before it has a modernized military, and they plan to have a modernized military by 2027. Therefore they won’t start a war before 2027.

China has also been drooling over Taiwan for the past 100 years. Thus, if you don’t think diplomatic or economic ties mean much to them, and they’ll contend with the US’s military might before 2027, and neither party will use nukes in such a conflict, then you expect a war after 2027.

Comment by Garrett Baker (D0TheMath) on D0TheMath's Shortform · 2024-06-05T17:55:07.540Z · LW · GW

Probably my biggest pet-peeve of trying to find or verify anything on the internet nowadays is that newspapers never seem to link to or cite (in any useful manner) any primary sources they use, unless weirdly if any of those primary sources come from Twitter.

There have probably been hundreds of times by now that I have seen an interesting economic or scientific claim made by The New York Times, or some other popular (or niche) newspaper, wanted to find the relevant paper, and had to spend at least 10 minutes on Google trying to search between thousands of identical newspaper articles for the one paper that actually says anything about what was actually done.

More often than not, the paper is a lot less interesting than the newspaper article is making it out to be too.

Comment by Garrett Baker (D0TheMath) on davekasten's Shortform · 2024-06-05T17:47:00.938Z · LW · GW

It is supposedly their goal for when they will have modernized their military.

Comment by Garrett Baker (D0TheMath) on Thomas Kwa's Shortform · 2024-06-05T15:14:35.516Z · LW · GW

Another circumstance where votes underestimate quality is where you often get caught up in long reply-chains (on posts that aren’t “the post of the week”), which seems like a very beneficial use of LessWrong, but typically has much lower readership & upvote rates, but uses a lot of comments.

Comment by Garrett Baker (D0TheMath) on Prometheus's Shortform · 2024-06-04T21:52:32.695Z · LW · GW

I note that I am uncertain whether working on such a task would increase or decrease global stability & great power conflicts.

Comment by Garrett Baker (D0TheMath) on Just admit that you’ve zoned out · 2024-06-04T17:02:12.465Z · LW · GW

When someone says that, I always use different words anyway, since its boring to use the same words.

Comment by Garrett Baker (D0TheMath) on in defense of Linus Pauling · 2024-06-03T22:50:13.640Z · LW · GW

I certainly wouldn't suggest trying to independently compete with the conceptual framework of, say, semiconductor physics or structural engineering, but when a field is rotten enough (nutrition, psychology, education, and economics come to mind) history indicates to me that someone smart from another field is often more correct than specialists on that topic, when they have an interest in it.

Economics seems out of place here. Why do you think its rotten?

Comment by Garrett Baker (D0TheMath) on quila's Shortform · 2024-06-02T16:49:58.199Z · LW · GW

Historically I’ve been able to understand others’ vague ideas & use them in ways they endorse. I can’t promise I’ll read what you send me, but I am interested.

Comment by Garrett Baker (D0TheMath) on OpenAI: Helen Toner Speaks · 2024-05-31T20:44:27.099Z · LW · GW

The review’s findings rejected the idea that any kind of ai safety concern necessitated Mr Altman’s replacement. In fact, WilmerHale found that “the prior board’s decision did not arise out of concerns regarding product safety or security, the pace of development, OpenAI's finances, or its statements to investors, customers, or business partners.”

Note that Toner did not make claims regarding product safety, security, the pace of development, OAI's finances, or statements to investors (the board is not investors), customers, or business partners (the board are not business partners). She said he was not honest to the board.

Comment by Garrett Baker (D0TheMath) on A civilization ran by amateurs · 2024-05-31T08:06:15.500Z · LW · GW

Sorry for the noncentral point...

Indeed, one can already find quite high-quality educational videos from YouTube. 3Blue1Brown has received near-universal acclaim (at least in my circles), and sets a lower bound for how good videos one can make. (I also bet that, unlike for many Hollywood movies, the budget for 3Blue1Brown videos is less than $10 million per hour.)

I actually don't think 3Blue1Brown is all that great an example here. How many people, after watching his essence of calculus videos, could find a derivative or an integral of a reasonably complicated function? How many, after watching his linear algebra series, could find the eigenvectors & values of a 3x3 matrix? 3Blue1Brown seems very much like good supplementary material to me, or good as a first high-level approach to a math area.

I'd say a better example for "pedagogy done extraordinarily well at the high-school math level" is Khan Academy. At least, it was 7-8 years ago, and I expect even if it has returned a bit to the mean, LLMs are vastly improving the experience, which I know they've been using, and is a big step up over the alternative. Someone whose gone through the Khan Academy lessons on calculus or linear algebra has a far higher chance of correctly performing a derivative or an integral, or finding the eigenvectors & values of a 3x3 matrix.

ETA in the undergraduate & beginning graduate math level, textbooks become the things which are professionalized (but not those recommending textbooks to you), and in the advanced graduate level of course there is no professionalization in pedagogy.

Comment by Garrett Baker (D0TheMath) on Non-Disparagement Canaries for OpenAI · 2024-05-30T22:20:57.375Z · LW · GW

A market on the subject:

Comment by Garrett Baker (D0TheMath) on D0TheMath's Shortform · 2024-05-29T06:43:09.137Z · LW · GW

Not quite, helpful video, summary:

They use a row of spinning fins mid-way through their rockets to indirectly steer missiles by creating turbulent vortices which interact with the tail-fins and add an extra oomfph to the steering mechanism. The exact algorithm is classified, for obvious reasons.

Comment by Garrett Baker (D0TheMath) on D0TheMath's Shortform · 2024-05-29T05:02:55.155Z · LW · GW

There is a mystery which many applied mathematicians have asked themselves: Why is linear algebra so over-powered?

An answer I like was given in Lloyd Trefethen's book An Applied Mathematician's Apology, in which he writes (my summary):

Everything in the real world is described fully by non-linear analysis. In order to make such systems simpler, we can linearize (differentiate) them, and use a first or second order approximation, and in order to represent them on a computer, we can discretize them, which turns analytic techniques into algebraic ones. Therefore we've turned our non-linear analysis into linear algebra.

Comment by Garrett Baker (D0TheMath) on Open Thread Spring 2024 · 2024-05-29T02:52:52.966Z · LW · GW

Relevant market

Comment by Garrett Baker (D0TheMath) on Alexander Gietelink Oldenziel's Shortform · 2024-05-27T18:13:05.697Z · LW · GW

Do you think the final big advance happens within or with-out labs?

Comment by Garrett Baker (D0TheMath) on Episode: Austin vs Linch on OpenAI · 2024-05-27T06:04:45.863Z · LW · GW

I don't know the exact article that convinced me, but I bet this summary of the history of economic thought on the subject is a good place to start, which I have skimmed, and seems to cover the main points with citations.

Comment by Garrett Baker (D0TheMath) on Episode: Austin vs Linch on OpenAI · 2024-05-26T22:18:51.104Z · LW · GW

Interesting lens! Though I'm not sure if this is fair -- the largest things that are done tend to get done through governments, whether those things are good or bad. If you blame catastrophes like Mao's famine or Hitler's genocide on governments, you should also credit things like slavery abolition and vaccination and general decline of violence in civilized society to governments too.

I do mostly[1] credit such things to governments, but the argument is about whether companies or governments are more liable to take on very large tail risks. Not about whether governments are generally good or bad. It may be that governments just like starting larger projects than corporations. But in that case, I think the claim that a greater percentage of those end in catastrophe than similarly large projects started by corporations still looks good.

  1. I definitely don't credit slavery abolition to governments, at least in America, since that industry was largely made possible in the first place by governments subsidizing the cost of chasing down runaway slaves. I'd guess general decline of violence is more attributable to generally increasing affluence, which has a range of factors associated with it, than government intervention so directly. But I'm largely ignorant on that particular subject. The "mostly" here means "I acknowledge governments do some good things". ↩︎

Comment by Garrett Baker (D0TheMath) on Episode: Austin vs Linch on OpenAI · 2024-05-26T03:04:22.163Z · LW · GW

I will push back on democratic in the sense I think Linch is pushing the term being actually all that good a property for cosmically important orgs. See Bryan Caplan's The Myth of the Rational Voter, and the literature around [Social-desirability bias](Social-desirability bias) for reasons why, which I'm sure Linch is familiar with, but I notice is not mentioned.

I also claim that most catastrophes through both recent and long-ago history have been caused by governments, not just in the trivial sense, but also if we normalize by amount of stuff done. A good example everyone right now should be familiar with is qualified immunity, and the effects it has on irresponsible policing. The fact is we usually hold our companies to much higher standards than our governments (or do we just have more control over the incentives of our companies than our governments). It is also strange that the example Linch gives for a bad company is Blackwater, which while bad, is... about par-for-the-course when it comes to CIA projects.

I note too the America-centric bias with all of these examples & comparisons. Maybe the American government is just too incompetent compared to others, and we should instead embed the project within France or Norway.

There's a general narrative that basic research is best done in government/academia, but is this true? The academia end seems possibly true in the 20th century, most discoveries were made by academics. But there was also a significant contribution by folks at research labs started by monopolies of the period (most notably Bell Laboratories). Though this seems like the kind of thing which could turn out to be false going forward, as our universities become more bloated, and we kill off our monopolies. But in either case, I don't know why Linch thinks quality basic research will be done by the government? People like bringing up the Apollo program & Manhattan project, but both of those were quality projects due to their applied research, not their basic research which was all laid down ahead of time. I'm not saying it doesn't happen, but does anyone have good case studies? CERN comes to mind, but of course for projects that just require governments to throw massive amounts of money at a problem, government does well. AGI is plausibly like this, but alignment is not (though more money would be nice).

Government also tends to go slow, which I think is the strongest argument in favor of doing AGI inside a government. But also, man I don't trust government to implement an alignment solution if such a solution is invented during the intervening time. I'm imagining trying to convince a stick-in-the-ass bureaucrat fancying himself a scientist philosopher, whose only contribution to the project was politicking at a few important senators to thereby get himself enough authority to stand in the way of anyone changing anything about the project, who thinks he knows the solution to alignment that he is in fact wrong, and he should use so-and-so proven strategy, or such-and-such ensemble approach instead. Maybe a cynical picture, but one I'd bet resonates with those working to improve government processes.

I'd be interested to hear how Austin has updated regarding Sam's trustworthiness over the past few days.

Comment by Garrett Baker (D0TheMath) on Episode: Austin vs Linch on OpenAI · 2024-05-26T02:05:17.427Z · LW · GW

The second half deals with more timeless considerations, like whether OpenAI should be embedded in a larger organization which doesn't have its main reason for existence being creating AGI, like a large company or a government.

Comment by Garrett Baker (D0TheMath) on Talent Needs of Technical AI Safety Teams · 2024-05-25T21:38:51.817Z · LW · GW

I don't know of any clear progress on your interests yet. My argument was about the trajectory MI is on, which I think is largely pointed in the right direction. We can argue about the speed at which it gets to the hard problems, whether its fast enough, and how to make it faster though. So you seem to have understood me well.

A core motivating intuition behind the MI program is (I think) "the stuff is all there, perfectly accessible programmatically, we just have to learn to read it". This intuition is deeply flawed: Koan: divining alien datastructures from RAM activations

I think I'm more agnostic than you are about this, and also about how "deeply" flawed MI's intuitions are. If you're right, once the field progresses to nontrivial dynamics, we should expect those operating at a higher level of analysis--conceptual MI--to discover more than those operating at a lower level, right?

Comment by Garrett Baker (D0TheMath) on Talent Needs of Technical AI Safety Teams · 2024-05-25T21:15:00.470Z · LW · GW

I have, and I also remember seeing Adam’s original retrospective, but I always found it unsatisfying. Thanks anyway!

Comment by Garrett Baker (D0TheMath) on Open Thread Spring 2024 · 2024-05-25T18:31:02.150Z · LW · GW

My recommendation would be to get an LTFF, manifund, or survival and flourishing fund grant to work on the research, then if it seems to be going well, try getting into MATS, or move to Berkeley & work in an office with other independent researchers like FAR for a while, and use either of those situations to find co-founders for an org that you can scale to a greater number of people.

Alternatively, you can call up your smart & trustworthy college friends to help start your org.

I do think there's just not that much experience or skill around these parts with setting up highly effective & scalable organizations, so what help can be provided won't be that helpful. In terms of resources for how to do that, I'd recommend Y Combinator's How to Start a Startup lecture recordings, and I've been recommended the book Traction: Get a Grip on Your Business.

It should also be noted that if you do want to build a large org in this space, once you get to the large org phase, OpenPhil has historically been less happy to fund you (unless you're also making AGI[1]).

  1. This is not me being salty, the obvious response to "OpenPhil has historically not been happy to fund orgs trying to grow to larger numbers of employees" is "but what about OpenAI or Anthropic?" Which I think are qualitatively different than, say, Apollo. ↩︎

Comment by Garrett Baker (D0TheMath) on Talent Needs of Technical AI Safety Teams · 2024-05-25T16:05:46.494Z · LW · GW

I'd say mechanistic interpretability is trending toward a field which cares & researches the problems you mention. For example, the doppelganger problem is a fairly standard criticism of the sparse autoencoder work, diasystemic novelty seems the kind of thing you'd encounter when doing developmental interpretability, interp-through-time, or inductive biases research, especially with a focus on phase changes (a growing focus area), and though I'm having a hard time parsing your creativity post (an indictment of me, not of you, as I didn't spend too long with it), it seems the kind of thing which would come from the study of in-context-learning, a goal that mainstream MI I believe has, even if it doesn't focus on now (likely because it believes its unable to at this moment), and which I think it will care more about as the power of such in-context learning becomes more and more apparent.

ETA: An argument could be that though these problems will come up, ultimately the field will prioritize hacky fixes in order to deal with them, which only sweep the problems under the rug. I think many in MI will prioritize such limited fixes, but also that some won't, and due to the benefits of such problems becoming empirical, such people will be able to prove the value of their theoretical work & methodology by convincing MI people with their practical applications, and money will get diverted to such theoretical work & methodology by DL-theory-traumatized grantmakers.

Comment by Garrett Baker (D0TheMath) on Daniel Kokotajlo's Shortform · 2024-05-25T01:48:17.331Z · LW · GW

Advertisements are often very overt so that users don't get suspicious of your product, so I imagine you get GPT-Cola, which believes its a nice, refreshing, cold, bubbling bottle of Coca-Cola. And loves, between & within paragraphs actually answering your question, to talk about how tasty & sweet coca-cola is, and how for a limited time only, you can buy specialty GPT-4 coke bottles with GPT-cola q&as written on the front.

Comment by Garrett Baker (D0TheMath) on Talent Needs of Technical AI Safety Teams · 2024-05-25T01:34:40.297Z · LW · GW

Report back if you get details, I'm curious.