LessWrong 2.0 Reader

View: New · Old · Top

← previous page (newer posts) · next page (older posts) →

← previous page (newer posts) · next page (older posts) →

Recent comments

viliam on Dating Roundup #3: Third Time’s the Charm

I am afraid that even asking this question would be perceived as horribly patriarchal today.

My parents' generation would probably say "cooking" and maybe a few more things, dunno.

keltan on Observations on Teaching for Four Weeks

The moment that I dread as a teacher and that has happened to me a few times. Is when the students realise that your authority is totally made up. I guess this is why we don’t teach philosophy in schools. I have never figured out a way to recover from this blunder. If anyone has any advice, I’d love to hear it.

keltan on Observations on Teaching for Four Weeks

Related to your principals comment. I can’t speak too much about the situation. Though am close with a former principal of a mid sized rural town. A real tricky job, and they knew that taking it on. It happened to be a town that (if I understand correctly) our government was sending a lot of refugees to. This resulted in a school where a large minority couldn’t speak English. In top of that, it’s a rural school. Notorious for horrible shit. Anyway, this principal made a small slip up and publicly apologised in a video. It then went locally viral and the state wide news picked it up. This principle was dragged through the mud for half a year. They couldn’t go to the grocery store or walk down the street. The news just kept going. I live pretty far away from them, but people know I know them. I had people come up and give condolences because of how harsh the treatment had been in the media.

I’m not sure how much this adds to the discussion. But I hope it helps to update someone’s model. A principle is a public figure with power over a tiny domain. They are sometimes attacked in the same way as a politician, but without the defences that politicians have.

mako-yass on Cooperation is optimal, with weaker agents too - tldr

:( that isn't what cooperation would look like. The gazelles can reject a deal that would lead to their extinction (they have better alternatives) and impose a deal that would benefit both species.

Cooperation isn't purely submissive compliance.

algon on Designing for a single purpose

Ah, that makes sense! Well, it does seem to work out for some businesses, in particular East Asian business conglomerates. Let me quote from a common cog article on the topic of near every company having an equillibrium point past which further growth is difficult w/o a line of capital.

Chinese businessmen and the SME Loop
With a few notable exceptions, the vast majority of successful traditional Chinese businessmen have chosen the route of escaping the SME loop by pursuing additional — and completely different — lines of businesses. This has led to the prevalence of ‘Asian conglomerates’ — where a parent holding company owns many subsidiaries in an incredibly diverse number of industries: energy, edible oils, shipping, real estate, hospitality, telecommunications and so on. The benefit of this structure has been to subsidise new business units with the profits of other business units.
Why a majority of Chinese businessmen chose this route remains a major source of mystery for me. When I left the point-of-sale business in late 2017, I wondered what steps my boss would take to escape the SME loop. And I began to wonder if the first generation of traditional Chinese businessmen chose the route of multiple diversified businesses because it was the easiest way to escape the SME loop ... or if perhaps there was something about developing markets that caused them to expand this way.
(And if so, why are there less such conglomerates in the West? Why are these conglomerates far more common in Asia? These are interesting questions — but the answers aren’t readily available to me; not for a few decades, and not until I’ve have had the experience of growing such businesses.)
Perhaps the right way to think about this is that the relentless pursuit of growth led them to expand into adjacent markets — and the markets for commodities and infrastructure was ripe for the taking in the early years of South East Asia’s development.

Here, we see that chinese businessmen expand to keep up their free cash flow to fund their attempts to innovate enough to keep growing to larger scales.

nevin-wetherill on How do top AI labs vet architecture/algorithm changes?

I am not an AI researcher, nor do I have direct access to any AI research processes. So, instead of submitting an answer, I am writing this in the comment section.

I have one definite easily sharable observation. I drew from this a lot of inferences, which I will separate out so that the reader can condition their world-model on their own interpretations of whatever pieces of evidence - if any - are unshared.

This interview in this particular segment, with the most seemingly relevant part to me occuring around roughly the timestamp 40:15.

So, in this segment Dwarkesh is asking Sholto Douglas, a researcher at Google Deepmind a sub-question in a discussion about how researchers see the feasibility of "The Intelligence Explosion."

The intent of this question seems to be to get an object-level description of the workflow of an AI researcher, in order to inform the meta-question of "how is AI going to increase the rate of AI research."

Potentially important additional detail, the other person at that table is Trenton Bricken, a "member of Technical Staff on the Mechanistic Interpretability team at Anthropic" (description according to his website.)

Sholto makes some kind of allusion to the fact that the bulk of his work at the time of this interview does not appear directly relevant to the question, so he seems to be answering for some more generic case of AI researcher.

Sholto's description of his work excerpted from the "About" section of his blog hosted on GitHub.

I’m currently going after end to end learning for robotic manipulation because of the impact it could have on the real world, and the surface area of the problem in contact with understanding how to make agents learn and reason like we do.

I’m currently exploring whether self-supervised learning on play data and clever usage of language to align robot and human video in the same trajectory space can build models which provide a sufficient base that they can be swiftly fine-tuned to any manipulation task.

In the past, I’ve looked into hierarchial RL, energy models for planning, and seeing if we can learn a representation of visual inputs where optimal paths are by definition the shortest path through the transformed space.

In this segment of the podcast, Sholto talks about "scaling laws inference" - seemingly alluding to the fact that researchers will have some compute budget to run experiments, and there will be agreed upon desideratum in the metrics of these experiments which could be used in the process of selecting features for programs which will then be given much larger training runs.

How do the researchers get this compute budget? Do all researchers have some compute resources available beyond just their personal workstation hardware? What does the process look like for spinning up a small-scale training run and reporting its results?

I am unsure, but from context will draw some guesses.

Sholto mentions, in providing further context in this segment:

A lot of good research comes from working backwards from the actual problems you want to solve.

He continues to give a few sentences that seem to gesture at a part of this internal process:

There's a couple of grand problems in making the models better that you identify as issues and then work on "how can I change things to achieve this?" When you scale you also run into a bunch of things and you want to fix behaviors and issues at scale.

This seems to imply that a part of this process is receiving some 'mission' or set of 'missions' (my words not theirs, you could say quests or tasks or assignments) - and then some group(s) of researchers propose and test small scale tests for solutions to those.

Does this involve taking snapshots of these models at the scale where "behaviors or issues" appear and branching them to run shorter, lower compute, continuations of training/reinforcement learning?

Presumably this list of "grand problems" may include some items like:

hallucinations
failures in reasoning on specific tasks
learning patterns which do not generalize well in new domains (learning by 'wrote' instead of learning simpler underlying patterns which can be invoked usefully in a different distribution)

Possibly the "behaviors and issues" which occur "when you scale" include:

unforseen differences between observed metrics and projected metrics
persistent failures to achieve lower loss on certain sections of the training data
tokens or sequences of tokens which cause degenerate behavior (not the human type) across some number of different contexts

Sholto continues:

Concretely, the barrier is a little bit of software engineering, having a code base that's large and capable enough that it can support many people doing research at the same time often makes it complex. If you're doing everything by yourself, your iteration pace is going to be much faster.

Actually operating with other people raises the complexity a lot, for natural reasons familiar to every software engineer and also the inherent running. Running and launching those experiments is easy but there's inherent slowdowns induced by that. So you often want to be parallelizing multiple different streams. You can't be totally focused on one thing necessarily. You might not have fast enough feedback cycles. And then intuiting what went wrong is actually really hard.

This seems to imply that these AI labs have put their finger on the problem of doing work in large teams/titled sub-projects introduces a lot of friction. This could be Sholto's take on the ideal way to run an AI lab which could be informed by AI labs not actually working this way - but I presume Google Deepmind, at least, has a culture where they attempt to prevent individual researchers grumbling a lot about organizational stuff slowing down their projects. It seems, to me, that Sholto is right about it being much faster to do more in "parallel" - where individual researchers can work on these sub problems without having to organize a meeting, submit paperwork, and write memos to 3 other teams to get access to relevant pieces of their work.

The trio continues to talk about the meta-level question and sections relevant to "what does AI research look like" return to being as diffuse as you may expect in a conversation which includes 2/3rds AI researchers and focuses on topics associated with AI research.

One other particular quote that may be relevant to people drawing some inferences - Dwarkesh asks:

That's interesting to think about because at least the compute part is not bottlenecked on more intelligence, it's just bottlenecked on Sam's $7 trillion or whatever, right? If I gave you 10x the [TPUs] to run your experiments, how much more effective a researcher are you?

Sholto:

I think the Gemini program would probably be maybe five times faster with 10 times more compute or something like that.

Dwarkesh:

So that's pretty good. Elasticity of 0.5. Wait, that's insane.

Sholto:

I think more compute would just directly convert into progress.

Dwarkesh goes on to ask why labs aren't reallocating some of the compute they have from running large runs/serving clients to doing experiments if this is such a massive bottleneck.

Sholto replies:

So one of the strategic decisions that every pre-training team has to make is exactly what amount of compute do you allocate to different training runs, to your research program versus scaling the last best thing that you landed on. They're all trying to arrive at an optimal point here. One of the reasons why you need to still keep training big models is that you get information there that you don't get otherwise. So scale has all these emergent properties which you want to understand better.

Remember what I said before about not being sure what's going to fall off the curve. If you keep doing research in this regime and keep on getting more and more compute efficient, you may have actually gone off the path to actually eventually scale. So you need to constantly be investing in doing big runs too, at the frontier of what you sort of expect to work.

What does this actual breakdown look like within Deepmind? Well, obviously Sholto doesn't give us details about that. If you get actual first-hand details about the allocation of compute budgets from this question, I'd be rather surprised...

Well, actually, not terribly surprised. These are modern AI labs, not Eliezer's fantasy-football AI lab from Six Dimensions Of Operational Adequacy. [LW · GW] They may just DM you with a more detailed breakdown of what stuff looks like on the inside. I doubt someone will answer publicly in a way that could be tied back to them. That would probably breach a bunch of clauses on a bunch of contracts and get them in actual serious trouble.

What do I infer from this?

Well, first, you can watch the interview and pick up the rhythm. When I've done that, I get the impression that there are some relatively independent researchers who work under the umbrella of departments which have some amount of compute budgeted to them. It seems to me likely that this compute is not budgeted as strictly as something like timeslots on orbital telescopes - such that an individual researchers can have a brilliant idea one day and just go try it using some very-small fraction of their organization's compute for a short period of time. I think there is probably a range of experiment sizes above a certain threshold where you're going to have to have a strong case and make that case to those involved in compute-budgeting in order to get the compute-time to do experiments of that scale.

Does that level of friction with compute available to individual researchers account for the "0.5 elasticity" that Sholto was talking about? I'm not sure. Plausibly there is no "do whatever you want with this" compute-budget for individual researchers beyond what they have plugged into their individual work-stations. This would surprise me, I think? That seems like a dumb decision when you take the picture Sholto was sketching about how progress gets made at face-value. Still, it seems to me like a characteristic dumb decision of large organizations - where they try really hard to have any resource expenditures accounted for ahead of time, such that intangibles like "ability to just go try stuff" get squashed by considerations like "are we utilizing all of our resources with maximum efficiency?"

Hopefully this interview and my analysis is helpful to answering this question. I can probably discuss more, but I've noticed this comment is already rather long, and my brain is telling me that further writing will likely just be meandering and hand-waving.

If there is more content relevant to this discussion able to be mined from this interview, perhaps others will be able to iterate on my attempt and help flesh out all of the parts which seem easy to update our models on.

itay-dreyfus on Designing for a single purpose

Oh, I meant for the bloated approach as for the reason why it didn't work out.

algon on Designing for a single purpose

Uh, Brian did cut out a great deal of fat from AirBnB and the company clearly survived its brush with death due to Covid19. So I don't see why you'd say it didn't work.

itay-dreyfus on Designing for a single purpose

I certainly see this pattern in late-stage startups, and it seems like for Airbnb it didn't work.

Maybe there's an in-between path. I wonder how Dropbox could have evolved if it had remained more loyal to its original root.

itay-dreyfus on Designing for a single purpose

That's interesting, thanks.

Reminds me of small giants, which is a very similar concept: https://museapp.com/podcast/24-small-giants/

LessWrong 2.0 Reader

Archive

Recent comments