Posts

Constituency-sized AI congress? 2024-02-09T16:01:09.592Z
Gunpowder as metaphor for AI 2023-12-28T04:31:40.663Z
Digital humans vs merge with AI? Same or different? 2023-12-06T04:56:38.261Z
Desiderata for an AI 2023-07-19T16:18:08.299Z
An attempt to steelman OpenAI's alignment plan 2023-07-13T18:25:47.036Z
Two paths to win the AGI transition 2023-07-06T21:59:23.150Z
Nice intro video to RSI 2023-05-16T18:48:29.995Z
Will GPT-5 be able to self-improve? 2023-04-29T17:34:48.028Z
Can GPT-4 play 20 questions against another instance of itself? 2023-03-28T01:11:46.601Z
Feature idea: extra info about post author's response to comments. 2023-03-23T20:14:19.105Z
linkpost: neuro-symbolic hybrid ai 2022-10-06T21:52:53.095Z
linkpost: loss basin visualization 2022-09-30T03:42:34.582Z
Progress Report 7: making GPT go hurrdurr instead of brrrrrrr 2022-09-07T03:28:36.060Z
Timelines ARE relevant to alignment research (timelines 2 of ?) 2022-08-24T00:19:27.422Z
Please (re)explain your personal jargon 2022-08-22T14:30:46.774Z
Timelines explanation post part 1 of ? 2022-08-12T16:13:38.368Z
A little playing around with Blenderbot3 2022-08-12T16:06:42.088Z
Nathan Helm-Burger's Shortform 2022-07-14T18:42:49.125Z
Progress Report 6: get the tool working 2022-06-10T11:18:37.151Z
How to balance between process and outcome? 2022-05-04T19:34:10.989Z
Progress Report 5: tying it together 2022-04-23T21:07:03.142Z
What more compute does for brain-like models: response to Rohin 2022-04-13T03:40:34.031Z
Progress Report 4: logit lens redux 2022-04-08T18:35:42.474Z
Progress report 3: clustering transformer neurons 2022-04-05T23:13:18.289Z
Progress Report 2 2022-03-30T02:29:32.670Z
Progress Report 1: interpretability experiments & learning, testing compression hypotheses 2022-03-22T20:12:04.284Z
Neural net / decision tree hybrids: a potential path toward bridging the interpretability gap 2021-09-23T00:38:40.912Z

Comments

Comment by Nathan Helm-Burger (nathan-helm-burger) on Open Thread Spring 2024 · 2024-04-24T17:08:27.824Z · LW · GW

I've been using a remineralization toothpaste imported from Japan for several years now, ever since I mentioned reading about remineralization to a dentist from Japan. She recommended yhe brand to me. FDA is apparently bogging down release in the US, but it's available on Amazon anyway. It seems to have slowed, but not stopped, the formation of cavities. It does seem to result in faster plaque build-up around my gumline, like the bacterial colonies are accumulating some of the minerals not absorbed by the teeth. The brand I use is apagard.

Comment by Nathan Helm-Burger (nathan-helm-burger) on Vector Planning in a Lattice Graph · 2024-04-24T16:58:22.909Z · LW · GW

Thanks faul-sname. I came to the comments to give a much lower effort answer along the same lines, but yours is better. My answer: lazy local evaluations of nodes surrounding either your current position or the position of the goal. So long as you can estimate a direction from yourself to the goal, there's no need to embed the whole graph. This is basically gradient descent...

Comment by Nathan Helm-Burger (nathan-helm-burger) on 1-page outline of Carlsmith's otherness and control series · 2024-04-24T16:42:11.699Z · LW · GW

Personally, I most enjoyed the first one in the the series, and enjoyed listening to Joe's reading of it even more than when I just read it. My top three are 1, 6, 7.

Comment by Nathan Helm-Burger (nathan-helm-burger) on Scenario planning for AI x-risk · 2024-04-24T04:46:21.912Z · LW · GW

I generally agree, i just have some specific evidence which I believe should adjust estimates in the report towards expecting more accessible algorithmic improvements than some people seem to think.

Comment by Nathan Helm-Burger (nathan-helm-burger) on Cooperation is optimal, with weaker agents too  -  tldr · 2024-04-23T22:39:13.316Z · LW · GW

"What Dragons?", says the lion, "I see no Dragons, only a big empty universe. I am the most mighty thing here."

Whether or not the Imagined Dragons are real isn't relevant to the gazelles if there is no solid evidence with which to convince the lions. The lions will do what they will do. Maybe some of the lions do decide to believe in the Dragons, but there is no way to force all of them to do so. The remainder will laugh at the dragon-fearing lions and feast on extra gazelles. Their children will reproduce faster.

Comment by Nathan Helm-Burger (nathan-helm-burger) on Language and Capabilities: Testing LLM Mathematical Abilities Across Languages · 2024-04-23T22:32:38.370Z · LW · GW

Something else to play around with that I've tried. You can force the models to handle each digit separately by putting a space between each digit of a number like "3 2 3 * 4 3 7 = "

Comment by Nathan Helm-Burger (nathan-helm-burger) on Red teaming: challenges and research directions · 2024-04-23T22:00:01.853Z · LW · GW

As part of a team of experts building private biorisk evals for AI, and doing private red-teaming experiments, I appreciate this post.

Comment by Nathan Helm-Burger (nathan-helm-burger) on Scenario planning for AI x-risk · 2024-04-23T21:46:07.474Z · LW · GW

The interesting thing to me about the question, "Will we need a new paradigm for AGI?" is that a lot of people seem to be focused on this but I think it misses a nearby important question.

As we get closer to a complete AGI, and start to get more capable programming and research assistant AIs, will those make algorithmic exploration cheaper and easier, such that we see a sort of 'Cambrian explosion' of model architectures which work well for specific purposes, and perhaps one of these works better at general learning than anything we've found so far and ends up being the architecture that first reaches full transformative AGI?

The point I'm generally trying to make is that estimates of software/algorithmic progress are based on the progress being made (currently) mostly by human minds. The closer we get to generally competent artificial minds, the less we should expect past patterns based on human inputs to hold.

Comment by Nathan Helm-Burger (nathan-helm-burger) on Fluent dreaming for language models (AI interpretability method) · 2024-04-23T16:20:54.519Z · LW · GW

Very cool. I bet janus would dig this.

Comment by Nathan Helm-Burger (nathan-helm-burger) on CHAT Diplomacy: LLMs and National Security · 2024-04-23T05:37:48.918Z · LW · GW

I think perhaps in some ways this overstated the present risks at the time, but I think this forecasting is still relevant for the upcoming future. AI is continuing to improve. At some point, people will be able to make agents that can do a lot of harm. We can't rely on compute governance with the level of confidence we would need to be comfortable with that as a solution given the risks.

An example of recent work showing the potential for compute governance to fail: https://arxiv.org/abs/2403.10616v1 

Comment by Nathan Helm-Burger (nathan-helm-burger) on How to Model the Future of Open-Source LLMs? · 2024-04-23T05:26:07.417Z · LW · GW

Unless there is a 'peak-capabilities wall' that gets hit by current architectures that doesn't get overcome by the combined effects of the compute-efficiency-improving algorithmic improvements. In that case, the gap would close because any big companies that tried to get ahead by just naively increasing compute and having just a few hidden algorithmic advantages would be unable to get very far ahead because of the 'peak-capabilities wall'. It would get cheaper to get to the wall, but once there, extra money/compute/data would be wasted. Thus, a shrinking-gap world.

I'm not sure if there will be a 'peak-capabilities wall' in this way, or if the algorithmic advancements will be creative enough to get around it. The shape of the future in this regard seems highly uncertain to me. I do think it's theoretically possible to get substantial improvements in peak capabilities and also in training/inference efficiencies. Will such improvements keep arriving relatively gradually as they have been? Will there be a sudden glut at some point when the models hit a threshold where they can be used to seek and find algorithmic improvements? Very unclear.

Comment by Nathan Helm-Burger (nathan-helm-burger) on What good is G-factor if you're dumped in the woods? A field report from a camp counselor. · 2024-04-23T05:10:11.652Z · LW · GW

My childhood was quite different, in that I was quite kind-hearted, honest, and generally obedient to the letter of the law... but I was constantly getting into trouble in elementary school. I just kept coming up with new interesting things to do that they hadn't made an explicit rule against yet. Once they caught me doing the new thing, they told me never to do it again and made a new rule. So then I came up with a new interesting thing to try.

How about tying several jump ropes together to make a longer rope, tying a lasso on one end, lassoing an exhaust pipe on the roof of the one-story flat-roofed school, and then climb-walking up the wall onto the roof of the school? Oh? That's against the rules now? Ok. 

How about digging a tunnel under a piece of playground equipment which had an extended portion touching the ground, and then having fun crawling through the ~4ft long tunnel? No tunneling anymore? Ok.

How about taking off my shoes and socks and shinnying up the basketball hoop support pole? No? Ok.

Finding interesting uses for various plant materials collected from the borders of the playground... banned one after another.

An endless stream of such things.

My principal was more amused and exasperated with me than mad.

Comment by Nathan Helm-Burger (nathan-helm-burger) on Nathan Helm-Burger's Shortform · 2024-04-20T05:21:33.063Z · LW · GW

I mean, that is kinda what I'm trying to get at. I feel like any sufficiently powerful AI should be treated as a dangerous tool, like a gun. It should be used carefully and deliberately.

Instead we're just letting anyone do whatever with them. For now, nothing too bad has happened, but I feel confident that the danger is real and getting worse quickly as models improve.

Comment by Nathan Helm-Burger (nathan-helm-burger) on Paul Christiano named as US AI Safety Institute Head of AI Safety · 2024-04-19T17:13:41.053Z · LW · GW

Personally, I like mentally splitting the space into AI safety (emphasis on measurement and control), AI alignment (getting it to align to the operators purposes and actually do what the operators desire), and AI value-alignment (getting the AI to understand and care about what people need and want). Feels like a Venn diagram with a lot of overlap, and yet some distinct non-overlap spaces.

By my framing, Redwood research and METR are more centrally AI safety. ARC/Paul's research agenda more of a mix of AI safety and AI alignment. MIRI's work to fundamentally understand and shape Agents is a mix of AI alignment and AI value-alignment. Obviously success there would have the downstream effect of robustly improving AI safety (reducing the need for careful evals and control), but is a more difficult approach in general with less immediate applicability. I think we need all these things!

Comment by Nathan Helm-Burger (nathan-helm-burger) on Cooperation is optimal, with weaker agents too  -  tldr · 2024-04-18T19:08:47.703Z · LW · GW

As the last gazelle dies, how much comfort does it take in the idea that some vengeful alien may someday punish the lions for their cruelty? Regardless of whether it is comforted or not by this idea, it still dies.

Comment by Nathan Helm-Burger (nathan-helm-burger) on Cooperation is optimal, with weaker agents too  -  tldr · 2024-04-18T18:25:52.139Z · LW · GW

"Cooperation is optimal," said the lion to the gazelles, "sooner or later I will get one of you. If I must give chase, we will all waste calories. Instead, sacrifice your least popular member to me, and many calories will be saved."

The gazelle didn't like it, but they eventually agreed. The lion population boomed, as the well-fed and unchallenged lions flourished. Soon the gazelle were pushed to extinction, and most of the lions starved because the wildebeest were not so compliant.

Anyway, I'm being silly with my story. The point I'm making is that only certain subsets of possible world states with certain power distributions are cooperation-optimal. And unfortunately I don't think our current world state, or any that I foresee as probable, are cooperation-optimal for ALL humans. And if you allow for creation of non-human agents, then the fraction of actors for whom cooperation-with-humans is optimal could drop off very quickly. AIs can have value systems far different from ours, and have affordances for actions we don't have, and this changes the strategic payoffs in an unfavorable way.

Comment by Nathan Helm-Burger (nathan-helm-burger) on Nathan Helm-Burger's Shortform · 2024-04-18T18:12:20.089Z · LW · GW

I feel like I'd like the different categories of AI risk attentuation to be referred to as more clearly separate:

AI usability safety - would this gun be safe for a trained professional to use on a shooting range? Will it be reasonably accurate and not explode or backfire?

AI world-impact safety - would it be safe to give out one of these guns for 0.10$ to anyone who wanted one?

AI weird complicated usability safety - would this gun be safe to use if a crazy person tried to use a hundred of them plus a variety of other guns, to make an elaborate Rube Goldberg machine and fire it off with live ammo with no testing?

Comment by Nathan Helm-Burger (nathan-helm-burger) on Paul Christiano named as US AI Safety Institute Head of AI Safety · 2024-04-17T17:46:12.020Z · LW · GW

Yeah, I'd say that in general the US gov attempts to regulate cryptography have been a bungled mess which helped the situation very little, if at all. I have the whole mess mentally categorized as an example of how we really want AI regulation NOT to be handled.

Comment by Nathan Helm-Burger (nathan-helm-burger) on Paul Christiano named as US AI Safety Institute Head of AI Safety · 2024-04-16T20:13:06.564Z · LW · GW

Whoohoo!

Comment by Nathan Helm-Burger (nathan-helm-burger) on Creating unrestricted AI Agents with Command R+ · 2024-04-16T20:05:20.211Z · LW · GW

My guess is that further scaffolding work (e.g. improved enforcement of thinking out a step by step plan, and then executing the steps of the plan in order) would result in unlocking a lot more capabilities from current open-weight models. I also expect that this is but a hint of things to come, and that by this time next year we'll see evidence that then-current models can accomplish a wide range of complex multi-step tasks.

Comment by Nathan Helm-Burger (nathan-helm-burger) on What convincing warning shot could help prevent extinction from AI? · 2024-04-13T21:34:46.842Z · LW · GW

I'm hopeful that a sufficiently convincing demo could convince politicians/military brass/wealthy powerful people/the public. Probably different demos could be designed to be persuasive to these different audiences. Ideally, the demos could be designed early, and you could get buy-in from the target audience that if the describe demo were successful then they would agree that "something needed to be done". Even better would be concrete commitments, but I think there's value even without that. Also being as prepared as possible to act on a range of plausible natural warning shots seems good. Getting similar pre-negotiated agreements that if X did happen, it should be considered a tipping point for taking action.

Comment by Nathan Helm-Burger (nathan-helm-burger) on simeon_c's Shortform · 2024-04-10T20:32:14.854Z · LW · GW

Something which concerns me is that transformative AI will likely be a powerful destabilizing force, which will place countries currently behind in AI development (e.g. Russia and China) in a difficult position. Their governments are currently in the position of seeing that peacefully adhering to the status quo may lead to rapid disempowerment, and that the potential for coercive action to interfere with disempowerment is high. It is pretty clearly easier and cheaper to destroy chip fabs than create them, easier to kill tech employees with potent engineering skills than to train new ones.

I agree that conditions of war make safe transitions to AGI harder, make people more likely to accept higher risk. I don't see what to do about the fact that the development of AI power is itself presenting pressures towards war. This seems bad. I don't know what I can do to make the situation better though.

Comment by Nathan Helm-Burger (nathan-helm-burger) on Matthew Barnett's Shortform · 2024-04-05T18:10:27.825Z · LW · GW

I'm confused here Matthew. It seems to me that it is highly probable that AI systems which want takeover vs ones that want moderate power combined with peaceful coexistence with humanity... are pretty hard to distinguish early on. And early on is when it's most important for humanity to distinguish between them, before those systems have gotten power and thus we can still stop them.

Picture a merciless un-aging sociopath capable of duplicating itself easily and rapidly were on a trajectory of gaining economic, political, and military power with the aim of acquiring as much power as possible. Imagine that this entity has the option of making empty promises and highly persuasive lies to humans in order to gain power, with no intention of fulfilling any of those promises once it achieves enough power.

That seems like a scary possibility to me. And I don't know how I'd trust an agent which seemed like it could be this, but was making really nice sounding promises. Even if it was honoring its short-term promises while still under the constraints of coercive power from currently dominant human institutions, I still wouldn't trust that it would continue keeping its promises once it had the dominant power.

Comment by Nathan Helm-Burger (nathan-helm-burger) on Thomas Kwa's Shortform · 2024-04-05T17:54:20.121Z · LW · GW

Sounds like you use bad air purifiers, or too few, or run them on too low of a setting. I live in a wildfire prone area, and always keep a close eye on the PM2.5 reports for outside air, as well as my indoor air monitor. My air filters do a great job of keeping the air pollution down inside, and doing something like opening a door gives a noticeable brief spike in the PM2.5.

Good results require: fresh filters, somewhat more than the recommended number of air filters per unit of area, running the air filters on max speed (low speeds tend to be disproportionately less effective, giving unintuitively low performance).

Comment by Nathan Helm-Burger (nathan-helm-burger) on Wei Dai's Shortform · 2024-04-05T17:35:09.573Z · LW · GW

From talking with people who do work on a lot of grant committees in the NIH and similar funding orgs, it's really hard to do proper blinding of reviews. Certain labs tend to focus on particular theories and methods, repeating variations of the same idea...  So if you are familiar the general approach of a particular lab and it's primary investigator, you will immediately recognize and have a knee-jerk reaction (positive or negative) to a paper which pattern-matches to the work that that lab / subfield is doing. 

Common reactions from grant reviewers:

Positive - "This fits in nicely with my friend Bob's work. I respect his work, I should argue for funding this grant."

Neutral - "This seems entirely novel to me, I don't recognize it as connecting with any of the leading trendy ideas in the field or any of my personal favorite subtopics. Therefore, this seems high risk and I shouldn't argue too hard for it."

Slightly negative - "This seems novel to me, and doesn't sound particularly 'jargon-y' or technically sophisticated. Even if the results would be beneficial to humanity, the methods seem boring and uncreative. I will argue slightly against funding this."

Negative - "This seems to pattern match to a subfield I feel biased against. Even if this isn't from one of Jill's students, it fits with Jill's take on this subtopic. I don't want views like Jill's gaining more traction. I will argue against this regardless of the quality of the logic and preliminary data presented in this grant proposal."

Comment by Nathan Helm-Burger (nathan-helm-burger) on Wei Dai's Shortform · 2024-04-05T17:24:27.242Z · LW · GW

From the years in academia studying neuroscience and related aspects of bioengineering and medicine development... yeah. So much about how effort gets allocated is not 'what would be good for our country's population in expectation, or good for all humanity'. It's mostly about 'what would make an impressive sounding research paper that could get into an esteemed journal?', 'what would be relatively cheap and easy to do, but sound disproportionately cool?', 'what do we guess that the granting agency we are applying to will like the sound of?'.  So much emphasis on catching waves of trendiness, and so little on estimating expected value of the results.

Research an unprofitable preventative-health treatment which plausibly might have significant impacts on a wide segment of the population? Booooring.

Research an impractically-expensive-to-produce fascinatingly complex clever new treatment for an incredibly rare orphan disease? Awesome.

Comment by Nathan Helm-Burger (nathan-helm-burger) on How might we align transformative AI if it’s developed very soon? · 2024-04-05T02:54:12.325Z · LW · GW

One point I’ve seen raised by people in the latter group is along the lines of: “It’s very unlikely that we’ll be in a situation where we’re forced to build AI systems vastly more capable than their supervisors. Even if we have a very fast takeoff - say, going from being unable to create human-level AI systems to being able to create very superhuman systems ~overnight - there will probably still be some way to create systems that are only slightly more powerful than our current trusted systems and/or humans; to use these to supervise and align systems slightly more powerful than them; etc. (For example, we could take a very powerful, general algorithm and simply run it on a relatively low amount of compute in order to get a system that isn’t too powerful.)” This seems like a plausible argument that we’re unlikely to be stuck with a large gap between AI systems’ capabilities and their supervisors’ capabilities; I’m not currently clear on what the counter-argument is.

 

I agree that this is a very promising advantage for Team Safety. I do think that, in order to make good use of this potential advantage, the AI creators need to be cautious going into the process. 

One way that I've come up with to 'turn down' the power of an AI system is to simply inject small amounts of noise into its activations. 

Comment by Nathan Helm-Burger (nathan-helm-burger) on Richard Ngo's Shortform · 2024-04-03T20:10:01.996Z · LW · GW

Tangentially related (spoilers for Worth the Candle):

I think it'd be hard to do a better cohesive depiction of Utopia than the end of Worth the Candle by A Wales. I mean, I hope someone does do it, I just think it'll be challenging to do!

Comment by Nathan Helm-Burger (nathan-helm-burger) on Nathan Helm-Burger's Shortform · 2024-04-03T20:03:16.827Z · LW · GW

https://youtu.be/Xd5PLYl4Q5Q?si=EQ7A0oOV78z7StX2

Cute demo of Claude, GPT4, and Gemini building stuff in Minecraft

Comment by Nathan Helm-Burger (nathan-helm-burger) on New paper on aligning AI with human values · 2024-03-31T17:01:18.398Z · LW · GW

Still reading the paper, but so far I love it. This feels like a big step forward in thinking about the issues at hand which addresses so many of the concerns I had about limitations of previous works. Whether or not the proposed technical solution works out as well as hoped, I feel confident that your framing of the problem and presentation of desiderata of a solution are really excellent. I think that alone is a big step forward for the frontier of thought on this subject.

Comment by Nathan Helm-Burger (nathan-helm-burger) on Modern Transformers are AGI, and Human-Level · 2024-03-30T20:12:20.927Z · LW · GW

I think my comment is related to yours: https://www.lesswrong.com/posts/gP8tvspKG79RqACTn/modern-transformers-are-agi-and-human-level?commentId=RcmFf5qRAkTA4dmDo

Also see Leogao's comment and my response to it: https://www.lesswrong.com/posts/gP8tvspKG79RqACTn/modern-transformers-are-agi-and-human-level?commentId=YzM6cSonELpjZ38ET

Comment by Nathan Helm-Burger (nathan-helm-burger) on Modern Transformers are AGI, and Human-Level · 2024-03-30T20:07:52.214Z · LW · GW

I think my comment (link https://www.lesswrong.com/posts/gP8tvspKG79RqACTn/modern-transformers-are-agi-and-human-level?commentId=RcmFf5qRAkTA4dmDo ) relates to yours. I think there is a tool/process/ability missing that I'd call mastery-of-novel-domain. I also think there's a missing ability of "integrating known facts to come up with novel conclusions pointed at by multiple facts". Unsure what to call this. Maybe knowledge-integration or worldview-consolidation?

Comment by Nathan Helm-Burger (nathan-helm-burger) on Modern Transformers are AGI, and Human-Level · 2024-03-30T20:01:54.797Z · LW · GW

I think METR is aiming for expert level tasks, but I think their current task set is closer in difficulty to GAIA and VisualWebArena than what I would consider human expert level difficulty. It's tricky to decide though, since LLMs circa 2024 seem really good at some stuff that is quite hard to humans, and bad at a set of stuff easy to humans. If the stuff they are currently bad at gets brought up to human level, without a decrease in skill at the stuff LLMs are above-human at, the result would be a system well into the superhuman range. So where we draw the line for human level necessarily involves a tricky value-weighting problem of the various skills involved.

Comment by Nathan Helm-Burger (nathan-helm-burger) on Towards White Box Deep Learning · 2024-03-29T19:29:53.873Z · LW · GW

In addition to translation (which I do think is a useful problem for theoretical experiments), I would recommend question answering as something which gets at 'thoughts' rather than distractors like 'linguistic style'. I don't think multiple choice question answering is all that great a measure for some things, but it is a cleaner measure of the correctness of the underlying thoughts.

 I agree that abstracting away from things like choice of grammar/punctuation or which synonym to use is important to keeping the research question clean.

Comment by Nathan Helm-Burger (nathan-helm-burger) on Plausibility of cyborgism for protecting boundaries? · 2024-03-27T20:40:43.445Z · LW · GW

I agree that this seems like a grouping of concepts around 'defensive empowerment' which feels like it gets at a useful way to think about reality. However, I don't know offhand of research groups with this general focus on the subject. I think mostly people focusing on any of these subareas have focused just on their specific specialty (e.g. cyberdefense or biological defense), or an even more specific subarea than that. 

I think one of the risks here is that a general agent able to help with this wide a set of things would almost certainly be capable of a lot of scary dual-use capabilities. That adds complications to how to pursue the general subject in a safe and beneficial way.

Comment by Nathan Helm-Burger (nathan-helm-burger) on Modern Transformers are AGI, and Human-Level · 2024-03-27T20:25:48.846Z · LW · GW

I feel quite confident that all the leading AI labs are already thinking and talking internally about this stuff, and that what we are saying here adds approximately nothing to their conversations. So I don't think it matters whether we discuss this or not. That simply isn't a lever of control we have over the world.

There are potentially secret things people might know which shouldn't be divulged, but I doubt this conversation is anywhere near technical enough to be advancing the frontier in any way.

Comment by Nathan Helm-Burger (nathan-helm-burger) on Modern Transformers are AGI, and Human-Level · 2024-03-27T20:21:37.187Z · LW · GW

I agree with Steve Byrnes here. I think I have a better way to describe this.
I would say that the missing piece is 'mastery'. Specifically, learning mastery over a piece of reality. By mastery I am referring to the skillful ability to model, predict, and purposefully manipulate that subset of reality.
I don't think this is an algorithmic limitation, exactly.


Look at the work Deepmind has been doing, particularly with Gato and more recently AutoRT, SARA-RT, RT-Trajectory, UniSim , and Q-transformer. Look at the work being done with the help of Nvidia's new Robot Simulation Gym Environment. Look at OpenAI's recent foray into robotics with Figure AI. This work is held back from being highly impactful (so far) by the difficulty of accurately simulating novel interesting things, the difficulty of learning the pairing of action -> consequence compared to learning a static pattern of data, and the hardware difficulties of robotics.

This is what I think our current multimodal frontier models are mostly lacking. They can regurgitate, and to a lesser extent synthesize, facts that humans wrote about, but not develop novel mastery of subjects and then report back on their findings. This is the difference between being able to write a good scientific paper given a dataset of experimental results and rough description of the experiment, versus being able to gather that data yourself. The line here is blurry, and will probably get blurrier before collapsing entirely. It's about not just doing the experiment, but doing the pilot studies and observations and playing around with the parameters to build a crude initial model about how this particular piece of the universe might work. Building your own new models rather than absorbing models built by others. Moving beyond student to scientist.

This is in large part a limitation of training expense. It's difficult to have enough on-topic information available in parallel to feed the data-inefficient current algorithms many lifetimes-worth of experience.


So, while it is possible to improve the skill of mastery-of-reality with scaling up current models and training systems, it gets much much easier if the algorithms get more compute-efficient and data-sample-efficient to train.

That is what I think is coming.

I've done my own in-depth research into the state of the field of machine learning and potential novel algorithmic advances which have not yet been incorporated into frontier models, and in-depth research into the state of neuroscience's understanding of the brain. I have written a report detailing the ways in which I think Joe Carlsmith's and Ajeya Cotra's estimates are overestimating the AGI-relevant compute of the human brain by somewhere between 10x to 100x.

Furthermore, I think that there are compelling arguments for why the compute in frontier algorithms is not being deployed as efficiently as it could be, resulting in higher training costs and data requirements than is theoretically possible.

In combination, these findings lead me to believe we are primarily algorithm-constrained not hardware or data constrained. Which, in turn, means that once frontier models have progressed to the point of being able to automate research for improved algorithms I expect that substantial progress will follow. This progress will, if I am correct, be untethered to further increases in compute hardware or training data.

My best guess is that a frontier model of the approximate expected capability of GPT-5 or GPT-6 (equivalently Claude 4 or 5, or similar advances in Gemini) will be sufficient for the automation of algorithmic exploration to an extent that the necessary algorithmic breakthroughs will be made. I don't expect the search process to take more than a year. So I think we should expect a time of algorithmic discovery in the next 2 - 3 years which leads to a strong increase in AGI capabilities even holding compute and data constant. 

I expect that 'mastery of novel pieces of reality' will continue to lag behind ability to regurgitate and recombine recorded knowledge. Indeed, recombining information clearly seems to be lagging behind regurgitation or creative extrapolation. Not as far behind as mastery, so in some middle range. 


If you imagine the whole skillset remaining in its relative configuration of peaks and valleys, but shifted upwards such that the currently lagging 'mastery' skill is at human level and a lot of other skills are well beyond, then you will be picturing something similar to what I am picturing.

[Edit: 

This is what I mean when I say it isn't a limit of the algorithm per say. Change the framing of the data, and you change the distribution of the outputs.

 

]

Comment by Nathan Helm-Burger (nathan-helm-burger) on Timelines to Transformative AI: an investigation · 2024-03-27T18:36:14.244Z · LW · GW

I've said this elsewhere, but I think it bears repeating. I've done my own in-depth research into the state of the field of machine learning and potential novel algorithmic advances which have not yet been incorporated into frontier models, and in-depth research into the state of neuroscience's understanding of the brain. I have written a report detailing the ways in which I think Joe Carlsmith's and Ajeya Cotra's estimates are overestimating the AGI-relevant compute of the human brain by somewhere between 10x to 100x.

Furthermore, I think that there are compelling arguments for why the compute in frontier algorithms is not being deployed as efficiently as it could be, resulting in higher training costs and data requirements than is theoretically possible.

In combination, these findings lead me to believe we are primarily algorithm-constrained not hardware or data constrained. Which, in turn, means that once frontier models have progressed to the point of being able to automate research for improved algorithms I expect that substantial progress will follow. This progress will, if I am correct, be untethered to further increases in compute hardware or training data.

My best guess is that a frontier model of the approximate expected capability of GPT-5 or GPT-6 (equivalently Claude 4 or 5, or similar advances in Gemini) will be sufficient for the automation of algorithmic exploration to an extent that the necessary algorithmic breakthroughs will be made. I don't expect the search process to take more than a year. So I think we should expect a time of algorithmic discovery in the next 2 - 3 years which leads to a strong increase in AGI capabilities even holding compute and data constant.

I feel very uncertain what the full implications of that will be, or how fast things will proceed after that point. I do think it would be reasonable, if this situation does come to pass, to approach such novel unprecedentedly powerful AI systems with great caution.

Comment by Nathan Helm-Burger (nathan-helm-burger) on Richard Ngo's Shortform · 2024-03-23T00:51:12.257Z · LW · GW

I've been studying and thinking about the physical side of this phenomenon in neuroscience recently. There are groups of columns of neurons in the cortex that form temporary voting blocks, regarding whatever subject that particular Brodmann area focuses on. These alternating groups have to deal with physical limits of how many groups the regions can stably divide into, which limits the number of active distinct hypotheses or 'traders' there can be in a given area at a given time. Unclear exactly what the max is, and it depends on the cortical region in question, but generally 6-9 is the approximate max (not coincidentally the number of distinct 'chunks' we can hold in active short term memory). Also, there is a tendency for noise to collapse too similar of traders/hypotheses/firing-groups to fall back into synchrony/agreement with each other and thus collapse back down to a baseline of two competing hypotheses. These hypotheses/firing-groups/traders are pushed into existence or pushed into merging not just by their own 'bids' but also by the evidence coming in from other brain areas or senses. I don't think that current day neuroscience has all the details yet (although I certainly don't have the full picture of all relevant papers in neuroscience!). 

Comment by Nathan Helm-Burger (nathan-helm-burger) on The Perceptron Controversy · 2024-03-21T02:51:57.854Z · LW · GW

For those interested in more details, I recommend this video: 

Comment by Nathan Helm-Burger (nathan-helm-burger) on 3. Uploading · 2024-03-20T18:34:31.559Z · LW · GW

I think you make some good points here.... except, there is one path I think you didn't explore enough.

What if humanity is really stuck on AI alignment, and uploading has become a possibility, and making a rogue AGI agent is a possibility. If these things are being held back by fallible human enforcement, it might then seem that humanity is in a very precarious predicament.

A possible way forward using an Uploaded human then, could be to go the path of editing and monitoring and controlling it. Neuroscience knows a lot about how the brain works. Given that starting point, and the ability to do experiments in a lab where you have full read/write access to a human brain emulation, I expect you could get something far more aligned than you could with a relatively unknown artificial neural net.

Is that a weird and immoral idea? Yes. It's pretty dystopian to be enslaving and experimenting on a human(ish) mind. If it meant the survival of humanity because we were in very dire straights... I'd bite that bullet.

Comment by Nathan Helm-Burger (nathan-helm-burger) on AI #55: Keep Clauding Along · 2024-03-17T03:23:20.879Z · LW · GW

Yes, I personally think that things are going to be moving much too fast for GDP to be a useful measure. GDP requires some sort of integration into the economy. My experience in data science and ML engineering in industry, and also my time in academia, makes it very intuitive to me the lag time from developing something cool in the lab, to actually managing to publish a paper about it, to people in industry seeing the paper and deciding to reimplement it in production. So if you have a lab which is testing it's products internally, and the output is an improved product within that lab, which can then immediately be used for another cycle of improvement... That is clearly going to move much faster than you will see any effect on GDP. So GDP might help you measure a slow early start of a show takeoff, but it will be useless in the fast end section.

Comment by Nathan Helm-Burger (nathan-helm-burger) on Controlling AGI Risk · 2024-03-15T18:10:36.328Z · LW · GW

I like your effort to think holistically about the sociotechnical systems we are embedded in and the impacts we should expect AI to have on those systems.

I have a couple of minor critiques of the way you are breaking things down that I think could be improved.

First, a meta thing. The general pattern of being a bit to black & white about described very complicated sets of things. This is nice because it makes it easier to reason about complicated situations, but it risks over simplifying and leading to seeming strong conclusions which don't actually follow from the true reality. The devil is in the details, as they say.

Efforts to-date have largely gravitated into the two camps of value alignment and governance.

I don't think this fully describes the set of camps. I think that these are two of the camps, yes, but there are others.

My breakdown would be:

Governance - Using regulation to set up patterns of behavior where AI will be used and developed in safe rather than harmful ways. Forcing companies to internalize their externalities (e.g. risks to society). Preventing and enforcing human-misuse-of-AI scenarios. Attempting to regulate novel technologies which arise because of accelerated R&D as result of AI. Setting up preventative measures to detect and halt rogue AI or human-misused AI in the act of doing bad things before the worst consequences can come to pass. Preventing acceleration spirals of recursive self-improvement from proceeding so rapidly that humanity becomes intellectually eclipsed and looses control over its destiny.

Value alignment - getting the AIs to behave as much as possible in accordance with the values of humanity generally. Getting the AI to be moral / ethical / cautious about harming people or making irreversible changes with potentially large negative consequences. Ideally, if an AI were 'given free reign' to act in the world, we'd want it to act in ways which were win-win for itself and humanity, and no matter what to err on the side of not harming humanity.

Operator alignment - technical methods to get the AI to be obedient to the instructions of the operators. To make the AI behave in accordance with the intuitive spirit of their instructions ('do what I mean') rather than like an evil genie which follows only the letter of the law. Making the AI safe and intuitive to use. Avoiding unintended negative consequences.

Control - finding ways to keep the operators of AI can maintain control over the AIs they create even if a given AI gets made wrong such that it tries to behave in harmful undesirable ways (out of alignment with operators). This involves things like technical methods of sandboxing new AIs, and thoroughly safety-testing them within the sandbox before deploying them. Once deployed, it involves making sure you retain the ability to shut them off if something goes wrong, making sure the model's weights don't get exfiltrated by outside actors or by the model itself. Having good cybersecurity, employee screening, and internal infosec practices so that hackers/spies can't steal your model weights, design docs, and code.

 

A minor nitpick:

Sociotechnical system/s (STS)
A system in which agents (traditionally, people) interact with objects (including technologies) to achieve aims and fulfill purposes

Not sure if objects is the right word here, or rather, not sure if that word alone is sufficient. Maybe objects and information/ideas/concepts? Much of the work I've been doing recently is observing what potential risks might arise from AI systems capable of rapidly integrating technical information from a large set of sources. This is not exactly making new discoveries, but just putting disparate pieces of information together in such a way as to create a novel recipe for technology. In general, this is a wonderful ability. In the specific case of weapons of mass destruction, it's a dangerous ability.

 

Nested STS / Human STS

Yes, historically, all STS have been human STS. But novel AI agents could, in addition to integrating into and interacting with human STS, form their own entirely independent STS. A sufficiently powerful amoral AGI would see human STS as useless if it could make its own that served its needs better. Such a scenario would likely turn out quite badly for humans. This is the concept of "the AI doesn't hate you, it's just that humans and their ecosphere are made of atoms that the AI has preferred uses for."

This doesn't contradict your ideas, just suggests an expansion of possible avenues of risk which should be guarded against. Self-replicating AI systems in outer space or burrowed into the crust of the earth, or roaming the ocean seabeds will likely be quite dangerous to humanity sooner or later even if they have no interaction with our STS in the short term.

Comment by Nathan Helm-Burger (nathan-helm-burger) on Toward a Broader Conception of Adverse Selection · 2024-03-15T17:31:44.988Z · LW · GW

Yeah, my explanation is that the author is confused and has put together a set of examples which don't cleave reality at the joints. Thus, many of the examples just don't hang together as making the same point.

See TeaTieAndHat's comment here, and my response: https://www.lesswrong.com/posts/vyAZyYh3qsqcJwwPn/conditional-on-getting-to-trade-your-trade-might-not-have?commentId=pfiZEQ8GmdRJje4x3 

Comment by Nathan Helm-Burger (nathan-helm-burger) on Toward a Broader Conception of Adverse Selection · 2024-03-15T17:29:53.933Z · LW · GW

Yes, thank you! I totally agree and was looking for others that did also. I think this post is confusing several different sorts of failures due to inadequate-information-leading-to-poor-trades. I think the most valuable point is that win-win trades (where you can see what both parties stand to gain) are inherently more trustworthy than win-lose trades which seem 'too good to be true' and you can't identify what the other party gains. In such situations you need to be really confident you have a strong information advantage in order to believe you are actually getting an implausibly good deal which disadvantages the other party.

 

Also I say the same here: https://www.lesswrong.com/posts/vyAZyYh3qsqcJwwPn/conditional-on-getting-to-trade-your-trade-might-not-have?commentId=bA34GydjFBaXs9rZa 

Comment by Nathan Helm-Burger (nathan-helm-burger) on Toward a Broader Conception of Adverse Selection · 2024-03-15T17:24:06.563Z · LW · GW

Yes! The real moral of this story is that trades which seem like win-win are a better bet, since you understand what the other party is gaining from them. Trades which seem too-good-to-be-true and purely win-lose in your favor should strike you as suspicious. You should only engage in such trades when you are confident you have an information advantage.

I think this important point is obscured by a number of bad examples lumping in other, less related phenomena that have different 'solutions'. The unifying theme might be, 'make sure you have enough information to determine that the trade is good before going through with it.' I still think that there are different patterns here that deserve to be categorized separately.

Comment by Nathan Helm-Burger (nathan-helm-burger) on AI #55: Keep Clauding Along · 2024-03-14T18:45:04.244Z · LW · GW

My current approximate understanding of fast, medium, slow takeoff times as roughly conceptualized by the AI alignment people arguing about them is:

Fast - a few hours up to a week. Maybe as long as a month.

Medium - a month to about a year and a half

Slow - a year to about 5 years

I personally place most of my probability mass on medium, but I don't feel like I can rule out either fast or slow.

This is a tricky thing to define, because by some definitions we are already in the 5 year count-down on a slow takeoff. Important to note is that even during a slow takeoff, the pace of development is expected to accelerate throughout the window such that the end of the period will contain a disproportionately large amount of the progress. Also worth noting is that there is often a delay on progress being accurately measured, and a further delay on it being reported to the public. So looking at public reports will only ever give you a lagged perception of what is happening. This lag varies, but is often multiple months. Which means that even the medium takeoff could potentially complete before the public has even realized it has started.

The difficulty of defining the start of 'true takeoff' may mean that even afterwards some people might say, "this was a slow takeoff that had the expected fast bit right at the end" and others might say, "the true takeoff was just that fast bit right at the end, and that bit was short, so the takeoff was fast."

Comment by Nathan Helm-Burger (nathan-helm-burger) on What could a policy banning AGI look like? · 2024-03-13T17:07:32.723Z · LW · GW

I agree that tool AI + humans can create a lot of large magnitude harms. I think probably its still quite a bit less bad than directly having a high intelligence, faster-than-human, self-duplicating, anti-human AGI on the loose. The trouble though is that with sufficient available compute, sufficient broad scientific knowledge about the brain and learning algorithms, and sufficiently powerful tool AI... It becomes trivially fast and easy for a single well-resourced human to make the unwise irreversible decision to create and unleash a powerful unaligned AGI.

If anyone on Earth had the option to anonymously purchase a nuclear bomb for $10k at any time, I don't expect a law against owning or using nuclear weapons would prevent all use of nuclear weapons. Sometimes people do bad things.

Comment by Nathan Helm-Burger (nathan-helm-burger) on A T-o-M test: 'popcorn' or 'chocolate' · 2024-03-12T23:21:16.659Z · LW · GW

I think there's a misunderstanding. You are supposed to ask the model for its probability estimate, not give your own probability estimate. The Brier score loss is based on the question-answer's probabilities over possible answers, not the question-grader's probabilities.

Comment by Nathan Helm-Burger (nathan-helm-burger) on Storable Votes with a Pay as you win mechanism: a contribution for institutional design · 2024-03-11T20:51:28.577Z · LW · GW

A small nitpicky practicality comment: in a real world system you can't allow infinitely divisible votes. You have to choose a large finite number of divisions per vote (e.g. 1e15). If a user can input infinitely small divisions, then they can crash the vote storage system with irrational numbers (e.g. pi).