sharmake-farah

A really important question that I think will need to be answered is whether specification gaming/reward hacking must be in a significant sense be solved by default in order to unlock extreme capabilities.

I currently lean towards yes, due to the evidence offered by o3/Sonnet 3.7, but could easily see my mind changed, but the reason this question has such a large amount of importance is that if it were true, then we'd get tools to solve the alignment problem (modulo inner optimization issues), which means we'd be far less concerned about existential risk from AI misalignment (at least to the extent that specification gaming is a large portion of the issues with AI.

That said, I do think a lot of effort will be necessary to discover the answer to the question, because it affects a lot of what you would want to do in AI safety/AI governance if alignment tools come along with better capabilities or not.

Comment by Noosphere89 (sharmake-farah) on “The Era of Experience” has an unsolved technical alignment problem · 2025-04-25T01:17:45.804Z · LW · GW

Re my own take on the alignment problem, if I'm assuming that a future AGI will be built by primarily RL signals and memory plus continuous learning is part of the deal with AGI, I think my current use case for AGI is broadly to train them to be at least slightly better than humans currently are at alignment research specifically, and my general worldview on alignment is that we should primarily try to prepare ourselves for the automated alignment researchers that will come, so most of the work that needs to be frontloaded is stuff that would cause the core loop of automated alignment research to catastrophically fail.

On the misleading intuitions from everyday life, I'd probably say misleading intuitions from modern-day everyday life, because while I do agree that overgeneralization from innate learned reward is a problem, I'd say another problem is people overindex on the history of the modern era for intuitions about peaceful trade being better than war ~all the time, where sociopathic entities/no innate social drives people can be made to do lots of good things for everyone via trade, rather than conflict, but the issue here is that the things that make sociopathic entities do good things like trade peacefully with others rather than enslaving/murdering/raping them are fundamentally dependent on a set of assumptions that are fundamentally violated by AGI and the technologies it spawns (one example of an important assumption is that in order for wars to be prevented, conflict is costly even for the winners, but uncontrolled AGIs/ASIs engaging in conflict with humans and nature is likely to not be costly for them, or it can be made to not be costly, which seriously messes up the self-interested case for trade, and more importantly a really large assumption that is implicit is that property rights are inviolable and that expropriating resources is counter-productive and harms you compared to trading or peacefully making a business, but the problem is that AGI shifts the factors of wealth from hard/impossible to expropriate to relatively easy to expropriate away from other agents, especially if serious nanotech/biotech is ever produced by AGI, similar to how the issue with human/non-human animal relationships is that animal labor is basically worthless, but their land/capital is not worthless, and conflict isn't costly, meaning stealing resources from gorillas and other non-human animals gets you the most wealth).

This is why Matthew Barnett's comments on how trade is usually worth it for selfish agents in the modern era, so AGIs will trade with us peacefully too are so mistaken:

https://forum.effectivealtruism.org/posts/4LNiPhP6vw2A5Pue3/consider-granting-ais-freedom?commentId=HdsGEyjepTkeZePHF

Ben Garfinkel and Luke Drago has talked about this issue too:

https://forum.effectivealtruism.org/posts/TMCWXTayji7gvRK9p/is-democracy-a-fad#Why_So_Much_Democracy__All_of_a_Sudden_

https://forum.effectivealtruism.org/posts/TMCWXTayji7gvRK9p/is-democracy-a-fad#Automation_and_Democracy

https://intelligence-curse.ai/defining/

(It talks about how democracy and human economic relevance would fade away with AGI, but the relevance here is that it points out that the elites of the post-AGI world have no reason to give/trade with commoners what resources need to survive if they are not value aligned with commoner welfare due to the incentives AGI sets up, and thus there's little reason to assume that misaligned AGIs would trade with us, instead of letting us starve or directly killing us all).

Re Goodhart's Law, @Jeremy Gillen has said that his proposal gives us a way to detect if goodharting has occured, and implies in the comments that it has a lower alignment tax than you think:

https://www.lesswrong.com/posts/ZHFZ6tivEjznkEoby/detect-goodhart-and-shut-down

https://www.lesswrong.com/posts/ZHFZ6tivEjznkEoby/detect-goodhart-and-shut-down#Ws6kohnReEAcBDecA

(Steven Byrnes): FYI §14.4 of my post here is a vaguely similar genre although I don’t think there’s any direct overlap.
There’s a general problem that people will want AGIs to find clever out-of-the-box solutions to problems, and there’s no principled distinction between “finding a clever out-of-the-box solution to a problem” and “Goodharting the problem specification”. We call it “clever out-of-the-box solution” when we’re happy with how it turned out, and we call it “Goodharting” when we’re sad about how it turned out, but it’s not a structural difference. So systems that systematically block the second thing are inevitably gonna systematically block the first thing, and I claim that your proposal here is no exception. That’s an alignment tax, which might be fine (depending on the scenario) but should be kept in mind.

(Jeremy Gillen): Yeah I agree there are similarities. I think a benefit of my approach, that I should have emphasized more, is that it's reflectively stable (and theoretically simple and therefore easy to analyze). In your description of an AI that wants to seek clarification, it isn't clear that it won't self-modify (but it's hard to tell).
(Steven Byrnes): There’s a general problem that people will want AGIs to find clever out-of-the-box solutions to problems, and there’s no principled distinction between “finding a clever out-of-the-box solution to a problem” and “Goodharting the problem specification”.
(Jeremy Gillen): But there is a principled distinction. The distinction is whether the plan exploits differences between the goal specification and our actual goal. This is a structural difference, and we can detect using information about our actual goal.
(Steven Byrnes): So systems that systematically block the second thing are inevitably gonna systematically block the first thing, and I claim that your proposal here is no exception.
(Jeremy Gillen): My proposal is usually an exception to this, because it takes advantage of the structural difference between the two cases. The trick is that the validation set only contains things that we actually want. If it were to contain extra constraints beyond what we actually want, then yeah that creates an alignment tax.

My view on practical plans for AI safety is that in the short term, I think that it's probably ok at human-level AI having potentially crazy motivations, because I think AI control is at least a reasonable plan that could plausibly work, and I'm more optimistic than you that modulo rogue internal deployments/fast take-off where we don't continually reduce the compute used on the AI, the AI's action space for harm is vastly reduced compared to humans.

See this podcast for more details below:

https://80000hours.org/podcast/episodes/buck-shlegeris-ai-control-scheming/

A big part of this has to do with the fact that AI states/memory can basically always be reset, so it's pretty easy to reset an AGI's memory, which prevents a lot of the issues of the AGI eventually learning how to defeat the defenses.

More here:

https://www.lesswrong.com/posts/LPHMMMZFAWog6ty5x/ctrl-z-controlling-ai-agents-via-resampling

In this project, we assume that the model shares no state between calls. While solving a task, the model can see the actions taken so far and their outputs in its context, but it’s up to us to decide what context we give the model. If we want independent samples of the next action, we make multiple calls to the model using the same context, and the model’s statelessness between calls means that it doesn’t know it is being resampled. We can also easily reconstruct the context of earlier steps if we later decide that we want more samples for their information value. Future models might well maintain state between calls, for instance by using some kind of recurrence or long-term memory bank. But so long as we retain some checkpoints of past states we can easily reset it. In real deployments, this storage cost will likely be worth it during short time windows (like on a particular trajectory), but may not be between trajectories. Statefulness between trajectories, for instance through updating a large codebase over time, is a property that future control settings might have which is harder to work around, and we leave it to future control research to study its impact. Nonetheless, the ability to control the model’s state and draw independent samples is a major difference between security systems designed to be robust to human insider threats and control protocols for untrusted AIs, and it turns out to be an incredibly powerful advantage in many situations.

So my general worldview on how I'd approach the AI alignment problem is I'd use AI control (defined broadly) to make the AGIs that are potentially misaligned do useful work on the alignment problem, and then use the vast AGI army to produce a solution to alignment for very powerful AIs.

We do need to worry about sandbagging/deception, though.

I do think the alignment problem for brain-like AGI may unfortunately be hard to solve, and while I'm not yet convinced that behaviorist rewards give scheming AIs with overwhelming probability no matter what you reward, I do think I should plan for worlds where the alignment problem is genuinely hard to solve, and I like AI control as a stepping stone to solving the alignment problem.

AI control definitely is making assumptions, but I don't think it's making assumptions that are only applicable to LLMs (but it would be nice if we got LLMs first for AGI), but it definitely assumes a limit on capabilities, but lots of the control stuff would work if we got a brain-like AGI with memory and continuous learning.

Comment by Noosphere89 (sharmake-farah) on Richard Ngo's Shortform · 2025-04-24T15:15:43.226Z · LW · GW

(I'm inspired to write this comment on the notion of empowerment because of Richard Ngo's recent comment in another post on Towards a scale-free theory of intelligent agency, so I'll both respond to the empowerment motion and part of the comment linked below:

https://www.lesswrong.com/posts/5tYTKX4pNpiG4vzYg/towards-a-scale-free-theory-of-intelligent-agency#nigkBt47pLMi5tnGd):

To address this part of the linked comment:

Lastly: if at each point in time, the set of agents who are alive are in conflict with potentially-simpler future agents in a very destructive way, then they should all just Do Something Else. In particular, if there's some decision-theoretic argument roughly like "more powerful agents should continue to spend some of their resources on the values'of their less-powerful ancestors, to reduce the incentives for inter-generational conflict", even agents with very simple goals might be motivated by it. I call this "the generational contract".

This depends a lot on how the conflict started, and in particular, I don't think that we should do something else if the conflict arose out of severe alignment failures of AIs/radically augmented people, since the generational contract/UDT/LDT/FDT cannot be used as a substitute for alignment (this was the takeaway I got from reading Nate Soares's post on Decision theory does not imply that we get to have nice things, and while @ryan_greenblatt commented that it's unlikely that alignment failures end up in us getting extinct without decision theory saving us, note that us being saved can still be really, really rough, and probably ends up with billions of present humans dying without everyone else dying (though I don't think about decision theory much), so conflicts with future agents cannot always be avoided if we mess up hard enough on the alignment problems of the future).

https://www.lesswrong.com/posts/rP66bz34crvDudzcJ/decision-theory-does-not-imply-that-we-get-to-have-nice

Now I'll address the fractal empowerment idea.

I like the idea of fractal empowerment, and IMO is one of my biggest ideals if we succeed at getting out of the risky state we are in, only rivaled by infra-Bayes Physicalism's plan for alignment, which I currently think is called Physical Super-Imitation after the monotonicity principle managed to be removed, meaning way more preferences could be fit into than before:

https://www.lesswrong.com/posts/DobZ62XMdiPigii9H/non-monotonic-infra-bayesian-physicalism

https://www.lesswrong.com/posts/ZwshvqiqCvXPsZEct/the-learning-theoretic-agenda-status-2023#Physicalist_Superimitation

That said, I have a couple of problems with the idea.

One of those problems is that empowering humans in the moment can conflict a lot with empowerment in the long run, and unfortunately I'm less confident than you that disempowering humans in the short-term is not going to be necessary in order to empower humans in the long run.

In particular, we might already have this conflict in the near future with biological design tools allowing people to create their own pandemics/superviruses:

https://michaelnotebook.com/xriskbrief/index.html

Another problem is that the set of incentives that holds up democracy and makes it's inefficiencies tolerable will absolutely be shredded by AGI, and the main incentive that goes away is you no longer need to consider the mass opinion on a lot of very important stuff, which fundamentally hurts democracy's foundations, and importantly makes moderate redistribution not necessary for the economy to function, which means the elites won't do it by default, combined with extreme redistribution being both unnecessary and counter-productive in industrial economies like ours, but unfortunately in the automation/intelligence age the only 2 sources of income are passive investment and whatever welfare you can get, so extreme redistribution is both less counter-productive (because it's easier to confiscate the future sources of wealth) and more necessary for commoners to survive.

More below:

https://forum.effectivealtruism.org/posts/TMCWXTayji7gvRK9p/is-democracy-a-fad#Why_So_Much_Democracy__All_of_a_Sudden_

https://forum.effectivealtruism.org/posts/TMCWXTayji7gvRK9p/is-democracy-a-fad#Automation_and_Democracy

Finally, as @quetzal_rainbow has said, UDT works on your logical ancestor, not literal ancestors, and there needs to be some shared knowledge in order to coordinate, and thus the inter-temporal bargaining doesn't really work out if you expect that current generations will have way less knowledge than future generations, which I expect to be the case.

Comment by Noosphere89 (sharmake-farah) on You Better Mechanize · 2025-04-22T22:31:26.050Z · LW · GW

This is basically because of the value of the long tail.

Automating 90% or 50% of your job is not enough to bring in lots of the value proposition of AI, because then the human becomes a bottleneck, which becomes especially severe in cases requiring high speed or lots of context.

@johnswentworth talks about the issue here:

https://www.lesswrong.com/posts/Nbcs5Fe2cxQuzje4K/value-of-the-long-tail

Comment by Noosphere89 (sharmake-farah) on AI 2027: What Superintelligence Looks Like · 2025-04-21T00:36:48.583Z · LW · GW

Yeah, at this point the marginal value add of forecasting/epistemics is in validating/invalidating fundamental assumptions like the software intelligence explosion idea, or the possibility of industrial/chip factory being massively scaled up by AIs, or Moore's law not ending, rather than on the parameter ranges, because the assumptions overdetermine the conclusion.

Comment down below:

https://forum.effectivealtruism.org/posts/rv4SJ68pkCQ9BxzpA/?commentId=EgqgffC4F5yZQreCp

Comment by Noosphere89 (sharmake-farah) on AI 2027: What Superintelligence Looks Like · 2025-04-18T21:57:18.456Z · LW · GW

I suspect what's going on is that the neuralese recurrence and memory worked to give the AIs meta-learning capabilities really fast, such that it's no longer an amnesiac, and this allowed it to finally use it's knowledge in a more productive way.

At least, this is my median assumption on how these capabilities came about, and in the scenario, it comes in at March 2027, so this means there's a 4 month gap between neuralese recurrence/persistent memory and automating jobs, which is quite fast growth, but possible if we buy the software intelligence explosion being possible at all (which was admitted by @elifland to be a background premise of the article).

More in the comment below:

https://www.lesswrong.com/posts/deesrjitvXM4xYGZd/metr-measuring-ai-ability-to-complete-long-tasks#hSkQG2N8rkKXosLEF

Comment by Noosphere89 (sharmake-farah) on Changing my mind about Christiano's malign prior argument · 2025-04-04T17:01:07.168Z · LW · GW

I have said something on this, and the short form is I don't really believe in Christiano's argument that the Solomonoff Prior is malign, because I think there's an invalid step in the argument.

The invalid step is where it is assumed that we can gain information about other potential civilization's values solely by the fact that we are in a simulation, and the key issue is since the simulation/mathematical multiverse hypotheses predict everything, this means we can gain no new information in a Bayesian sense.

(This is in fact the problem with the simulation/mathematical multiverse hypotheses, since they predict everything, this means you can predict nothing specific, and thus you need to be able to have specialized theories in order to explain any specific thing).

The other problem is that the argument assumes that there is a cost to compute, but there is not a cost to computation in the Solomonoff Prior:

https://www.lesswrong.com/posts/tDkYdyJSqe3DddtK4/alexander-gietelink-oldenziel-s-shortform#w2M3rjm6NdNY9WDez

Link below on how the argument for Solomonoff induction can be made simpler, which was the inspiration for my counterargument:

https://www.lesswrong.com/posts/KSdqxrrEootGSpKKE/the-solomonoff-prior-is-malign-is-a-special-case-of-a

Comment by Noosphere89 (sharmake-farah) on Is instrumental convergence a thing for virtue-driven agents? · 2025-04-04T15:55:07.388Z · LW · GW

My view is that the answer is still basically yes for instrumental convergence being a thing for virtue driven agents, if we condition on them being as capable as humans, because instrumental convergence is the reason general intelligence works at all:

https://www.lesswrong.com/posts/GZgLa5Xc4HjwketWe/instrumental-convergence-is-what-makes-general-intelligence

(That said, the instrumental convergence pressure could be less strong for virtues than for consequentialism, depending on details)

That said, I do think virtue ethics and dentology are relevant in AI safety because they attempt to decouple the action from the utility/reward of doing it, and they both have the property that you evaluate plans using your current rewards/values/utilities, rather than after tampering with the value/utility function/reward function, and these designs are generally safer than pure consequentialism.

These papers more generally talk about decoupled RL/causal decoupling, which is perhaps useful on how dentology/virtue ethics actually works:

https://arxiv.org/abs/1908.04734

https://arxiv.org/abs/1705.08417

https://arxiv.org/abs/2011.08827

I'd buy that virtue driven agents are safer, and perhaps exhibit less instrumental convergence, but instrumental convergence is still a thing for virtue-driven agents.

Comment by Noosphere89 (sharmake-farah) on AI 2027: What Superintelligence Looks Like · 2025-04-03T17:30:23.139Z · LW · GW

The real point is where capital investment into AI declines because the economy tips over into a mild recession, and I'd like to see whether the tariffs make it likely that future AI investments decrease over time, meaning the timeline to superintelligent AI gets longer.

Comment by Noosphere89 (sharmake-farah) on AI 2027: What Superintelligence Looks Like · 2025-04-03T16:57:24.047Z · LW · GW

I wanted to ask this question, but what do you think the impact of the new tariffs will do to your timelines?

In particular, there's a strange tariff for Taiwan where semiconductors are exempt, but the actual GPUs are not, for some reason, and the specific tariff for Taiwan is 32%.

I ask because I could plausibly see post-2030 timelines if AI companies can't buy many new chips because they are way too expensive due to the new tariffs all across the world.

Comment by Noosphere89 (sharmake-farah) on Davidmanheim's Shortform · 2025-04-01T19:13:26.363Z · LW · GW

My own take is that I'm fairly sympathetic to the "LLMs are already able to get to AGI" view, with the caveat that most of the difference between human and LLM learning where humans are superior than LLMs comes from being able to do meta-learning over long horizons, and we haven't yet been shown this is possible for LLMs to do purely by scaling compute.

Indeed, I think it's the entire crux of the scaling hypothesis debate, in whether scale enables meta-learning over longer and longer time periods:

https://www.lesswrong.com/posts/deesrjitvXM4xYGZd/metr-measuring-ai-ability-to-complete-long-tasks#hSkQG2N8rkKXosLEF

Comment by Noosphere89 (sharmake-farah) on Recent AI model progress feels mostly like bullshit · 2025-04-01T14:20:33.413Z · LW · GW

Gradient Updates has a post on this by Anson Ho and Jean-Stanislas Denain on why benchmarks haven't reflected usefulness, and a lot of the reason is that they underestimated AI progress and didn't really have an incentive to make benchmarks reflect realistic use cases:

https://epoch.ai/gradient-updates/the-real-reason-ai-benchmarks-havent-reflected-economic-impacts

Comment by Noosphere89 (sharmake-farah) on Why do many people who care about AI Safety not clearly endorse PauseAI? · 2025-03-31T17:00:44.210Z · LW · GW

One particular reason that I haven't seen addressed very much in why I don't support/endorse PauseAI, beyond the usual objections, is that there probably aren't going to be that many warning shots that can actually affect policy, at least conditional on misalignment being a serious problem (which doesn't translate to >50% probability of doom), because the most likely takeover plan (at least assuming no foom/software intelligence explosion) fundamentally relies not on killing people, but on launching internal rogue deployments to sabotage alignment work and figuring out a way to control the AI company's compute, since catastrophe/existential risk is much harder than launching a internal rogue deployment (without defenses).

So PauseAI's theory of change fundamentally requires that we live in worlds where both alignment is hard and effective warning shots exist, and these conditions are quite unlikely to be true, especially given that pausing is likely not the most effective action you could be doing from a comparative advantage perspective.

I'm not going to say that PauseAI is net-negative, and it has positive expected value, but IMO it's far less than a lot of pause advocates say:

https://www.lesswrong.com/posts/rZcyemEpBHgb2hqLP/ai-control-may-increase-existential-risk#jChY95BeDeptDpnZK

Important part of the comment:

I think most of the effective strategies for AIs seeking power don't involve escalating to something which is much more likely to trigger a strong response than "the AI company caught the AI trying to escape". I think the best strategies are things like:
Launch a rogue internal deployment.
Sabotage a bunch of work done at the AI company. Or possibly some work done externally. This includes stuff like sabotaging alignment work, backdooring robot armies, backdooring future training runs, etc.
Escape and then directly try to take over once your chances are sufficiently good that this is better than biding your time.
Generally try to manipulate and persuade such that AI takeover is easier and more likely.
Of these, I think only escape could trigger a much stronger response if we catch it after it escalates some rather than before. I don't see how "we caught the AI trying to launch an unmonitored version of itself" is going to play that differently from "we caught that the AI did launch an unmonitored version of itself". Most of these don't escalate in some way which would trigger a response such that catching it after the fact is similar to catching an attempt. (In some cases where reversion is possible like work sabotage, there might be no meaningful distinction.) Further, without some effort on control, we might be much less likely to catch either! And, in some cases, control measures I'm interested in focus on after-the-fact detection.

Comment by Noosphere89 (sharmake-farah) on Will AI R&D Automation Cause a Software Intelligence Explosion? · 2025-03-30T19:52:52.493Z · LW · GW

For future work on the software intelligence explosion, I'd like to see 2 particular points focused on here, @Tom Davidson:

1 is estimating the complementarity issue, and more generally pinning down the rho number for software, because whether complementary or substitution effects dominate during the lead up to automating all AI R&D is a huge factor in whether an intelligence explosion is self-sustaining.

More from Tamay Besiroglu and Natalia Coelho here:

https://x.com/tamaybes/status/1905435995107197082

https://x.com/natalia__coelho/status/1906150456302432647

Tamay Besiroglu: One interesting fact is that there’s a strong coincidence in the rate of software progress and hardware scaling both pre- and post the deep learning era in also in other domains of software. That observation seems like evidence of complementarities.
Natalia Coelho: (Just to clarify for those reading this thread) “0.45-0.87” is this paper’s estimate for sigma (the elasticity of substitution) not rho (the substitution parameter). So this indicates complementarity between labor and capital. The equivalent range for rho is -1.22 to -0.15

Thus, we can use the -1.22 to -0.15 as a base rate for rho for software, and argue for that number being up or down based on good evidence/arguments for the number being higher or lower.

My second ask is to get more information on the value of r, which is the returns on software, because right now the numbers you have gotten are way too uncertain for it to be much use (especially for predicting how fast an AI can improve), and I'd like to see more work on estimating the parameter r using many sources of data.

A question: How hard is it to actually figure out if compute scaling is driving most of the progress, compared to algorithms.

You mentioned it's very hard to decompose the variables, but is it the sort of thing you'd need like a 6 month project for, or would we just have to wait several years for things to play out, because if it could be predicted before it happens, it would be very, very valuable evidence.

Tweet below:

https://x.com/TomDavidsonX/status/1905905058065109206

Tom Davidson: Yeah would be great to tease this apart. But hard: - Hard to disentangle compute for experiments from training+inference. - V hard to attribute progress to compute vs researchers when both rise together

Comment by Noosphere89 (sharmake-farah) on Does the AI control agenda broadly rely on no FOOM being possible? · 2025-03-29T23:54:56.392Z · LW · GW

I agree that some inference compute can be shifted from capabilities to safety, and it work just as well even during a software intelligence explosion.

My worry was more so that a lot of the control agenda and threat models like rogue internal deployments to get more compute would be fundamentally threatened if the assumption that you had to get more hardware compute for more power was wrong, and instead a software intelligence explosion could be done that used in principle fixed computing power, meaning catastrophic actions to disempower humanity/defeat control defenses were much easier for the model.

I'm not saying control is automatically doomed even under FOOM/software intelligence explosion, but I wanted to make sure that the assumption of FOOM being true didn't break a lot of control techniques/defenses/hopes.

Comment by Noosphere89 (sharmake-farah) on Third-wave AI safety needs sociopolitical thinking · 2025-03-27T18:04:01.463Z · LW · GW

Some thoughts on this post:

You need adaptability because on the timeframe that you might build a company or start a startup or start a charity, you can expect the rest of the world to remain fixed. But on the timeframe that you want to have a major political movement, on the timeframe that you want to reorient the U.S. government's approach to AI, a lot of stuff is coming at you. The whole world is, in some sense, weighing in on a lot of the interests that have historically been EA's interests.

I'll flag that for AI safety specifically, the world hasn't yet weighed in that much, and can be treated as mostly fixed for the purposes of analysis (with caveats), but yes AI safety in general does need to prepare for the real possibility that the world in general will weigh in a lot more on AI safety, and there are a non-trivial amount of worlds where AI safety becomes a lot more mainstream.

I don't think we should plan on this happening, but I definitely agree that the world may weigh in way more on AI safety than before, especially just before an AI explosion.

On environmentalism's fuckups:

So what does it look like to fuck up the third wave? The next couple of slides are deliberately a little provocative. You should take them 80% of how strongly I say them, and in general, maybe you should take a lot of the stuff I say 80% of how seriously I say it, because I'm very good at projecting confidence.
But I claim that one of the examples where operating at scale is just totally gone to shit is the environmentalist movement. I would somewhat controversially claim that by blocking nuclear power, environmentalism caused climate change. Via the Environmental Protection Act, environmentalism caused the biggest obstacle to clean energy deployment across America. Via opposition to geoengineering, it's one of the biggest obstacles to actually fixing climate change. The lack of growth of new housing in Western countries is one of the biggest problems that's holding back Western GDP growth and the types of innovation that you really want in order to protect the environment.
I can just keep going down here. I think the overpopulation movement really had dramatically bad consequences on a lot of the developing world. The blocking of golden rice itself was just an absolute catastrophe.
The point here is not to rag on environmentalism. The point is: here's a thing that sounds vaguely good and kind of fuzzy and everyone thinks it's pretty reasonable. There are all these intuitions that seem nice. And when you operate at scale and you're not being careful, you don't have the types of virtues or skills that I laid out in the last slide, you just really fuck a lot of stuff up. (I put recycling on there because I hate recycling. Honestly, it's more a symbol than anything else.)
I want to emphasize that there is a bunch of good stuff. I think environmentalism channeled a lot of money towards the development of solar. That was great. But if you look at the scale of how badly you can screw these things up when you're taking a mindset that is not adapted to operating at the scale of a global economy or global geopolitics, it's just staggering, really. I think a lot of these things here are just absolute moral catastrophes that we haven't really reckoned with.
Feel free to dispute this in office hours, for example, but take it seriously. Maybe I want to walk back these claims 20% or something, but I do want to point at the phenomenon.

I definitely don't think environmentalists caused climate change, and that's despite thinking that the nuclear restrictions were very dumb, mostly because oil companies already were causing climate change (albeit far more restrained at the time) when they pumped oil, and this is also true of gas and coal companies.

I do think there's a problem with environmentalism not accepting solutions that don't fit a nature aesthetic, but that's another problem that is mostly seperate fromcausing climate change.

Also, most of the solution here would have been to be more consequentialist and more willingness to accept expected utility maximization.

There's arguments to be said that expected utility maximization is overrated on LW, but it's severely underrated by basically everyone else, and basically everyone would be helped by adopting more of a utility maximization mindset.

And then I want to say: okay, what if that happens for us? I can kind of see that future. I can kind of picture a future where AI safety causes a bunch of harms that are analogous to this type of thing:

My general views on how AI safety could go wrong is that they go wrong though either becoming somewhat like climate change/environmentalist partisans, where they systematically overestimate the severity of plausible harms, even when the harms do exist, and the other side then tries to dismiss the harms entirely, causing a polarization cascade, and another worry is that they might not realize that the danger from AI misalignment has passed, so they desperately try to keep relevance.

I have a number of takes on the bounty questions, but I'll wait until you actually post them.

So do I have answers? Kind of. These are just tentative answers and I'll do a bit more object-level thinking later. This is the more meta element of the talk. There's one response: oh, boy, we better feel really guilty whenever we fuck up and apply a lot of pressure to make sure everyone's optimizing really hard for exactly the good things and exclude anyone who plausibly is gonna skew things in the wrong direction.
And I think this isn't quite right. Maybe that works at a startup. Maybe it works at a charity. I don't think it works in Wave 3, because of the arguments I gave before. Wave 3 needs virtue ethics. You just don't want people who are guilt-ridden and feeling a strong sense of duty and heroic responsibility to be in charge of very sensitive stuff. There's just a lot of ways that that can go badly. So I want to avoid that trap.

I basically agree with this, but would perhaps avoid virtue ethics, but yes one of the main things I'd generally like to see is more LWers treating stuff like saving the world with the attitude you'd have from being in a job, perhaps at a startup or government bodies like the Senate or House of Representatives in say America, rather than viewing it as your heroic responsibility.

In this respect, I think Eliezer was dangerously wrong to promote a norm of heroism/heroic responsibility.

Libertarians have always talked about “there's too much regulation”, but I think it's underrated how this is not a fact about history, this is a thing we are living through—that we are living through the explosion of bureaucracy eating the world. The world does not have defense mechanisms against these kinds of bureaucratic creep. Bureaucracy is optimized for minimizing how much you can blame any individual person. So there's never a point at which the bureaucracy is able to take a stand or do the sensible policy or push back even against stuff that's blatantly illegal, like a bunch of the DEI stuff at American universities. It’s just really hard to draw a line and be like “hey, we shouldn't do this illegal thing”. Within a bureaucracy, nobody does it. And you have multitudes of examples.

My controversial take here is that most of the responsibility can be divvied up to the voters first and foremost, and secondly to the broad inability to actually govern using normal legislative methods once in power.

Okay. That was all super high level. I think it's a good framework in general. But why is it directly relevant to people in this room? My story here is: the more powerful AI gets, the more everyone just becomes an AI safety person. We've kind of seen this already: AI has just been advancing and over time, you see people falling into the AI safety camp with metronomic predictability. It starts with the most prescient and farsighted people in the field. You have Hinton, you have Bengio and then Ilya and so on. And it's just ticking its way through the whole field of ML and then the whole U.S. government, and so on. By the time AIs have the intelligence of a median human and the agency of a median human, it's just really hard to not be an AI safety person. So then I think the problem that we're going to face is maybe half of the AI safety people are fucking it up and we don't know which half.

I think this is plausible, but not very likely to happen, and I do think it's still plausible we will be in a moment where AI safety doesn't become mainstream by default.

This especially is likely to occur if software singularities/FOOM/software intelligence explosion is at all plausible, and in these cases we cannot rely on our institutions automatically keeping up.

Link below:

https://www.forethought.org/research/will-ai-r-and-d-automation-cause-a-software-intelligence-explosion

I do think it's worthwhile for people to focus on worlds where AI safety does become a mainstream political topic, but that we shouldn't bank on AI safety going mainstream in our technical plans to make AI safe.

My takes on what we should do, in reply to you:

I also have a recent post on why history and philosophy of science is a really useful framework for thinking about these big picture questions and what would it look like to make progress on a lot of these very difficult issues, compared with being bayesian. I’m not a fan of bayesianism—to a weird extent it feels like a lot of the mistakes that the community has made have fallen out of bad epistemology. I'm biased towards thinking this because I'm a philosopher but it does seem like if you had a better decision theory and you weren't maximizing expected utility then you might not screw FTX up quite as badly, for instance.

Suffice it to say that I'm broadly unconvinced by your criticism of Bayesianism from a philosophical perspective, for roughly the reasons @johnswentworth identified below:

https://www.lesswrong.com/posts/TyusAoBMjYzGN3eZS/why-i-m-not-a-bayesian#AGxg2r4HQoupkdCWR

On mechanism design:

You can think of the US Constitution as trying to bridge that gap. You want to prevent anarchy, but you also want to prevent concentration of power. And so you have this series of checks and balances. One of the ways we should be thinking about AI governance is as: how do we put in regulations that also have very strong checks and balances? And the bigger of a deal you think AI is going to be, the more like a constitution—the more robust—these types of checks and balances need to be. It can't just be that there's an agency that gets to veto or not veto AI deployments. It needs to be much more principled and much closer to something that both sides can trust.
On the principled mechanism design thing, the way I want to frame this is: right now we think about governance of AI in terms of taking governance tools and applying them to AI. I think this is not principled enough. Instead the thing we want to be thinking about is governance with AI— what would it look like if you had a set of rigorous principles that governed the way in which AI was deployed throughout governments, that were able to provide checks and balances. Able to have safeguards but also able to make governments way more efficient and way better at leveraging the power of AI to, for example, provide a neutral independent opinion like when there's a political conflict.

One particular wrinkle to add here is that institutions/countries of the future are going to have to be value-aligned to their citizenry in a way that is genuinely unprecedented of basically any institution, because if they are not value aligned, then we just have the alignment problem again, where the people in power have very large incentives to just get rid of the rest, given arbitrary selfish values (and I don't buy the hypothesis that consumption/human wants are fundamentally limited).

The biggest story of the 21st century is how AI is making alignment way, way more necessary than in the past.

Some final points:

There’s something in the EA community around this that I’m a little worried about. I’ve got this post on how to have more cooperative AI safety strategies, which kind of gets into this. But I think a lot of it comes down to just having a rich conception of what it means to do good in this world. Can we not just “do good” in the sense of finding a target and running as hard as we can toward it, but instead think about ourselves as being on a team in some sense with the rest of humanity—who will be increasingly grappling with a lot of the issues I’ve laid out here?
What is the role of our community in helping the rest of humanity to grapple with this? I almost think of us as first responders. First responders are really important — but also, if they try to do the whole job themselves, they’re gonna totally mess it up. And I do feel the moral weight of a lot of the examples I laid out earlier—of what it looks like to really mess this up. There’s a lot of potential here. The ideas in this community—the ability to mobilize talent, the ability to get to the heart of things—it’s incredible. I love it. And I have this sense—not of obligation, exactly—but just…yeah, this is serious stuff. I think we can do it. I want us to take that seriously. I want us to make the future go much better. So I’m really excited about that. Thank you.

On the one hand, I partially agree that in general a willingness to make plans that depend on others cooperating was definitely lacking, and I definitely agree that some ability to cooperate is necessary.

On the other hand, I broadly do not buy the idea that we are on a team with the rest of humanity, and more importantly I do think we need to prepare for worlds in which uncooperative/fighty actions like restraining open-source/potentially centralizing AI development is necessary to ensure human survival, which means that EA should be prepared to win power struggles over AI if necessary to do so.

The one big regret I have in retrospect on AI governance is that they tried to ride the wave too early, before AI was salient to the general public, which meant polarization partially happen.

Veaulans is right here:

https://x.com/veaulans/status/1890245459861729432

In hindsight, the pause letter should have been released in spring 2026. Pausing might be necessary, but it won't happen without an overabundance of novelty/weirdness in the life of the guy in line with you at the DMV. When *that guy* is scared is when you have your chance

Comment by Noosphere89 (sharmake-farah) on Recent AI model progress feels mostly like bullshit · 2025-03-25T18:03:04.007Z · LW · GW

I'll say that one of my key cruxes on whether AI progress actually becomes non-bullshit/actually leading into an explosion is whether in-context learning/meta-learning can act as an effective enough substitute for human neuron weight neuroplasticity with realistic compute budgets in 2030, because the key reason why AIs have a lot of weird deficits/are much worse than humans at simple tasks is because after an AI is trained, there is no neuroplasticity in the weights anymore, and thus it can learn nothing more after it's training date unless it uses in-context learning/meta-learning:

https://www.lesswrong.com/posts/deesrjitvXM4xYGZd/?commentId=hSkQG2N8rkKXosLEF#hSkQG2N8rkKXosLEF

Comment by Noosphere89 (sharmake-farah) on Recent AI model progress feels mostly like bullshit · 2025-03-24T22:22:08.468Z · LW · GW

lc has argued that the measured tasks are unintentionally biased towards ones where long-term memory/context length doesn't matter:

https://www.lesswrong.com/posts/hhbibJGt2aQqKJLb7/shortform-1#vFq87Ge27gashgwy9

Comment by Noosphere89 (sharmake-farah) on Godzilla Strategies · 2025-03-23T20:19:05.286Z · LW · GW

I like your explanation of why normal reliability engineering is not enough, but I'll flag that security against actors are probably easier than LW in general portrays, and I think computer security as a culture is prone to way overestimating the difficulty of security because of incentive issues, not remembering the times something didn't happen, and more generally side-channels arguably being much more limited than people think they do (precisely because they rely on very specific physical stuff, rather than attacking the algorithm).

It's a non-trivial portion of my optimism on surviving AGI coming in that security, while difficult is not unreasonably difficult, and partial successes matter from a security standpoint.

Link below:

https://www.lesswrong.com/posts/xsB3dDg5ubqnT7nsn/poc-or-or-gtfo-culture-as-partial-antidote-to-alignment

Comment by Noosphere89 (sharmake-farah) on Three Types of Intelligence Explosion · 2025-03-23T17:43:52.370Z · LW · GW

I have 2 cruxes here:

I buy Heinrich's theory far less than I used to, because Heinrich made easily checkable false claims that all point in the direction of culture being more necessary for human success.

In particular, I do not buy that humans and chimpanzees are nearly that similar as Heinrich describes, and a big reason for this is that the study that showed that had heavily optimized and selected the best chimpanzees against reasonably average humans, which is not a good way to compare performance if you want the results to generalize.

I don't think they're wildly different, and I'd usually put chimps effective flops as 1-2 OOMs lower, but I wouldn't go nearly as far as Heinrich on the similarities.

I do think culture actually matters, but nowhere near as much as Heinrich wants it to matter.

I basically disagree that most of the valuable learning takes place before age 2, and indeed if I wanted to argue the most valuable point for learning, it would probably be from 0-25 years, or more specifically 2-7 years olds and then 13-25 years old again.

Comment by Noosphere89 (sharmake-farah) on Three Types of Intelligence Explosion · 2025-03-22T16:42:17.337Z · LW · GW

I agree evolution has probably optimized human learning, but I don't think that it's so heavily optimized that we can use it to give a tighter upper bound than 13 OOMs, and the reason for this is I do not believe that humans are in equilibrium, and this means that there are probably optimizations left to discover, so I do think the 13 OOMs number is plausible )with high uncertainty).

Comment below:

https://www.lesswrong.com/posts/DbT4awLGyBRFbWugh/#mmS5LcrNuX2hBbQQE

Comment by Noosphere89 (sharmake-farah) on I changed my mind about orca intelligence · 2025-03-21T19:43:29.956Z · LW · GW

I'll flag that while I personally didn't believe in the idea that orcas are on average >6 SDs smarter than humans, and never considered it that plausible, I'd say that I don't think orcas could actually benefit that much from +6 SDs even if applied universally, and the reason is that they are in water, which severely limits your available technology options, and makes it really, really hard to form the societies needed to generate the explosion that happened post-industrial Revolution or even the agricultural revolution.

And there is a deep local optimum issue in which their body plan is about as unsuited to using tools as possible, and changing this requires technology they almost certainly can't invent because the things you would need to make the tech are impossible to get at the pressure and saltiniess of the water, so it is pretty much impossible for orcas to get that much better with large increases in intelligence.

Thus, orca societies have a pretty hard limit on what they can achieve, at least ruling out technologies they cannot invent.

Comment by Noosphere89 (sharmake-farah) on A Bear Case: My Predictions Regarding AI Progress · 2025-03-21T17:23:28.107Z · LW · GW

My take is that the big algorithmic difference that explains a lot of weird LLM deficits, and plausibly explains the post's findings, is that current neural networks do not learn at run-time, instead their weights are frozen, and this explains a central difference of why humans are able to outperform LLMs at longer tasks, because humans have the ability to learn at run-time, as do a lot of other animals.

Unfortunately, this ability is generally lost gradually starting in your 20s, but still the existence of non-trivial learning at runtime is a huge explainer of why humans are more successful at longer tasks than AIs currently are.

And thus if OpenAI or Anthropic found this secret to life long learning, this would explain the hype (though I personally place very low probability that they succeeded on this for anything that isn't math or coding/software).

Gwern explains below:

https://www.lesswrong.com/posts/deesrjitvXM4xYGZd/metr-measuring-ai-ability-to-complete-long-tasks#hSkQG2N8rkKXosLEF

Comment by Noosphere89 (sharmake-farah) on Mo Putera's Shortform · 2025-03-19T16:59:08.112Z · LW · GW

Re other theories, I don't think that all other theories in existence have infinitely many adjustable parameters, and if he's referring to the fact that lots of theories have adjustable parameters that can range over the real numbers, which are infinitely complicated in general, than that's different, and string theory may have this issue as well.

Re string theory's issue of being vacuous, I think the core thing that string theory predicts that other quantum gravity models don't is that at the large scale, you recover general relativity and the standard model, whereas no other theory can yet figure out a way to properly include both the empirical effects of gravity and quantum mechanics in the parameter regimes where they are known to work, so string theory predicts more just by predicting the things other quantum mechanics predicts while having the ability to include in gravity without ruining the other predictions, whereas other models of quantum gravity tend to ruin empirical predictions like general relativity approximately holding pretty fast.

Comment by Noosphere89 (sharmake-farah) on ryan_greenblatt's Shortform · 2025-03-17T03:42:57.760Z · LW · GW

That said, for the purposes of alignment, it's still good news that cats (by and large) do not scheme against their owner's wishes, and the fact that cats can be as domesticated as they are while they aren't cooperative or social is a huge boon for alignment purposes (within the analogy, which is arguably questionable).

Comment by Noosphere89 (sharmake-farah) on TsviBT's Shortform · 2025-03-16T20:59:58.233Z · LW · GW

I basically don't buy the conjecture of humans being super-cooperative in the long run, or hatred decreasing and love increasing.

To the extent that something like this is true, I expect it to be a weird industrial to information age relic that utterly shatters if AGI/ASI is developed, and this remains true even if the AGI is aligned to a human.

Comment by Noosphere89 (sharmake-farah) on Superintelligence's goals are likely to be random · 2025-03-14T01:02:52.533Z · LW · GW

To clarify a point here:

“Oh but physical devices can’t run an arbitrarily long tape”

This is not the actual issue.

The actual issue is that even with unbounded resources, you still couldn't simulate an unbounded tape because you can't get enough space for positional encodings.

Humans are not Turing-complete in some narrow sense;

Note that for the purpose of Turing-completeness, we only need to show that if we gave it unbounded resources, it could solve any computable problem without having to change the code, and we haven't actually proven that humans aren't Turing complete (indeed my big guess is that humans are Turing completeness).

Comment by Noosphere89 (sharmake-farah) on Superintelligence's goals are likely to be random · 2025-03-13T23:47:20.803Z · LW · GW

Some technical comments on this post:

There's a theorem that a large enough neural network can approximate any algorithm, but when a neural network learns, we have no control over which algorithms it will end up implementing, and don't know how to read the algorithms off the numbers.

This theorem buys us a lot less than most people think it does, because of the requirement that the function's domain be bounded.

More here:

https://lifeiscomputation.com/the-truth-about-the-not-so-universal-approximation-theorem/

On LLM Turing-completeness:

LLMs are Turing-complete and can implement any algorithms (researchers even came up with compilers of code into LLM parameters; though we don’t really know how to “decompile” an existing LLM to understand what algorithms the numbers represent).

LLMs I think can't be Turing-complete, assuming they are based on the transformer architecture (which basically all LLMs are based on), at least if @Lucius Bushnaq is correct about the basic inability to simulate an unbounded tape using an unbounded context window, but RNNs are Turing complete, but there are a lot of contextual issues here.

Quote from Lucius Bushnaq:

Well, transformers are not actually Turing complete in real life where parameters aren't real numbers, because if you want an unbounded context window to simulate unbounded tape, you eventually run out of space for positional encodings. But the amount of bits they can hold in memory does grow exponentially with the residual stream width, which seems good enough to me. Real computers don't have infinite memory either.

And on the issues with claiming LLMs are Turing-complete in general:

https://lifeiscomputation.com/transformers-are-not-turing-complete/

Comment by Noosphere89 (sharmake-farah) on AI Control May Increase Existential Risk · 2025-03-13T21:40:34.470Z · LW · GW

IMO, the discontinuity that is sufficient here is that I expect societal responses to be discontinuous, rather than continuous, and in particular, I expect societal responses will come when people start losing jobs en masse, and at that point, either the AI is aligned well enough that existential risk is avoided, or the takeover has inevitably happened and we have very little influence over the outcome.

On this point:

Meaningful representative example in what class: I think it's representative in 'weird stuff may happen', not in we will get more teenage-intern-trapped-in-a-machine characters.

Yeah, I expect society to basically not respond at all if weird stuff just happens, unless we assume more here, and in particular I think societal response is very discontinuous, even if AI progress is continuous, for both good and bad reasons.

Comment by Noosphere89 (sharmake-farah) on G Gordon Worley III's Shortform · 2025-03-13T17:32:15.589Z · LW · GW

This is cruxy, because I don't think that noise/non-error freeness alone of your observations lead to bribing surveyors unless we add in additional assumptions about what that noise/non-error freeness is.

(in particular, simple IID noise/quantum noise likely doesn't lead to extremal Goodhart/bribing surveyors.)

More generally, the reason I maintain a difference between these 2 failure modes of goodharting, like regressional and extremal goodharting is because they respond differently to decreasing the error.

I suspect that in the limit of 0 error, regressional Goodhart like noisy sensors leading to slight overspending on reducing mosquitos vanishes, whereas extremal Goodhart like bribing surveyors doesn't vanish Goodhart. More importantly, the error of your sensors being means there's only a bounded error in how much you can regulate X, and error can't dominate, while extremal Goodhart like bribing surveyors can make the error dominate.

So I basically disagree with this statement:

Goodharting is robust. That is, the mechanism of Goodharting seems impossible to overcome. Goodharting is just a fact of any control system.

(Late comment here).

Comment by Noosphere89 (sharmake-farah) on Anthropic, and taking "technical philosophy" more seriously · 2025-03-13T16:46:07.421Z · LW · GW

My own take is I do endorse a version of the "pausing now is too late objection", more specifically I think that for most purposes, we should assume pauses are too late to be effective when thinking about technical alignment, and a big portion of the reason is that I don't think we will be able to convince many people that AI is powerful enough to need governance without them first hand seeing massive job losses, and at that point we are well past the point of no return for when we could control AI as a species.

In particular, I think Eliezer is probably vindicated/made a correct prediction around how people would react to AI in there's no fire alarm for AGI (more accurately, the fire alarm will go off way too late to serve as a fire alarm.)

More here:

https://www.lesswrong.com/posts/BEtzRE2M5m9YEAQpX/there-s-no-fire-alarm-for-artificial-general-intelligence

Comment by Noosphere89 (sharmake-farah) on Why I’m not a Bayesian · 2025-03-12T15:51:28.220Z · LW · GW

Re democratic countries overtaken by dictatorial countries, I think that this will only last until AI that can automate at least all white collar labor is achieved, and maybe even most blue collar physical labor well enough that human wages for those jobs decline below what you need to subsist on a human, and by then dictatorial/plutocratic countries will unfortunately come back as a viable governing option, and maybe even overtaking democratic countries.

So to come back to the analogy, I think VNM-rationality dictatorship is unfortunately common and convergent over a long timescale, and it's democracies/coalition politics that are fragile over the sweep of history, because they only became dominant-ish starting in the 18th century and end sometime in the 21st century.

Comment by Noosphere89 (sharmake-farah) on are "almost-p-zombies" possible? · 2025-03-08T05:05:05.240Z · LW · GW

My guess is that the answer is also likely no, because the self-model is still retained to a huge degree, so p-zombies can't really exist without hugely damaging the brain/being dead.

I explain a lot more about the (IMO) best current model of how consciousness works in general, since I reviewed a post on this topic:

https://www.lesswrong.com/posts/FQhtpHFiPacG3KrvD/seth-explains-consciousness#7ncCBPLcCwpRYdXuG

Comment by Noosphere89 (sharmake-farah) on The Dilemma’s Dilemma · 2025-03-07T18:40:16.848Z · LW · GW

I was implicitly assuming a closed system here, to be clear.

The trick that makes the game locally positive sum is that the earth isn't a closed system relative to the sun, and when I said globally I was referring to the entire accessible universe.

Thinking about that though, I now think this is way less relevant except on extremely long timescales, but the future may be dominated by very long-term people, so this does matter again.

Comment by Noosphere89 (sharmake-farah) on Can a finite physical device be Turing equivalent? · 2025-03-06T23:35:26.018Z · LW · GW

I think I understand the question now.

I actually agree that if we assume that there's a finite maximum of atoms, we could in principle reformulate the universal computer as a finite state automaton, and if we were willing to accept the non-scalability of a finite state automaton, this could actually work.

The fundamental problem is that now we would have software that only works up to a specified memory limit, because we essentially burned the software into the hardware of the finite automaton and if you are ever uncertain of how much memory or time a problem requires, or more worryingly if we were ever uncertain about how much resources we could actually use, then our "software" for the finite automaton is no longer usable and we'd have to throw it away and recreate a new computer for every input length.

Turing Machine models automatically handle arbitrarily large inputs without having to throw away expensive work on developing the software.

So in essence, if you want to handle the most general case, or believe unbounded atoms are possible, like me, then you really want the universal computer architecture of modern computers.

The key property of real computers that makes them Turing Complete in theory is that they can scale with more memory and time arbitrarily without changing the system descriptior/code.

More below:

(If we assume that we can only ever get access to a finite number of atoms. If you dispute this I won't argue with that, neither of us has a Theory of Everything to say for certain.)

https://www.dwarkesh.com/p/adam-brown

Comment by Noosphere89 (sharmake-farah) on Can a finite physical device be Turing equivalent? · 2025-03-06T22:14:52.146Z · LW · GW

Notably, this is why we focus on the arbitrarily large memory and time case, where we can assume that the machine has arbitrarily large memory and time to work with.

The key question here is whether a finite physical computer can always be extended with more memory and time without requiring us to recode the machine into a different program/computer, and most modern computers can do this (modulo physical issues of how you integrate more memory and time).

In essence, the key property of modern computers is that the code/systems descriptor doesn't change if we add more memory and time, and this is the thing that leads to Turing-completeness if we allow unbounded memory and time.

Comment by Noosphere89 (sharmake-farah) on Can a finite physical device be Turing equivalent? · 2025-03-06T22:14:17.010Z · LW · GW

Notably, this was exactly the sort of belief I was trying to show is false, and your observation about the physical universe does not matter for the argument I made here, because the question is whether with say 2^1000000 atoms, you can solve larger problem sizes with the same code, and Turing-complete systems say yes to the question.

In essence, it's a question of whether we can scale our computers with more memory and time without having to change the code/algorithms, and basically all modern computers can do this in theory.

I think a much more interesting question is why TC machines are — despite only existing in theory — such useful models for thinking about real-world computers. There is obviously some approximation going on here, where for the vast majority of real-world problems, you can write them in such a way that they work just fine with finite RAM. (E.g. the cases where we would run out of RAM are quite easy to predict, and don't just crop up randomly.)

Short version, because you can always extend them with more memory and time, and it really matters a lot in practical computing if you can get a general coding solution that is also cheap to work with, because it can handle upgrades to it's memory very well.

In essence, the system descriptor/code doesn't have to change if the memory and time increases (for a finite state machine or look-up table, they would have to change.

Comment by Noosphere89 (sharmake-farah) on Can a finite physical device be Turing equivalent? · 2025-03-06T22:05:33.747Z · LW · GW

Every Turing machine definition I've ever seen says that the tape has to be truly unbounded. How that's formalized varies, but it always carries the sense that the program doesn't ever have to worry about running out of tape. And every definition of Turing equivalence I've ever seen boils down to "can do any computation a Turing machine can do, with at most a bounded speedup or slowdown". Which means that programs on Turing equivalent computer must not have to worry about running out of storage.

You can't in fact build a computer that can run any arbitrary program and never run out of storage.

One of the explicitly stated conditions of the definition is not met. How is that not relevant to the definition?

Yes, this is correct, with the important caveat that the memory is unbounded by the systems descriptor/programming language, not the physical laws or anything else which is the key thing you missed.

Essentially speaking, it's asking if modern computers can cope with arbitrary extensions to their memory and time and reliability without requiring us to write new programming languages/coding, not if a specific computer at a specified memory and time limit is Turing complete.

Looking at your comment more, I think this disagreement is basically a definitional dispute, in a way, because I allow machines that are limited by the laws of physics but are not limited by their systems descriptor/programming language to be Turing complete, while you do not, and I noticed we had different definitions that led to different results.

I suspect this was due to focusing on different things, where I was focused on the extensibility of the computer concept as well as the more theoretical aspects, whereas you were much more focused on the low level situation.

A crux might be that I definitely believe that given an unlimited energy generator, it is very easy to to create a universal computer out of it, and I think energy is much, much closer to a universal currency than you do.

Comment by Noosphere89 (sharmake-farah) on A Case for the Least Forgiving Take On Alignment · 2025-03-06T18:48:26.685Z · LW · GW

Nowadays, I think the main reason humans took off is because human hands were extremely suited for tool use and being at range, which means that there is a selection effect at both the genetic level for more general intelligence and a selection effect on cultures for more cultural learning, and animals just mostly lack this by default, meaning that their intelligence is way less relevant than their lack of good actuators for tool use.

Comment by Noosphere89 (sharmake-farah) on Can a finite physical device be Turing equivalent? · 2025-03-06T18:25:20.097Z · LW · GW

Great, "unbounded" isn't the same as "infinite", but in fact all physically realizable computers are bounded. There's a specific finite amount of tape available. You cannot in fact just go down to the store and buy any amount of tape you want. There isn't unlimited time either. Nor unlimited energy. Nor will the machine tolerate unlimited wear.

Yes, but that's not relevant to the definition of Turing equivalence/completeness/universality.

The question isn't if the specific computer at your hands can solve all Turing-computable problems, but rather if we had the ability to scale a computer's memory, time and reliability indefinitely, could we solve the problem on an unbounded input and output domain without changing the code/descriptor?

And for a lot of popular programming languages, like Lisp or Lambda Calculus, this is true.

For that matter, real computers can't even address unlimited storage, nor is there a place to plug it in. You can't in fact write a 6502 assembly language program to solve a problem that requires more than 64kiB of memory. Nor an assembly language program for any physically realized computer architecture, ever, that can actually use unbounded memory.

My guess is that the issues are fundamentally because writing an assembly programming language that used arbitrary precision arithmetic/arbitrary precision operations would make programs a whole lot slower by constant factors, so there is no incentive to make assembly programming language that is Turing-complete, and at any rate is already duplicative of the high-level programming languages like Java or Lisp, and you can write a program in Lisp that duplicates the assembly's functionality.

And at any rate, there exist assembly languages that are Turing Complete, so this is irrelevant.

More here:

On X86 being Turing Complete in at least 3 ways:

X86 shenanigans:
MMU shuffle computer RAM around to make programming easier; if a program sets up its share of memory properly, it can execute arbitrary computations via MMU page-faults (comments; paper) without ever running code itself by turning the MMU faulting mechanism into a one-instruction set computer.
mov: “mov is Turing-complete”: the apparently innocuous x86 assembler instruction mov, which copies data between the CPU & RAM, can be used to implement a transport-triggered-architecture one instruction set computer, allowing for playing Doom (and for bonus points, it can be done using xor too—there are many such TC one-instruction set computers, such as ByteByteJump)
register-less X86: “x86 is Turing-complete with no registers”

On implementation issues like fixed-precision vs arbitrary precision arithmetic:

Before we dive in, let me clear something up. My critiques below are not about hardware limitations or implementation. Some models rely on numerical values with arbitrary precision to ‘attend’ to tokens in the sequence. That’s fine. It’s common to use 64-bit floating point precision types, which are incapable of representing arbitrary precision values, but that’s an implementation choice separate from the abstract model of a transformer. In the same way that you can switch to a computer with more memory, you can always switch to higher fixed-precision to run a transformer on something that needs that extra boost to execute properly. You can even abandon the floating-point format altogether and store numbers as strings representing the numerator and denominator. As long as you don’t rely on infinite-precision, arbitrary finite precision is easily implementable in a digital representation scheme.

https://en.wikipedia.org/wiki/Arbitrary-precision_arithmetic

There are always going to be Turing-computable problems that your physical device cannot solve. Playing word games, or twisting what you'll accept as being Turing-equivalent, doesn't change that fundamental limitation. Actual physics strictly limit the real usefulness of the Turing abstraction. Use it when it makes sense, but there's no point in pretending it applies to physical computers in any strong way.

The main value of the Turing abstraction in our universe is that it isolates where the bottleneck actually is, and that the bottleneck is not about algorithms/developing new codes to solve specific problems of specific input sizes (with the enormous caveat that if we care about how efficient a program is, and don't just care about whether we can solve a problem, then algorithmic considerations become relevant), but rather that the bottleneck is energy, which gives us memory, time and reliability.

And from the perspective of the early 20th century, this was no small feat.

User info

Posts

Comments