Posts

Francois Chollet inadvertently limits his claim on ARC-AGI 2024-07-16T17:32:00.219Z
The problems with the concept of an infohazard as used by the LW community [Linkpost] 2023-12-22T16:13:54.822Z
What's the minimal additive constant for Kolmogorov Complexity that a programming language can achieve? 2023-12-20T15:36:50.968Z
Arguments for optimism on AI Alignment (I don't endorse this version, will reupload a new version soon.) 2023-10-15T14:51:24.594Z
Hilbert's Triumph, Church and Turing's failure, and what it means (Post #2) 2023-07-30T14:33:25.180Z
Does decidability of a theory imply completeness of the theory? 2023-07-29T23:53:08.166Z
Why you can't treat decidability and complexity as a constant (Post #1) 2023-07-26T17:54:33.294Z
An Opinionated Guide to Computability and Complexity (Post #0) 2023-07-24T17:53:18.551Z
Conditional on living in a AI safety/alignment by default universe, what are the implications of this assumption being true? 2023-07-17T14:44:02.083Z
A potentially high impact differential technological development area 2023-06-08T14:33:43.047Z
Are computationally complex algorithms expensive to have, expensive to operate, or both? 2023-06-02T17:50:09.432Z
Does reversible computation let you compute the complexity class PSPACE as efficiently as normal computers compute the complexity class P? 2023-05-09T13:18:09.025Z
Are there AI policies that are robustly net-positive even when considering different AI scenarios? 2023-04-23T21:46:40.952Z
Can we get around Godel's Incompleteness theorems and Turing undecidable problems via infinite computers? 2023-04-17T15:14:40.631Z
Best arguments against the outside view that AGI won't be a huge deal, thus we survive. 2023-03-27T20:49:24.728Z
A case for capabilities work on AI as net positive 2023-02-27T21:12:44.173Z
Some thoughts on the cults LW had 2023-02-26T15:46:58.535Z
How seriously should we take the hypothesis that LW is just wrong on how AI will impact the 21st century? 2023-02-16T15:25:42.299Z
I've updated towards AI boxing being surprisingly easy 2022-12-25T15:40:48.104Z
A first success story for Outer Alignment: InstructGPT 2022-11-08T22:52:54.177Z
Is the Orthogonality Thesis true for humans? 2022-10-27T14:41:28.778Z
Logical Decision Theories: Our final failsafe? 2022-10-25T12:51:23.799Z
How easy is it to supervise processes vs outcomes? 2022-10-18T17:48:24.295Z
When should you defer to expertise? A useful heuristic (Crosspost from EA forum) 2022-10-13T14:14:56.277Z
Does biology reliably find the global maximum, or at least get close? 2022-10-10T20:55:35.175Z
Is the game design/art maxim more generalizable to criticism/praise itself? 2022-09-22T13:19:00.438Z
In a lack of data, how should you weigh credences in theoretical physics's Theories of Everything, or TOEs? 2022-09-07T18:25:52.750Z
Can You Upload Your Mind & Live Forever? From Kurzgesagt - In a Nutshell 2022-08-19T19:32:12.434Z
Complexity No Bar to AI (Or, why Computational Complexity matters less than you think for real life problems) 2022-08-07T19:55:19.939Z
Which singularity schools plus the no singularity school was right? 2022-07-23T15:16:19.339Z
Why AGI Timeline Research/Discourse Might Be Overrated 2022-07-20T20:26:39.430Z
How humanity would respond to slow takeoff, with takeaways from the entire COVID-19 pandemic 2022-07-06T17:52:16.840Z
How easy/fast is it for a AGI to hack computers/a human brain? 2022-06-21T00:34:34.590Z
Noosphere89's Shortform 2022-06-17T21:57:43.803Z

Comments

Comment by Noosphere89 (sharmake-farah) on Confusing the metric for the meaning: Perhaps correlated attributes are "natural" · 2024-07-26T16:01:21.583Z · LW · GW

This wasn't specifically connected to the post, just providing general commentary.

Comment by Noosphere89 (sharmake-farah) on Confusing the metric for the meaning: Perhaps correlated attributes are "natural" · 2024-07-25T16:14:13.865Z · LW · GW

If I were to take anything away from this, it's that you can have cognition/intelligence that is efficient, or rational/unexploitable cognition like full-blown Bayesianism, but not both.

And that given the constraints of today, it is far better to have efficient cognition than rational/unexploitable cognition, because the former can actually be implemented, while the latter can't be implemented at all.

Comment by Noosphere89 (sharmake-farah) on Optimistic Assumptions, Longterm Planning, and "Cope" · 2024-07-18T20:07:59.659Z · LW · GW

My point isn't that the easier option always exists, or even that a problem can't be impossible.

My point is that if you are facing a problem that requires 1-shot complete plans, and there's no second try, you need to do something else.

There is a line where a problem becomes too difficult to productively work on, and that constraint is a great sign of an impossible problem (if it exists.)

Comment by Noosphere89 (sharmake-farah) on Francois Chollet inadvertently limits his claim on ARC-AGI · 2024-07-18T16:54:44.095Z · LW · GW

I was focusing on runs eligible for the prize in this short linkpost.

Comment by Noosphere89 (sharmake-farah) on Optimistic Assumptions, Longterm Planning, and "Cope" · 2024-07-18T16:53:55.945Z · LW · GW

Plans obviously need some robustness to things going wrong, and in a sense I agree with John Wentworth, if weakly, that some robustness is a necessary feature of a plan, and some verification is actually necessary.

But I have to agree that there is a real failure mode identified by moridinamael and Quintin Pope, and that is perfectionism, meaning that you discard ideas too quickly as not useful, and this constraint is the essence of perfectionism:

I have an exercise where I give people the instruction to play a puzzle game ("Baba is You"), but where you normally have the ability to move around and interact with the world to experiment and learn things, instead, you need to make a complete plan for solving the level, and you aim to get it right on your first try.

It asks for both a complete plan to solve the whole level, and also asks for the plan to work on the first try, which outside of this context implies either the problem is likely unsolvable or you are being too perfectionist with your demands.

In particular, I think that Quintin Pope's comment here is genuinely something that applies in lots of science and problem solving, and that it's actually quite difficult to reasoin well about the world in general without many experiments.

Comment by Noosphere89 (sharmake-farah) on Optimistic Assumptions, Longterm Planning, and "Cope" · 2024-07-18T16:36:38.527Z · LW · GW

What I take away from this is that they should have separated the utility from an assumption being true, from the probability/likelihood of an assumption being true, and indeed this shows some calibration problems.

There is slipping into more convenient worlds for reasons based on utility rather than evidence, which is a problem (assuming it's solvable for you.)

This is an important takeaway, but I don't think your other takeaways help as much as this one.

That said, this constraint IRL makes almost all real-life problems impossible for humans and AIs:

I have an exercise where I give people the instruction to play a puzzle game ("Baba is You"), but where you normally have the ability to move around and interact with the world to experiment and learn things, instead, you need to make a complete plan for solving the level, and you aim to get it right on your first try.

In particular, if such a constraint exists, then it's a big red flag that the problem you are solving is impossible to solve, given that constraint.

Almost all plans fail on the first try, even for really competent plans and humans, and outside of very constrained regimes, 0 plans work out on the first try.

Thus, if you are truly in a situation where you are encountering such constraints, you should give up on the problem ASAP, and rest a little to make sure that the constraint actually exists.

So while this is a fun experiment, with real takeaways, I'd warn people that constraining a plan to work on the first try and requiring completeness makes lots of problems impossible to solve for us humans and AIs.

Comment by Noosphere89 (sharmake-farah) on Paper: LLMs trained on “A is B” fail to learn “B is A” · 2024-07-12T15:23:58.012Z · LW · GW

Very interesting. Yeah, I'm starting to doubt the idea that Reversal Curse is any sort of problem for LLMs at all, and is probably trivial to fix.

Comment by Noosphere89 (sharmake-farah) on jacquesthibs's Shortform · 2024-07-12T04:19:55.029Z · LW · GW

In retrospect, I probably should have updated much less than i did, I though that it was actually testing a real LLM, which makes me less confident in the paper.

Should have responded long ago, but responding now.

Comment by Noosphere89 (sharmake-farah) on Daniel Kokotajlo's Shortform · 2024-07-11T21:14:05.612Z · LW · GW

Where are your DMs so I can get the links?

Comment by Noosphere89 (sharmake-farah) on When Are Results from Computational Complexity Not Too Coarse? · 2024-07-06T03:27:13.454Z · LW · GW

Yeah, I probably messed up here quite a bit, sorry.

Comment by Noosphere89 (sharmake-farah) on Static Analysis As A Lifestyle · 2024-07-04T15:48:52.518Z · LW · GW

The point is that it can get really, really hard for a static analyzer to be complete if you ask for enough generality in your static analyzer.

The proof basically works by showing that if you figured out a way to say, automatically find bugs in programs and making sure the program meets the specification, or figuring out whether a program actually is platonically implementing the square function infallibly, or any other program that identifies non-trivial, semantic properties, we could convert it into a program that solves the halting problem, and thus the program must be at least be able to solve all recursively enumerable problems.

For a more in practice example of static analysis being hard, I'd say a lot of NP and Co-NP completeness results of lots of problems, or even PSPACE-completeness for problems like model checking show that unless huge assumptions are made about physics, completeness of static analysis is a pipe dream for even limited areas.

Static analysis will likely be very hard for a long time to come.

Comment by Noosphere89 (sharmake-farah) on Static Analysis As A Lifestyle · 2024-07-04T15:14:45.287Z · LW · GW

I want to point out that one other big reason for static analysis being incomplete in practice is that it's basically impossible to get completeness of static analysis for lots of IRL stuff, even in limited areas without huge discoveries in physics that would demand extraordinary evidence, and the best example of this is Rice's theorem:

https://en.wikipedia.org/wiki/Rice's_theorem

Which is a huge limiter to how much we can perform static analysis IRL, though a more relevant result would probably be the Co-NP completeness result for Tautology problems, which again are related to static analysis.

Comment by Noosphere89 (sharmake-farah) on When Are Results from Computational Complexity Not Too Coarse? · 2024-07-04T01:56:35.690Z · LW · GW

While this is a useful result, I'd caution that lots of NP-complete problems are not like this, where the parameterized complexity is easy while the general complexity is hard, and assuming FPT != W[1], lots of NP-complete problems like the Clique problem are still basically impossible to solve in practice, so be wary of relying on parameterized complexity too much.

That also neatly solves the issue of whether P vs NP matters in practice: The answer is very likely yes, it does matter a lot in practice.

Comment by Noosphere89 (sharmake-farah) on Dalcy's Shortform · 2024-07-04T01:47:23.674Z · LW · GW

as an aside, does the P vs NP distinction even matter in practice?

Yes, it does, for several reasons:

  1. It basically is necessary to prove P != NP to get a lot of other results to work, and for some of those results, proving P != NP is sufficient.

  2. If P != NP (As most people suspect), it fundamentally rules out solving lots of problems generally and quickly without exploiting structure, and in particular lets me flip the burden of proof to the algorithm maker to explain why their solution to a problem like SAT is efficient, rather than me having to disprove the existence of an efficient algorithm.

It's either by exploiting structure, somehow having a proof that P=NP, or relying on new physics models that enable computing NP-complete problems efficiently, and the latter 2 need very, very strong evidence behind them.

This in particular applies to basically all learning problems in AI today.

  1. It explains why certain problems cannot be reasonably solved optimally, without huge discoveries, and the best examples are travelling salesman problems for inability to optimally solve, as well as a whole lot of other NP-complete problems. There are also other NP problems where there isn't a way to solve them efficiently at all, especially if FPT != W[1] holds.

Also a note that we also expect a lot of NP-complete problems to also not be solvable by fast algorithms even in the average case, which basically means it's likely to be very relevant quite a lot of the time, so we don't have to limit ourselves to the worst case either.

Comment by Noosphere89 (sharmake-farah) on Value Claims (In Particular) Are Usually Bullshit · 2024-06-02T17:07:48.526Z · LW · GW

The big reason that value claims tend to be on the more bullshit side is that values/morality has far, far more degrees of freedom than most belief claims, primarily because there are too many right answers to the question of what is ethical.

Belief claims can also have sort of effect (I believe the Mathematical Multiverse/Simulation Hypothesis idea by Max Tegmark and others like Nick Bostrom, while true, are basically useless claims for almost any attempt at prediction because they allow basically everything to be predicted, so it's an extremely weak predictive model, as opposed to an extremely strong generative model, which is why I hate the discourse on the Simulation/Mathematical hypotheses.), but value claims tend to be worst offenders of not being entangled and having far too many right answers.

Comment by Noosphere89 (sharmake-farah) on Catastrophic Goodhart in RL with KL penalty · 2024-05-17T23:06:50.273Z · LW · GW

My expectation is that error and utility are both extremely heavy tailed, and arguably in the same order of magnitude for heavy tails.

But thanks for answering, the real answer is we can predict effectively nothing without independence, and thus we can justify virtually every outcome of real-life Goodhart.

Maybe it's catastrophic, maybe it doesn't matter, or maybe there's anti-goodhart, but I don't see a way to predict what will reasonably happen.

Also, why do you think that error is heavier tailed than utility?

Comment by Noosphere89 (sharmake-farah) on Catastrophic Goodhart in RL with KL penalty · 2024-05-17T17:28:59.321Z · LW · GW

I have a question about this post, and it has to do with the case where both utility and error are heavy tailed:

Where does the expected value converge to if both utility and errors are heavy tailed? Is it 0, infinity, some other number, or does it not converge to any number at all?

Comment by Noosphere89 (sharmake-farah) on Please stop publishing ideas/insights/research about AI · 2024-05-02T22:12:56.775Z · LW · GW

Privacy of communities isn't a solvable problem in general, as soon as your community is large enough to compete with the adversary, it's large enough and conspicuous enough that the adversary will pay attention to it and send in spies and extract leaks.

I disagree with this in theory as a long-term concern, but yes in practice the methods to have privacy of communities haven't been implemented or tested at all, and I agree with the general sentiment that it isn't worth the steep drawbacks of privacy to protect secrets, which does unfortunately make me dislike the post due to it's strength of recommendations.

So while I could in theory disagree with you, in practice right now I mostly have to agree with the comment that there will not be such an infrastructure for private alignment ideas.

Also to touch on something here that isn't too relevant and could be considered a tangent:

If your acceptable lower limit for basically anything is zero you wont be allowed to do anything, really anything.

This is why perfectionism is such a bad thing, and why you need to be able to accept that failure happens. You cannot have 0 failures IRL.

Comment by Noosphere89 (sharmake-farah) on tlevin's Shortform · 2024-05-02T14:59:21.613Z · LW · GW

Unless you're talking about financial conflicts of interest, but there are also financial incentives for orgs pursuing a "radical" strategy to downplay boring real-world constraints, as well as social incentives (e.g. on LessWrong IMO) to downplay boring these constraints and cognitive biases against thinking your preferred strategy has big downsides.

It's not just that problem though, they will likely be biased to think that their policy is helpful for safety of AI at all, and this is a point that sometimes gets forgotten.

But correct on the fact that Akash's argument is fully general.

Comment by Noosphere89 (sharmake-farah) on The first future and the best future · 2024-05-01T16:12:06.188Z · LW · GW

I kind of agree with this, and in this way is where I fundamentally differ from a lot of e/accs and AI progress boosters quite a lot.

However, I think 2 things matter here that limit the force of this, though I don't know to what extent:

  1. People have pretty different values, and while I mostly don't consider it a bottleneck to alignment as understood on LW, it does impact this post specifically because there are differences in what people consider the best future, and this is why I'm unsure that we should pursue your program specifically.

  2. I think there are semi-reasonable arguments that lock-in concerns are somewhat overstated, and while I don't totally buy them, they are at least somewhat reasonable, and thus I don't fully support the post at this time.

However, this post has a lot of food for thought, especially given my world model of AI development is notably skewed more towards optimistic outcomes than most of LW by a lot, so thank you for at least trying to argue for a slow down without assuming existential risk.

Comment by Noosphere89 (sharmake-farah) on Against John Searle, Gary Marcus, the Chinese Room thought experiment and its world · 2024-04-16T15:07:45.830Z · LW · GW

I have a better argument now, and the answer is that the argument fails in the conclusion.

The issue is that conditional on assuming that a computer program (speaking very generally here) is able to give a correct response to every input of Chinese characters, and it knows the rules of Chinese completely, then it must know/understand Chinese in order to do the things that Searle claims it to be doing, and in this instance we'd say that it does understand Chinese/decide Chinese for all purposes.

Basically, I'm claiming that the premises lead to a different, opposite conclusion.

These premises:

“Imagine a native English speaker who knows no Chinese locked in a room full of boxes of Chinese symbols (a data base) together with a book of instructions for manipulating the symbols (the program). Imagine that people outside the room send in other Chinese symbols which, unknown to the person in the room, are questions in Chinese (the input). And imagine that by following the instructions in the program the man in the room is able to pass out Chinese symbols which are correct answers to the questions (the output).

assuming that every input has in fact been used, contradicts this conclusion:

The program enables the person in the room to pass the Turing Test for understanding Chinese but he does not understand a word of Chinese.”

The correct conclusion, including all assumptions is that they do understand/decide Chinese completely.

The one-sentence slogan is "Look-up table programs are a valid form of intelligence/understanding, albeit the most inefficient form of intelligence/understanding."

What it does say is that without any restrictions on how the program computes Chinese or any problem, other than it must give a correct answer to every input, the answer to the question of "Is it intelligent on this specific problem/does it understand this specific problem?" is always yes, and to have the possibility of it being no, you need to add more restrictions than that to make the answer be no.

Comment by Noosphere89 (sharmake-farah) on When is Goodhart catastrophic? · 2024-04-15T19:24:32.654Z · LW · GW

Essentially, the paper's model requires, by assumption, that it is impossible to get any efficiency gains (like "don't sleep on the floor" or "use this more efficient design instead) or mutually-beneficial deals (like helping two sides negotiate and avoid a war).

Yeah, that was a different assumption that I didn't realize, because I thought the assumption was solely that we had a limited budget and every increase in a feature has a non-zero cost, which is a very different assumption.

I sort of wish the assumptions were distinguished, because these are very, very different assumptions (for example, you can have positive-sum interactions/trade so long as the cost is sufficiently low and the utility gain is sufficiently high, which is pretty usual.)

Comment by Noosphere89 (sharmake-farah) on When is Goodhart catastrophic? · 2024-04-15T17:45:13.574Z · LW · GW

The real issue IMO is assumption 1, the assumption that utility strictly increases. Assumption 2 is, barring rather exotic regimes far into the future, basically always correct, and for irreversible computation, this always happens, since there's a minimum cost to increase the features IRL, and it isn't 0.

Increasing utility IRL is not free.

Assumption 1 is plausibly violated for some goods, provided utility grows slower than logarithmic, but the worry here is status might actually be a utility that strictly increases, at least relatively speaking.

Comment by Noosphere89 (sharmake-farah) on Inference cost limits the impact of ever larger models · 2024-04-13T16:21:03.447Z · LW · GW

My general prior on inference cost is that it is the same order of magnitude as training cost, and thus neither dominates the other in general, due to tradeoffs.

I don't remember where I got that idea from, though.

Comment by Noosphere89 (sharmake-farah) on How does the ever-increasing use of AI in the military for the direct purpose of murdering people affect your p(doom)? · 2024-04-13T01:24:35.863Z · LW · GW

I basically agree with John Wentworth here that it affects p(doom) not at all, but one thing I will say is that it kind of makes claims that humans will make decisions/be accountable once AI gets very useful rather uncredible.

More generally, one takeaway I see from the military's use of AI is that there are strong pressures to let them operate on their own, and this is going to be surprisingly important in the future.

Comment by Noosphere89 (sharmake-farah) on Ackshually, many worlds is wrong · 2024-04-11T21:08:01.040Z · LW · GW

My read of the post is not that many worlds is wrong, but rather it's not uniquely correct, and that many worlds has some issues of it's own, and that other theories are at least coherent.

Is this a correct reading of this post?

Comment by Noosphere89 (sharmake-farah) on Any evidence or reason to expect a multiverse / Everett branches? · 2024-04-10T01:52:49.127Z · LW · GW

What's the technical objection you have to it?

Comment by Noosphere89 (sharmake-farah) on On green · 2024-03-26T17:19:08.056Z · LW · GW

Yeah, the basic failure mode of green is that it is reliant on cartoonish descriptions of nature that is much closer to Pocahontas or really any Disney movie than real-life nature, and in general is extremely non-self reliant in the sense that it relies heavily on both Blue and Red's efforts to preserve the idealized Green.

Otherwise, it collapses into large scale black and arguably red personalities of nature.

Comment by Noosphere89 (sharmake-farah) on Natural Latents: The Concepts · 2024-03-21T00:00:34.589Z · LW · GW

Your point on laws and natural abstractions expresses nicely a big problem with postmodernism that was always there, but wasn't clearly pointed out:

Natural Abstractions and more generally almost every concept is subjective, in the sense that people can change what a concept means, and are quite subjective, but that doesn't mean you can deny the concept/abstraction and instantly make it non-effective, you actually have to do real work, and importantly change stuff in the world, and you can't simply assign different meanings or different concepts to the same data, and expect the concept to no longer work. You actually have to change the behavior of lots of other different humans, and if you fail, the concept is still real.

This also generalizes to a lot of other abstractions like gender or sexuality, where real work, especially in medicine and biotech is necessary if you want concepts on gender or sex to change drastically.

This is why a lot of postmodernism is wrong to claim that denying concepts automatically negates it's power, you have to do real work to change concepts, which is why I tend to favor technological progress.

I'll put the social concepts one in the link below, because it's so good as a response to postmodernism:

https://www.lesswrong.com/posts/mMEbfooQzMwJERAJJ/natural-latents-the-concepts#Social_Constructs__Laws

Comment by Noosphere89 (sharmake-farah) on 'Empiricism!' as Anti-Epistemology · 2024-03-19T22:09:56.989Z · LW · GW

My main disagreement is that I actually do think that at least some of the critiques are right here.

In particular, the claims that Quintin Pope is making that I think are right is that evolution is extremely different from how we train our AIs, and thus none of the inferences that work under an evolution model work under the AIs under consideration, which importantly includes a lot of analogies to apes/Neanderthals making smarter humans (which they didn't do, BTW.), which presumably failed to be aligned, ergo we can't align AI smarter than us.

The basic issue though is that evolution doesn't have a purpose or goal, and thus the common claim that evolution failed to align humans to X thing is nonsensical, as it assumes a teleological goal that just does not exist in evolution, which is quite different from humans making AIs with particular goals in mind. Thus talk of an alignment problem between say chimps/Neanderthals and humans is entirely nonsensical. This is also why this generalized example of misgeneralization fails to work, since evolution is not a trainer or designer in the way that say. an OpenAI employee making AI would be, and thus there is no generalization error, since there wasn't a goal or behavior to purposefully generalize in the first place:

"In the ancestral environment, evolution trained humans to do X, but in the modern environment, they do Y instead."

There are other problems with the analogy that Quintin Pope covered, like the fact that it doesn't actually capture misgeneralization correctly, since the ancient/modern human distinction is not the same as one AI doing a treacherous turn, or how the example of ice cream overwhelming our reward center isn't misgeneralization, but the fact that evolution has no purpose or goal is the main problem I see with a lot of evolution analogies.

Another issue is that evolution is extremely inefficient at the timescales required, which is why dominant training methods for AI borrow little from evolution at best, and even from an AI capabilities perspective it's not really worth it to rerun evolution to get AI progress.

Some other criticisms I agree with from Quintin Pope is that current AI can already self-improve, albeit more weakly and having more limits than humans, though I agree way less strongly here than Quintin Pope, and that the security mindset is very misleading and predicts things in ML that don't actually happen at all, which is why I don't think adversarial assumptions are good unless you can solve the problem in the worst case easily or just as easily as the non-adversarial cases.

Comment by Noosphere89 (sharmake-farah) on Deconstructing Bostrom's Classic Argument for AI Doom · 2024-03-14T05:06:06.165Z · LW · GW

The thing I'll say on the orthogonality thesis is that I think it's actually fairly obvious, but only because it makes extremely weak claims, in that it's logically possible for AI to be misaligned, and the critical mistake is assuming that possibility translates into non-negligible likelihood.

It's useful for history purposes, but is not helpful at all for alignment, as it fails to answer essential questions.

Comment by Noosphere89 (sharmake-farah) on Some (problematic) aesthetics of what constitutes good work in academia · 2024-03-12T22:17:26.955Z · LW · GW

Yeah, something like the alignment forum would actually be pretty good, and while LW/AF has a lot of problems, lots of it is mostly attributable to the people and culture around here, rather than their merits.

LW/AF tools would be extremely helpful for a lot of scientists, once you divorce the culture from it.

Comment by Noosphere89 (sharmake-farah) on Evolution did a surprising good job at aligning humans...to social status · 2024-03-11T15:31:03.907Z · LW · GW

Note that this doesn't undermine the post, because it's thesis only gets stronger if we assume that more alignment attempts like romantic love or altruism generalized, because that could well imply that control or alignment is actually really easy to generalize, even when the intelligence of the aligner is way less than the alignee.

This suggests that scalable oversight is either a non-problem, or a problem only at ridiculous levels of disparity, and suggests that alignment does generalize quite far.

This, as well as my belief that current alignment designers have far more tools in their alignment toolkit than evolution had makes me extremely optimistic that alignment is likely to be solved before dangerous AI.

Comment by Noosphere89 (sharmake-farah) on philh's Shortform · 2024-03-09T21:03:42.591Z · LW · GW

Only if you can't examine all of the inputs.

The no free lunch theorems basically say that if you are unlucky enough with your prior, and the problem to be solved is maximally general, then you can't improve on your efficiency beyond random sampling/brute force search, which requires you to examine every input, and thus you can't get away with algorithms that don't require you to examine all inputs like in brute-force search.

It's closer to a maximal inefficiency for intelligence/inapproximability result for intelligence than an impossibility result, which is still very important.

Comment by Noosphere89 (sharmake-farah) on Counting arguments provide no evidence for AI doom · 2024-03-07T03:30:01.843Z · LW · GW

Specifically, I wanted the edit to be a clarification that you only have a <0.1% probability on spontaneous scheming ending the world.

Comment by Noosphere89 (sharmake-farah) on Counting arguments provide no evidence for AI doom · 2024-03-07T01:07:51.595Z · LW · GW

Agree with this hugely, though I could make a partial defense of the confidence given, but yes I'd like this post to be hugely edited.

Comment by Noosphere89 (sharmake-farah) on Counting arguments provide no evidence for AI doom · 2024-02-29T17:15:03.844Z · LW · GW

Hm, are we actually sure singular learning theory actually supports general-purpose search at all?

And how does it support the goal-slot theory?

Comment by Noosphere89 (sharmake-farah) on Counting arguments provide no evidence for AI doom · 2024-02-28T01:00:32.352Z · LW · GW

I actually wish this is done sometime in the future, but I'm okay with focusing on other things for now.

(specifically the Training vs Out Of Distribution test performance experiment, especially on more realistic neural nets.)

Comment by Noosphere89 (sharmake-farah) on On the Proposed California SB 1047 · 2024-02-18T15:22:25.338Z · LW · GW

Odd that ‘a model autonomously engaging in a sustained sequence of unsafe behavior’ only counts as an ‘AI safety incident’ if it is not ‘at the request of a user.’ If a user requests that, aren’t you supposed to ensure the model doesn’t do it?

I actually agree with this. This is a good thing since a lot of the bill's provisions are useful in the case of misalignment, but not misuse. In particular, I would not support a lot of the provisions like fully shutting down AI in the misuse case, so I'm happy for that.

Overall, I must say as an optimist on AI safety, I am reasonably happy with the bill. Admittedly, the devil is in what standards of evidence are required to not have a positive safety determination, and how much evidence would they need.

Comment by Noosphere89 (sharmake-farah) on Causality is Everywhere · 2024-02-15T01:47:42.640Z · LW · GW

I want to note that just because the probability is 0 for X happening does not in general mean that X can never happen.

A good example of this is that you can decide with probability 1 whether a program halts, but that doesn't let me turn it into a decision procedure on a Turing Machine that will analyze arbitrary/every Turing Machine and decide whether they halt or not, for well known reasons.

(Oracles and hypercomputation in general can, but that's not the topic for today here.)

In general, one of the most common confusions on LW is assuming that probability 0 equals the event can never happen, and probability 1 meaning the event must happen.

This is a response to this part of the post.

And while 0 is the mode of this distribution, it’s still just a single point of width 0 on a continuum, meaning the probability of any given effect size being exactly 0, represented by the area of the red line in the picture, is almost 0.

Comment by Noosphere89 (sharmake-farah) on OpenAI wants to raise 5-7 trillion · 2024-02-09T18:38:22.615Z · LW · GW

That's much more reasonable of a claim, though it might be too high still (but much more reasonable.)

Comment by Noosphere89 (sharmake-farah) on Prediction Markets aren't Magic · 2024-01-30T18:38:20.719Z · LW · GW

Potentially, but that would require a lot of bitcoin people to admit that government intervention in their activity is at least sometimes good, and given all the other flaws of bitcoin like having irreversible transactions, it truly is one of those products that isn't valuable at all in the money role except in extreme edge cases, and pretty much all other inventions had more use than this, which is why I think that in order for crypto to be useful, you need to entirely remove the money aspect via some means, and IMO, governments are the most practical means of doing so.

Comment by Noosphere89 (sharmake-farah) on Four visions of Transformative AI success · 2024-01-21T16:33:27.893Z · LW · GW

My primary concern here is that biology remains substantial as the most important cruxes of value to me such as love, caring and family all are part and parcel of the biological body.

I'm starting to think a big crux of my non-doominess probably rests on basically rejecting this premise, alongside a related premise that holds that value is complex and fragile, and the arguments for them being there being surprisingly weak, and the evidence in neuroscience is coming to the opposite conclusion, where values and capabilities are fairly intertwined, and the value generators are about as simple and general as we could have gotten, which makes me much less worried about several alignment problems like deceptive alignment.

Comment by Noosphere89 (sharmake-farah) on peterbarnett's Shortform · 2024-01-09T03:55:51.592Z · LW · GW

people have written what I think are good responses to that piece; many of the comments, especially this one, and some posts.

There are responses by Quintin Pope and Ryan Greenblatt that addressed their points, where Ryan Greenblatt pointed out that the argument used in support of autonomous learning is only distinguishable from supervised learning if there are data limitations, and we can tell an analogous story about supervised learning having a fast takeoff without data limitations, and Quintin Pope has massive comments that I can't really summarize, but one is a general purpose response to Zvi's post, and the other is adding context to the debate between Quintin Pope and Jan Kulevit on culture:

https://www.lesswrong.com/posts/hvz9qjWyv8cLX9JJR/evolution-provides-no-evidence-for-the-sharp-left-turn#hkqk6sFphuSHSHxE4

https://www.lesswrong.com/posts/Wr7N9ji36EvvvrqJK/response-to-quintin-pope-s-evolution-provides-no-evidence#PS84seDQqnxHnKy8i

https://www.lesswrong.com/posts/wCtegGaWxttfKZsfx/we-don-t-understand-what-happened-with-culture-enough#YaE9uD398AkKnWWjz

Comment by Noosphere89 (sharmake-farah) on Deceptive AI ≠ Deceptively-aligned AI · 2024-01-07T22:36:32.575Z · LW · GW

Yep, that's what I was talking about, Seth Herd.

Comment by Noosphere89 (sharmake-farah) on Deceptive AI ≠ Deceptively-aligned AI · 2024-01-07T19:07:14.743Z · LW · GW

I agree with the claim that deception could arise without deceptive alignment, and mostly agree with the post, but I do still think it's very important to recognize if/when deceptive alignment fails to work, it changes a lot of the conversation around alignment.

Comment by Noosphere89 (sharmake-farah) on Against Almost Every Theory of Impact of Interpretability · 2024-01-05T21:57:14.835Z · LW · GW

I'll admit I overstated it here, but my claim is that once you remove the requirement for arbitrarily good/perfect solutions, it becomes easier to solve the problem. Sometimes, it's still impossible to solve the problem, but it's usually solvable once you drop a perfectness/arbitrarily good requirement, primarily because it loosens a lot of constraints.

Indeed, I think the implication quite badly fails.

I agree it isn't a logical implication, but I suspect your example is very misleading, and that more realistic imperfect solutions won't have this failure mode, so I'm still quite comfortable with using it as an implication that isn't 100% accurate, but more like 90-95+% accurate.

Comment by Noosphere89 (sharmake-farah) on Critical review of Christiano's disagreements with Yudkowsky · 2023-12-28T22:28:38.208Z · LW · GW

Yeah, I feel this is quite similar to OpenAI's plan to defer alignment to future AI researchers, except worse, because if we grant that the plan proposed actually made the augmented humans stably aligned with our values, then it would be far easier to do scalable oversight, because we have a bunch of advantages around controlling AIs, like the fact that it would be socially acceptable to control AI in ways that wouldn't be socially acceptable to do if it involved humans, the incentives to control AI are much stronger than controlling humans, etc.

I truly feel like Eliezer has reinvented a plan that OpenAI/Anthropic are already doing, except worse, which is deferring alignment work to future intelligences, and Eliezer doesn't realize this, so the comments treat it as though it's something new rather than an already done plan, just with AI swapped out for humans.

It's not just coy, it's reinventing an idea that's already there, except worse, and he doesn't tell you that if you swap the human for AI, it's already being done.

Link for why AI is easier to control than humans below:

https://optimists.ai/2023/11/28/ai-is-easy-to-control/

Comment by Noosphere89 (sharmake-farah) on In Defense of Epistemic Empathy · 2023-12-28T14:50:22.024Z · LW · GW

I'd say the main flaws in conspiracy theories are that they tend to assume that coordination is easy, especially when the conspiracy requires a large group of people to do something, generally assumes agency/homunculi too much, and underestimates the costs of secrecy, especially when trying to do complicated tasks. As a bonus, it also suffers from the problem of a lot of claimed conspiracy theories being told in a way that talks about it as though it was a narrative, which tends to be a general problem around a lot of subjects.

It's already hard enough to cooperate openly, and secrecy amplifies this difficulty a lot, so much so that conspiracies that are attempted usually go nowhere, and the successful conspiracies are a very rare set of the set of all conspiracies attempted.

Comment by Noosphere89 (sharmake-farah) on In Defense of Epistemic Empathy · 2023-12-28T02:10:36.097Z · LW · GW

Yep, I think this is the likely wording as well, since on a quick read, I suspect that what the research is showing isn't that humans are rational, but rather that we simply can't be rational in realistic situations due to resource starvation/resource scarcity issues.

Note, that doesn't mean it's easy or possible at all to fix the problem of irrationality, but I might agree with "others are not remarkably more irrational than you are."