Posts

The problems with the concept of an infohazard as used by the LW community [Linkpost] 2023-12-22T16:13:54.822Z
What's the minimal additive constant for Kolmogorov Complexity that a programming language can achieve? 2023-12-20T15:36:50.968Z
Arguments for optimism on AI Alignment (I don't endorse this version, will reupload a new version soon.) 2023-10-15T14:51:24.594Z
Hilbert's Triumph, Church and Turing's failure, and what it means (Post #2) 2023-07-30T14:33:25.180Z
Does decidability of a theory imply completeness of the theory? 2023-07-29T23:53:08.166Z
Why you can't treat decidability and complexity as a constant (Post #1) 2023-07-26T17:54:33.294Z
An Opinionated Guide to Computability and Complexity (Post #0) 2023-07-24T17:53:18.551Z
Conditional on living in a AI safety/alignment by default universe, what are the implications of this assumption being true? 2023-07-17T14:44:02.083Z
A potentially high impact differential technological development area 2023-06-08T14:33:43.047Z
Are computationally complex algorithms expensive to have, expensive to operate, or both? 2023-06-02T17:50:09.432Z
Does reversible computation let you compute the complexity class PSPACE as efficiently as normal computers compute the complexity class P? 2023-05-09T13:18:09.025Z
Are there AI policies that are robustly net-positive even when considering different AI scenarios? 2023-04-23T21:46:40.952Z
Can we get around Godel's Incompleteness theorems and Turing undecidable problems via infinite computers? 2023-04-17T15:14:40.631Z
Best arguments against the outside view that AGI won't be a huge deal, thus we survive. 2023-03-27T20:49:24.728Z
A case for capabilities work on AI as net positive 2023-02-27T21:12:44.173Z
Some thoughts on the cults LW had 2023-02-26T15:46:58.535Z
How seriously should we take the hypothesis that LW is just wrong on how AI will impact the 21st century? 2023-02-16T15:25:42.299Z
I've updated towards AI boxing being surprisingly easy 2022-12-25T15:40:48.104Z
A first success story for Outer Alignment: InstructGPT 2022-11-08T22:52:54.177Z
Is the Orthogonality Thesis true for humans? 2022-10-27T14:41:28.778Z
Logical Decision Theories: Our final failsafe? 2022-10-25T12:51:23.799Z
How easy is it to supervise processes vs outcomes? 2022-10-18T17:48:24.295Z
When should you defer to expertise? A useful heuristic (Crosspost from EA forum) 2022-10-13T14:14:56.277Z
Does biology reliably find the global maximum, or at least get close? 2022-10-10T20:55:35.175Z
Is the game design/art maxim more generalizable to criticism/praise itself? 2022-09-22T13:19:00.438Z
In a lack of data, how should you weigh credences in theoretical physics's Theories of Everything, or TOEs? 2022-09-07T18:25:52.750Z
Can You Upload Your Mind & Live Forever? From Kurzgesagt - In a Nutshell 2022-08-19T19:32:12.434Z
Complexity No Bar to AI (Or, why Computational Complexity matters less than you think for real life problems) 2022-08-07T19:55:19.939Z
Which singularity schools plus the no singularity school was right? 2022-07-23T15:16:19.339Z
Why AGI Timeline Research/Discourse Might Be Overrated 2022-07-20T20:26:39.430Z
How humanity would respond to slow takeoff, with takeaways from the entire COVID-19 pandemic 2022-07-06T17:52:16.840Z
How easy/fast is it for a AGI to hack computers/a human brain? 2022-06-21T00:34:34.590Z
Noosphere89's Shortform 2022-06-17T21:57:43.803Z

Comments

Comment by Noosphere89 (sharmake-farah) on Value Claims (In Particular) Are Usually Bullshit · 2024-06-02T17:07:48.526Z · LW · GW

The big reason that value claims tend to be on the more bullshit side is that values/morality has far, far more degrees of freedom than most belief claims, primarily because there are too many right answers to the question of what is ethical.

Belief claims can also have sort of effect (I believe the Mathematical Multiverse/Simulation Hypothesis idea by Max Tegmark and others like Nick Bostrom, while true, are basically useless claims for almost any attempt at prediction because they allow basically everything to be predicted, so it's an extremely weak predictive model, as opposed to an extremely strong generative model, which is why I hate the discourse on the Simulation/Mathematical hypotheses.), but value claims tend to be worst offenders of not being entangled and having far too many right answers.

Comment by Noosphere89 (sharmake-farah) on Catastrophic Goodhart in RL with KL penalty · 2024-05-17T23:06:50.273Z · LW · GW

My expectation is that error and utility are both extremely heavy tailed, and arguably in the same order of magnitude for heavy tails.

But thanks for answering, the real answer is we can predict effectively nothing without independence, and thus we can justify virtually every outcome of real-life Goodhart.

Maybe it's catastrophic, maybe it doesn't matter, or maybe there's anti-goodhart, but I don't see a way to predict what will reasonably happen.

Also, why do you think that error is heavier tailed than utility?

Comment by Noosphere89 (sharmake-farah) on Catastrophic Goodhart in RL with KL penalty · 2024-05-17T17:28:59.321Z · LW · GW

I have a question about this post, and it has to do with the case where both utility and error are heavy tailed:

Where does the expected value converge to if both utility and errors are heavy tailed? Is it 0, infinity, some other number, or does it not converge to any number at all?

Comment by Noosphere89 (sharmake-farah) on Please stop publishing ideas/insights/research about AI · 2024-05-02T22:12:56.775Z · LW · GW

Privacy of communities isn't a solvable problem in general, as soon as your community is large enough to compete with the adversary, it's large enough and conspicuous enough that the adversary will pay attention to it and send in spies and extract leaks.

I disagree with this in theory as a long-term concern, but yes in practice the methods to have privacy of communities haven't been implemented or tested at all, and I agree with the general sentiment that it isn't worth the steep drawbacks of privacy to protect secrets, which does unfortunately make me dislike the post due to it's strength of recommendations.

So while I could in theory disagree with you, in practice right now I mostly have to agree with the comment that there will not be such an infrastructure for private alignment ideas.

Also to touch on something here that isn't too relevant and could be considered a tangent:

If your acceptable lower limit for basically anything is zero you wont be allowed to do anything, really anything.

This is why perfectionism is such a bad thing, and why you need to be able to accept that failure happens. You cannot have 0 failures IRL.

Comment by Noosphere89 (sharmake-farah) on tlevin's Shortform · 2024-05-02T14:59:21.613Z · LW · GW

Unless you're talking about financial conflicts of interest, but there are also financial incentives for orgs pursuing a "radical" strategy to downplay boring real-world constraints, as well as social incentives (e.g. on LessWrong IMO) to downplay boring these constraints and cognitive biases against thinking your preferred strategy has big downsides.

It's not just that problem though, they will likely be biased to think that their policy is helpful for safety of AI at all, and this is a point that sometimes gets forgotten.

But correct on the fact that Akash's argument is fully general.

Comment by Noosphere89 (sharmake-farah) on The first future and the best future · 2024-05-01T16:12:06.188Z · LW · GW

I kind of agree with this, and in this way is where I fundamentally differ from a lot of e/accs and AI progress boosters quite a lot.

However, I think 2 things matter here that limit the force of this, though I don't know to what extent:

  1. People have pretty different values, and while I mostly don't consider it a bottleneck to alignment as understood on LW, it does impact this post specifically because there are differences in what people consider the best future, and this is why I'm unsure that we should pursue your program specifically.

  2. I think there are semi-reasonable arguments that lock-in concerns are somewhat overstated, and while I don't totally buy them, they are at least somewhat reasonable, and thus I don't fully support the post at this time.

However, this post has a lot of food for thought, especially given my world model of AI development is notably skewed more towards optimistic outcomes than most of LW by a lot, so thank you for at least trying to argue for a slow down without assuming existential risk.

Comment by Noosphere89 (sharmake-farah) on Against John Searle, Gary Marcus, the Chinese Room thought experiment and its world · 2024-04-16T15:07:45.830Z · LW · GW

I have a better argument now, and the answer is that the argument fails in the conclusion.

The issue is that conditional on assuming that a computer program (speaking very generally here) is able to give a correct response to every input of Chinese characters, and it knows the rules of Chinese completely, then it must know/understand Chinese in order to do the things that Searle claims it to be doing, and in this instance we'd say that it does understand Chinese/decide Chinese for all purposes.

Basically, I'm claiming that the premises lead to a different, opposite conclusion.

These premises:

“Imagine a native English speaker who knows no Chinese locked in a room full of boxes of Chinese symbols (a data base) together with a book of instructions for manipulating the symbols (the program). Imagine that people outside the room send in other Chinese symbols which, unknown to the person in the room, are questions in Chinese (the input). And imagine that by following the instructions in the program the man in the room is able to pass out Chinese symbols which are correct answers to the questions (the output).

assuming that every input has in fact been used, contradicts this conclusion:

The program enables the person in the room to pass the Turing Test for understanding Chinese but he does not understand a word of Chinese.”

The correct conclusion, including all assumptions is that they do understand/decide Chinese completely.

The one-sentence slogan is "Look-up table programs are a valid form of intelligence/understanding, albeit the most inefficient form of intelligence/understanding."

What it does say is that without any restrictions on how the program computes Chinese or any problem, other than it must give a correct answer to every input, the answer to the question of "Is it intelligent on this specific problem/does it understand this specific problem?" is always yes, and to have the possibility of it being no, you need to add more restrictions than that to make the answer be no.

Comment by Noosphere89 (sharmake-farah) on When is Goodhart catastrophic? · 2024-04-15T19:24:32.654Z · LW · GW

Essentially, the paper's model requires, by assumption, that it is impossible to get any efficiency gains (like "don't sleep on the floor" or "use this more efficient design instead) or mutually-beneficial deals (like helping two sides negotiate and avoid a war).

Yeah, that was a different assumption that I didn't realize, because I thought the assumption was solely that we had a limited budget and every increase in a feature has a non-zero cost, which is a very different assumption.

I sort of wish the assumptions were distinguished, because these are very, very different assumptions (for example, you can have positive-sum interactions/trade so long as the cost is sufficiently low and the utility gain is sufficiently high, which is pretty usual.)

Comment by Noosphere89 (sharmake-farah) on When is Goodhart catastrophic? · 2024-04-15T17:45:13.574Z · LW · GW

The real issue IMO is assumption 1, the assumption that utility strictly increases. Assumption 2 is, barring rather exotic regimes far into the future, basically always correct, and for irreversible computation, this always happens, since there's a minimum cost to increase the features IRL, and it isn't 0.

Increasing utility IRL is not free.

Assumption 1 is plausibly violated for some goods, provided utility grows slower than logarithmic, but the worry here is status might actually be a utility that strictly increases, at least relatively speaking.

Comment by Noosphere89 (sharmake-farah) on Inference cost limits the impact of ever larger models · 2024-04-13T16:21:03.447Z · LW · GW

My general prior on inference cost is that it is the same order of magnitude as training cost, and thus neither dominates the other in general, due to tradeoffs.

I don't remember where I got that idea from, though.

Comment by Noosphere89 (sharmake-farah) on How does the ever-increasing use of AI in the military for the direct purpose of murdering people affect your p(doom)? · 2024-04-13T01:24:35.863Z · LW · GW

I basically agree with John Wentworth here that it affects p(doom) not at all, but one thing I will say is that it kind of makes claims that humans will make decisions/be accountable once AI gets very useful rather uncredible.

More generally, one takeaway I see from the military's use of AI is that there are strong pressures to let them operate on their own, and this is going to be surprisingly important in the future.

Comment by Noosphere89 (sharmake-farah) on Ackshually, many worlds is wrong · 2024-04-11T21:08:01.040Z · LW · GW

My read of the post is not that many worlds is wrong, but rather it's not uniquely correct, and that many worlds has some issues of it's own, and that other theories are at least coherent.

Is this a correct reading of this post?

Comment by Noosphere89 (sharmake-farah) on Any evidence or reason to expect a multiverse / Everett branches? · 2024-04-10T01:52:49.127Z · LW · GW

What's the technical objection you have to it?

Comment by Noosphere89 (sharmake-farah) on On green · 2024-03-26T17:19:08.056Z · LW · GW

Yeah, the basic failure mode of green is that it is reliant on cartoonish descriptions of nature that is much closer to Pocahontas or really any Disney movie than real-life nature, and in general is extremely non-self reliant in the sense that it relies heavily on both Blue and Red's efforts to preserve the idealized Green.

Otherwise, it collapses into large scale black and arguably red personalities of nature.

Comment by Noosphere89 (sharmake-farah) on Natural Latents: The Concepts · 2024-03-21T00:00:34.589Z · LW · GW

Your point on laws and natural abstractions expresses nicely a big problem with postmodernism that was always there, but wasn't clearly pointed out:

Natural Abstractions and more generally almost every concept is subjective, in the sense that people can change what a concept means, and are quite subjective, but that doesn't mean you can deny the concept/abstraction and instantly make it non-effective, you actually have to do real work, and importantly change stuff in the world, and you can't simply assign different meanings or different concepts to the same data, and expect the concept to no longer work. You actually have to change the behavior of lots of other different humans, and if you fail, the concept is still real.

This also generalizes to a lot of other abstractions like gender or sexuality, where real work, especially in medicine and biotech is necessary if you want concepts on gender or sex to change drastically.

This is why a lot of postmodernism is wrong to claim that denying concepts automatically negates it's power, you have to do real work to change concepts, which is why I tend to favor technological progress.

I'll put the social concepts one in the link below, because it's so good as a response to postmodernism:

https://www.lesswrong.com/posts/mMEbfooQzMwJERAJJ/natural-latents-the-concepts#Social_Constructs__Laws

Comment by Noosphere89 (sharmake-farah) on 'Empiricism!' as Anti-Epistemology · 2024-03-19T22:09:56.989Z · LW · GW

My main disagreement is that I actually do think that at least some of the critiques are right here.

In particular, the claims that Quintin Pope is making that I think are right is that evolution is extremely different from how we train our AIs, and thus none of the inferences that work under an evolution model work under the AIs under consideration, which importantly includes a lot of analogies to apes/Neanderthals making smarter humans (which they didn't do, BTW.), which presumably failed to be aligned, ergo we can't align AI smarter than us.

The basic issue though is that evolution doesn't have a purpose or goal, and thus the common claim that evolution failed to align humans to X thing is nonsensical, as it assumes a teleological goal that just does not exist in evolution, which is quite different from humans making AIs with particular goals in mind. Thus talk of an alignment problem between say chimps/Neanderthals and humans is entirely nonsensical. This is also why this generalized example of misgeneralization fails to work, since evolution is not a trainer or designer in the way that say. an OpenAI employee making AI would be, and thus there is no generalization error, since there wasn't a goal or behavior to purposefully generalize in the first place:

"In the ancestral environment, evolution trained humans to do X, but in the modern environment, they do Y instead."

There are other problems with the analogy that Quintin Pope covered, like the fact that it doesn't actually capture misgeneralization correctly, since the ancient/modern human distinction is not the same as one AI doing a treacherous turn, or how the example of ice cream overwhelming our reward center isn't misgeneralization, but the fact that evolution has no purpose or goal is the main problem I see with a lot of evolution analogies.

Another issue is that evolution is extremely inefficient at the timescales required, which is why dominant training methods for AI borrow little from evolution at best, and even from an AI capabilities perspective it's not really worth it to rerun evolution to get AI progress.

Some other criticisms I agree with from Quintin Pope is that current AI can already self-improve, albeit more weakly and having more limits than humans, though I agree way less strongly here than Quintin Pope, and that the security mindset is very misleading and predicts things in ML that don't actually happen at all, which is why I don't think adversarial assumptions are good unless you can solve the problem in the worst case easily or just as easily as the non-adversarial cases.

Comment by Noosphere89 (sharmake-farah) on Deconstructing Bostrom's Classic Argument for AI Doom · 2024-03-14T05:06:06.165Z · LW · GW

The thing I'll say on the orthogonality thesis is that I think it's actually fairly obvious, but only because it makes extremely weak claims, in that it's logically possible for AI to be misaligned, and the critical mistake is assuming that possibility translates into non-negligible likelihood.

It's useful for history purposes, but is not helpful at all for alignment, as it fails to answer essential questions.

Comment by Noosphere89 (sharmake-farah) on Some (problematic) aesthetics of what constitutes good work in academia · 2024-03-12T22:17:26.955Z · LW · GW

Yeah, something like the alignment forum would actually be pretty good, and while LW/AF has a lot of problems, lots of it is mostly attributable to the people and culture around here, rather than their merits.

LW/AF tools would be extremely helpful for a lot of scientists, once you divorce the culture from it.

Comment by Noosphere89 (sharmake-farah) on Evolution did a surprising good job at aligning humans...to social status · 2024-03-11T15:31:03.907Z · LW · GW

Note that this doesn't undermine the post, because it's thesis only gets stronger if we assume that more alignment attempts like romantic love or altruism generalized, because that could well imply that control or alignment is actually really easy to generalize, even when the intelligence of the aligner is way less than the alignee.

This suggests that scalable oversight is either a non-problem, or a problem only at ridiculous levels of disparity, and suggests that alignment does generalize quite far.

This, as well as my belief that current alignment designers have far more tools in their alignment toolkit than evolution had makes me extremely optimistic that alignment is likely to be solved before dangerous AI.

Comment by Noosphere89 (sharmake-farah) on philh's Shortform · 2024-03-09T21:03:42.591Z · LW · GW

Only if you can't examine all of the inputs.

The no free lunch theorems basically say that if you are unlucky enough with your prior, and the problem to be solved is maximally general, then you can't improve on your efficiency beyond random sampling/brute force search, which requires you to examine every input, and thus you can't get away with algorithms that don't require you to examine all inputs like in brute-force search.

It's closer to a maximal inefficiency for intelligence/inapproximability result for intelligence than an impossibility result, which is still very important.

Comment by Noosphere89 (sharmake-farah) on Counting arguments provide no evidence for AI doom · 2024-03-07T03:30:01.843Z · LW · GW

Specifically, I wanted the edit to be a clarification that you only have a <0.1% probability on spontaneous scheming ending the world.

Comment by Noosphere89 (sharmake-farah) on Counting arguments provide no evidence for AI doom · 2024-03-07T01:07:51.595Z · LW · GW

Agree with this hugely, though I could make a partial defense of the confidence given, but yes I'd like this post to be hugely edited.

Comment by Noosphere89 (sharmake-farah) on Counting arguments provide no evidence for AI doom · 2024-02-29T17:15:03.844Z · LW · GW

Hm, are we actually sure singular learning theory actually supports general-purpose search at all?

And how does it support the goal-slot theory?

Comment by Noosphere89 (sharmake-farah) on Counting arguments provide no evidence for AI doom · 2024-02-28T01:00:32.352Z · LW · GW

I actually wish this is done sometime in the future, but I'm okay with focusing on other things for now.

(specifically the Training vs Out Of Distribution test performance experiment, especially on more realistic neural nets.)

Comment by Noosphere89 (sharmake-farah) on On the Proposed California SB 1047 · 2024-02-18T15:22:25.338Z · LW · GW

Odd that ‘a model autonomously engaging in a sustained sequence of unsafe behavior’ only counts as an ‘AI safety incident’ if it is not ‘at the request of a user.’ If a user requests that, aren’t you supposed to ensure the model doesn’t do it?

I actually agree with this. This is a good thing since a lot of the bill's provisions are useful in the case of misalignment, but not misuse. In particular, I would not support a lot of the provisions like fully shutting down AI in the misuse case, so I'm happy for that.

Overall, I must say as an optimist on AI safety, I am reasonably happy with the bill. Admittedly, the devil is in what standards of evidence are required to not have a positive safety determination, and how much evidence would they need.

Comment by Noosphere89 (sharmake-farah) on Causality is Everywhere · 2024-02-15T01:47:42.640Z · LW · GW

I want to note that just because the probability is 0 for X happening does not in general mean that X can never happen.

A good example of this is that you can decide with probability 1 whether a program halts, but that doesn't let me turn it into a decision procedure on a Turing Machine that will analyze arbitrary/every Turing Machine and decide whether they halt or not, for well known reasons.

(Oracles and hypercomputation in general can, but that's not the topic for today here.)

In general, one of the most common confusions on LW is assuming that probability 0 equals the event can never happen, and probability 1 meaning the event must happen.

This is a response to this part of the post.

And while 0 is the mode of this distribution, it’s still just a single point of width 0 on a continuum, meaning the probability of any given effect size being exactly 0, represented by the area of the red line in the picture, is almost 0.

Comment by Noosphere89 (sharmake-farah) on OpenAI wants to raise 5-7 trillion · 2024-02-09T18:38:22.615Z · LW · GW

That's much more reasonable of a claim, though it might be too high still (but much more reasonable.)

Comment by Noosphere89 (sharmake-farah) on Prediction Markets aren't Magic · 2024-01-30T18:38:20.719Z · LW · GW

Potentially, but that would require a lot of bitcoin people to admit that government intervention in their activity is at least sometimes good, and given all the other flaws of bitcoin like having irreversible transactions, it truly is one of those products that isn't valuable at all in the money role except in extreme edge cases, and pretty much all other inventions had more use than this, which is why I think that in order for crypto to be useful, you need to entirely remove the money aspect via some means, and IMO, governments are the most practical means of doing so.

Comment by Noosphere89 (sharmake-farah) on Four visions of Transformative AI success · 2024-01-21T16:33:27.893Z · LW · GW

My primary concern here is that biology remains substantial as the most important cruxes of value to me such as love, caring and family all are part and parcel of the biological body.

I'm starting to think a big crux of my non-doominess probably rests on basically rejecting this premise, alongside a related premise that holds that value is complex and fragile, and the arguments for them being there being surprisingly weak, and the evidence in neuroscience is coming to the opposite conclusion, where values and capabilities are fairly intertwined, and the value generators are about as simple and general as we could have gotten, which makes me much less worried about several alignment problems like deceptive alignment.

Comment by Noosphere89 (sharmake-farah) on peterbarnett's Shortform · 2024-01-09T03:55:51.592Z · LW · GW

people have written what I think are good responses to that piece; many of the comments, especially this one, and some posts.

There are responses by Quintin Pope and Ryan Greenblatt that addressed their points, where Ryan Greenblatt pointed out that the argument used in support of autonomous learning is only distinguishable from supervised learning if there are data limitations, and we can tell an analogous story about supervised learning having a fast takeoff without data limitations, and Quintin Pope has massive comments that I can't really summarize, but one is a general purpose response to Zvi's post, and the other is adding context to the debate between Quintin Pope and Jan Kulevit on culture:

https://www.lesswrong.com/posts/hvz9qjWyv8cLX9JJR/evolution-provides-no-evidence-for-the-sharp-left-turn#hkqk6sFphuSHSHxE4

https://www.lesswrong.com/posts/Wr7N9ji36EvvvrqJK/response-to-quintin-pope-s-evolution-provides-no-evidence#PS84seDQqnxHnKy8i

https://www.lesswrong.com/posts/wCtegGaWxttfKZsfx/we-don-t-understand-what-happened-with-culture-enough#YaE9uD398AkKnWWjz

Comment by Noosphere89 (sharmake-farah) on Deceptive AI ≠ Deceptively-aligned AI · 2024-01-07T22:36:32.575Z · LW · GW

Yep, that's what I was talking about, Seth Herd.

Comment by Noosphere89 (sharmake-farah) on Deceptive AI ≠ Deceptively-aligned AI · 2024-01-07T19:07:14.743Z · LW · GW

I agree with the claim that deception could arise without deceptive alignment, and mostly agree with the post, but I do still think it's very important to recognize if/when deceptive alignment fails to work, it changes a lot of the conversation around alignment.

Comment by Noosphere89 (sharmake-farah) on Against Almost Every Theory of Impact of Interpretability · 2024-01-05T21:57:14.835Z · LW · GW

I'll admit I overstated it here, but my claim is that once you remove the requirement for arbitrarily good/perfect solutions, it becomes easier to solve the problem. Sometimes, it's still impossible to solve the problem, but it's usually solvable once you drop a perfectness/arbitrarily good requirement, primarily because it loosens a lot of constraints.

Indeed, I think the implication quite badly fails.

I agree it isn't a logical implication, but I suspect your example is very misleading, and that more realistic imperfect solutions won't have this failure mode, so I'm still quite comfortable with using it as an implication that isn't 100% accurate, but more like 90-95+% accurate.

Comment by Noosphere89 (sharmake-farah) on Critical review of Christiano's disagreements with Yudkowsky · 2023-12-28T22:28:38.208Z · LW · GW

Yeah, I feel this is quite similar to OpenAI's plan to defer alignment to future AI researchers, except worse, because if we grant that the plan proposed actually made the augmented humans stably aligned with our values, then it would be far easier to do scalable oversight, because we have a bunch of advantages around controlling AIs, like the fact that it would be socially acceptable to control AI in ways that wouldn't be socially acceptable to do if it involved humans, the incentives to control AI are much stronger than controlling humans, etc.

I truly feel like Eliezer has reinvented a plan that OpenAI/Anthropic are already doing, except worse, which is deferring alignment work to future intelligences, and Eliezer doesn't realize this, so the comments treat it as though it's something new rather than an already done plan, just with AI swapped out for humans.

It's not just coy, it's reinventing an idea that's already there, except worse, and he doesn't tell you that if you swap the human for AI, it's already being done.

Link for why AI is easier to control than humans below:

https://optimists.ai/2023/11/28/ai-is-easy-to-control/

Comment by Noosphere89 (sharmake-farah) on In Defense of Epistemic Empathy · 2023-12-28T14:50:22.024Z · LW · GW

I'd say the main flaws in conspiracy theories are that they tend to assume that coordination is easy, especially when the conspiracy requires a large group of people to do something, generally assumes agency/homunculi too much, and underestimates the costs of secrecy, especially when trying to do complicated tasks. As a bonus, it also suffers from the problem of a lot of claimed conspiracy theories being told in a way that talks about it as though it was a narrative, which tends to be a general problem around a lot of subjects.

It's already hard enough to cooperate openly, and secrecy amplifies this difficulty a lot, so much so that conspiracies that are attempted usually go nowhere, and the successful conspiracies are a very rare set of the set of all conspiracies attempted.

Comment by Noosphere89 (sharmake-farah) on In Defense of Epistemic Empathy · 2023-12-28T02:10:36.097Z · LW · GW

Yep, I think this is the likely wording as well, since on a quick read, I suspect that what the research is showing isn't that humans are rational, but rather that we simply can't be rational in realistic situations due to resource starvation/resource scarcity issues.

Note, that doesn't mean it's easy or possible at all to fix the problem of irrationality, but I might agree with "others are not remarkably more irrational than you are."

Comment by Noosphere89 (sharmake-farah) on In Defense of Epistemic Empathy · 2023-12-28T01:54:17.160Z · LW · GW

This is one of my biggest pet-peeves about a lot of languages, they basically have no way to bound the domain of discourse without getting quite complicated, and perhaps getting more formal as well, and in ordinary communication, a claim is usually assumed to have a bounded domain of discourse that's different from the set of all possible X, whether it's realities, worlds or whatever else is being talked about here, and I think this is the main problem with the attempt to make claim "In the real world, there are talking donkeys" sound absurd, because the real word is essentially attempting to bound the domain of discourse to talk about 1 world, the world we live in.

https://en.wikipedia.org/wiki/Domain_of_discourse

Comment by Noosphere89 (sharmake-farah) on Critical review of Christiano's disagreements with Yudkowsky · 2023-12-28T01:39:24.563Z · LW · GW

I think my crux is that if we assume that humans are scalable in intelligence without the assumption that they become misaligned, then it becomes much easier to argue that we'd be able to align AI without having to go through the process, for the reason sketched out by jdp:

I think the crux is an epistemological question that goes something like: "How much can we trust complex systems that can't be statically analyzed in a reductionistic way?" The answer you give in this post is "way less than what's necessary to trust a superintelligence". Before we get into any object level about whether that's right or not, it should be noted that this same answer would apply to actual biological intelligence enhancement and uploading in actual practice. There is no way you would be comfortable with 300+ IQ humans walking around with normal status drives and animal instincts if you're shivering cold at the idea of machines smarter than people.

https://www.lesswrong.com/posts/JcLhYQQADzTsAEaXd/?commentId=7iBb7aF4ctfjLH6AC

Comment by Noosphere89 (sharmake-farah) on Critical review of Christiano's disagreements with Yudkowsky · 2023-12-27T21:11:39.778Z · LW · GW

Yep, this is basically OpenAI's alignment plan, but worse. IMO I'm pretty bullish on that plan, but yes this is pretty clearly already done, and I'm rather surprised by Eliezer's comment here.

Comment by Noosphere89 (sharmake-farah) on Against Almost Every Theory of Impact of Interpretability · 2023-12-26T18:37:42.344Z · LW · GW

I think this might be a crux, actually. I think it's surprisingly common in history for things to work out well empirically, but that we either don't understand how they work, or it took a long time to understand how it works.

AI development is the most central example, but I'd argue the invention of steel is another good example.

To put it another way, I'm relying on the fact that there have been empirically successful interventions where we either simply don't know why it works, or it takes a long time to get a useful theory out of the empirically successful intervention.

Comment by Noosphere89 (sharmake-farah) on K-complexity is silly; use cross-entropy instead · 2023-12-26T17:50:01.863Z · LW · GW

Admittedly, as much as I do think that Kolmogorov Complexity is worse than Alt-Complexity, I do think that it has one particular use case that Alt-complexity does not have:

It correctly handles the halting oracle case, and generally handles the ideal/infinite cases quite a lot better than Alt-complexity/Solomonoff log-probability, and this is a case where alt-complexity does quite a lot worse.

Alt-complexity intelligence is very much a theory of the finite, and also a restrictive finite case at that, and doesn't attempt to deal with the infinite case, or cases where it's still finite but some weird feature of the environment allows halting oracles and I think they're right, at least for the foreseeable future to ignore this case, but Kolmogorov Complexity definitely deals with the infinite/ideal cases way better than Alt-complexity when it comes to intelligence.

Links are below:

Alt-Complexity intelligence: https://www.lesswrong.com/posts/gHgs2e2J5azvGFatb/infra-bayesian-physicalism-a-formal-theory-of-naturalized#Evaluating_agents

Kolmogorov Complexity intelligence: https://www.lesswrong.com/posts/dPmmuaz9szk26BkmD/vanessa-kosoy-s-shortform?commentId=Tg7A7rSYQSZPASm9s#Tg7A7rSYQSZPASm9s

Comment by Noosphere89 (sharmake-farah) on K-complexity is silly; use cross-entropy instead · 2023-12-26T17:41:51.817Z · LW · GW

Do they know that it does not differ by a constant in the infinite sequence case?

Comment by Noosphere89 (sharmake-farah) on Against Almost Every Theory of Impact of Interpretability · 2023-12-26T16:05:17.743Z · LW · GW

I basically just disagree with this entirely, unless you don't count stuff like RLHF or DPO as alignment.

More generally, if we grant that we don't need perfection, or arbitrarily good alignment, at least early on, then I think this implies that alignment should be really easy, and the p(Doom) numbers are almost certainly way too high, primarily because it's often doable to solve problems of you don't need perfect or arbitrarily good solutions.

So I basically just disagree with Eliezer here.

Comment by Noosphere89 (sharmake-farah) on Prediction Markets aren't Magic · 2023-12-26T02:38:09.112Z · LW · GW

Maybe killed is an overstatement, but it definitely flopped hard, and compared to the expectations that bitcoin and crypto advocates were claiming, it definitely failed, and it didn't even work for almost every use case proposed by bitcoin/general cryptocurrency advocates.

The fact that the price number goes up is a testament to how much speculation can prop up bubbles, even when they're based on nothing or at best much less valuable, plus the Fed loosening it's interest rate policy means that they can party again with cheaper money.

Comment by Noosphere89 (sharmake-farah) on Prediction Markets aren't Magic · 2023-12-26T02:11:12.493Z · LW · GW

In order for bitcoin to function securely participants must waste an enormous amount of electricity and money on mining; a postgres database could process many more transactions per second at much less cost.

This alone basically made bitcoin flop hard, because it required ridiculous amounts of energy and it grew exponentially more expensive to be a useful alternative like currency, and it got so bad that Kazakhstan had protests over just how much electricity prices shot up because of it's energy being used for cryptocurrency trading.

Note, this doesn't address the many other severe flaws with bitcoin or cryptocurrency in general, but this alone basically underscored how much bitcoin couldn't ever work to even be a helping hand for stuff like databases, let alone replace the centralized entity, because energy is expensive, and you always want to reduce the amount you use to help use energy for other useful things.

Comment by Noosphere89 (sharmake-farah) on Against Almost Every Theory of Impact of Interpretability · 2023-12-25T16:58:09.932Z · LW · GW

In theory arguments like these can sometimes be correct, but in practice perfect is often the enemy of the good.

Now that I think about it, this is the main problem a lot of LW thinking and posting has: It implicitly thinks that only a perfect, watertight solution to alignment is sufficient to guarantee human survival, despite the fact that most solutions to problems don't have to be perfect to work, and even the cases where we do face against an adversary, imperfect but fast solutions win out over perfect, very slow solutions, and in particular ignores that multiple solutions to alignment can fundamentally stack.

In general, I feel like the biggest flaw of LW is it's perfectionism, and the big reason why Michael Nielsen pointed out that alignment is extremely accelerationist in practice is that OpenAI implements a truth that LWers like Nate Soares and Eliezer Yudkowsky, as well as the broader community doesn't: Alignment approaches don't need to be perfect to work, and having an imperfect safety and alignment plan is much better than no plan at all.

Links are below:

https://www.lesswrong.com/posts/8Q7JwFyC8hqYYmCkC/link-post-michael-nielsen-s-notes-on-existential-risk-from

https://www.beren.io/2023-02-19-The-solution-to-alignment-is-many-not-one/

Comment by Noosphere89 (sharmake-farah) on Instrumental Convergence? [Draft] · 2023-12-24T17:59:17.481Z · LW · GW

I think this is one of the biggest issues, in practice, as I view at least some of the arguments for AI doom to essentially ignore structure, and I suspect that they're committing a similar error to people who argue that the no free lunch theorem makes intelligence and optimization in general so expensive that AI can't progress at all.

This is especially true for the orthogonality thesis.

Comment by Noosphere89 (sharmake-farah) on The problems with the concept of an infohazard as used by the LW community [Linkpost] · 2023-12-24T00:47:22.480Z · LW · GW

I am more pointing out that they seemed to tacitly assume that deep learning/ML/scaling couldn't work, since all the real work was what we would call better algorithms, and compute was not viewed as a bottleneck at all.

Comment by Noosphere89 (sharmake-farah) on The problems with the concept of an infohazard as used by the LW community [Linkpost] · 2023-12-23T20:23:32.152Z · LW · GW

I'm specifically focused on Nate Soares and Eliezer Yudkowsky, as well as MIRI the organization, but I do think the general point applies, especially before 2012-2015.

Comment by Noosphere89 (sharmake-farah) on Idealized Agents Are Approximate Causal Mirrors (+ Radical Optimism on Agent Foundations) · 2023-12-23T19:45:23.478Z · LW · GW

However, the inversion of the universe's forward passes can be NP-complete functions. Hence a lot of difficulties.

If were talking about cryptography specifically, we don't believe that the inversion of the universe's forward passes for cryptography is NP-complete, and if this was proved, this would collapse the polynomial hierarchy to the first level. The general view is that the polynomial hierarchy is likely to have an infinite amount of levels, ala Hilbert's hotel.

Yup! Cryptography actually was the main thing I was thinking about there. And there's indeed some relation. For example, it appears that NP≠P is because our universe's baseline "forward-pass functions" are just poorly suited for being composed into functions solving certain problems. The environment doesn't calculate those; all of those are in P.

A different story is that the following constraints potentially prevent us from solving NP-complete problems efficiently:

  1. The first law of thermodynamics coming from time-symmetry of the universe's physical laws.

  2. Light speed being finite, meaning there's only a finite amount of universe to build your computer.

  3. Limits on memory and computational speed not letting us scale exponentially forever.

  4. (Possibly) Time Travel and Quantum Gravity are inconsistent, or time travel/CTCs are impossible.

Edit: OTCs might also be impossible, where you can't travel in time but nevertheless have a wormhole, meaning wormholes might be impossible. .