Posts

Are COVID lab leak and market origin theories incompatible? 2023-03-20T01:44:38.893Z

Comments

Comment by Anon User (anon-user) on Is this voting system strategy proof? · 2024-09-06T23:35:11.108Z · LW · GW

The 3rd paragraph of the Wikipedia page you linked to seems to answer the very question you are asking:

Maximal lotteries do not satisfy the standard notion of strategyproofness [...] Maximal lotteries are also nonmonotonic in probabilities, i.e. it is possible that the probability of an alternative decreases when a voter ranks this alternative up

Comment by Anon User (anon-user) on Why Reflective Stability is Important · 2024-09-05T22:48:25.860Z · LW · GW

If your AGI uses a bad decision theory T it would immediately self-modify to use a better one.

Nitpick - while probably a tiny part of the possible design space, there are obvious counterexamples to that, including when using T results in the AGI [incorrectly] concluding T is the best, or otherwise not realizing this self-modification is for the best.

Comment by Anon User (anon-user) on How do you finish your tasks faster? · 2024-08-22T18:11:38.576Z · LW · GW

After finishing any task/subtask and before starting the next one, go up the hierarchy at least two levels, and ask yourself - is moving onto the next subtask still the right way to achieve the higher-level goal, and is it still the highest priority thing to tackle next. Also do this anytime there is a significant unexpected difficulty/delay/etc.

Periodically (with period defined at the beginning) do this for the top-level goal regardless of where you are in the [sub]tasks.

Comment by Anon User (anon-user) on Rabin's Paradox · 2024-08-18T19:28:52.374Z · LW · GW

There are so many side-effects this overlooks. Winning $110 complicates my taxes by more than $5. In fact, once gambling winnings taxes are considered, the first bet will likely have a negative EV!

Comment by Anon User (anon-user) on Orthogonality Thesis burden of proof · 2024-08-17T07:39:33.919Z · LW · GW

Your last figure should have behaviours on the horizontal axis, as this is what you are implying - you are effectively saying, any intelligence capable of understanding "I don't know what I don't know" will on.y have power seeking behaviours, regardless of what its ultimate goals are. With that correction, your third figure is not incompatible with the first.

Comment by Anon User (anon-user) on Orthogonality Thesis burden of proof · 2024-05-06T21:10:52.049Z · LW · GW

I buy your argument that power seeking is a convergent behavior. In fact, this is a key part of many canonical arguments for why an unaligned AGI is likely to kill us all.

But, on the meta level you seem to argue that this is incompatible with orthogonally thesis? If so, you may be misunderstanding the thesis - the ability of an AGI to have arbitrary utility functions is orthogonal (pun intended) to what behaviors are likely to result from those utility functions. The former is what orthogonality thesis claims, but your argument is about the latter.

Comment by Anon User (anon-user) on Scientific Method · 2024-02-19T22:56:34.461Z · LW · GW

Your principles #3 and #5 are in a weak conflict - generating hypothesis without having enough information to narrow the space of reasonable hypotheses would too often lead to false positives. When faced with an unknown novel phenomena, one put to collect information first, including collecting experimental data without a fixed hypothesis, before starting to formulate any hypotheses.

Comment by Anon User (anon-user) on Weighing reputational and moral consequences of leaving Russia or staying · 2024-02-18T21:03:10.233Z · LW · GW

I'm not involved in politics or the military action, but I can't help but feel implicated by my government's actions as a citizen here

Please consider the implications of not only being a citizen, but also taxpayer, and customer to other taxpayers. Through taxes, you work indirectly supports the Russian war effort.

I'm interested in building global startups,

If you succeed while still in Russia, what is stopping those with powerful connections from simply taking over from you? From what you say, it does not sound like you have connections of your own that would allow you to protect yourself?

You do not mention you eligibility for getting drafted, but unless you have strong reasons to believe you would not be (e.g. you are female), you also need to consider that possibility.

Chances are things in Russia will become worse before they become better. Have you considered how Putin's next big stupid move might affect you? What happens next time something like the Prigozhin/Wagner rebellion is a bit less of a farse? Or how it might affect you if Putin dies and Kadyrov decides it's his chance to take over?

Comment by Anon User (anon-user) on The entropy maxim for binary questions · 2024-02-11T21:40:16.789Z · LW · GW

Option 5: the questioner is optimizing a metric other than what appears to be the post's implicit "get max info with minimal number of questions, ignoring communication overhead", which is IMHO a weird metric to optimize to begin with - not only it does not take length/complexity of each question into account, but is also ignoring things like maintaining answerer wilingness to continue answering questions, not annoying the answerer, ensuring proper context so that a question is not misunderstood, and this is not even taking into account the possiblity that while the questioner does care about getting the information, they might also simultaneously care about other things.

Comment by Anon User (anon-user) on AI Risk and the US Presidential Candidates · 2024-01-07T04:16:27.021Z · LW · GW

Looks like a good summary of their current positions, but how about willingness to update their position and act decisively and based on actual evidence/data? De Santis's history of anti-mask/anti-vaccine stances have to be taken into account, perhaps? Same for Kennedy?

Comment by Anon User (anon-user) on Taboo "procrastination" · 2023-12-13T03:39:43.706Z · LW · GW

I am not working on X because it's so poorly defined that I dread needing to sort it out.

I not working on X because I am at a loss where to start

I feel like admiring the problem X and considering all the ways I could theoretically start solving it, so I am not actually doing something to solve it.

Comment by Anon User (anon-user) on Am I going insane or is the quality of education at top universities shockingly low? · 2023-11-20T23:39:32.377Z · LW · GW

For a professor at a top university, this would be easily 60+ hrs/week. https://www.insidehighered.com/news/2014/04/09/research-shows-professors-work-long-hours-and-spend-much-day-meetings claims 61hrs/week is average, and something like 65 for a full Professor. The primary currency is prestige, not salary, and prestige is generated by research (high-profile grants, high-profile publications, etc), not teaching. For teaching, they would likely care a lot more about advanced classes for students getting closer to potentially joining their research team, and a lot less about the intro classes (where many students might not even be from the right major) - those would often be seen as a chore to get out of the way, not as a meaningful task to invest actual effort into.

Comment by Anon User (anon-user) on What is democracy for? · 2023-11-09T20:25:09.824Z · LW · GW

So what system selects the best leader out of the entire population?

None - as Churchill said, democracy is the worst form of Government except for all those other forms that have been tried from time to time. Still, should be realistic when explaining the benefits.

Comment by Anon User (anon-user) on What is democracy for? · 2023-11-07T20:28:08.268Z · LW · GW

One theory of democracy’s purpose is to elect the “right” leaders. In this view, questions such as “Who is best equipped to lead this nation?” have a correct answer, and democracy is merely the most effective way to find that answer.

I think this is a very limiting view of instrumental goals of democracy. First, democracy has almost no chance of selecting the best leader - at best, it could help select a better one out of a limited set of options. Second, this ignores a key, IMHO the key, feature of democracy - keeping leaders accountable after they are elected. Democracy does not just start backsliding when a bad leader is elected, it starts backsliding when the allies of that leader become too willing to shield the "dear leader" from accountability.

Ensuring the leaders change is another important feature.

Comment by anon-user on [deleted post] 2023-11-04T16:34:16.238Z

I think the use of the term "AGI" without a specific definition is causing an issue here - IMHO the crux of the matter is the difference between the progress in average performance vs worst-case performance. We are having amazing progress in the former, but struggling with the latter (LLM hallucinations, etc). And robotaxis require an almost-perfect performance.

Comment by Anon User (anon-user) on A NotKillEveryoneIsm Argument for Accelerating Deep Learning Research · 2023-10-20T00:53:38.613Z · LW · GW

This makes assumptions that make no sense to me. Auto-GPT is already not passively safe, and there is no reason to be sure LLMs would remain myopic as they are scaled. LLMs are inscrutable matrixes of floating points that we are barely learning how to understand and interpret. We have no reliable way to predict when LLMs might hallucinate or misbehave in some other way. There is also no "human level" - LLMs are way faster than humans and are way more scalable than humans - there is no way to get LLMs that are as good as humans without having something that's way better than humans along a huge number of dimensions.

Comment by Anon User (anon-user) on Provably Safe AI · 2023-10-06T05:38:46.999Z · LW · GW

As a few commenters have already pointed out, this "strategy" completely fails in step 2 ("Specify safety properties that we want all AIs to obey"). Even for a "simple" property you cite, "refusal to help terrorists spread harmful viruses", we are many orders of magnitude of descriptive complexity away from knowing how to state them as a formal logical predicate on the I/O behavior of the AI program. We have no clue how to define "virus" as a mathematical property of the AI sensors in a way that does not go wrong in all kinds of corner cases, even less clue for "terrorist", and even less clue than that for "help". The gap between what we know how to specify today and the complexity of your "simple" property is way bigger than the gap between the "simple" property and most complex safety properties people tend to consider...

To illustrate, consider an even simpler partial specification - the AI is observing the world, and you want to formally define the probability that it's notion of whether it's seeing a dog is aligned with your definition of a dog. Formally, define a mathematical function of arguments that, with the arguments representing the RGB values for a 1024x1024 image, would capture the true probability that the image contains what you consider to be a dog - so that a neutral network that is proven to compute that particular function can be trusted to be aligned with your definition of a dog, while a neutral network that does something else is misaligned. Well, today we have close to zero clue how to do this. The closest we can do is to train a neutral network to recognize dog pictures, and than whatever function that network happens to compute (which, if written down as a mathematical function, would be an incomprehensible mess that, even if we optimize to reduce the size of, will probably tbe at least thousands of pages long) is the best formal specification we know how to come up with. (For things simpler than dogs we can probably do better by first defining a specification for 3d shapes, then projecting it onto 2d images, but I do not think this approach will be much help for dogs). Note that effectively we are saying to trust the neural network - whatever it learned to do is our best guess on how to formalize what it needed to do! We do not yet know how to do better!!!

Comment by Anon User (anon-user) on On being downvoted · 2023-09-17T21:25:04.415Z · LW · GW

Yes, of course, what I meant is more of a case of somebody confidently presenting as an self-evident truth something with a ton of well-known counterarguments. Or more generally, somebody that is not only clueless, but showing no awareness of how clueless they are, and no evidence that they at least tried to look for relevant information. [IMHO] Somebody who demonstrates willingness to learn deserves a comment pointing them to relevant information (and may still warrant a downvote, depending on how off the post it). Somebody who does not deserves to be downvoted, and usually would not deserve the time I would need to spend to explain my downvote in a comment. [/IMHO]

Comment by Anon User (anon-user) on On being downvoted · 2023-09-17T18:37:06.337Z · LW · GW

FWIW, most of my downvotes on LW are for poorly reasoned jumping to conclusions posts and/or where the poster does not seem to fully know what they are talking about and should have done more homework first. Would never downvote a well written post even if I 100% disagree.

Comment by Anon User (anon-user) on Can I take ducks home from the park? · 2023-09-15T00:03:48.554Z · LW · GW

Grammar issue in your Russian version - should be "Как я могу взять уток домой из парка?", or even better: "Как мне забрать уток из парка домой?"

Comment by Anon User (anon-user) on Socialism in large organizations · 2023-07-30T19:01:50.940Z · LW · GW

Sears tried creating an explicit internal economy. It did not end well. https://www.versobooks.com/blogs/news/4385-failing-to-plan-how-ayn-rand-destroyed-sears

Comment by Anon User (anon-user) on The cone of freedom (or, freedom might only be instrumentally valuable) · 2023-07-25T00:39:40.198Z · LW · GW

Everything else being equal, fast agile decisionmaking is better than slow and blunt one. Freedom does not just mean freedom to do X today, it also means freedom to change our minds bout X tomorrow. Do not regulate X because freedom means, a,ong other things, not trusting X to be regulated in sensible ways, and trusting individuals self-organizing more. Not saying this is always a good choice, but the potential pitfalls of things like regulatory capture need to be acknowledged.

Comment by Anon User (anon-user) on An AGI kill switch with defined security properties · 2023-07-08T21:30:52.574Z · LW · GW

If humans are supposed to be able to detect things going wrong and shut things down, that requires that they are exposed to the unencrypted feed. At this point, the humans are the weakest link, not the encryption. Similar for anything else external that you need / want AI to access while it's being trained and tested.

Edited to add: particularly if we are talking about not some theoretical sensible humans, but about real humans that started with "do not worry about LLMs, they are not agentic", and then promptly connected LLMs to agentic APIs.

Comment by anon-user on [deleted post] 2023-07-08T21:22:17.146Z

Maybe there is a better way to put it - SFOT holds for objective functions/environments that only depend on the agent I/O behavior. Once the agent itself is embodied, then yes, you can use all kinds of diagonal tricks to get weird counterexamples. Implications for alignment - yes, if your agent is fully explainable and you can transparently examine it's workings, chances are that alignment is easier. But that is kind of obvious without having to use SFOT to reason about it.

Edited to add: "diagonal tricks" above refers to things in the conceptual neighborhood of https://en.m.wikipedia.org/wiki/Diagonal_lemma

Comment by Anon User (anon-user) on An AGI kill switch with defined security properties · 2023-07-05T19:03:19.387Z · LW · GW

https://xkcd.com/538/ Crypto is not the weakest link.

Comment by anon-user on [deleted post] 2023-06-02T00:54:31.153Z

When an AGI takes on values for the first time, it must draw from the set of values which already exist or construct something similar from what already exists

The values come into the picture well before it's an AGI. First, a random neural network is initialized, and its "values" is a completely arbitrary function chosen as random. Over time, NN is trained towards an AGI and it's "values" take shape. By the time AGI emerges, it does not "take on values for the first time", the values emerge from an extremely long sequence of tiny mutations, each creating something very similar to what already existed, becoming more complex and coherent over time.

Comment by Anon User (anon-user) on No - AI is just as energy-efficient as your brain. · 2023-05-27T19:17:03.966Z · LW · GW

I made a similar point (but without specific numbers - great to have them!) in a comment https://www.lesswrong.com/posts/Lwy7XKsDEEkjskZ77/?commentId=nQYirfRzhpgdfF775 on a post that posited human brain energy efficiency over AIs as a core anti-doom argument, and I also think that the energy efficiency comparisons are not particularly relevant either way:

Humanity is generating and consuming enormous amount of power - why is the power budget even relevant? And even if it was, energy for running brains ultimately comes from Sun - if you include the agriculture energy chain, and "grade" the energy efficiency of brains by the amount of solar energy it ultimately takes to power a brain, AI definitely has a potential to be more efficient. And even if a single human brain is fairly efficient, the human civilization is clearly not. With AI, you can quickly scale up the amount of compute you use, but scaling beyond a single brain is very inefficient.

Comment by anon-user on [deleted post] 2023-05-27T19:05:58.457Z

Well, yeah, if you specifically choose a crippled version of the high-U agent that is somehow unable to pursue the winning strategy, it will loose - but IMHO that's not what the discussion here should be about.

Comment by Anon User (anon-user) on A rejection of the Orthogonality Thesis · 2023-05-27T19:00:27.931Z · LW · GW

And Gordon Seidoh Worley is not saying there can't be good arguments against orthogonality thesis that would deserve uovotes, just that this one is not one of those.

Comment by Anon User (anon-user) on A rejection of the Orthogonality Thesis · 2023-05-27T18:54:05.477Z · LW · GW

This line of reasoning is absurd: it assumes an agent knows in advance the precise effects of self-improvement — but that’s not how learning works! If you knew exactly how an alteration in your understanding of the world would impact you, you wouldn’t need the alteration: to be able to make that judgement, you’d have to be able to reason as though you had already undergone it.

It seems there is some major confusion is going on here - it is, generally speaking, imporrible to know the outcome of an arbitrary computation without actually running it, but that does not mean it's impossible to design a specific computation in a way you'd know exactly what the effects would be. For example, one does not need to know the trillionth digit of pi in order to write a program that they could be very certain would compute that digit.

You also seem to be too focused on minor modifications of a human-like mind, but focusing too narrowly on minds is also missing the point - focus on optimization programs instead.

For many different kinds of X, it should be possible to write a program that given a particular robotics apparatus (just the electromechanical parts without a specific control algorithm), predicts which electrical signals sent to robot's actuators would result in more X. You can then place that program inside the robot and have the program's output wired to the robot controls. The resulting robot does not "like" X, it's just robotically optimizing for X.

The orthogonality principle just says that there is nothing particularly special about human-aligned Xs that would make the X-robot more likely to work well for those Xs over Xs that result in human extinction (e.g. due to convergent instrumental goals, X does not need to specifically be anti-human).

Comment by anon-user on [deleted post] 2023-05-17T16:30:41.302Z

Wait, if Clip-maniac finds itself in a scenario where Clippy would achieve higher U then itself, the rational thing for it would be to self-modify into Clippy, and the Strong Form would still hold, wouldn't it?

Comment by Anon User (anon-user) on Contra Yudkowsky on AI Doom · 2023-04-24T08:53:55.406Z · LW · GW

Exactly! I'd expect compute to scale way better than humans - not necessarily because the intelligence of compute scales so well, but because the intelligence of human groups scales so poorly...

Comment by Anon User (anon-user) on Votes-per-Dollar · 2023-04-24T00:56:24.378Z · LW · GW

The advertising has to be visible, but who exactly paid for it does not have to be. And there are plenty of less obvious spending (e.g. paying people to go door-to-door, phone calls, etc, etc - pay people, then claim they were volunteers?).

Comment by Anon User (anon-user) on Contra Yudkowsky on AI Doom · 2023-04-24T00:54:02.075Z · LW · GW

Humanity is generating and consuming enormous amount of power - why is the power budget even relevant? And even if it was, energy for running brains ultimately comes from Sun - if you include the agriculture energy chain, and "grade" the energy efficiency of brains by the amount of solar energy it ultimately takes to power a brain, AI definitely has a potential to be more efficient. And even if a single human brain is fairly efficient, the human civilization is clearly not. With AI, you can quickly scale up the amount of compute you use, but scaling beyond a single brain is very inefficient.

Comment by Anon User (anon-user) on Prediction: any uncontrollable AI will turn earth into a giant computer · 2023-04-17T19:20:47.866Z · LW · GW

Temporal discounting is a thing - not sure why you are certain an ASI would not have enough temporal discounting in its value function to be unwilling to delay gratification by so much.

Comment by Anon User (anon-user) on [linkpost] "What Are Reasonable AI Fears?" by Robin Hanson, 2023-04-23 · 2023-04-15T20:50:56.575Z · LW · GW

Doomers worry about AIs developing “misaligned” values. But in this scenario, the “values” implicit in AI actions are roughly chosen by the organisations who make them and by the customers who use them

I think this is the critical crux of the disagreement. A part of the Elizer's argument, as I understand it, is that the current technology is completely incapable of anything close to actually "roughly choosing" the AI values. On this point, I think Elizer is completely right.

Comment by Anon User (anon-user) on Votes-per-Dollar · 2023-04-15T20:19:33.259Z · LW · GW

Hm? With the current system, at least the final vote counting process is relatively transparent. Yes, there are some opportunities to cheat on the margin of election finance laws, but importantly that opportunity is before the vote count, so has to be balances against the negative electoral consequences of being credibly accused of cheating. With you system, the final accounting happens after the vote, and in a close election, there is just too much incentive to cheat at that point...

Comment by Anon User (anon-user) on GPT-4 is easily controlled/exploited with tricky decision theoretic dilemmas. · 2023-04-15T20:14:05.446Z · LW · GW

Interesting. But I am wondering - would the results been much different with pre-RLHF version of GPT-4? The GPT-4 paper has a figure showing that GPT-4 was close to perfectly calibrated before RLHF, and became badly calibrated after. Perhaps it's something similar here?

Comment by Anon User (anon-user) on Votes-per-Dollar · 2023-04-10T17:23:03.484Z · LW · GW

Voting has a dual role - not just determining the winner, but also demonstrating to the losers and their supporters that they lost fairly, in the most transparent way possible. How do you convince the losers that the winners did not cheat on their budget reporting? How do you account for "unpaid" volunteers? How do you account for "uncoordinated" spending by non-candidates?

Comment by Anon User (anon-user) on Is it correct to frame alignment as "programming a good philosophy of meaning"? · 2023-04-08T21:47:45.616Z · LW · GW

Note that your "1" has two words that both carry very heavy load - "uses" and "correct". What does it mean for a model to be correct? How do you create one? How do you ensure that the model you implemented in software is indeed correct? How do you create AI that actually uses that model under all circumstances? In patcicular, how do you ensure that it is stable under self-improvement, out-of-distribution environments, etc? Your "2-4" seem to indicate that you are focusing more on the "correct" part, and not enough on the "uses" part. My understanding is that if both "correct" and "uses" could be solved, it would indeed likely be a solution to the alignment problem, but it's probably not the only path, and not necessarily the most promising one. Other paths could potentially emerge from the work on AI corrigibility, negative side-effect minimization, etc.

Comment by Anon User (anon-user) on How Politics interacts with AI ? · 2023-03-26T17:38:49.431Z · LW · GW

Rules will not support development of powerful AGI as it might threaten to overpower them

is probably true, but only because you used the word "powerful" rather than "capable". Rulers would definitely want development of capable AGIs as long as they believe (however incorrectly) in their ability to maintain power/control over those AGIs.

In fact, rulers are likely to be particularly good at cultivating capable underlings they they maintain firm control of. It may cause them to overestimate their ability to do the same for AGI. In fact, if they expect an AGI to be less agentic, they might expect it to actually be easier to maintain control over an "we just program it to obey" AGI, and prefer that over what they perceive to be inherently less predictable humans.

Comment by Anon User (anon-user) on Nudging Polarization · 2023-03-25T20:31:03.964Z · LW · GW

In modern politics, simple messages tend to work a lot better than nuanced ones (which is a thing that Donald Trump masterfully exploited). "X is good/bad" is a much simpler message than "X is good, but only if it's X1, and not X2" and having primary opponents claim "By supporting X, [politician] argees with the evil other-siders in their support for X2! [Politician] is an our-sider-in-name-only!"

Comment by Anon User (anon-user) on Good News, Everyone! · 2023-03-25T16:12:10.903Z · LW · GW

Not just disinformation - any information that does not fit their preconceived worldview - it's all "fake news", don't you know?

Comment by Anon User (anon-user) on Are COVID lab leak and market origin theories incompatible? · 2023-03-20T16:46:23.727Z · LW · GW

The lab is known to have been studying bats - weren't those sold on the market too?

Comment by Anon User (anon-user) on Are COVID lab leak and market origin theories incompatible? · 2023-03-20T16:44:06.790Z · LW · GW

"Lab leak" doesn't necessarily imply "created in a lab".

Right, I was sloppy, replaced "created" with "studied"

Comment by Anon User (anon-user) on Why We MUST Build an (aligned) Artificial Superintelligence That Takes Over Human Society - A Thought Experiment · 2023-03-07T23:59:17.462Z · LW · GW

I've axiomatically set P(win) on path one equal to zero. I know this isn't true in reality and discussing how large that P(win) is and what other scenarios may result from this is indeed worthwhile, but it's a different discussion.

Your title says "we must". You are allowed to make conditional arguments from assumptions, but if your assumptions are demonstratively take away most of the P(win) paths out of consideration, yoour claim that the conclusions derived in your skewed model apply to real life is erroneous. If your title was "Unless we can prevent the creation of AGI capable of taking over the human society, ...", you would not have been downvotes as much as you have been.

The clock would not be possible in any reliable way. For all we know, we could be a second before midnight already, we could very well be one unexpected clever idea away from ASI. From now on, new evidence might update P(current time is >= 11:59:58) in one direction or another, but extremely unlikely that it would ever get back to being close enough to 0, and it's also unlikely that we will have any certainty of it before it's too late.

Comment by Anon User (anon-user) on GÖDEL GOING DOWN · 2023-03-06T23:45:31.438Z · LW · GW

very little has been said about whether it is possible to construct a complete set of axioms

Huh? Didn't Gödel conclusively prove that the answer to pretty much every meaningful form of your question is "no"?

Comment by Anon User (anon-user) on What should we do about network-effect monopolies? · 2023-03-06T21:02:40.797Z · LW · GW

You might enjoy Cory Doctorow's take on this - such as https://onezero.medium.com/demonopolizing-the-internet-with-interoperability-b9be6b851238 and https://locusmag.com/2023/01/commentary-cory-doctorow-social-quitting/

Comment by Anon User (anon-user) on Why We MUST Build an (aligned) Artificial Superintelligence That Takes Over Human Society - A Thought Experiment · 2023-03-05T23:12:34.797Z · LW · GW

I'll first summarize the parts I agree with in what I believe you are saying.

First, you are saying, effectively that there are two theoretically possible paths to success:

  1. Prevent the situation where an ASI takes over the world.
  2. Make sure that ASI that takes over the world is fully aligned.

You are then saying that the likelihood on winning on path one is so small as to not be worth discussing in this post.

The issue is that you then conclude that since the P(win) on path one is so close to 0, we ought to focus on path 2. The fallacy here is the P(win) appears very close to 0 on both paths, so we have to focus on whatever path that has a higher P(win), no matter how impossibly low it is. And to do that, we need to directly compare the P(win) on both.

Consider this - what is the harder task - to create a fully aligned ASI that would remain fully aligned for the rest of the lifetime of the universe, regardless of whatever weird state the universe ends up in as a result of that ASI, or to create an AI (not necessarily superhuman) that is capable of correctly making one pivotal action that is sufficient for preventing ASI takeover in the future (Elizer's placeholder example - go ahead and destroy all GPUs in the world, self-destructing in the process) without killing humanity in the process? Would not you agree that when the question is posed that way, it seems a lot more likely that the latter is something we'd actually be able to accomplish?

Comment by anon-user on [deleted post] 2023-03-02T03:07:48.558Z

I think your intuition that learning from only positive examples is very inefficient is likely true. However, if additional supervised fine-tuning is done, then the models also effectively learns from its mistakes and could potentially become a lot better fast.