Posts

P(doom|superintelligence) or coin tosses and dice throws of human values (and other related Ps). 2023-04-22T10:06:47.701Z

Comments

Comment by Muyyd on [deleted post] 2024-07-18T01:12:24.926Z

Roman V. Yampolskiy have a paper (The Universe of Minds - 1 Oct 2014). I think it shoud be mentioned here.

Comment by Muyyd on Seeking feedback on a critique of the paperclip maximizer thought experiment · 2024-07-18T00:29:24.729Z · LW · GW

1. The paperclip maximizer oversimplifies AI motivations and neglects the potential for emergent ethics in advanced AI systems.

2. The doomer narrative often overlooks the possibility of collaborative human-AI relationships and the potential for AI to develop values aligned with human interests.

Because it is a simple (entrance-level) example of unintended consequences. There is a post about emergent phenomena - so ethics will definetly emerge, but problem lies in probability-chances (and not in overlooking the possibility) that it (behavior of AI) will happen to be to our liking. Slim chances of that comes from size of Mind Design Space (this post have a pic) and from tremendous difference between man-hours of very smart humans invested in increasing capabilities and man-hours of very smart humans invested in alignment (Don't Look Up - The Documentary: The Case For AI As An Existential Threat on Youtube - 5:45 about this difference).

3. Current AI safety research and development practices are more nuanced and careful than the paperclip maximizer scenario suggests.

They are not - we are long past simple entry-level examples and AI safety (in practice by Big Players) got worse, even if it is looks more nuanced and careful. Some time ago AI safety meant something like "how to keep AI contained in its air-gapped box during value-extraction process" and now it means something like "is it safe for the internet? And now? And now? And now?". So all differences in practices are overshadowed by complexity of  new task - make your new AI more capable than competing systems and safe enough for the net. AI safety problems got more nuanced too.

There were posts about Mind Design Space by Quintin Pope

Comment by Muyyd on The commenting restrictions on LessWrong seem bad · 2023-09-19T10:37:07.303Z · LW · GW

Here goes my one comment for a day. Or may be not? Who knows? It is not like i can look up my restrictions or their history in my account page. I will have to make two comments to figure out if there a some changes. 

One comment per day is heavily discuraging to participation. 

Comment by Muyyd on Carl Shulman on The Lunar Society (7 hour, two-part podcast) · 2023-06-28T17:02:58.416Z · LW · GW

Here how you can talk about bombing unlicensed datacenters without using "strike" and "bombing".

If we can probe the thoughts and motivations of an AI and discover wow, actually GPT-6 is planning to takeover the world if it ever gets the chance. That would be an incredibly valuable thing for governments to coordinate around because it would remove a lot of the uncertainty, it would be easier to agree that this was important, to have more give on other dimensions and to have mutual trust that the other side actually also cares about this because you can't always know what another person or another government is thinking but you can see the objective situation in which they're deciding. So if there's strong evidence in a world where there is high risk of that risk because we've been able to show actually things like the intentional planning of AIs to do a takeover or being able to show model situations on a smaller scale of that I mean not only are we more motivated to prevent it but we update to think the other side is more likely to cooperate with us and so it's doubly beneficial.

Here is alternative to dangerous experiments to develop enhanced cognition in humans. Sounds less extreme and little more doable.

Just going from less than 1% of the effort being put into AI to 5% or 10% of the effort or 50% or 90% would be an absolutely massive increase in the amount of work that has been done on alignment, on mind reading AIs in an adversarial context. 

If it's the case that as more and more of this work can be automated and say governments require that you put 50% or 90% of the budget of AI activity into these problems of make this system one that's not going to overthrow our own government or is not going to destroy the human species then the proportional increase in alignment can be very large even just within the range of what we could have done if we had been on the ball and having humanity's scientific energies going into the problem. Stuff that is not incomprehensible, that is in some sense is just doing the obvious things that we should have done.

Also pretty bizarre that in response to 

Dwarkesh Patel 02:18:27

So how do we make sure it's not the thing it learns is not to manipulate us into rewarding it when we catch it not lying but rather to universally be aligned.

Carl Shulman 02:18:41

Yeah, so this is tricky. Geoff Hinton was recently saying there is currently no known solution for this. 

The answer was: yes, but we are doing it anyway. But with a twists like adversarial examples, adversarial training and simulations. If Shulman had THE ANSWER to Alignment problem then he would not kept it secret, but i cant help but feel some disappointment, because he sounds SO hopeful and confident. I somehow expected something different than variation of "we are going to us weaker AIs to help us to align stronger AIs while trying to outrun capabilities research teams". Even if this variation (in his description) seems very sophisticated with mind reading and inducing hallucinations.

Comment by Muyyd on All AGI Safety questions welcome (especially basic ones) [May 2023] · 2023-05-10T09:41:05.238Z · LW · GW

How slow humans perception comparing to AI? Is it a pure difference of "signal speed of neurons" and "signal speed of copper/aluminum"?

Comment by Muyyd on What can we learn from Bayes about reasoning? · 2023-05-05T20:57:32.171Z · LW · GW

Absent hypotheses do not produce evidence. Often (in some cases you can notice confusion but it is hard thing to notice until it up in your face) you need to have a hypothesis that favor certain observation as evidence to even observe it, to notice it. It is source for a lot of misunderstandings (along with a stupid priors of course). If you forget that other people can be tired or in pain or in a hurry, it is really easy to interpret harshness as evidence in favor of "they dont like me" (they can still be in a hurry and dislike you, but well...) and be done with it. After several instances of it you will be convinced enough to make changing your mind very difficult (confirmation bias difficult) so alternatives need to be present in your mind before encounter with observation. 

Vague hypotheses ("what if we are wrong?") and negative ("what if he did not do this?") are not good at producing evidence to. To be useful they have to be precise and concrete and positive (this is easy to check in some cases by visualisation - how hard it is to do and if it even possible to visualise).

Comment by Muyyd on [linkpost] "What Are Reasonable AI Fears?" by Robin Hanson, 2023-04-23 · 2023-04-15T04:42:19.148Z · LW · GW

Did not expect to see such strawmanning from Hanson. I can easily imagine a post with less misrepresentation. Something like this.

Yudkowsky and the signatories to the moratorium petition worry most about AIs getting “out of control.” At the moment, AIs are not powerful enough to cause us harm, and we hardly know anything about the structures and uses of future AIs that might cause bigger problems. But instead of waiting to deal with such problems when we understand them better and can envision them more concretely later, AI “doomers” want to redirect most  if not all computational, capital and human resources from making black-boxed AIs more capable to research avenues that directed to the goal of obtaining precise understanding of inner structure of current AIs now and make this redirection enforced by law including most dire (but legal) methods of law enforcement.

instead of this (original). But that's would be a different article written by someone else. 

Yudkowsky and the signatories to the moratorium petition worry most about AIs getting “out of control.” At the moment, AIs are not powerful enough to cause us harm, and we hardly know anything about the structures and uses of future AIs that might cause bigger problems. But instead of waiting to deal with such problems when we understand them better and can envision them more concretely, AI “doomers” want stronger guarantees now.

Comment by Muyyd on Evolution provides no evidence for the sharp left turn · 2023-04-12T12:29:39.620Z · LW · GW

Evolutionary metaphors is about huge differences between evolutionary pressure in ancestral environment and what we have now: ice cream, transgenders, lesswrong, LLMs, condoms and other contraceptives. What kind of "ice cream" AGI and ASI will make for itself? May be it can be made out of humans, put them in vats and let them dream inputs for GPT10?

Mimicry is product of evolution too. Also - social mimicry.

I have thoughts about reasons for AI to evolve human-like morality too. But i also have thoughts like "this coin turned up heads 3 times in a row, so it must turn tails next". 

Comment by Muyyd on Stupid Questions - April 2023 · 2023-04-09T13:09:47.843Z · LW · GW

>But again... does this really translate to a proportional probability of doom?

If you buy a lottery ticket and get all (all out of n) numbers right, then you have glorious transhumanists utopia (still some people will get very upset). And if you get wrong a single number, then you get a weirdtopia and may be distopia. There is an unknown quantity of numbers to guess, and single ticket cost a billion now (and here enters the discrepancy). Where i get so many losing tickets? From Mind Design Space. There is also and alternative that suggests that space of possibilities is much smaller.

It is not enough to get some alignment, and it seems that we need to get clear on difference between utility maximisers (ASI and AGI) and behavior executors (humans and dogs and monkeys). That's is where "AGI is proactive (and synonyms)" part based on.

So the probability of doom is proportioned to the probability of buying a losing (not getting all numbers right) ticket.

Comment by Muyyd on Stupid Questions - April 2023 · 2023-04-07T20:06:10.148Z · LW · GW

From how discrepancy between temp/resources allocated to alignment research and capability research looks to lay person (to me), the doom scenario is closer to a lottery than to a story. I don't see why it would be winning number. I 99,999 sure that ASI will be proactive (and all kind of synonyms to this word). It all mostly can be summarised with "fast takeoff" and "human values are fragile". 

Comment by Muyyd on Stupid Questions - April 2023 · 2023-04-07T19:28:05.517Z · LW · GW

It is not really a question but i will formulate it as if it was. 
Is current LLMs are capable enough to output real rationality exercises (and not a somewhat plausible sounding nonsense) that follows natural way (you dont get a "15% of the cabs are Blue" usually) of how information is presented in life? Can it give 5 new problems to do every day so i can train my sensitivity to prior probability of outcomes? In real life you don't get percentages. Just a problems like: "can't find my wallet, was it stolen?" Can they guide through solution process?
There also:

  • P(D/~H) weight of evidence assigned by alternative hypotheses types of problems
  • How to notice hidden conjunctions (as in Linda problem)
  • Less obvious problems than probability of pregnancy, given that intercourse has occurred and probability of intercourse, given that pregnancy has occurred. 
  • Syllogisms.
Comment by Muyyd on My Model of Gender Identity · 2023-04-01T11:22:02.941Z · LW · GW

Is there are work being done to map (in transcommunity or somewhere else) with precise language this distinct observable objects? Or it is to early and there is no consensus?

  • Born female, calls themselves "woman" (she/her)
  • Born female, calls themselves "man" (he/him)
  • Born female, had pharmacological and/or surgical interventions, calls themselves "man" (he/him)
  • Born male, calls themselves "man" (he/him)
  • Born male, calls themselves "woman" (she/her)
  • Born male, had pharmacological and/or surgical interventions, calls themselves "woman" (she/her)
Comment by Muyyd on Pausing AI Developments Isn't Enough. We Need to Shut it All Down by Eliezer Yudkowsky · 2023-04-01T10:08:48.349Z · LW · GW

I was hoping that he meant some concrete examples but did not elaborate on this due this being letter in magazine and not a blog post. The only thing that comes to my mind in somehow measure unexpected behavior and if bridge some times lead people in circles then it will be definitely cause for concern and reevaluation of used technics.

Comment by Muyyd on "Dangers of AI and the End of Human Civilization" Yudkowsky on Lex Fridman · 2023-03-31T23:36:46.994Z · LW · GW

Humans, presumably, wont have to deal with deception between themselves so if there is sufficient time they can solve Alignment. If pressed for time (as it is now) then they will have to implement less understood solutions because thats the best they will have at the time. 

Comment by Muyyd on "Dangers of AI and the End of Human Civilization" Yudkowsky on Lex Fridman · 2023-03-31T23:26:18.799Z · LW · GW

Capabilities advance much faster that alignment, so there is likely no time to do meticulous research. And if you will try to use weak AIs as shortcut to outrun current "capabilities timeline" then you will somehow have to deal with suggestor and verifier problem (with much harder to verify suggestions than a simple math problems) which is not wholly about deception but also filtering somewhat working staff that may steer alignment in right direction. And may be not. 

But i agree that this collaboration will be successfully used for patchwork (because shortcuts) alignment of weak AIs to placate general public and politicians. All of this depends on how hard Alignment problem is. Hard as EY think or may be harder or easier.

Comment by Muyyd on Pausing AI Developments Isn't Enough. We Need to Shut it All Down by Eliezer Yudkowsky · 2023-03-31T22:08:56.846Z · LW · GW

Do we have an idea of how this tables about ML should look like? I dont know about ML that much.

Comment by Muyyd on Pausing AI Developments Isn't Enough. We Need to Shut it All Down by Eliezer Yudkowsky · 2023-03-30T11:17:44.156Z · LW · GW
  • If we held anything in the nascent field of Artificial General Intelligence to the lesser standards of engineering rigor that apply to a bridge meant to carry a couple of thousand cars, the entire field would be shut down tomorrow.

What are examples that can help to see this tie more clearly? Procedures that works similarly enough to say "we do X during planning and building a bridge and if we do X in AI building...". Is there are even exist such X that can be applied to enginering a bridge and enginering an AI? 

Comment by Muyyd on My Objections to "We’re All Gonna Die with Eliezer Yudkowsky" · 2023-03-21T11:07:02.287Z · LW · GW

DeepMind can't just press a button and generate a million demonstrations of scientific advances, and objectively score how useful each advance is as training data, while relying on zero human input whatsoever.

It can't now (or it can?). Is there no 100 robots in 100 10x10 meters labs trained with recreating all human technology from stone age and after? If it is cost less than 10 mil then they probably are. This is a joke but i don't know how offtarget it is.

Comment by Muyyd on My Objections to "We’re All Gonna Die with Eliezer Yudkowsky" · 2023-03-21T10:22:49.831Z · LW · GW

Discussion of human generality.
It should be named Discussion of "human generality versus Artificial General Intelligence generality". And there is exist example of human generality much closer to 'okay, let me just go reprogram myself a bit, and then I'll be as adapted to this thing as I am to' which is not "i am going to read a book or 10 on this topic" but "i am going to meditate for couple of weeks to change my reward circuitry so i will be as interested in coding after as i am interested in doing all side quests in Witcher 3 now"and "i as a human have documented thing known as "Insensitivity to prior probability" so i will go and find 1000 examples of some probabilistic inference in the internet and use them to train sensitivity". And humans can't do that. Imagine how "general" humans would be if they could? But if there will be a machine intelligence that can perform a set of tasks so it would be unquestionable named "general", i would expect for it to be capable of rewriting part of its code with purpose. This is a speculation of course but so is this AGI-entity, whatever it is.


How to think about superintelligence
Here you completely glossed over topic of superintelligence. It is ironic. EY made his prediction about "current influx of money might stumble upon something" and he did not made a single argument in favor. But you wrote list with 11 entries and 5 paragraphs to argue why it is unlikely. But then EY speaks about efficient market hypothesis and chess and efficiency of action... You did not engaged with the specific argument here. I am in agreement that devs have more control over weaker AIs. But SUPER in superintelligence is one of the main points. It is SUPER big point. This is a speculation of course but you and EY both seems in agreement on danger and capabilities of current AIs (and they are not even "general"). I know i did not wrote a good argument here but i do not see a point there to argue against.

The difficulty of alignment
If you stop the human from receiving reward for eating ice cream, then the human will eat chocolate. And so on and so on. look at what stores have to offer that sugary but not ice cream or chocolate. You have to know in advance that liking apples and berries and milk AND HONEY will result in discovering and creating ice cream. In advance - that's the point of ice cream metaphor. 
And by the point that humanity understood connection between sugar and brain and evolution it made cocaine and meth. Because it is not about sugar but reward circuitry. So you have to select for reward circuitry that (and surrounding it apparatus) won't invent cocaine before it does. In advance. Far in advance.
And some humans like cocaine so much that we could say that their value system cleanly revolves around the one single goal. Or may be there is no equal example of cocaine for AI. But then sugar is still valid. Because we are at "worm intelligence" (?) now in terms of evolution metaphor and it is hard to tell at this point in time will this thing make an ice cream truck sometime (5 to 10 years from now) in the future. But the you wrote a lot about why there is a better examples than evolution. But you also engaged with ice cream argument so i engaged with it too.

Why aren't other people as pessimistic as Yudkowsky?
As much as i agree with EY, even for me a thought "they should spend 10 times as much in alignment research than in capability increasing research" truly alien and counterintuitive. I mean "redistribution" not "even more money and human resourses". And for people whose money and employees i am now so brazenly boss here this kind of thinking even more alien and counterintuitive.
I can see that most of your disagreement here comes from different value theory and how fragale human value is. And it is a crux of the matter on "99% and 100%". That's why you wrote [4]
I expect there are pretty straightforward ways of leveraging a 99% successful alignment method into a near-100% successful method by e.g., ensembling multiple training runs, having different runs cross-check each other, searching for inputs that lead to different behaviors between different models, transplanting parts of one model's activations into another model and seeing if the recipient model becomes less aligned, etc

It would be great if you are right. But you wrote [4] and this is prime example of "They're waiting for reality to hit them over the head". If you are wrong on value theory then this 1% is what differentiate "inhuman weirdtopia" from "weird utopia" of post-ASI world in the best case.

Overall. You have different views on what is AGI, and what is a superintelligence, and your shard theory of human values. But you missed "what is G means in AGI" argument. And did not engage in "how to think about superintelligence" part (and it is superimportant). And missed "ice cream" argument. The shard theory of values i did not know, may be i will read it now - is seems major point in your line of reasoning.
 

Comment by Muyyd on Ethical AI investments? · 2023-03-18T03:25:04.099Z · LW · GW

Does 'ethical safe AI investments' means 'to help make AI safer and make some money at the same time'?

Comment by Muyyd on GPT-4 · 2023-03-14T18:00:33.143Z · LW · GW

But how good it can be, realistically? I will be so so much surprised if all this details wont be leaked in next week. May be they will try to make several false leaks to muddle things a bit.

Comment by Muyyd on Bankless Podcast: 159 - We’re All Gonna Die with Eliezer Yudkowsky · 2023-03-08T12:51:26.588Z · LW · GW

Lack of clarity when i think about this limits makes hard for me to see how end result will change if we could somehow "stop discounting" them. 
It seems to my that we will have to be much more elaborete in describing parameters of this thought experiment. In particular we will have to agree on deeds and real world achivments that hypothetical AI has, so we will both agree to call it AGI (like writing interesting story and making illustrations so this particular research team now have a new revenue strem from selling it online - will this make AI an AGI?). And security conditions (air-gapped server-room?). This will get us closer to understanding "the rationale".
But then your question is not about AGI but "superintelligent AI" so we will have to do elaborate describing again with new parameters. And that is what i expect Eliezer (alone and with other people) had done a lot. And look what it did to him (this is a joke but at the same time - not). So i will not be an active participant further. 
It is not even about a single SAI in some box: compeeting teams, people running copies (legal and not) and changing code, corporate espionage, dirty code...

Comment by Muyyd on Bankless Podcast: 159 - We’re All Gonna Die with Eliezer Yudkowsky · 2023-02-24T11:08:12.579Z · LW · GW

It is the very same rationale that stands behind assumptions like "why Stockfish won't execute losing set of moves" - it is just that good at chess. Or better - it is just that smart when it come down to chess.

In this thought experiment the way to go is not to "i see that AGI could likely fail at this step, therefore it will fail" but to keep thinking and inventing better moves for AGI to execute, which won't be countered as easily. It is an important part of "security mindset" and probably major reason why Eliezer speaks about lack of pessimism in the field.

Comment by Muyyd on Full Transcript: Eliezer Yudkowsky on the Bankless podcast · 2023-02-24T08:37:26.924Z · LW · GW

Both times my talks went that way (why they did not raise him good - why we could not program AI to be good; cant we keep on eye on them, and so on), but it would take to long to summarise something like 10 minutes dialog, so i am not going to do this. Sorry. 

Comment by Muyyd on Full Transcript: Eliezer Yudkowsky on the Bankless podcast · 2023-02-24T07:57:56.360Z · LW · GW

Evolution: taste buds and ice cream, sex and condoms... This analogy always was difficult to use in my experience. A year ago i came up with less technical. KPIs (key performance indicators) as inevitable way to communicate goals (to AI) to ultra-high-IQ psycopath-genius who's into malicious compliance (kinda cant help himself being clone of Nicola Tesla, Einstain and bunch of different people, some of them probably CEO, becouse she can). 

I have used it only 2 times and it was way easier than talks about different optimisation processes. And it took me only something like 8 years to come up with!