Do-it-yourself-science Wiki 2011-06-22T10:25:24.395Z · score: 4 (5 votes)
Fusing AI with Superstition 2010-04-21T11:04:26.570Z · score: -6 (25 votes)


Comment by drahflow on Open thread, Sep. 26 - Oct. 02, 2016 · 2016-10-11T08:42:55.290Z · score: 0 (0 votes) · LW · GW


The ternary relation T1(e,i,x) takes three natural numbers as arguments. The triples of numbers (e,i,x) that belong to the relation (the ones for which T1(e,i,x) is true) are defined to be exactly the triples in which x encodes a computation history of the computable function with index e when run with input i, and the program halts as the last step of this computation history.

In other words: If someone gives you an encoding of a program, an encoding of its input and a trace of its run, you can check with a primitive recursive function whether you have been lied to.

Comment by drahflow on Open thread, Sep. 26 - Oct. 02, 2016 · 2016-09-27T10:33:08.450Z · score: 1 (3 votes) · LW · GW

A counterexample to your claim: Ackermann(m,m) is a computable function, hence computable by a universal Turing machine. Yet it is designed to be not primitive recursive.

And indeed Kleene's normal form theorem requires one application of the μ-Operator. Which introduces unbounded search.

Comment by drahflow on The Philosophical Implications of Quantum Information Theory · 2016-02-26T15:18:40.950Z · score: 0 (0 votes) · LW · GW

I don't buy your first argument against time-travel. Even under the model of the universe as a static mathematical object connected by wave-function consistency constraints, there is still a consistent interpretation of the intuitive notion of "time travel":

The "passage" of time is the continuous measurement of the environment by a subsystem (which incidentally believes itself to be an 'observer') and the resulting entanglement with farther away parts of the system as "time goes on" (i.e. further towards positive time). Then time-travel is a measurement of a "past" state or described differently (but the same thing) an entanglement between a subsystem (the location in the past the traveler visited) and its surroundings, which does not respect the common constraint that entanglement propagates at speed of light (because the traveler came from some future location (and its past light-cone) which is -- "surprisingly" -- entangled with the past). While violating common understanding of space-time, it is not logically impossible in this understanding of the universe.

This time-travel allows interaction with the past (which are not different from observations anyway).

Do I overlook something here?

Comment by drahflow on The ethics of eating meat · 2016-02-19T22:28:57.919Z · score: 0 (0 votes) · LW · GW

Here is my attempt to convince you also of 1 (in your numbering):

I disagree with your: "From a preference utilitarian Perspective, only a self-conscious being can have preferences for the future, therefore you can only violate the preferences of a self-conscious being by killing it."

To the contrary, every agent which follows an optimization goal exhibits some preference (even if itself does not understand them). Namely that its optimization goal shall be reached. The ability to understand ones own optimization goal is not necessary for a preference to be morally relevant, otherwise babies and even unconscious people would not have moral weight. (And even non-sleeping people don't understand all their optimization goals.)

This leaves the problem of how to weight various agents. A solution which gives equal weight "per agent" has ugly consequences (because we should all immediately take immunosuppressants to save the bacteria) and is ill-defined, because many systems allow multiple ways to count "agents" (each cell has equal weight? each organ? each human? each family? each company? each species? each gene allele?).

A decent solution seems to be to take computing power (alternatively: the ability to reach the optimization goals) of the system exhibiting optimizing behavior as a "weight" (If only for game-theoretic reasions; it certainly makes sense to value preferences of extremly powerful optimizers strongly). Unfortunately, there is no clear scale of "computing power" one can calculate with. Extrapolating from intuition gives us a trivial weight for bacterias' goals and a weight near our own for the goals of other humans. In the concrete context of killing animals to obtain meat, it should be observed that animals are generally rather capable of reaching their goals in the wild (e.g. getting food, spawning offspring) - better than human children, I'd say.

Comment by drahflow on The Fable of the Burning Branch · 2016-02-10T01:34:47.531Z · score: 6 (8 votes) · LW · GW

I, for one, like my moral assumptions and cached thoughts challenged regularly. This works well with repugnant conclusions. Hence I upvoted this post (to -21).

I find two interesting questions here:

  1. How to reconcile opposing interests in subgroups of a population of entities whose interests we would like to include into our utility function. An obvious answer is facilitating trade between all interested to increase utility. But: How do we react to subgroups whose utility function values trade itself negatively?

  2. Given that mate selection is a huge driver of evolution, I wonder if there is actually a non-cultural, i.e. genetic, component to the aversion (which I feel) against providing everyone with sexual encounters / the ability to create genetic offspring / raise children. And I'd also be interested in hearing where other people feel the "immoral" line...

Comment by drahflow on The Fable of the Burning Branch · 2016-02-09T23:19:43.253Z · score: 1 (1 votes) · LW · GW

Interestingly, there appears (at least in my local cultural circle) that being attended by human caretakers when incapacitated by age, is supposed to be a basic right. Hence, there must be some other reason - and not just the problem about rights being fulfilled by other persons, why the particular example assumed to underlie the parable, is reprehensible to many people.

Comment by drahflow on Your transhuman copy is of questionable value to your meat self. · 2016-01-06T14:18:20.785Z · score: 5 (5 votes) · LW · GW

To disagree with this statement is to say that a scanned living brain, cloned, remade and started will contain the exact same consciousness, not similar, the exact same thing itself, that simultaneously exists in the still-living original. If consciousness has an anatomical location, and therefore is tied to matter, then it would follow that this matter here is the exact matter as that separate matter there. This is an absurd proposition.

You conclude that consciousness in your scenario cannot have 1 location(s).

If consciousness does not have an anatomical / physical location then it is the stuff of magic and woo.

You conclude that consciousness in your scenario cannot have 0 locations.

However, there are more numbers than those two.

The closest parallel I see to your scenario is a program run on two computers for redundancy (like it is sometimes done in safety-critical systems). It is indeed the same program in the same state but in 2 locations.

The two consciousnesses will diverge if given different input data streams, but they are (at least initially) similar. Given that the state of your brain tomorrow will be different from the state of if today, why do you care about the wellbeing of that human, who is not identical to now-you? Assuming that you care about your tomorrow, why does it make a difference if that human is separated from you by time and not by space (as in your scenario)?

Comment by drahflow on Rationalist Magic: Initiation into the Cult of Rationatron · 2015-12-09T12:09:18.364Z · score: 0 (0 votes) · LW · GW

Regarding auras. I am not sure, if I observed the same phenomenon, but if I sit still and keep my eyes fixed on the same spot for a while (in a still scene), my eyes will -- after a while -- get accustomed to the exact light pattern incoming and everythig kind-of fades to gray. But very slight movements will then generate colorful borders on edges (like a gaussian edge detector).

Comment by drahflow on Ideological Turing Test Domains · 2015-08-02T20:34:28.329Z · score: 3 (3 votes) · LW · GW
  • Best way to fix climate change: "Renewables / Nuclear"
  • Secret Services are necessary to fight terrorism / Secret Services must be abolished
  • GPL / BSD-Licences
Comment by drahflow on Stupid Questions June 2015 · 2015-06-03T09:08:14.091Z · score: 2 (2 votes) · LW · GW
  • Install a smoke detector

  • Do martial arts training until you get the falling more or less right. While this might be helpful against muggers the main benefit is the reduced probability of injury in various unfortunate situation.

Comment by drahflow on Stupid Questions May 2015 · 2015-05-03T20:43:04.381Z · score: 2 (2 votes) · LW · GW

The Metamath project was started by a person who also wanted to understand math by coding it:

Generally speaking, machine-checked proofs are ridiculously detailed. But it being able to create such detailed proofs did boost my mathematical understanding a lot. I found it worthwhile.

Comment by drahflow on 2015 Repository Reruns - Boring Advice Repository · 2015-01-11T14:52:50.584Z · score: 2 (2 votes) · LW · GW

Install a smoke detector (and reduce mortality by 0.3% if I'm reading the statistics right - not to talk of the property damages prevented).

Comment by drahflow on December 2014 Bragging Thread · 2014-12-02T09:19:51.660Z · score: 1 (1 votes) · LW · GW

I use multiple passwords of consisting of 12 elements of a..z, A..Z, 0..9, and ~20 symbol characters, generated randomly. Total entropy of these is around 76 bits.

10 decimal digits is actually more like 33 bits of entropy.

Comment by drahflow on Contrarian LW views and their economic implications · 2014-10-09T08:57:49.578Z · score: 2 (2 votes) · LW · GW

small enough to be masked by confounders There are an extremely large number of companies. Unrelated effects should average out.

Regarding statistics: links to quite some.

Comment by drahflow on Knightian Uncertainty and Ambiguity Aversion: Motivation · 2014-07-23T12:59:47.777Z · score: 1 (1 votes) · LW · GW

Given identical money payoffs between two options (even when adjusting for non-linear utility of money), choosing the non-ambiguous has the added advantage of giving a limited rationality agent less possible futures to spend computing resources on while the process of generating utility runs.

Consider two options: a) You wait one year and get 1 million dollars. b) You wait one year and get 3 million dollars with 0.5 probability (decided after this year).

If you take option b), depending on the size of your "utils", all planning for after the year must essentially be done twice, once for the case with 3 million dollars available and once for the case without.

Comment by drahflow on Bragging Thread, July 2014 · 2014-07-14T17:38:50.721Z · score: 9 (9 votes) · LW · GW

I usually take the minutes of the German Pirate Party assemblies. It is non-trivial to transcribe two days of speach alone (and I don't know steno). A better solution is a collaborative editor and multiple people typing while listening to the audio with increasing delay, i.e. one person gets life audio, the next one 20 seconds delay, etc... There is EtherPad, but the web client cannot really handle the 250kB files a full day transcript needs, also two of the persons interested in taking minutes (me included) strongly prefer VIm over a glorified textfield.

Hence: On the 23rd of June I downloaded the VIm source and started implementing collaborative editing. On the 28th and 29th three people used it for hours without major problems (except I initially started the server in a gdb to get a backtrace in case of a crash and the gdb unhelpfully stopped it on the first SIGPIPE - but that was not the fault of my software).

To give you an idea of the complexity of collaborative editing, let me quote Joseph Gentle from "I am an ex Google Wave engineer. Wave took 2 years to write and if we rewrote it today, it would take almost as long to write a second time." It took me 5 days (and I had a full-day meeting on one of them) to deliver >80% of the goodness. Alone.

Comment by drahflow on Intelligence Explosion vs. Co-operative Explosion · 2012-04-18T17:34:54.459Z · score: 1 (1 votes) · LW · GW

while corporations have a variety of mechanisms for trying to provide their employees with the proper incentives, anyone who's worked for a big company knows that they employees tend to follow their own interests, even when they conflict with those of the company. It's certainly nothing like the situation with a cell, where the survival of each cell organ depends on the survival of the whole cell. If the cell dies, the cell organs die; if the company fails, the employees can just get a new job.

These observations might not hold for uploads running on hardware paid for by the company. Which would give a combination of company+upload-tech superior cooperation options compared to current forms of collaboration. Also, company-owned uploads will have most of their social network inside the company as well, in particular not with uploads owned by competitors. Hence the natural group boundary would not be "uploads" versus "normals", but company boundaries.

Comment by drahflow on Is community-collaborative article production possible? · 2012-03-24T21:44:27.121Z · score: 6 (6 votes) · LW · GW

There should be a step 9, where every potential author is sent the final article and has the option of refusing formal authorship (if she doesn't agree with the final article). Convention in academic literature is that each author individually endorses all claims made in an article, hence this final check.

Comment by drahflow on SotW: Check Consequentialism · 2012-03-24T17:23:26.479Z · score: 3 (3 votes) · LW · GW

So... how would I design an exercise to teach Checking Consequentialism?

Divide the group into pairs. One is the decider, the other is the environment. Let them play some game repeatedly, prisoners dilemma might be appropriate, but maybe it should be a little bit more complex. The algorithm of the environment is predetermined by the teacher and known to both of the players.

The decider tries to maximize utilitiy over the repeated rounds, the environment tries to minimise the winnigs of the decider, by using social interaction between the evaluated game rounds, e.g. by trying to invoke all the fancy fallacies you outlined in the post or convincing the decider that the environment algorithm actually results in a different decision. By incorporating randomness into the environment algorithm, this might even be used to train consequentialism under uncertainty.

Comment by drahflow on Is risk aversion really irrational ? · 2012-02-03T23:17:38.056Z · score: 1 (1 votes) · LW · GW

The described effect seems strongly related to the concept of opportunity cost.

I.e. while a bet of yours is still open, the resources spent paying for entering the bet cannot be used again to enter a (better) bet.

Comment by drahflow on Why an Intelligence Explosion might be a Low-Priority Global Risk · 2011-11-16T08:40:07.171Z · score: 0 (0 votes) · LW · GW

The AGI would have to acquire new resources slowly, as it couldn’t just self-improve to come up with faster and more efficient solutions. In other words, self-improvement would demand resources. The AGI could not profit from its ability to self-improve regarding the necessary acquisition of resources to be able to self-improve in the first place.

If the AGI creates a sufficiently convincing business plan / fake company front, it might well be able to command a significant share of the world's resources on credit and either repay after improving or grab power and leave it at that.

Comment by drahflow on What visionary project would you fund? · 2011-11-11T12:22:20.378Z · score: 0 (0 votes) · LW · GW

Small scale fusion power.

Research challenges: How to get hydrogen to fuse into helium using only 500kg of machinery and less energy than will be produced.

Urgent Tasks: (In-)Validate the results of the fusor people, scale up / down as neccessary.

Reasons: Enormous amounts of energy goes into everything. If energy costs drop significantly, I expect sustained, fast and profound economic growth, in this case without too much ecological impact. Also, a lot of high-energy technology will become way more feasible, e.g. space missions.

Comment by drahflow on Selection Effects in estimates of Global Catastrophic Risk · 2011-11-06T12:36:28.607Z · score: 0 (0 votes) · LW · GW

Risk mitigation groups would gain some credibility by publishing concrete probability estimates of "the world will be destroyed by X before 2020" (and similar for other years). As many of the risks are a rather short event (think nuclear war / asteroid strike / singularity), the world will be destroyed by a single cause and the respective probabilities can be summed. I would not be surprised if the total probability comes out well above 1. Has anybody ever compiled a list of separate estimates?

On a related note, how much of the SIAI is financed on credit? Any group which estimates high risks of disastrous events should be willing to pay higher interest rates than market average. (As the expected amount of repayments is reduced by the nontrivial probability of everyone dying before maturity of the contract).

Comment by drahflow on Algorithms as Case Studies in Rationality · 2011-02-15T21:55:11.731Z · score: 2 (2 votes) · LW · GW

My classical example for algorithms applicable to real life: Merge sort for sorting stacks of paper.

Comment by drahflow on Branches of rationality · 2011-01-12T11:02:48.458Z · score: 2 (2 votes) · LW · GW

A short list for prediction making of groups (and via extension decision making):

Comment by drahflow on Confidence levels inside and outside an argument · 2010-12-16T09:07:59.244Z · score: 1 (3 votes) · LW · GW

But it's hard for me to be properly outraged about this, because the conclusion that the LHC will not destroy the world is correct.

What is your argument for claiming that the LHC will not destroy the world?

That the world still exists albeit ongoing experiments is easily explained by the fact that we are necessarily living in those branches of the universe where the LHC didn't destroy the world. (On an related side note: Has the great filter been found yet?)

Comment by drahflow on Intelligence Amplification Open Thread · 2010-09-15T16:38:21.174Z · score: 3 (3 votes) · LW · GW

It appears slow. In particular I seem to think more things per time, sometimes noticing significant delays between thought and action. However according to the scores, performance improvement is only marginal (but existent). The effect wears off after 10 to 15 minutes according to my experience.

I usually play Quake 3 (just in case anybody want's to compare effects between games).

Comment by drahflow on Intelligence Amplification Open Thread · 2010-09-15T12:53:36.400Z · score: 3 (3 votes) · LW · GW

Same goes for videos (Yay action movies at 2x).

Bonus points (for fun only): Play action games afterwards. Time sensation is a weird thing.

Comment by drahflow on Memetic Hazards in Videogames · 2010-09-10T17:23:50.074Z · score: 5 (7 votes) · LW · GW

Video game authors probably put a lot of effort into optimizing video games for human pleasure.

Workplace design, User Interfaces etc., they could all be improved if more ideas were copied from video games.

Comment by drahflow on Controlling Constant Programs · 2010-09-06T10:42:30.861Z · score: 2 (2 votes) · LW · GW

The only difference I can see between "an agent which knows the world program it's working with" and "agent('source of world')" is that the latter agent can be more general.

Comment by drahflow on Controlling Constant Programs · 2010-09-06T10:36:11.601Z · score: 0 (0 votes) · LW · GW

If agent() is actually agent('source of world') as the classical newcomb problem has it, I fail to see what is wrong with simply enumerating the possible actions and simulating the 'source of world' with the constant call of agent('source of world') replaced by the current action candidate? And then returning the action with maximum payoff obviously.

Comment by drahflow on Rationality quotes: September 2010 · 2010-09-03T10:34:17.128Z · score: 1 (1 votes) · LW · GW

Loyality to petrified opinion has already kept chains from being closed and souls from being trapped.

Comment by drahflow on Rationality quotes: September 2010 · 2010-09-03T10:30:27.986Z · score: 4 (4 votes) · LW · GW

But some thoughts are both so complex and so irrelevant that a correct analysis of the thought would cost more than an infrequent error about thoughts of this class (costs of necessary meta-analysis included).

Comment by drahflow on Exploitation and cooperation in ecology, government, business, and AI · 2010-08-27T18:32:14.497Z · score: 1 (1 votes) · LW · GW

What is the difference between non-nested and modular? (Or between non-modular and nested?)

The pictures seem to be rotated by 180 degrees essentially.

Comment by drahflow on Five-minute rationality techniques · 2010-08-11T23:35:48.478Z · score: 2 (2 votes) · LW · GW

Decreasing frequency of surprising technology advancements are caused by faster and more frequent information of the general public about scientific advancements.

If the rate of news consumes grows faster than the rate of innovations produced, the perceived magnitude of innovation per news will go down.

Comment by drahflow on Missed opportunities for doing well by doing good · 2010-07-21T10:23:37.869Z · score: 1 (1 votes) · LW · GW

If you are out for the warm fuzzies: According to my experience fuzzies / $ is optimized via giving a little often.

Microfinancing might be an option, as the same capital can be lend multiple times, generating some fuzzies each time.

Then again, GiveWell seems not too decided on the concept:

Comment by drahflow on Fusing AI with Superstition · 2010-04-22T08:15:56.305Z · score: 0 (0 votes) · LW · GW

I fail to understand the sentence about overthinking. Mind to explain?

As for the condition of removing all energy and mass in a part of space not being sufficient to destroy all agents therein, I cannot see the error. Do you have an example of an agent which would continue to exist in those circumstances?

That the condition is not necessary is true: I can shoot you, you die. No need to remove much mass or energy from the part of space you occupy. However we don't need a necessary condition, only a sufficient one.

Comment by drahflow on Fusing AI with Superstition · 2010-04-22T06:05:39.336Z · score: 0 (0 votes) · LW · GW

Not having heard your argument against "Describing ..." yet, but assuming you believe some to exist, I estimate the chance of me still believing it after your argument at 0.6.

Now for guessing the two problems:

The first possible problem will be describing "mass" and "energy" to a system which basically only has sensor readings. However, if we can describe concepts like "human" or "freedom", I expect descriptions of matter and energy to be simpler (even though 10.000 years ago, telling somebody about "humans" was easier than telling them about mass but that was not the same concept of "humans" we would actually like to describe). And for "mass" and "energy" the physicists already have at quite formal descriptions.

One other problem is that mass and energy might not be contained within a certain part of space, as per physics, it is just the probability of it having an effect outside some space going down to pretty much zero the greater the distance. Thus removing all energy and matter somewhere might produce subtle effects somewhere totally different . However I do expect these effects to be so subtle not even to matter to the AI because they become smaller than the local quantum noise for very short distances already.

Regarding the condescending: "I say this..." I would have liked it more if you would have stated explicitly that your preference originates from a wish to further my learning. I have no business optimizing your value function. Anyway, I operate by Crocker's Rules.

Comment by drahflow on Fusing AI with Superstition · 2010-04-22T04:26:10.293Z · score: 1 (1 votes) · LW · GW

The claim is relevant to the question of whether giving an action description for the red wire which will fit all of human future is not harder than constructing a real moral system. That the claim is trivial is a good reason to use "certainly".

Comment by drahflow on Fusing AI with Superstition · 2010-04-21T21:47:08.258Z · score: 0 (0 votes) · LW · GW

I meant certainly as in "I have an argument for it, so I am certain."

Claim: Describing some part of space to "contain a human" and its destruction is never harder than describing a goal which will ensure every part of space which "contains a human" is treated in manner X for a non-trivial X (where X will usually be "morally correct", whatever that means). (Non-trivial X means: Some known action A of the AI exists which will not treat a space volume in manner X).

The assumption that the action A is known is reasonably for the problem of friendly AI, as a sufficiently torturous killing can be constructed for every moral system we might wish to include into the AI, to have the killing labeled immoral.

Proof: Describing destruction of every agent in a certain part of space is easy: Remove all mass and all energy within that part of space. We need to find a way to select those parts of space which "contain a human". However we have (via the assumption) that our goal function will go to negative infinity when evaluating a plan which treats a volume of space "containing a human" in violation of manner X. Assume for now that we find some way !X to violate manner X for a given space volume. By pushing through the goal evaluation every space volume in existence together with a plan to do !X, we will detect at least those space volumes which "contain a human".

This leaves us with the problem of defining !X. The assumption as it stands already requires some A which can be used as !X.

Comment by drahflow on Fusing AI with Superstition · 2010-04-21T21:14:35.907Z · score: 1 (1 votes) · LW · GW

How so? The AI lives in a universe where people are planning to fuse AIs in the way described here. Given this website, and the knowledge that one believes that the red wire is magic, there is a high probability that the red wire is fake, and some very small probability that the wire is real. But it is also known for certain that the wire is real. There is not even a contradiction here.

Giving a wrong prior is not the same as walking up to the AI and telling it a lie (which should never raise probability to 1).

Comment by drahflow on Fusing AI with Superstition · 2010-04-21T21:08:51.083Z · score: 0 (0 votes) · LW · GW

It cannot fix bugs in its priors as for any other part of the system, e.g. sensor drivers, the AI can fix the hell out of itself. Anything which can be fixed is not a true prior though. If we allow the AI to change its prior completely then it is effectively acting upon a prior which does not include any probability 1 entries.

There is no reason to fix the red wire belief if you are certain that it is true. Every evidence is against it, but the red wire does magic with probability 1, hence something is wrong with the evidence (e.g. sensor errors).

Comment by drahflow on Fusing AI with Superstition · 2010-04-21T21:02:32.213Z · score: 0 (0 votes) · LW · GW

I agree. The AI + Fuse System is a deliberately broken AI. In general such an AI will perform suboptimal compared to the AI alone.

If the AI under consideration has a problematic goal though, we actually want the AI to act suboptimal with regards to its goals.

Comment by drahflow on Fusing AI with Superstition · 2010-04-21T20:59:51.392Z · score: 1 (1 votes) · LW · GW

This is indeed a point I did not consider.

In particular, it might be impossible to construct a simple action description which will fit all of human future. However, it is certainly not harder than to construct a real moral system.

One might get pretty far by eliminating every volume in space (AI excluded) which can learn (some fixed pattern for example) within a certain bounded time, instead of converting DNA into fluorine. It is not clear to me whether this would be possible to describe or not though.

The other option would be to disable the fuse after some fixed time or manually once one has high confidence in the friendliness of the AI. The problems of these approaches are many (although not all problems from the general friendly AI problem carry over).

Comment by drahflow on Fusing AI with Superstition · 2010-04-21T20:48:04.580Z · score: 0 (2 votes) · LW · GW

There is no hand coded goal in my proposal. I propose to craft the prior, i.e. restrict the worlds the AI can consider possible.

This is the reason both why the procedure is comparatively simple (in comparison with friendly AI) and why the resulting AIs are less powerful.

Comment by drahflow on Fusing AI with Superstition · 2010-04-21T17:56:53.742Z · score: 1 (1 votes) · LW · GW

It might be the case that adding the red wire belief will cripple the AI to a point of total unusability. Whether that is the case can be found out by experiment however.

Adding a fuse as proposed turns an AI which might be friendly or unfriendly into an AI that might be friendly, might spontaneously combust or be stupid.

I prefer the latter kind of AI (even though they need rebuilding more often).

Comment by drahflow on Fusing AI with Superstition · 2010-04-21T17:54:19.201Z · score: 1 (3 votes) · LW · GW

War mongering humans are also not particularly useful. In particular they are burning energy like there is no tomorrow for things definitely not paperclippy at all. And you have to spend significant energy resources on stopping them from destroying you.

A paperclip optimizer would at some point turn against humans directly, because humans will turn against the paperclip optimizer if it is too ruthless.

Comment by drahflow on Fusing AI with Superstition · 2010-04-21T14:22:16.807Z · score: 2 (2 votes) · LW · GW

Because broken != totally nonfunctional.

If we have an AI which we believe to be friendly, but can not verify to be so, we add the fuse I described, then start it. As long as the AI does not try to kill humanity or tries to understand the red wire too well, it should operate pretty much like an unmodified AI.

From time to time however it will conclude the wrong things. For example it might waste significant resources on the production of red wires, to conduct various experiments on them. Thus the modified AI is not optimal in our universe, and it contains one known bug. Hence I think it justified to call it broken.

Comment by drahflow on Fusing AI with Superstition · 2010-04-21T14:00:01.923Z · score: 0 (0 votes) · LW · GW

If the AI is able to question the fact that the red wire is magical, then the prior was less than 1.

It should still be able to reason about hypothetical worlds where the red wire is just a usual copper thingy, but it will always know that those hypothetical worlds are not our world. Because in our world, the red wire is magical.

As long as superstitious knowledge is very specialized, like about the specific red wire, I would hope that the AI can act quite reasonable as long as the specific red wire is not somehow part of the situation.

Comment by drahflow on Fusing AI with Superstition · 2010-04-21T13:34:55.262Z · score: 0 (0 votes) · LW · GW

I think every AI will need to learn from it's environment. Thus it will need to update its current believes based upon new information from sensors.

It might conduct an experiment to check whether transmutation at a distance is possible - and find that transmutation at a distance could never be produced.

As the probability that transmutation of human DNA into fluorine is 1, this leaves some other options, like

  • the sensor readings are wrong
  • the experimental setup is wrong
  • it only works in the special case of the red wire

After sufficiently many experiments, the last case will have very high probability.

Which makes me think that maybe, faith is just a numerical inaccuracy.