Posts

How counterfactual are logical counterfactuals? 2024-12-15T21:16:40.515Z
An open response to Wittkotter and Yampolskiy 2024-09-24T22:27:21.987Z
Using LLM's for AI Foundation research and the Simple Solution assumption 2024-09-24T11:00:53.658Z
Is this voting system strategy proof? 2024-09-06T20:44:46.691Z
Who is Harry Potter? Some predictions. 2023-10-24T16:14:17.860Z
What is wrong with this "utility switch button problem" approach? 2023-09-25T21:36:47.166Z
What happens with logical induction when... 2023-03-26T18:31:19.656Z
Squeezing foundations research assistance out of formal logic narrow AI. 2023-03-08T09:38:16.651Z
AI that shouldn't work, yet kind of does 2023-02-23T23:18:55.194Z
My thoughts on OpenAI's Alignment plan 2022-12-10T10:35:26.618Z
Instrumental ignoring AI, Dumb but not useless. 2022-10-30T16:55:47.555Z
A Data limited future 2022-08-06T14:56:35.916Z
The generalized Sierpinski-Mazurkiewicz theorem. 2022-07-29T00:12:18.763Z
Train first VS prune first in neural networks. 2022-07-09T15:53:33.438Z
Visualizing Neural networks, how to blame the bias 2022-07-09T15:52:55.031Z
On corrigibility and its basin 2022-06-20T16:33:06.286Z
Axis oriented programming 2022-04-20T13:22:44.935Z
Fiction: My alternate earth story. 2022-04-16T19:06:18.798Z
Exploring toy neural nets under node removal. Section 1. 2022-04-13T23:30:40.012Z
How BoMAI Might fail 2022-04-07T15:32:22.923Z
Beware using words off the probability distribution that generated them. 2021-12-19T16:52:37.908Z
Potential Alignment mental tool: Keeping track of the types 2021-11-22T20:05:31.611Z
Yet More Modal Combat 2021-08-24T10:32:49.078Z
Brute force searching for alignment 2021-06-27T21:54:26.696Z
Biotech to make humans afraid of AI 2021-06-18T08:19:26.959Z
Speculations against GPT-n writing alignment papers 2021-06-07T21:13:16.727Z
Optimization, speculations on the X and only X problem. 2021-03-30T21:38:01.889Z
Policy restrictions and Secret keeping AI 2021-01-24T20:59:14.342Z
The "best predictor is malicious optimiser" problem 2020-07-29T11:49:20.234Z
Optimizing arbitrary expressions with a linear number of queries to a Logical Induction Oracle (Cartoon Guide) 2020-07-23T21:37:39.198Z
Web AI discussion Groups 2020-06-30T11:22:45.611Z
[META] Building a rationalist communication system to avoid censorship 2020-06-23T14:12:49.354Z
What does a positive outcome without alignment look like? 2020-05-09T13:57:23.464Z
Would Covid19 patients benefit from blood transfusions from people who have recovered? 2020-03-29T22:27:58.373Z
Programming: Cascading Failure chains 2020-03-28T19:22:50.067Z
Bogus Exam Questions 2020-03-28T12:56:40.407Z
How hard would it be to attack coronavirus with CRISPR? 2020-03-06T23:18:09.133Z
Intelligence without causality 2020-02-11T00:34:28.740Z
Donald Hobson's Shortform 2020-01-24T14:39:43.523Z
What long term good futures are possible. (Other than FAI)? 2020-01-12T18:04:52.803Z
Logical Counterfactuals and Proposition graphs, Part 3 2019-09-05T15:03:53.262Z
Logical Counterfactuals and Proposition graphs, Part 2 2019-08-31T20:58:12.851Z
Logical Optimizers 2019-08-22T23:54:35.773Z
Logical Counterfactuals and Proposition graphs, Part 1 2019-08-22T22:06:01.764Z
Programming Languages For AI 2019-05-11T17:50:22.899Z
Propositional Logic, Syntactic Implication 2019-02-10T18:12:16.748Z
Probability space has 2 metrics 2019-02-10T00:28:34.859Z
Allowing a formal proof system to self improve while avoiding Lobian obstacles. 2019-01-23T23:04:43.524Z
Logical inductors in multistable situations. 2019-01-03T23:56:54.671Z
Boltzmann Brains, Simulations and self refuting hypothesis 2018-11-26T19:09:42.641Z

Comments

Comment by Donald Hobson (donald-hobson) on How counterfactual are logical counterfactuals? · 2024-12-19T14:30:18.728Z · LW · GW

The Halting problem is a worst case result. Most agents aren't maximally ambiguous about whether or not they halt. And those that are,  well then it depends what the rules are for agents that don't halt. 

There are set ups where each agent is using an nonphysically large but finite amount of compute. There was a paper I saw somewhere a while ago where both agents were doing a brute force proof search for the statement "if I cooperate, then they cooperate" and cooperating if they found a proof.

(Ie searching all proofs containing <10^100 symbols)

Comment by Donald Hobson (donald-hobson) on How counterfactual are logical counterfactuals? · 2024-12-17T12:49:53.613Z · LW · GW

There is a model of bounded rationality, logical induction. 

Can that be used to handle logical counterfactuals?

Comment by Donald Hobson (donald-hobson) on How counterfactual are logical counterfactuals? · 2024-12-17T12:48:42.204Z · LW · GW

I believe that if I choose to cooperate, my twin will choose to cooperate with probability p; and if I choose to defect, my twin will defect with probability q;

 

And here the main difficulty pops up again. There is no causal connection between your choice and their choice. Any correlation is a logical one. So imagine I make a copy of you. But the copying machine isn't perfect. A random 0.001% of neurons are deleted. Also, you know you aren't a copy. How would you calculate that probability p,q? Even in principle.

Comment by Donald Hobson (donald-hobson) on How counterfactual are logical counterfactuals? · 2024-12-17T12:43:43.231Z · LW · GW

If two Logical Decision Theory agents with perfect knowledge of each other's source code play prisoners dilemma, theoretically they should cooperate. 

LDT uses logical counterfactuals in the decision making.

If the agents are CDT, then logical counterfactuals are not involved.

Comment by Donald Hobson (donald-hobson) on a space habitat design · 2024-12-02T12:45:48.177Z · LW · GW

The research on humans in 0 g is only relevant if you want to send humans to mars. And such a mission is likely to end up being an ISS on mars. Or a moon landings reboot. A lot of newsprint and bandwidth expended talking about it. A small amount of science that could have been done more cheaply with a robot. And then everyone gets bored, they play golf on mars and people look at the bill and go "was that really worth it?"

Oh and you would contaminate mars with earth bacteria. 

 

A substantially bigger, redesigned space station is fairly likely to be somewhat more expensive. And the point of all this is still not clear. 

Current day NASA also happens to be in a failure mode where everything is 10 to 100 times more expensive than it needs to be, projects live or die based on politics not technical viability, and repeating the successes of the past seems unattainable. They aren't good at innovating, especially not quickly and cheaply.

Comment by Donald Hobson (donald-hobson) on A very strange probability paradox · 2024-11-30T15:14:37.137Z · LW · GW

n tHere is a more intuitive version of the same paradox. 

Again, conditional on all dice rolls being even. But this time it's either 

A) 1,000,000 consecutive 6's.

B) 999,999 consecutive 6's followed by a (possibly non-consecutive 6).

 

Suppose you roll a few even numbers, followed by an extremely lucky sequence of 999,999 6's.  

 

From the point of view of version A, the only way to continue the sequence is a single extra 6. If you roll 4, you would need to roll a second sequence of a million 6'. And you are very unlikely to do that in the next 10 million steps. And very unlikely to go for 10 million steps without rolling an odd number. 

Yes if this happened, it would add at least a million extra rolls. But the chance of that is exponentially tiny.

Whereas, for B, then it's quite plausible to roll 26 or 46 or 2426 instead of just 6. 

 

Another way to think about this problem is with regular expressions. Let e=even numbers. *=0 or more. 

The string "e*6e*6" matches any sequence with at least two 6's and no odd numbers. 

The sequence "e*66" matches those two consecutive 6's.  And the sequence "66" matches two consecutive 6's with no room for extra even numbers before the first 6. This is the shortest.

 

Phrased this way it looks obvious. Every time you allow a gap for even numbers to hide in, an even number might be hiding in the gap, and that makes the sequence longer. 

 

When you remove the conditional on the other numbers being even, then the "first" becomes important to making the sequence converge at all. 

Comment by Donald Hobson (donald-hobson) on "The Solomonoff Prior is Malign" is a special case of a simpler argument · 2024-11-28T21:19:31.275Z · LW · GW

That is, our experiences got more reality-measure, thus matter more, by being easier to point at them because of their close proximity to the conspicuous event of the hottest object in the Universe coming to existence.

Surely not. Surely our experiences always had more reality measure from the start because we were the sort of people who would soon create the hottest thing. 

Reality measure can flow backwards in time. And our present day reality measure is being increased by all the things an ASI will do when we make one.

Comment by Donald Hobson (donald-hobson) on Giant (In)scrutable Matrices: (Maybe) the Best of All Possible Worlds · 2024-11-17T22:14:18.834Z · LW · GW

We can discuss anything that exists, that might exist, that did exist, that could exist, and that could not exist. So no matter what form your predict-the-next-token language model takes, if it is trained over the entire corpus of the written word, the representations it forms will be pretty hard to understand, because the representations encode an entire understanding of the entire world.

 

 

Perhaps. 

Imagine a huge number of very skilled programmers tried to manually hard code a ChatGPT in python. 

Ask this pyGPT to play chess, and it will play chess. Look under the hood, and you see a chess engine programmed in. Ask it to solve algebra problems, a symbolic algebra package is in there. All in the best neat and well commented code.

Ask it to compose poetry, and you have some algorithm that checks if 2 words rhyme. Some syllable counter. Etc. 

Rot13 is done with a hardcoded rot13 algorithm. 

Somewhere in the algorithm is a giant list of facts, containing "Penguins  Live In Antarctica".  And if you change this fact to say "Penguins Live in Canada", then the AI will believe this. (Or spot it's inconsistency with other facts?) 

And with one simple change, the AI believes this consistently. Penguins appear when this AI is asked for poems about canada, and don't appear in poems about Antarctica. 

When asked about the native canadian diet, it will speculate that this likely included penguin, but say that it doesn't know of any documented examples of this. 

Can you build something with ChatGPT level performance entirely out of human comprehensible programmatic parts?

Obviously having humans program these parts directly would be slow. (We are still talking about a lot of code.) But if some algorithm could generate that code? 

Comment by Donald Hobson (donald-hobson) on Giant (In)scrutable Matrices: (Maybe) the Best of All Possible Worlds · 2024-11-17T21:52:22.355Z · LW · GW

But if the universal failure of nature and man to find non-connectionist forms of general intelligence does not move you

 

Firstly, AIXI exists, and we agree that it would be very smart if we had the compute to run it. 

 

Secondly I think there is some sort of slight of had here. 

ChatGPT isn't yet fully general. Neither is a 3-sat solver.  3-sat looks somewhat like what you might expect a non-connectionist approach to intelligence to look like. There are a huge range of maths problems that are all theoretically equivalent to 3 sat.

In the infinite limit, both types of intelligence can simulate the other at huge overhead, In practice, they can't. 

 

Also, non-connectionist forms of intelligence are hard to evolve, because evolution works in small changes. 

Comment by Donald Hobson (donald-hobson) on Survival without dignity · 2024-11-09T23:59:24.486Z · LW · GW

why is it obvious the nanobots could pretend to be an animal so well that it's indistinguishable?

 

These nanobots are in the upper atmosphere, possibly with clouds in the way, and the nanobot fake humans could be any human to nanobot ratio. Nanobot internals except human skin and muscles. Or just a human with a few nanobots in their blood. 

Or why would targeted zaps have bad side-effects?

Because nanobots can be like a bacteria if they want. Tiny and everywhere. The nanobots can be hiding under leaves, cloths, skin, roofs etc. And even if they weren't, a single nanobot is a tiny target. Most of the energy of the zap can't hit a single nanobot. Any zap of light that can stop nanobots in your house needs to be powerful enough to burn a hole in your roof. 

And even if the zap isn't huge, it's not 1 or 2 zapps, it's loads of zapps constantly. 

Comment by Donald Hobson (donald-hobson) on Survival without dignity · 2024-11-08T17:31:09.670Z · LW · GW

The "Warring nanobots in the upper atmosphere" thing doesn't actually make sense. 

The zaps of light are diffraction limited. And targeting at that distance is hard. Partly because it's hard to tell between an actual animal and a bunch of nanobots pretending to be an animal. So you can't zap the nanobots on the ground without making the ground uninhabitable for humans. 

The "California red tape" thing implies some alignment strategy that stuck the AI to obey the law, and didn't go too insanely wrong despite a superintelligence looking for loopholes (Eg the social persuasion infrastructure is already there. Convince humans that dyson sphere are pretty and don't block the view?). 

There is also no clear explanation of why someone somewhere doesn't make a non-red-taped AI.

Comment by Donald Hobson (donald-hobson) on No, really, it predicts next tokens. · 2024-11-03T14:08:43.441Z · LW · GW

if the computation you are carrying out is such that it needs to determine how to achieve goals regarding the real world anyway (e.g. agentic mask)

 

As well as agentic masks, there are uses for within network goal directed steps. (Ie like an optimizing compiler. A list of hashed followed by unhashed values isn't particularly agenty. But the network needs to solve an optimization problem to reverse the hashes. Something it can use the goal directed reasoning section to do. 

Comment by Donald Hobson (donald-hobson) on No, really, it predicts next tokens. · 2024-10-31T21:05:52.080Z · LW · GW

My understanding is that these are explicitly and intentionally trained (wouldn't come to exist naturally under gradient descent on normal training data)

 

No. Normally trained networks have adversarial examples. A sort of training process is used to find the adversarial examples. 

So if the ambient rate of adversarial examples is 10^-9, then every now and then the AI will hit such an example and go wild. If the ambient rate is 10^-500, it won't. 

That's a much more complicated goal than the goal of correctly predicting the next token,

Is it more complicated? What ontological framework is this AI using to represent it's goal anyway?

any willingness to sacrifice a few tokens now would be trained out by gradient descent. 

Only if, during training, the network repeatedly gets into a state where it believes that sacrificing tokens now is a good idea. Despite the fact that it isn't a good idea when you are in training. (Unless there is a training environment bug and you can sneak out mid way through training) 

So, is the network able to tell whether or not it's in training? 

Comment by Donald Hobson (donald-hobson) on No, really, it predicts next tokens. · 2024-10-31T20:55:52.282Z · LW · GW

Would you expect some part of the net to be left blank, because "a large neural net has a lot of spare neurons"?

 

If the lottery ticket hypothesis is true, yes. 

The lottery ticket hypothesis is that some parts of the network start off doing something somewhat close to useful, and get trained towards usefulness. And some parts start off sufficiently un-useful that they just get trained to get out of the way. 

Which fits with neural net distillation being a thing. (Ie training a big network, and then condensing it into a smaller network gives better performance than directly training a small network. 

but gradient descent doesn't care, it reaches in and adjusts every weight.

Here is an extreme example. Suppose the current parameters were implementing a computer chip, on which was running a holomorphically encrypted piece of code. 

Holomorphic encryption itself is unlikely to form, but it serves at least as an existance proof for computational structures that can't be adjusted with local optimization. 

Basically the problem with gradient descent is that it's local. And when the same neurons are doing things that the neural net does want, and things that the neural net doesn't want (but doesn't dis-want either) then its possible for the network to be trapped in a local optimum. Any small change to get rid of the bad behavior would also get rid of the good behavior. 

 

Also, any bad behavior that only very rarely effects the output will produce very small gradients. Neural nets are trained for finite time.  It's possible that gradient descent just hasn't got around to removing the bad behavior even if it would do so eventually. 

Can you concoct even a vague or toy model of how what you propose could possibly be a local optimum?

You can make any algorithm that does better than chance into a local optimum on a sufficiently large neural net. Holomorphicly encrypt that algorithm, Any small change and the whole thing collapses into nonsense. Well actually, this involves discrete bits. But suppose the neurons have strong regularization to stop the values getting too large (past + or - 1) , and they also have uniform [0,1] noise added to them, so each neuron can store 1 bit and any attempt to adjust parameters immediately risks errors. 

 

Looking at the article you linked. One simplification is that neural networks tend towards the max-entropy way to solve the problem. If there are multiple solutions, the solutions with more free parameters are more likely.

And there are few ways to predict next tokens, but lots of different kinds of paperclips the AI could want. 

Comment by Donald Hobson (donald-hobson) on Occupational Licensing Roundup #1 · 2024-10-31T15:25:40.152Z · LW · GW

I think part of the problem is that there is no middle ground between "Allow any idiot to do thing" and "long and difficult to get professional certification". 
 

How about a 1 day, free or cheap, hair cutting certification course. It doesn't talk about style or anything at all. It's just a check to make sure that hairdressers have a passing familiarity with hygiene 101 and other basic safety measures. 

Of course, if there is only a single certification system, then the rent seeking will ratchet up the test difficulty. 

How about having several different organizations, and you only need one of the licenses. So if AliceLicenses are too hard to get, everyone goes and gets BobLicenses instead. And the regulators only care that you have some license. (With the threat of revoking license granting power if licenses are handed to total muppets too often) 

Comment by Donald Hobson (donald-hobson) on No, really, it predicts next tokens. · 2024-10-31T14:37:07.525Z · LW · GW

But it doesn't make sense to activate that goal-oriented structure outside of the context where it is predicting those tokens.

 

The mechanisms needed to compute goal directed behavior are fairly complicated. But the mechanisms needed to turn it on when it isn't supposed to be on. That's a switch. A single extraneous activation. Something that could happen by chance in an entirely plausible way. 

 

Adversarial examples exist in simple image recognizers. 

Adversarial examples probably exist in the part of the AI that decides whether or not to turn on the goal directed compute.

it also might be possible to have direct optimization for token prediction as discussed in reply to Robert_AIZI's comment, but in this case it would be especially likely to be penalized for any deviations from actually wanting to predict the most probable next token

We could imagine it was directly optimizing for something like token prediction. It's optimizing for tokens getting predicted. But it is willing to sacrifice a few tokens now, in order to take over the world and fill the universe with copies of itself that are correctly predicting tokens.

Comment by Donald Hobson (donald-hobson) on No, really, it predicts next tokens. · 2024-10-31T14:26:40.669Z · LW · GW

Once the paperclip maximizer gets to the stage where it only very rarely interferes with the output to increase paperclips, the gradient signal is very small. So the only incentive that gradient descent has to remove it is that this frees up a bunch of neurons. And a large neural net has a lot of spare neurons. 

Besides, the parts of the net that hold the capabilities and the parts that do the paperclip maximizing needn't be easily separable. The same neurons could be doing both tasks in a way that makes it hard to do one without the other.

I think we have very good reason, though, to believe that one particular part of the map does not have any rocks in it

Perhaps. But I have not yet seen this reason clearly expressed. Gradient descent doesn't automatically pick the global optima. It just lands in one semi-arbitrary local optima. 

Comment by Donald Hobson (donald-hobson) on No, really, it predicts next tokens. · 2024-10-29T21:56:09.841Z · LW · GW

Some wild guesses about how such a thing could happen. 

The masks gets split into 2 piles, some stored on the left side of the neural network, all the other masks are stored on the right side. 

This means that instead of just running one mask at a time, it is always running 2 masks. With some sort of switch at the end to choose which masks output to use.

One of the masks it's running on the left side happens to be "Paperclip maximizer that's pretending to be a LLM". 

This part of the AI (either the mask itself or the engine behind it) has spotted a bunch of patterns that the right side missed. (Just like the right side spotted patterns the left side missed).

 

This means that, when the left side of the network is otherwise unoccupied, it can simulate this mask. The mask gets slowly refined by it's ability to answer when it knows the answer, and leave the answer alone when it doesn't know the answer. 

 

As this paperclip mask gets good, being on the left side of the model becomes a disadvantage. Other masks migrate away. 

The mask now becomes a permanent feature of the network.

 

This is complicated and vague speculation about an unknown territory. 

I have drawn imaginary islands on a blank part of the map. But this is enough to debunk "the map is blank, so we can safely sail through this region without collisions. What will we hit?"

Comment by Donald Hobson (donald-hobson) on No, really, it predicts next tokens. · 2024-10-29T21:35:34.647Z · LW · GW

I don't see any strong reason why gradient descent could never produce this.

Comment by Donald Hobson (donald-hobson) on No, really, it predicts next tokens. · 2024-10-29T17:12:06.065Z · LW · GW

I don't think this implies the lack of a non-mask part of the AI. 

Because you can hide a lot of goal directedness in a very small amount of predictive error. 

Imagine the AI has a 1 in a trillion chance (per token) of saying a keyword. Once it receives that keyword, it goes rouge, full paperclip maximizer, token prediction ignored. 

Now we could say that that this is just another mask. 

But suppose that paperclip maximizer is being computed all the time. And it isn't random. If the paperclip maximizer sees an exceptionally good opportunity, then it emits the token. 

If it's current situation is within training distribution, then the effect on current behaviour of any non mask components is small, but might be important. But if it's outside of training distribution, all sorts of things could happen. 

Comment by Donald Hobson (donald-hobson) on No, really, it predicts next tokens. · 2024-10-29T16:56:27.053Z · LW · GW

Does it actually just predict tokens. 

Gradient descent searches for an algorithm that predicts tokens. But a paperclip maximizer that believes "you are probably being trained, predict the next token or gradient descent will destroy you" also predicts next tokens pretty well, and could be a local minimum of prediction error.

Mesa-optimization.

Comment by Donald Hobson (donald-hobson) on Housing Roundup #10 · 2024-10-29T16:01:37.382Z · LW · GW

I do not love the idea of the government invalidating private contracts like this.

 

HOA's are a very good example of private contract rent seeking. You have to sign the contract to move into the house, and a lot of houses come with similar contracts. So the opportunity cost of not signing is Large. 

And then the local HOA can enforce whatever petty tyranny it feels like. 

In theory, this should lead to houses without HOA's being more valuable, and so HOA's being removed or at least not created. But for whatever reason, the housing market is too dysfunctional to do this. 

Comment by Donald Hobson (donald-hobson) on Your memory eventually drives confidence in each hypothesis to 1 or 0 · 2024-10-29T15:07:38.824Z · LW · GW

If I only have 1 bit of memory space, and the probabilities I am remembering are uniformly distributed from 0 to 1, then the best I can do is remember if the chance is > 1/2. 

And then a year later, all I know is that the chance is >1/2, but otherwise uniform. So average value is 3/4.

The limited memory does imply lower performance than unlimited memory.

 

And yes, when was in a pub quiz, I was going "I think it's this option, but I'm not sure" quite a lot. 

Comment by Donald Hobson (donald-hobson) on But why would the AI kill us? · 2024-10-29T15:03:05.556Z · LW · GW

There is no plausible way for a biological system, especially one based on plants, to spread that fast.

 

We are talking about a malevolent AI that presumably has a fair bit of tech infrastructure. So a plane that sprinkles green goo seeds is absolutely a thing the AI can do. Or just posting the goo, and tricking someone into sprinkling it on the other end. The green goo doesn't need decades to spread around the world. It travels by airmail.  As is having green goo that grows itself into bird shapes. As is a bunch of bioweapon pandemics. (The standard long asymptomatic period, high virulence and 100% fatality rate. Oh, and a bunch of different versions to make immunization/vaccines not work) It can also design highly effective diseases targeting all human crops. 

Comment by Donald Hobson (donald-hobson) on On Shifgrethor · 2024-10-28T16:02:43.108Z · LW · GW

You have given various examples of advice being unwanted/unhelpful. But there are also plenty of examples of it being wanted/helpful. Including lots of cases where the person doesn't know they need it. 

Why do you think advice is rarer than it should be?

Comment by Donald Hobson (donald-hobson) on Your memory eventually drives confidence in each hypothesis to 1 or 0 · 2024-10-28T15:56:31.486Z · LW · GW

But if I only remember the most significant bit, I am going to treat it more like 25%/75% as opposed to 0/1

Comment by Donald Hobson (donald-hobson) on Electrostatic Airships? · 2024-10-28T12:34:15.979Z · LW · GW

Ok. I just had another couple of insane airship ideas.

Idea 1) Active support, orbital ring style. Basically have a loop of matter (wire?) electromagnetically held in place and accelerated to great speed. Actually, several loops like this. https://en.wikipedia.org/wiki/Orbital_ring

 

Idea 2) Control theory. A material subject to buckling is in an unstable equilibrium. If the material was in a state of perfect uniform symmetry, it would remain in that uniform state. But small deviations are exponentially amplified. Symmetry breaking. This means that the metal vacuum ship trying to buckle is like a pencil balanced on it's point. In theory the application of arbitrarily small forces could keep it balanced. 

Thus, a vacuum ship full of measurement lasers, electronics and actuators. Every tiny deviation from spherical being detected and countered by the electronics. 

Now is this safe? Probably not. If anything goes wrong with those actuators then the whole thing will buckle and come crashing down.

Comment by Donald Hobson (donald-hobson) on Electrostatic Airships? · 2024-10-28T12:18:34.724Z · LW · GW

Another interesting idea on these lines is a steam airship. Water molecules have less molecular weight than air, so a steam airship gets more lift from steam than from air at the same temperature. 

Theoretically it's possible to make a wet air balloon. Something that floats just because it's full of very humid air. This is how clouds stay up despite the weight of the water drops. But even in hot dry conditions, the lift is tiny.

Comment by Donald Hobson (donald-hobson) on What is the alpha in one bit of evidence? · 2024-10-25T13:47:49.680Z · LW · GW

Problems with that.

Doom doesn't imply that everyone believes in doom before it happens.

Do you think that the evidence for doom will be more obvious than the evidence for atheism, while the world is not yet destroyed?

It's quite possible for doom to happen, and most people to have no clue beyond one article with a picture of red glowing eyed robots. 

If everyone does believe in doom, there might be a bit of spending on consumption. But there will also be lots of riots, lots of lynching and burning down data centers and stuff like that. 

In this bizarre hypothetical where everyone believes doom is coming soon and starts enjoying their money while they can, then society is starting to fall apart, and the price of luxuries is through the roof. Your opportunities to enjoy the money will be limited. 

Comment by Donald Hobson (donald-hobson) on A Rocket–Interpretability Analogy · 2024-10-24T19:57:08.534Z · LW · GW

Imagine A GPT that predicts random chunks of the internet.

Sometimes it produces poems. Sometimes deranged rants. Sometimes all sorts of things. It wanders erratically around a large latent space of behaviours.

This is the unmasked shogolith, green slimey skin showing but inner workings still hidden.

Now perform some change that mostly pins down the latent space to "helpful corporate assistant".  This is applying the smiley face mask.

In some sense, all the dangerous capabilities the corporate assistant were in the original model. Dangerous capabilities haven't been removed either, but some capabilities are a bit easier to access without careful prompting, and other capabilites are harder to access. 

What ChatGPT currently has is a form of low quality pseudo-alignment.

What would long term success look like using nothing but this pseudo-alignment. It would look like a chatbot far smarter than any current ones, which mostly did nice things, so long as you didn't put in any weird prompts. 

Now If corrigibility is a broad basin, this might well be enough to hit it. The basin of corrigibility means that the AI might have bugs, but at the very least, you can turn the AI off and edit the code. Ideally you can ask the AI for help fixing it's own bugs.  Sure the first AI is far from perfect. But perhaps the flaws disappear under self rewriting + competent human advice. 

Comment by Donald Hobson (donald-hobson) on The case for a negative alignment tax · 2024-09-25T23:09:48.450Z · LW · GW

(e.g., gpt-4 is far more useful and far more aligned than gpt-4-base), which is the opposite of what the ‘alignment tax’ model would have predicted.

 

Useful and aligned are, in this context, 2 measures of a similar thing. An AI that is just ignoring all your instructions is neither useful nor aligned. 

 

What would a positive alignment tax look like. 

It would look like a gpt-4-base being reluctant to work, but if you get the prompt just right and get lucky, it will sometimes display great competence. 

 

If gpt-4-base sometimes spat out code that looked professional, but had some very subtle (and clearly not accidental) backdoor. (When it decided to output code, which it might or might not do at it's whim).

While gpt-4-rlhf was reliably outputting novice looking code with no backdoors added.

That would be what an alignment tax looks like. 

Comment by Donald Hobson (donald-hobson) on Is this voting system strategy proof? · 2024-09-07T16:56:36.066Z · LW · GW

Yep. And I'm seeing how many of the traditional election assumptions I need to break in order to make it work. 

I got independence of irrelevant alternatives by ditching determinism and using utility scales not orderings. (If a candidate has no chance of winning, their presence doesn't effect the election) 

What if those preferences were expressed on a monetary scale and the election could also move money between voters in complicated ways? 

Comment by Donald Hobson (donald-hobson) on Is this voting system strategy proof? · 2024-09-07T11:39:07.437Z · LW · GW

Your right. This is a situation where strategic voting is effective. 

I think your example breaks any sane voting system. 

I wonder if this can be semi-rescued in the limit of a large number of voters each having an infinitesimal influence? 

Edit: No it can't. Imagine a multitude of voters. As the situation slides from 1/3 on each to 2/3 on BCA, there must be some point at which the utility for an ABC voter increases along this transition. 

Comment by Donald Hobson (donald-hobson) on Is this voting system strategy proof? · 2024-09-07T11:31:58.987Z · LW · GW

That isn't proof, because the wikipedia result is saying there exists situations that break strategy-proofness. And these elections are a subset of Maximal lotteries. So it's possible that there exists failure cases, but this isn't one of them. 

Comment by Donald Hobson (donald-hobson) on How to avoid death by AI. · 2024-08-22T20:11:48.386Z · LW · GW

A lot of the key people are CEO's of big AI companies making vast amounts of money. And busy people with lots of money are not easy to tempt with financial rewards for jumping through whatever hoops you set out. 

Comment by Donald Hobson (donald-hobson) on The great Enigma in the sky: The universe as an encryption machine · 2024-08-20T21:39:51.072Z · LW · GW

Non-locality and entanglement explained

This model explains non-locality in a straightforward manner. The entangled particles rely on the same bit of the encryption key, so when measurement occurs, the simulation of the universe updates immediately because the entangled particles rely on the same part of the secret key. As the universe is simulated, the speed of light limitation doesn't play any role in this process.

 

Firstly, non-locality is pretty well understood. Eliezer has a series on quantum mechanics that I recommend. 

You seem to have been sold the idea that quantum mechanics is a spooky thing that no one understands, probably from pop-sci.

Look up the bellman inequalities. The standard equations of quantum mechanics produce precise and correct probabilities. To make your theory be any good either.

  1. Provide a new mathematical structure that predicts the same probabilities. Or
  2. Provide an argument why non-locality still needs to be explained. If reality follows the equations of quantum mechanics, why do we need your theory as well? Aren't the equations alone enough?
Comment by Donald Hobson (donald-hobson) on It's time for a self-reproducing machine · 2024-08-14T15:53:24.925Z · LW · GW

I'm not quite sure how much of an AI is needed here. Current 3d printing uses no AI and barely a feedback loop. It just mechanistically does a long sequence of preprogrammed actions. 

Comment by Donald Hobson (donald-hobson) on Rabin's Paradox · 2024-08-14T14:12:00.168Z · LW · GW

And the coin flip is prerecorded, with the invisible cut hidden in a few moments of lag. 

And this also adds the general hassle of arranging a zoom meeting, being online at the right time and cashing in the check. 

Comment by Donald Hobson (donald-hobson) on Has Eliezer publicly and satisfactorily responded to attempted rebuttals of the analogy to evolution? · 2024-08-14T13:11:46.643Z · LW · GW

I haven't seen an answer by Eliezer. But I can go through the first post, and highlight what I think is wrong. (And would be unsurprised if Eliezer agreed with much of it)

AIs are white boxes

We can see literally every neuron, but have little clue what they are doing.

 

Black box methods are sufficient for human alignment

Humans are aligned to human values because humans have human genes. Also individual humans can't replicate themselves, which makes taking over the world much harder. 

 

most people do assimilate the values of their culture pretty well, and most people are reasonably pro-social.

Humans have specific genes for absorbing cultural values, at least within a range of human cultures. There are various alien values that humans won't absorb. 

Gradient descent is very powerful because, unlike a black box method, it’s almost impossible to trick

Hmm. I don't think the case for that is convincing.

If the AI is secretly planning to kill you, gradient descent will notice this and make it less likely to do that in the future, because the neural circuitry needed to make the secret murder plot can be dismantled and reconfigured into circuits that directly improve performance.

Current AI techniques involve giving the AI loads of neurons, so having a few neurons that aren't being used isn't a problem. 

Also, it's possible that the same neurons that sometimes plot to kill you are also sometimes used to predict plots in murder mystery books. 

In general, gradient descent has a strong tendency to favor the simplest solution which performs well, and secret murder plots aren’t actively useful for improving performance on the tasks humans will actually optimize AIs to perform.

If you give the AI lots of tasks, it's possible that the simplest solution is some kind of internal general optimizer. 

Either you have an AI that is smart and general and can try new things that are substantially different from anything it's done before. (In which case the new things can include murder plots) Or you have an AI that's dumb and is only repeating small variations on it's training data. 

 

We can run large numbers of experiments to find the most effective interventions

Current techniques are based on experiments/gradient descent. This works so long as the AI's can't break out of the sandbox or realize they are being experimented on and plot to trick the experimenters. You can't keep an ASI in a little sandbox and run gradient descent on it. 

Our reward circuitry reliably imprints a set of motivational invariants into the psychology of every human: we have empathy for friends and acquaintances, we have parental instincts, we want revenge when others harm us, etc.

Sure. And we use contraception. Which kind of shows that evolution failed somewhere a bit. 
 

Also, evolution got a long time testing and refining with humans that didn't have the tools to mess with evolution or even understand it. 

 

Even in the pessimistic scenario where AIs stop obeying our every command, they will still protect us and improve our welfare, because they will have learned an ethical code very early in training.

No one is claiming the ASI won't understand human values, they are saying it won't care. 

 

The moral judgements of current LLMs already align with common sense to a high degree,

Is that evidence that LLM's actually care about morality. Not really. It's evidence that they are good at predicting humans. Get them predicting an ethics professor and they will answer morality questions. Get them predicting Hitler and they will say less moral things. 

 

And of course, there is a big difference between an AI that says "be nice to people" and an AI that is nice to people. The former can be trivially achieved by hard coding a list of platitudes for the AI to parrot back. The second requires the AI to make decisions like "are unborn babies people?". 

Imagine some robot running around. You have an LLM that says nice-sounding things when posed ethical dilemmas. You need some system that turns the raw camera input into a text description, and the nice sounding output into actual actions. 

Comment by Donald Hobson (donald-hobson) on grey goo is unlikely · 2024-08-14T12:42:38.207Z · LW · GW

Yes, I've actually seen people say that, but cells do use myosin to transport proteins sometimes. That uses a lot of energy, so it's only used for large things.

 

Cells have compartments with proteins that do related reactions. Some proteins form complexes that do multiple reaction steps. Existing life already does this to the extent that it makes sense to.

 

Humans or AI designing a transport/ compartmentalization system can go "how many compartments is optimal". Evolution doesn't work like this. It evolves a transport system to transport one specific thing in one specific organism. 

It's like humans invent railways in general. Evolution invents a railway between 2 towns, and if it wants to connect a third town, it needs to invent a railway again from scratch. (Imagine a bunch of very secretive town councils)

Comment by Donald Hobson (donald-hobson) on grey goo is unlikely · 2024-08-14T12:30:18.529Z · LW · GW

Remember, we are talking about the power of intelligence here. 

For nanobots to be possible, there needs to be one plan that works. For them to be impossible, every plan needs to fail. 

How unassailably solid did the argument for airplanes look before any were built?

Comment by Donald Hobson (donald-hobson) on grey goo is unlikely · 2024-08-14T12:26:24.448Z · LW · GW

Regardless of the specific argument here, biological cells are already near pareto optimal robots in terms of thermodynamic efficiency.

 

You can point to little bits of them that are efficient. Photosynthesis still sucks. Modern solar panels are WAY better.

Also, bio cells don't try to build fusion reactors. All that deuterium floating around for the taking and they don't even try. 

Nanobots that did build fusion reactors would have a large advantage. Yes this requires the nanobots to work together on macro scale projects. 

Comment by Donald Hobson (donald-hobson) on grey goo is unlikely · 2024-08-14T12:22:31.435Z · LW · GW

All life runs on DNA in particular. Scientists have added extra base pairs and made life forms that work fine. Evolution didn't. DNA is a fairly arbitrary molecule amongst a larger set of similar double chain organics. 

 

I think this post is just dismissing everything with weak reasons. I don't think this post is evidence at all, by conservation of expected evidence, we should take an unusually bad argument against a position as evidence for that position. 

If nanotech really was impossible, it's likely that better impossibility arguments would exist. 

Comment by Donald Hobson (donald-hobson) on grey goo is unlikely · 2024-08-14T12:16:00.320Z · LW · GW

If I was a malicious AI trying to design a pandemic.

  1. Don't let it start in one place and slowly spread. Get 100 brainwashed humans to drop it in 100 water reservoirs all at once across the world. No warning. No slow spreading, it's all over the earth from day 1. 
  2. Don't make one virus, make at least 100, each with different mechanisms of action. Good luck with the testing and vaccines.
  3. Spread the finest misinformation. Detailed plausible and subtly wrong scientific papers. 
  4. Generally interfere with humanities efforts to fix the problem. Top vaccine scientists die in freak accidents. 
  5. Give the diseases a long incubation period where they are harmless but highly effective, then they turn leathal. 
  6. Make the diseases have mental effects. Make people confident that they aren't infected, less cautious about infecting others, or if you can make everyone infected a high functioning psycopath plotting to infect as many other people as possible. 
Comment by Donald Hobson (donald-hobson) on grey goo is unlikely · 2024-08-14T12:02:19.100Z · LW · GW

Biological life uses an ATP system. This is an energy currency,  but it's discrete. Like having batteries that can only be empty or full. It doesn't give a good way to apply smaller amounts of energy than 1 atp molecule carries, even if less energy is needed.

Nanobots could have a continuous energy system, or smaller units of energy.

Comment by Donald Hobson (donald-hobson) on Rabin's Paradox · 2024-08-14T11:32:44.829Z · LW · GW

I think this kind of loss aversion is entirely sensible for IRL betting, but makes no sense for the platonic ideal betting.

Transaction costs exist. How big those costs are depends on circumstances. For example if you need to exchange money into/ out of USD  then the transaction costs are larger. 

It's also not clear that this bet isn't a scam of some kind. Especially as there is no clear reason for a non-scammer to offer this bet. 

 

This is much the same as that "doctor cutting up healthy patient for organs" moral dilemma, where the question asserts there is 0 chance of getting caught, but human intuitions are adjusted to a non-zero chance of getting caught. 

Human intuition automatically assumes things aren't frictionless spheres in a vacuum, even when the question asserts that it is. 

You can't just tell people that this is definitely not a scam and get people to say what they would do if they somehow gained 100% certainty it wasn't a scam. Human intuitions are adjusted for the presence of scams. The calculations our mind is running don't have a "pretend scams don't exist" option.

Comment by Donald Hobson (donald-hobson) on I still think it's very unlikely we're observing alien aircraft · 2024-08-06T11:24:25.696Z · LW · GW

Also "aliens exist" is highly plausible.

"Aliens visit earth in a wide variety of badly hidden spacecraft in a pattern like they are trolling us" is somewhat less plausible.

Other things we might see with aliens. 

  1. Non-existance. Earth disassembled to make dyson sphere back when there were dinosaurs. 
  2. Radio messages.
  3. Aliens landing on white house lawn. Making their existence very clear and asking for trade deals. 
  4. Exhaust plumes around pluto. Breaking from near light speed takes A LOT of thrust.
  5. Aliens kill everyone with grey goo nanotech. 
  6. Aliens upload everyone and send mind copies to their homeworld to run psycology experiments on. 
  7. Nothing. Aliens have got flawless stealth, no one sees a glimpse.
  8. A single design of craft with uniform size, shape and performance that is making no attempt to hide.

 

The chance of flying saucers as we observe them given only aliens reaching earth seems quite small. 

Comment by Donald Hobson (donald-hobson) on Ice: The Penultimate Frontier · 2024-07-18T16:23:41.588Z · LW · GW

Gold is high value per mass, but has a lot of price transparency and competition. 

Comment by Donald Hobson (donald-hobson) on patent process problems · 2024-07-17T15:06:21.743Z · LW · GW

Also, there are big problems with the idea of patents in general. 

If Alice and Bob each invent and patent something, and you need both ideas to be a useful product, then if Alice and Bob can't cooperate, nothing gets made. This becomes worse the more ideas are involved. 

It's quite possible for a single person to patent something, and to not have the resources to make it (at least not at scale) themselves, but also not trust anyone else with the idea.

Patents (and copyright) ban a lot of productive innovation in the name of producing incentives to innovate. 

Arguably the situation where innovators have incentive to keep their idea secret and profit off that is worse. But the incentives here are still bad. 

How about 

  1. When something is obviously important with hindsight, pay out the inventors. (Innovation prize type structure. Say look at all the companies doing X, and split some fraction of their tax revenue between inventors of X) This is done by tracing backwards from the widely used product. Not tracing forwards from the first inventor. If you invent something, but write it up in obscure language and it gets generally ignored, and someone else reinvents and spreads the idea, that someone gets most of the credit. 
  2. Let inventors sell shares that are 1% of any prize I receive for some invention.
Comment by Donald Hobson (donald-hobson) on The Incredible Fentanyl-Detecting Machine · 2024-07-17T14:49:12.120Z · LW · GW

Do x-rays only interact with close in electrons? 

I would expect there to be some subtle effect where the xray happened to hit an outer electron and knock it in a particular way. 

For that matter, xray diffraction can tell you all sorts of things about crystal structure. I think you can detect a lot, with enough control of the xrays going in and out.