The world of potential preferences is a super-high dimensional space, of which what I care about is only a tiny subset (though still complex in its own right).
Taking actions or offers which would improve the world according to my preferences has both absolute cost and opportunity cost, meaning I only take actions with at least some threshold positive impact.
Preferences outside my core domain of action are:
generally very weak at best, with total indifference being the norm, and
chaotically noisy, such that they may vary according to all kinds of situational characteristics unpredictable to myself or an outside observer
Also, my understanding of how my actions affect those areas I truly care about is sufficiently imperfect, that outside of a well understood range the expected value is low, especially with conservative preferences.
These factors greatly reduce the scope of potential dutch books which I would ever actually take, reducing the ability of somebody to exploit any inconsistencies.
Also, repeated exposure to simple of dutch books and failures to maximize are likely
We should expect therefore to find agents which are utility-maximizing only as a contingent outcome of the trajectory of their learning to display utility-maximizing behaviors in areas that are both reward-relevant, within the training domain.
While self-modification to create a simple smooth plain is a plausible action, it shouldn't be seen as dominant since less drastic actions are likely to be sufficient to avoid being dutch booked.
A major crux for this view as applied to systems like humans is the explanation of how our (relatively) simple, compressible goals and ideas emerge out of the outrageous complexity of our minds. My feeling is that, once learning has started, adjustments to the nature of the mind pick up broad contours as a way to act but only as a reflection of the world and reward system in which they are placed. If instead there is some kind of core underlying drive towards logical simplicity, or that a logically simple set of drives, once in place, is some how dominant or tends to spread through a network, then I would expect smarter agents to quickly become for agent-like.
My intuitive feeling about the value of the utility function abstraction, is that for systems like humans and current neural networks, the pre-experiential mind is like a rough, rocky mountain, without the effect of wind or rain, while a utility maximizing agent is analogous to a perfectly flat plain.
Under the influence of consistent incentives, certain sections of this mountain are worn smooth over time, allowing it to be well approximated as a utility maximizing system. We should not expect it to act as a utility maximizer outside of this smoothed area, but we should expect this smooth area to exist even with only weak assumptions about the agent.
I've been looking at papers involving a lot of 'controlling for confounders' recently and am unsure about how much weight to give their results.
Does anyone have recommendations about how to judge the robustness of these kind of studies?
Also, I was considering doing some tests of my own based on random causal graphs, testing what happens to regressions when you control for a limited subset of confounders, varying the size/depth of graph and so on. I can't seem to find any similar papers but I don't know the area, does anyone know of similar work?
This employee has 100 million dollars, approximately 10,000x fewer resources than the hedge fund. Even if the employee engaged in unethical business practices to achieve a 2x higher yearly growth rate than their former employer, it would take 13 years for them to have a similar amount of capital.
I think it's worth being explicit here about whether increases in resources under control are due to appreciation of existing capital or allocation of new capital.
If you're talking about appreciation, then if the firm earns 5% returns on average and the rogue employee earns 10% then the time for their resources to be equal would be ln(10000)/ln(1.05) = 189 years, not 13.
If you're instead talking about capital allocation then swings much faster than yearly doublings are very easy to imagine - for a non-AGI example see Blackrock's assets under management.
In general I think you could make the argument stronger by looking empirically at the dynamics by which the large passive investing funds acquired multiple trillions in managed assets with (as I understand it) relatively small pricing edges and no strategic edge, and extrapolating from there.
Cheers for the post, I find the whole series fascinating.
One thing I was particularly curious about is how these 'proposals' are made. Do you have a picture of what kind of embedding is used to present a potential action?
For example, is a proposal encoded in the activations of set of neurons that are isomorphic to the motor neurons and it could then propose tightening a set of finger muscles through specific neurons? Or is the embedding jointly learned between the two in some large unstructured connection, or smaller latent space, or something completely different?
Another little update, speed issue solved for now by adding SymPy's fortran wrappers to the derivative calculations - calculating the SVD isn't (yet?) the bottleneck. Can now quickly get results from 1,000+ step simulations of 100s of particles.
Unfortunately, even for the pretty stable configuration below, the values are indeed exploding. I need to go back through the program and double check the logic but I don't think it should be chaotic, if anything I would expect the values to hit zero.
It might be that there's some kind of quasi-chaotic behaviour where the residual motion of the particles is impossibly sensitive to the initial conditions, even as the macro state is very stable, with a nicely defined derivative wrt initial conditions. Not yet sure how to deal with this.
Been a while but I thought the idea was interesting and had a go at implementing it. Houdini was too much for my laptop, let alone my programming skills, but I found a simple particle simulation in pygame which shows the basics, can see below.
Planned next step is to work on the run-time speed (even this took a couple of minutes run, calculating the frame-to-frame Jacobian is a pain, probably more than necessary) and then add some utilities for creating larger, densely connected objects, will write up as a fuller post once done.
Curious if you've got any other uses for a set-up like this.
Reading this after Steve Byrnes' posts on neuroscience gives a potentially unfortunate view on this.
The general impression is that the a lot of our general understanding of the world is carried in the neocortex which is running a consistent statistical algorithm and the fact that humans converge on similar abstractions about the world could be explained by the statistical regularities of the world as discovered by this system. At the same time, the other parts of the brain have a huge variety of structures and have functions which are the products of evolution at a much more precise level, and the brain is directly exposed to, and working in response to, this higher level of complexity. Of course, it doesn't mean these systems can't be reliably compressed, and presumably have structure of their own, but it may be very complex, not be discoverable without high definition and so progress on values wouldn't follow easily from progress in understanding world-modelling abstractions.
This would suggest that successes in reliably measuring abstractions would be of greater use to general capability and world modelling than to understanding human values. It would also potentially give some scientific backing to the impression from introspection and philosophy that the core concepts of human values are particularly difficult concepts to point at.
I guess one lesson would be to try and put a focus on this case where at least part of the complexity of the goal of a system is in a system directly in contact with the cognitive system rather than observed at a distance.
Also interested in helping on this - if there's modelling you'd want to outsource.
I roughly have similar beliefs and I've thought about the same question before.
The hope is that you could make more specific bets based on trends which are not currently clear to the world as a whole but will become apparent relatively soon. For example, I think I remember Gwern asking whether, if the scaling power of larger NNs continues, Nvidia will become the most valuable company in the world as the power of truly massive models/training volumes becomes apparent and they're in prime position to profit.
The problem is that shares on the frontier of AI developments are already subject to a lot of hype from somewhat similar beliefs (e.g. anyone who is a major blockchain believer, or a big AI believer but in a purely positive sense). These stocks are therefore already significantly overvalued by traditional metrics and it's not obvious whether NN progress is enough to generate major share price growth, at least with high enough probability to overcome the presumably very high discount rates that you have, even within the next 10 years (e.g. Nvidia market cap is $360B, so even becoming the largest company in the world only implies a ~6x price increase and it's hard to give this more than 15% credence in the next decade).
It seems that if you believe specifically in short timelines then there may be companies who are particularly likely to succeed given the importance of massive models (if indeed that's the way you expect things to play out). At the moment though, most of those in position to take advantage seem to either be embedded in larger companies (DeepMind, big tech AI divisions) or just not public (OpenAI, most startups).
Ideally I guess there would be a venture capital fund which you could place money into which would invest in the most promising companies which themselves are betting on being in position to take commercial advantage of ML breakthroughs. I'm not sure I'm aware of any such fund but I'd certainly be interested if one exists/is being created.
Question about error-correcting codes that's probably in the literature but I don't seem to be able to find the right search terms:
How can we apply error-correcting codes to logical *algorithms*, as well as bit streams?
If we want to check that bit-stream is accurate, we know how to do this for a manageable overhead - but what happens if there's an error in the hardware that does the checking? It's not easy for me to construct a system that has no single point of failure - you can run the correction algorithm multiple times but how do you compare the results without ending up back with a single point of failure?
Anyone know any relevant papers or got a cool solution?
Interested for the stability of computronium-based futures!
I agree that this is the biggest concern with these models, and the GPT-n series running out of steam wouldn't be a huge relief. It looks likely that we'll have the first human-scale (in terms of parameters) NNs before 2026 - Metaculus, 81% as of 13.08.2020.
Does anybody know of any work that's analysing the rate at which, once the first NN crosses the n-parameter barrier, other architectures are also tried at that scale? If no-one's done it yet, I'll have a look at scraping the data from Papers With Code's databases on e.g. ImageNet models, it might be able to answer your question on how many have been tried at >100B as well.
Hey Daniel, don't have time for a proper reply right now but am interested in talking about this at some point soon. I'm currently in UK Civil Service and will be trying to speak to people in their Office for AI at some point soon to get a feel for what's going on there, perhaps plant some seeds of concern. I think some similar things apply.
As I understand it, one of the biggest issues with a land value tax is that the existence of the tax instantly makes owning land much less desirable - reduced by the net present value of the total future taxation. This is obviously in some sense part of the plan but it causes some pretty large sudden shifts in wealth - in particular away from anyone who has a mortgage but also just from home owners in general.
Implementing it in a fair/politically acceptable way then seems to require either a far-off starting date, a very slow taper in or a very large series of handouts to compensate, and all of these are difficult for a government to implement given the time horizon of elections and a large, wealthy group who will be opposed to this, likely including inside the governing party.
This isn't especially relevant to your variant but if you're thinking about how to get efficient taxation then this is something to think about trying to find a solution to :)
On the numbers from The Precipice - I think the point is that the next 100 years have an estimated 1/6 chance of extinction, but also contain the power to protect us from future harm and facilitate the human race flourishing across the universe. Extrapolating risk from next 100 years to an expected 600 year lifespan, and using current population forecasts as the number of humans involved therefore seems not in the spirit of his model.
I think this this points to the strategic supremacy of relevant infrastructure in these scenarios. From what I remember of the battleship era, having an advantage in design didn't seem to be a particularly large advantage - once a new era was entered, everyone with sufficient infrastructure switches to the new technology and an arms race starts from scratch.
This feels similar to the AI scenario, where technology seems likely to spread quickly through a combination of high financial incentive, interconnected social networks, state-sponsored espionage etc. The way in which a serious differential emerges is likely to be more through a gap in the infrastructure to implement the new technology. It seems that the current world is tilted towards infrastructure ability diffusing fast enough to, but it seems possible that if we have a massive increase in economic growth then this balance is altered and infrastructure gaps emerge, creating differentials that can't easily be reversed by a few algorithm leaks.
Apologies if this is not the discussion you wanted, but it's hard to engage with comparability classes without a framework for how their boundaries are even minimally plausible.
Would you say that all types of discomfort are comparable with higher quantities of themselves? Is there always a marginally worse type of discomfort for any given negative experience? So long as both of these are true (and I struggle to deny them) then transitivity seems to connect the entire spectrum of negative experience. Do you think there is a way to remove the transitivity of comparability and still have a coherent system? This, to me, would be the core requirement for making dust specks and torture incomparable.
Late to the party but I'm pretty confident he's saying the opposite - that a 1 PFLOP/s system is likely to have 10 or more times the computational capacity of the human brain, which is rather terrifying.
He gives the example of Baidu's Deep Speech 2 which requires around 1 GFLOP/s to run and produces human-comparable results. This is 10^6 slower than the 1 PFLOP/s machine. He estimates that this process in humans take around 10^-3 of the human brain, thereby giving the estimate of a 1 PFLOP/s system being 10^3 times faster than the brain. His other examples give similar results.
An easy way to deal with this difficulty is to replace 'at least as happy with policy A as with policy B (in any situation that we think might arise in practice)' with 'at least as happy with policy A as with policy B (when averaged over the distribution of situations that we expect to arise)', though this is clearly much weaker.
To me it seems that the reason this stronger sense of ordering is used is because we expect this amplification procedure to be of a sort that produces results such that A+ is strictly better than A but that even if this wasn't the case, the concept of an obstruction would still be a useful one. Perhaps it would be reasonable to take the more relaxed definition but expect that amplification would produce results that are strictly better.
I also agree with Chris below that defining an obstruction in terms of this 'better than' relation brings in serious difficulty. There are exponentially many policies Bthat are no better than A+ and there may well be a subset of these can be amplified beyond A+ but as far as I can tell there's no clear way to identify these. We thus have an exponential obstacle to progress even within a partition, necessitating a stronger definition.
When you talk about 'black-box' versions of Hugh, do you envision that H is able to answer questions relating to the cognitive processes that lead to the answer given, or about H's thinking in general? This seems to contradict the spirit of a black box but self reflection is an important part of Hugh's cognitive ability.
Perhaps they are both useful possibilities, my intuition is that this kind of self reflection is as far from being possible for AI as any human ability and so we should expect that we might have systems powerful enough to take on wide responsibility without this ability. If it were possible, though, the ability to use loops of self reflection to check whether a cognitive process serves a certain goal would be very helpful.
I've realised that you've gotta be careful with this method because when you find a trichromatic subtriangle of the original, it won't necessarily have the property of only having points of two colours along the edges, and so may not in fact contain a point that maps to the centre.
This isn't a problem if we just increase the number n by which we divide the whole triangle instead of recursively dividing subtriangles. Unfortunately now we're not reducing the range of co-ords where this fixed point must be, only finding a triad of arbitrarily close points that map to a triangle surrounding the centre. You can, for example, take the centre point of the first of these triangles (with some method of numbering to make the function definite) for each value of n=1,2,3.. as a sequence in R2. This must have a convergent sequence which should converge to a point that maps to the centre but I can't prove that last stage.
Also, if we have a proof for #6 there's a pleasant method for #7 that should work in any dimension:
We take our closed convex set S that has the bounded function h:S→S . We take a triangle T that covers S so that any point in S is also in T .
Now we define a new function h′:T→T such that h′(x)=h(cs(x)) where cs(x) is the function that maps x to the nearest point in S.
By #6 we know that h′ has a fixed point, since cs is continuous. We know that the fixed point of h′ cannot lie outside S because the range of h′ is S. This means h′ has a fixed point within S and since for x∈S, h(x)=h′(x), h has a fixed point.
Yeah agreed, in fact I don't think you even need to continually bisect, you can just increase n indefinitely. Iterating becomes more dangerous as you move to higher dimensions because an n dimensional simplex with n+1 colours that has been coloured according to analogous rules doesn't necessarily contain the point that maps to zero.
On the second point, yes I'd been assuming that a bounded function had a bounded gradient, which certainly isn't true for say sin(x^2), the final step needs more work, I like the way you did it in the proof below.
Here's a messy way that at least doesn't need too much exhaustive search:
First let's separate all of the red nodes into groups so that within each group you can get to any other node in that group only passing through red nodes, but not to red nodes in any other group.
Now, we trace out the paths that surround these groups - they immediately look like the paths from Question 1 so this feels like a good start. More precisely, we draw out the paths such that each vertex forms one side of a triangle that has a blue node at its opposite corner. Note that you can have multiple paths stemming from the same group if the group touches the side of the larger triangle, or if it has internal holes.
Now we have this set of paths we can split them into three kinds. The first is loops, which arise when you have a group which never touches the edge of the larger triangle, or inside 'holes' in large groups. These can be seen as a path starting and finishing at the same node. They therefore have an even number of b-g vertices. The second kind is those that begin at the edge of the large triangle and end at the same edge. These paths begin and end on the same colour and therefore also have an even number of b-g vertices. Finally and most importantly there is a kind of path that goes from one edge to the other -in the case of the reds, the left edge to the right edge. This will happen once with the group that includes the top red node, and if any other group spans the larger triangle then it will generate two more of these paths. Sperner's lemma tells us that these will have an odd number of b-g vertices and we know that there will be an odd number of such paths, so this final type generates an odd number of total b-g vertices.
By the way that we have defined these paths, the total number of r-g-b triangles is equal to the number of g-b vertices on the paths in the set generated above. This number is the sum of an odd number from the spanning paths and a series of even numbers from the other paths, giving an odd overall number of r-g-b vertices, proving number 4 (as long as I haven't made an error in categorizing the paths).
I hope this makes sense, let me know if it doesn't or has errors :)
I was able to get at least (I think) close to proving 2 using Sperner's Lemma as follows:
You can map the continuous function f(x) to a path of the kind found in Question 1 of length n+1 by evaluating f(x) at x=0, x=1 and n-1 equally spaced divisions between these two points and setting a node as blue if f(x) < 0 else as green.
By Sperner's Lemma there is an odd, and therefore non-zero number of b-g vertices. You can then take any b-g pair of nodes as the starting points for a new path and repeat the process. After k iterations you have two values of x - only one where f(x) is below zero - that are 1/(n^k) away from each other. We thus can find arbitrarily close points that straddle zero. By taking the sequence f(x) of initial nodes x we get a sequence that, by B-W, has a sub-sequence which converges to zero. By continuity we have proved the existence of an x such that f(x)=0.
We can be sure that the sub-sequence does in fact converge to zero, rather than any other value because if it converges to any number |a|>0, the gradient of f(x) would have to be arbitrarily high to dip back below/above 0 for a value of x arbitrarily close by and therefore would not be a continuous function.
Comments to tighten up/poke holes in the above appreciated :)
For long term bets, where the opportunity cost of tying money up in these bets becomes high, I would have thought that the bets should be denominated in US bonds (or other agreed minimal-risk interest rate asset) to minimize this cost.
Even if the bet does not pay out one way or another, the money still accumulates interest.
Other than being incompatible with Augur, are there any theoretical or practical hurdles to using this? It would hopefully reduce the subsidy required to make an attractive market without incurring cost in and of itself.
If the joining bonus were large enough to give a new member enough DKP to get the choice items, then older members would (quite rightly) complain. If it were smaller, it wouldn’t work.
I guess my central question is, a new player will have infinite EP/GP after they first receive EP. They can therefore wait until their perfect item comes up, and choose that. This to me seems extremely similar to giving an uncertain but potentially very large joining bonus. After losing this infinite ratio status, the situation then seems very similar to a free market one. In particular I don't understand why having collected lots of points (ie ability to claim future value) would lead to your incentive dropping off, while accumulating a high ratio (which you'd presumably need to 'save' for a while for really top items) doesn't have this problem.
I'm curious but a bit confused about some of the benefits of EP/GP over the straight free-market model, but if EP/GP did indeed take over then I'm sure there's something I'm missing.
1: Presumably, in both models, in the long run it takes roughly the same average amount of time (modulated by your efficiency of pro-social activity) to get an item of quality >x, but it seems that in EP/GP you get your first almost immediately, while in DKP your timer starts from 0. Was there the issue of individuals jumping around guilds to try and get that first item?
2: Is there any system by which one can defer the receiving of items in EP/GP so that you don't end up getting something that is of low or nil value to you (especially since they can't be traded)? The main advantage of the free-market, at least in systems where individuals have similar ability to earn currency, is usually that items go to those who value them most, so you'd expect DWP to have a big efficiency advantage if you can't choose whether to accept. On the other hand, if this deferral is possible, would this degenerate into something like a free market, except where new entrants have first dibs over everything?
The power of attracting new players is a valuable advantage I'm sure but it's the only one that I really see from the 3 given above, and I can't see how this isn't possible in a similar way by, say, a free market system where a new member gets some kind of joining bonus.