I am kind of suprised you didn't reference causal inference here to just gesture at the task in which we "figure out which variables are directly relevant - i.e. which variables mediate the influence of everything else". Are you pointing to a different sort of idea/do you not feel causal inference is adequate for describing this task?
Also, scenario 1 and 2 seem fairly close to the "linear" and "non-linear" models of innovation Jason Crawford described in his talk "TheNon-Linear Model of Innovation." To be honest, I prefered his description of the models. Though he didn't cover how miraculous it is that somehow the model can work. That, to a good approximation, the universe is simple and local.
The strategy of conflict is condensed instrumental rationality. Much of the content is covered elsewhere, but I don't know of a superior qualatative presentation.
Talking about qualatative presentations, thinking physics is a set of hundreds of physics problems, designed to show how important conservation laws and infinitesimals are. The problems are all solvable with some careful thought, and cover quite a deal of ground. I wish more books were written in this way.
Here's a visualisation that goes along with Euclid's elements
This was one of many from an article on "The Empirical MetaMathematics of Euclid and Beyond". It is a long essay on the overarching structure of Euclid's elements and verifies some claims made about Euclid's Elements e.g. the proofs were ordered in nearly the most parsimonious way possible. It also finds the most difficult theorems in each book, the greatest possible reductions in proof length, and hints that the network of theorem dependancy has a local 2-d structure. Highly recommend the article.
Definetely not a subject, but I'd say that the visualisation of Wolfram's theory of everything is excellent. Of course there are problems with his theory of everything, like the fact that he hasn't actually proved his claims that it generates GR field equations or replicates QM. Or shown that his theory evades the critical objection Scott Aaronson raised. but as a visualisation:
It is aesthetically pleasing
Compactly contains the basic ideas of his T.o.E.
Ties the basic concepts together to see how they could generate a theory of physics
I am glad you put the quotation marks around "morality as taxes" since what my mind jumped to upon verbalising the title was what you described in the last part of your post: something you'd be glad to evade where possible. In retrospect, its clear that the quotation marks were meant to point to another approach and not the one your thought experiment is meant to represent. Still, I think "Wholehearted choices vs morality as taxes" would be a little clearer as a title.
I assumed the question was about the first few decades after "first contact".
A large chunk of my probability mass is on first contact being unintentional, and something neither side can do much about. Or perhaps one "side" is unaware of it. Like if we receive some message directed to no one in particular, or recording the remnants of some extreme cosmic event that seems mighty unatural.
It feels like we're near certain to have created an AGI by then. I am unsure enough about the long term time scales of AGI improvement, and their limits, that I can assign some credence to the AGI we make possessing relatively advanced technology. And so, it may be in a good bargainning position. If we make plenty of AI, maybe they'll be less powerful individually, but they should still be quite potent in the face of a superior adversary.
Use a very large magnifying glass on the candle itself, igniting the wax
Chuck the candle into a very hot oven
Use a laser to ignite it, perhaps getting one from a CD scanner and overloading it
Run a stupendously large electric current through the candle wax or the wick.
Go into a volcano
Launch the candle into the sun in one of 50 ways
Build a simple bomb, perhaps a flour bomb, and use it to "ignite" the candle momentarily.
Grab some wood sticks, dry them out for a couple months, turn one into tinder by scraping and use the other two to ignite it through friction.
Strip the candle of the wax, unravel the rope, wrap the strands around the candle, and now you can much more easily ignite the candle using any of the prior methods.
Pay someone else to do ignite the candle.
Bully someone else into igniting the candle.
Wait until someone finishes smoking the cigar with the transformed candle from 12.
Get into a ferocious gunfight with a flour bomb backpack with the candle in the middle of the backpack.
Pay someone else to come up with ideas for how to ignite the candle.
Go to my car, open up the engine, stick the candle in one of the cylinders (perhaps by cutting it apart and re-assembling it withing using some tweezers) and turn on the ignition.
Coat the candle in sodium and throw it in water.
Unravel the wick, recombine into two, thinner wicks, and rub them against each other rapidly enough to ignite them.
Change the meaning of the phrase "light a candle" by tying it to a pertinent aspect of politics in my local bubble and then do whatever is entailed by the new meaning.
Notice that all candles emit light because they have non zero temperature. Hence genetically modify all future humans s.t. they can detect the radiation it gives off.
Drive myself into an Everett branch in which the candle spontanesouly ignites.
Use the prior idea to accomplish any of the former 23 methods.
Grab a gun and shoot the candle wick.
Use a magnifying glass to ignite flour witht the candle nearby.
Use a magnifying glass to ignite some oil, throw water on that with the candle nearby.
Alter the candle piece by piece in such a way that it is deemed to be the same candle after each replacement, replacing it each step of the way with wax with a lower ignition point and then ignite than.
Do the above with a candle that is already lit within a high oxygen atmosphere.
Pump in more oxygen into the atmosphere and use that to enhance any prior method so I can e.g. use a small magnifying glass to easily ignite the candle.
Wait until there's a nearby fire/explosion/more favourable conditions for lighting a candle and do so then.
Use a magnifying glass/laser/whatever to ignite my clothing and transfer the flame that way.
Repeat the same but with any nearby flammable object. As an example, igniting my hair in the case that I happen to be locked naked in a room with a candle and a magnifying glass. Then light the candle that way.
Find a landmine in an active warzone and step on it with the candle.
Notice that the challenge does not specify a particular candle and choose to light one that is already on fire, and reject the Copenhagen interpretation of responsibility.
Again, note that I am causally influencing every event in my future light cone, which practically guarantees that I will in some way cause a candle to be lit.
Note that the candle need not be lit within our universe: there are universes which evolve exactly like our own up to this point but happen to have a rule specifying that the thing which is isomorphic to me manages to light a candle after reading this post. Recognise that there can be no consistent notion of selfhood beyond a similaiarity between structures and employ this perspective to say that I must always light a candle.
Write a book in which I light a candle.
Run a simulation in which a virtual candle is lit.
Put out a contract which specifies that the first EM to light a virtual candle can have all my savings.
Sell the candle to someone who's participating in a "light a candle for X" festival.
Make a festival where people are encouraged to light candles, perhaps in a former Hindu community to take advantage of Diwali.
Release flourine upon the candle and watch in glee as it devours everything around it.
Slowly wear away a candle over time until it is practically just a strand of rope and then ignite that.
Change the pressure of the surrounding air to make it easier to use any of the prior methods.
Go to a paticularly storm prone area, find the highest lightning rod that I can and tape the candle to it with duck tape.
Train fire flies so that they are attracted to candles.
Break a candle up and re-arrange it into near atomically small components which resembles candles and proceed to ignite them by baking them in the oven.
Embed the candle with LED lights.
Get one of those flourescent fungi, sterilise the candle, make some system of tunnels going through it and place some sawdust in their and proceed to grow a flourescent fungi within the candle.
Magnetise the candle in a strong enough field and rotate it rapidly to cause it to throw off radiation.
Use molecular nanotechnology to dissasemble the wax, remove electrons from each molecule, recombine it then move it around to create a current and thus an EM field.
Collapse spacetime into a point so that all light co-incides with the former consituents of the candle.
Note that the things which became the candle once emitted vast amounts of light before atoms were formed, then move backwords through time at such a rapid pace that it seems as if the candle instantaenously turns into a cloud of protons, neutrons and electons emitting horrendous amounts of light.
Deplete the supply of candles throughout the world, raising their relative demand and then proceed to sell my candle.
Destroy civilisation and then sell my candle once lighting no longer works.
Getting kind of bored now and I think an hour has passed.
The post about Sweden's unusual situation you linked to has updated. The author claims that the reduced death rate is mostly due to younger people getting near all of the covid cases, which is supported by recent data (the figure shows total number of changes between July and 03 Nov). Why that is the case is another issue.
What about Thorium? A back of the envelope calculation suggests thorium reactors could supply us with energy 100-500 years. I got this from a few sources. First used the figure of the 170 GW days produced per metric tonne of fuel (Fort St Vrain HTR) and the availability of fuel (500-2500 ktonnes according to Wikipedia) to estimate 10-50 years out of Thorium reactors if we keep using 15TW of energy. And that's not even accounting for breeding reactors, which can produce their own fuel. So if we do go with the theoretical maximum, then we should multiply this figure by 50. I'm basing that estimate of the (probably peak) fuel efficiency of Thorium from what Carlo Rubia of Cern said (see Wikipedia article above). That is, 1 tonne can provide 200 times more power than 1 tonne of Uranium. Since Uranium produces ~45 GW days per metric tonne of fuel, we get the estimae of 50 times. Then we get the figure of 500-2500 15TW years.
Supposing that we really need four or five times the amount of energy we actually use, leaves us with an upper bound of ten times the naive estimate. So I'd estimate thorium could provide 100-500 75TW years.
Thanks for the reply. Feelings of helplessness sounds about right, and I think you may be right about giving your self the feeling that you are being supported. Only, people with severe chronic pain often suffer from anxiety and depression as well. It seems like it would be a hard battle getting their brains to recognise those aforementioned feelings.
Somewhat urgent: can anyone recommend a good therapist or psychiatrist for anxiety/depression in the UK? Virtual sessions are probably required. Private is fine. Also, they shouldn't be someone biased towards rationalist types. The person I'm thinking of has nearly no knowledge of these ideas.
I still disagree. You can use Fermat's last theorem rigorously without understanding why it works. Same for the four colour theorem. And which mathematics understand why we can classify finite simple groups the way we do? I'd bet fewer than a percent do. Little wonder, if the proof's 3 volumes long! My point is that there are many theorems a mathematician will use without rigorously knowing why it works. Oh sure, you can tell them a rough story outlining the ideas. But could the prove it themselves? Probably not, without a deep understanding of the area. Yet even without that understanding, they can use these theorems in formal proofs. They can get a machine to check over it.
Now, I admit that's unsatisfying. I agree that if they don't, then they don't have a rigorous understanding of the theorem. Eventually, problems will arise which they cannot resolve without understanding that which they accepted as magic. But is that really so fatal a flaw for teaching students the hyperreals? One only needs a modest amount of logic, perhaps enough for a course or two, to understand why the transfer principle works. Which seems a pretty good investment, given how much model theory sheds light on what we take for grounded.
Now I suppose if you find infinitary mathematics ugly, then is all besides the point. And unfortunately, there's not much I can say against that beyond the usual arguements and personal aesthetics.
No, to understand why the transfer principle works requires a fair amount of knowledge of mathematical logic. It doesn't follow that you can't perform rigorous proofs once you've accepted it. Or am I missing something here?
Because of the shift in culture in mathematics, wherein the old proofs were considered unrigorous. Analysis ala Weirstrauss put the old statements on firmer footing, everyone migrated there, and infinitesimals were left to langiush until a transfer principle was proven to give them a rigorous founding. But by that time, standard analysis had born such great fruits that it was deeply intertwined with modern mathematics. And of course, there's been a trend against the infinitary and against the incomputable in the past century.
So there's both institutional inertia due to historical developments, as well as some philosophical objections which really boil down to whether you're fine with infinitary mathematics. I make no arguements concerning the latter, I just note that one can reject infinitary mathematics without believing they're ugly. Now if you're saying not all infinitary mathematics is ugly, just the hypereals, that's a different claim. I can get why one might think they're uglier than e.g. the complex numbers, but I don't get why they'd be ugly, period. May I ask why you think so?
Stuart, by "Prt(R|D1;j) is complex" are you referring to their using R=R(.,E[ΘR∗|D1;j]) as the estimated reward function?
Also, what did you think of their arguement that their agents have no incentive to manipulate their beliefs because they evaluate future trajectories based of their current beliefs about how likely they are? Does that suffice to implement eq. 1) from your motivated value selection paper?
Not really? The axioms (for hyperreals) aren't much different to that of the reals. Yes, its true that you need some strange constructions to justify that the algebra works as you'd expect. But many proofs in analysis become intuitive with this setup, and surely aid pedagogy. Admittedly, you need to learn standard analysis anyway since the tools are used in so many more areas. But I'd hardly call it ugly.
Recall that memories are pathway dependant i.e. you can remember an "idea" when given verbal cues but not visual ones. Or given cues in the form of "complete this sentence " and "complete this totally different sentence expressing the same concept". If you memorise a sentence and can recall it any relevant context, I'd say you've basically learnt it. But just putting it into SRS on its own won't do that. Like, that's why supermemo has such a long list of rules and heuristics on how to use SRS effectively.
Here's the re-written version, and thanks for the feedback.
Having an Anki deck is kind of useless in my view. When you encounter a new idea, you need to integrate it with the rest of your thoughts. For a technique, you must integrate it with your unconscious. But often, there's a tendency to just go "oh, that's useful" and do nothing with it. Putting it into space repitition software to view later won't accomplish anything since you're basically memorising the teacher's password. Now suppose you take the idea, think about it for a bit, and maybe put it into your own words. Better both in terms of results and using Anki as you're supposed to.
But there are two issues here. One, you haven't integrated the Anki cards with the rest of your thoughts. Two, Anki is not designed such that the act of integrating is the natural thing to do. Just memorising it is the path of least resistance, which a person with poor instrumental rationality will take. So the problem with using Anki for proper learning is that you are trying to teach insturmental rationality via a method that requires instrumental rationality. Note its even worse teaching good research and creative habits, which requires yet more instrumental rationality. No, you need a system which fundamentally encourages those good habits. Incremental reading is a litttle better, if you already have good reading habits which you can use to bootstrap your way to other forms of instrumental rationality.
Now go to pargraph two of the original comment.
P.S. Just be thankful you didn't read the first draft.
Having an Anki deck is kind of useless in my view as engaging with the ideas is not the path of least resistance. There's a tendency to just go "oh, that's useful" and do nothing with it because Anki/Supermemo are about memorisation. Using them for learning, or creating, is possible with the right mental habits. But for an irrational person, that's exactly what you want to instill! No, you need a system which fundamentally encourages those good habits.
Which is why I'm bearish about including cards that tell you to drill certain topis into Anki since the act of drilling is itself a good mental habit that many lack. Something like a curated selection of problems that require a certain aspect of rationality, spaced out to aid retention would be a good start. But
Unfortunately, there's a trade off between making the drills thorough and reducing overhead on the designer's part. If you're thinking about an empircally excellent, "no cut corners" implementation of teaching total newbs mental models, I'd suggest DARPA's Digital Tutor. As to how you'd replicate such a thing, the field of research described in here seems a good place to start.
Active IRD doesn't have anything to do with corrigibility, I guess my mind just switched off when I was writing that. Anyway, how diverse are CHAI's views on corrigibility? Could you tell me who I should talk to? Because I've already read all the published stuff on it if I'm understanding you rightly and I want to make sure that all the perspectives no this topic are covered.
Hey Rohin, I'm writing a review on everything that' been written on corrigibility so far. Do the "the off switch game", "Active Inverse Reward Design" "should robots be obedient", "incorrigibility in CIRL" as well as your reply in the Newsletter represent CHAI's current views on the subject? If not, which papers contain them?
Before or after what? If it is a passage in a book, or an article you wrote, I agree that's enough. But what about a nebulous concept you struggled to put into words? Or an idea which seemed to have suprising links to other thoughts, which you didn't pursue at the time. If you write all this stuff down explicitly, then fine. If not, and you're writing style is like mine, then it seems better to link to other cards and leave it to your future self to figure it out.
Plus, links provide the system extra information with which it can auto-suggest other relevant ideas that you weren't even aware you were considering.
I started writing a blog post in response, but that seems a bit much for a comment. Suffice to say, I agree that anti-spaced repetition is a good idea. However, it throws away the context of the notes you made, as well as showing it to you after your mind has totally forgotten about it. And as I wrote, those seem to be major factors in the value of the Zettlekasten method!
Yeah, I had some ideas concerning how to keep track of Zettlekasten as well as the right way to display graphs. Reinforcing the network is definetely a worthwhile idea. The entire point is to suggest good links, but also give you the freedom to traverse your graph. RE the hyperlinks: I agree about the worry of biases. But more than that, it seems the network should not automate link suggestion without leaving the option to create links yourself. As you say, the worth of the Zettlekasten method is largely in instilling virtuous mental hanits. What you suggested seems like it could instil laziness in the user.
I have a hypothesis about why Zettlekasten provide diminishing returns over time. A corrolary is that others should find even less value in your Zettles. Which ties into some of your points, and shows what is missing from the Zibbaldone. Plus there are some suggestions on how to correct the flaw.
One of the key benefits to the Zettlekasten is that the way you link cards reflects your psyche's understanding of the ideas. Of course other note-taking systems have this advantage. But this isn't baked into them like it is with Zettlekasten.
Traversing the Zettlekasten lets you approximate your past state of mind when working on a problem. Which lets you dredge up whatever your subconscious has come up with on the topic. The seemingly random orgranisations of yesterdays Zettles helps this along a little by providing a glimpse into yesterdays self. So when you wake up in the morning and look over yesterdays cards, a flood of relevant houghts arises. When considering where to place them, you glance through your Zettlekasten. Zettles your mind was working on bring new thoughts to the forefront. Often it will feel trivial to combine and play with all these new ideas, generating even more thoughts.
Unfortunately, past a certain size your Zettlekasten contains too many cards for your brain to be processing at once and too many potential states of mind you could have been in. We expect a gradual reduction in value of the system. And less value to others, who have a different understanding of the ideas. Something similair is true for other note taking systems. There's just a steeper decline in value.
How does this square with peoples' reports that between different note taking systems provides the same early returns they got with the old one? My guess is that the reduced scope of your notes and the context shift is what does it. Your brain realises it no longer has to keep track of all those ideas and can focus on a few relatively simple ones. Which makes the low hanging fruit all the easier to grab.
Can we fix these issues? Maybe. And I think you'd have to go digital to do it. Consider spaced repetition. A memory's strength decays exponentially with time. When reminded of them, your memory is strengthened. Spaced repition takes advantage of the fact that there are optimal timings to strengthen the memory. By analogy, we might say there are optimal times to dredge up ideas from your mind. And likewise, there may be optimal timings to link zettles. Perhaps these timings depend on the "distance" between the zettles.
A digital system could provide all this. A useful format would be the zettle to consider, and a graph surrounding it of the zettles you should link it to. When moving to adjacent zettles, you can see all the zettles it links to in order to provide relevant context.
When I bother to vote, I do take TK into account when upvoting. Karma serves a signalling purpose. But only when abs(TK) is large. If I see a post with +50 karma, I would have quite high expectations of it. If it exceeds that expectation, and I remember voting is a thing, I will upvote it. Since I almost never downvote, I can't say how much TK affects that.
If there is a mistake deep in the belief of someone
Are they not ideal Bayesians? Also, do they update based off other people's priors? It could be intresting to make them all ultra-finitists.
Mimemis land is confusing from the outside. I'm not sure how they could avoid stumbling upon "correct" forms of manipulating beliefs, if they persist for long enough and there are large enough stochastic shocks to the communities beliefs. If they also copid succesful people in the past, I feel like this would be even more likely. Unless they happen to be the equivalent of chinese rooms: just an archive of if else clauses.
Anyway, thank you for introducing this delightful style of thought experiments.
Dutch custom prevents me from recommending my own recent paper in any case
This phrase and its implications are perfect examples of problems in corrigibility. Was that intentional? If so, bravo. Your paper looks interesting, but I think I'll read the blog post first. I want a break from reading heavy papers. I wonder if the researchers would be OK with my drawing on their blog posts in the review. Would you mind?
Thanks for recommending "Reward tampering", it is much appreciated. I'll get on it after synthesising what I've read so far. Otherwise, I don't think I'll learn much.
Hey, thanks for writing all of that. My current goal is to do an up to date literature review on corrigibility, so that was a most helpful comment. I'll definitely look over your blog, since some of these papers are quite dense. Out of the paper's you recommended, is there one that stands out? Bear in mind that I've read Stewart and MIRI's papers already.
Based off what you've said in the comments, I'm guessing you'd say the various forms of corrigibility are natural abstractions. Would you say we can use the strategy you outline here to get "corrigibility by default"?
Regarding iterations, the common objection is that we're introducing optimisation pressure. So we should expect the usual alignment issues anyway. Under your theory, is this not an issue because of the sparsity of natural abstractions near human values?
This came out of the discussion you had with John Maxwell, right? Does he think this is a good presentation of his proposal?
How do we know that the unsupervised learner won't have learnt a large number of other embeddings closer to the proxy? If it has, then why should we expect human values to do well?
Some rough thoughts on the data type issue. Depending on what types the unsupervised learner provides the supervised, it may not be able to reach the proxy type by virtue of issues with NN learning processes.
Recall that tata types can be viewed as homotopic spaces, and construction of types can be viewed as generating new spaces off the old e.g. tangent spaces or path spaces etc. We can view neural nets as a type corresponding to a particular homotopic space. But getting neural nets to learn certain functions is hard. For example, learning a function which is 0 except in two sub spaces A and B. It has different values on A and B. But A and B are shaped like intelocked rings. In other words, a non-linear classification problem. So plausibly, neural nets have trouble constructing certain types from others. Maybe this depends on architecture or learning algorithm, maybe not.
If the proxy and human values have very different types, it may be the case that the supervised learner won't be able to get from one type to another. Supposing the unsupervised learner presents it with types "reachable" from human values, then the proxy which optimises performance on the data set is just unavailable to the system even though its relatively simple in comparison.
Because of this, checking which simple homotopies neural nets can move between would be useful. Depending on the results, we could use this as an arguement that unsupervised NNs will never embed the human values type because we've found out it has some simple properties it won't be able to construct de novo. Unless we do something like feed the unsupervised learner human biases/start with an EM and modify it.
Sometimes the cluster in the map a preference is pointing at involves another preference. Which provides a natural resolution mechanism. What happens when there's two preferences, I'm unsure. I suppose it depends on how your map changes. In which case, I think you should focus on how to make purity coherent you should start off with some "simple" map and various "simple" changes in the map. To make purity coherent relative to your map is both computationally hard, and empathetically hard.
Side-note: It would be interesting to see which resolution mechanisms produce the most varied shifts in preferences for boundedly rational agents with complex utility functions.
Side-note^2: Stuart, I'm writing a review of all the work done on corrigibility. Would you mind if I asked you some questions on your contributions?
Second order logic can also arithmatise sentences, and also has fixed points. So the usual proofs carry over about the 1st incompleteness theorem. But there's an easier way to see this. There can't be any computable procedure to check if a second order sentence is valid or not, because if there was we could check if PA->Theorem and therefore decide Peano Arithmetic and therefore the Halting problem.
You can use them for practicing techniques. Have cards which say: use X technique today. You need to actually do that rather than spend 1 minute thinking about it. Which is suprisingly hard. I suspect it works much better if you have some system to guide you in generating new ideas e.g. Zettlekasten. I suspect it could be even better if the method was incorporated into the software itself. Maybe create links between cards as well, and have some repititions where you explore the graph surrounding a card?
I'm also unsure if the spaced repition timings are optimal for drilling techniques. Does anyone know the relevant literature?