Yeah, so there are four options, (B1∧B2)∨(¬B1∧B2)∨(B1∧¬B2)∨(¬B1∧¬B2). These will have the ratios 0.99×0.9999:0.01×0.9999:0.99×0.0001:0.01×0.0001. By D4 we'd eliminate the first one. The remaining odds ratios are normalized to be something around 0:0.9901:0.0098:0.0001. I.e. given that the agent takes $5 instead of $10, it is pretty sure that it's taken the smaller one for some reason, gives a tiny probability of it having miscalculated which of $5 and $10 are larger, and a really really small probability that both are true.
In fact were it to reason further it would see that the fourth option is also impossible, we have an XOR type situation on our hands. Then it would end up with odds around 0:0.9902:0.0098:0.
That last bit was assuming that it doesn't have uncertainty about its own reasoning capability.
Ideally it would also consider that D4 might be incorrect , and still assign some tiny ϵ of probability (10−10 for example, the point is it should be pretty small to both the first and fourth options giving 10−10:0.9902:0.0098:10−10. It wouldn't really consider them for the purposes of making predictions, but to avoid logical explosions, we never assign a "true" zero.
We only ignore the proportion of that probability mass while thinking about the counterfactual world in which $5 is taken. It's just treated as we would ignore the probability mass previously assigned to anything we now know to be impossible.
I used "ignore" to emphasize that the agent is not updating either of it's beliefs about B1 or B2 based on C1. It's just reasoning in a "sandboxed" counterfactual world where it now assigns ~99% probability to it taking the lower of $5 and $10 and ~1% chance to $5 being larger than $10. From within the C1 universe it looks like a standard (albeit very strong) bayesian update.
When it stops considering C1, it "goes back to" having strong beliefs that both B1 and B2 are true.
I suspect a key piece which is missing from our definition of knowledge is a strong (enough) formalisation of the notion of computational work. In a lot of these cases the knowledge exists as a sort of crystallized computation.
The difference between a chemistry textbook and a rock is that building a predictive model of chemistry by reading a textbook requires much less computational effort than by studying the reactions in a rock.
A ship plotting a shoreline requires very little computational work to extract that map.
The computer with a camera requires very little work to extract the data as to how the room looked. Studying the computer case to get that information requires a lot of computational work.
I don't know how you'd check if a system was accumulating knowledge based on this, but perhaps doing a lot of computational work and storing the results as information might be evidence.
I think I've explained myself poorly, I meant to use the phrase social reward/punishment to refer exclusively to things forming friendships and giving people status, which is doled out differently to "physical government punishment". Saddam Hussein was probably a bad example as he is also someone who would clearly also receive the latter.
Getting rid of guilt and shame as motivators of people is definitely admirable, but still leaves a moral/social question. Goodness or Badness of a person isn't just an internal concept for people to judge themselves by, it's also a handle for social reward or punishment to be doled out.
I wouldn't want to be friends with Saddam Hussein, or even a deadbeat parent who neglects the things they "should" do for their family. This also seems to be true regardless of whether my social punishment or reward has the ability to change these people's behaviour. But what about being friends with someone who has a billion dollars but refuses to give any of that to charity? What if they only have a million dollars? What if they have a reasonably comfortable life but not much spare income?
Clearly the current levels of social reward/punishment are off (billionaire philanthropy etc.) so there seems an obvious direction to push social norms in if possible. But this leaves the question of where the norms should end up.
A fairly vague idea for corrigible motivation which I've been toying with has been something along the lines of:
1: Have the AI model human behaviour
2: Have this model split the causal nodes governing human behaviour into three boxes: Values, Knowledge and Other Stuff. (With other stuff being things like random impulses which cause behaviour, revealed but not endorsed preferences etc.) This is the difficult bit, I think using tools like psychology/neurology/evolution we can get around the no free lunch theorems.
3: Have the model keep the values, improve on the knowledge, and throw out the other stuff.
4: Enforce a reflective consistency thing. I don't know exactly how this would work but something along the lines of "Re-running the algorithm with oversight from the output algorithm shouldn't lead to a different output algorithm". This is also possibly difficult, if something ends up in the "Values" it's not clear whether it might get stuck there, so local attractors of values are a problem.
This is something like inverse reinforcement learning but with an enforced prior on humans not being fully rational or strategic. It also might require an architecture which is good at breaking down models into legible gears, which NNs often fail at unless we spend a lot of time studying the resulting NN.
Using a pointer to human values rather than human values itself suffers from issues of the AI resisting attempts to re-orient the pointer, which is what the self-consistency parts of this method are there for.
This approach was mostly borne out of considerations of the "The AI knows we will fight it and therefore knows we must have messed up its alignment but doesn't care because we messed up its alignment" situation. My hope is also that it can leverage the human-modelling parts of the AI to our advantage. Issues of modelling humans do fall prey to "mind-crime" though so we ought to be careful there too.
[APPRENTICE] Any AI alignment (or related) stuff. I have an average of maybe 5-10 hours of time per week to give to something over the next ~10 months while I finish a chemistry master's degree. I have decent experience programming (a few languages as part of my degree) and some in pure maths (BMO1/2 which are high school-level national Olympiads in the UK).
I'd be interested in anything that would let me get involved with or get a feel for the field in the way just reading random papers and posting here doesn't.
There's a court at my university accommodation that people who aren't Fellows of the college aren't allowed on, it's a pretty medium-sized square of mown grass. One of my friends said she was "morally opposed" to this (on biodiversity grounds, if the space wasn't being used for people it should be used for nature).
And I couldn't help but think, how tiring it would be to have a moral-feeling-detector this strong. How could one possibly cope with hearing about burglaries, or North Korea, or astronomical waste.
I've been aware of scope insensitivity for a long time now but, this just really put things in perspective in a visceral way for me.
What do you think about the ability to predict age to surprising accuracy using ~350 DNA methylation sites? Sadly I can't work out if the author has considered looking at the sites to see much of what the methylation is changing the transcription of, other than the PGCT genes which based on a cursory search seem to be linked by being targets of a specific process rather than doing a specific thing. Again this makes it unclear whether this is upstream or downstream of ageing.
Mitochondrial mutation accumulation seems to be a big thing, mitochondrial dysfunction is implicated in Alzheimer's, and might be linked to a bunch of signalling around epoxyeicosatrienoic acids. This is confused by the fact that EETs might induce mitogenesis but are also also anti-inflammatory and regulate the vascular system (?!) because biochemistry just like this sometimes.
Oddly enough, mitophagy also seems to be a potential target of anti-ageing drugs. Possibly the (selective) turnover of mitochondria can be used to remove the most dysfunctional ones? Perhaps the two processes might be coupled in such a way that speeding up one speeds up the other. Also some people have suggested combining these before but then as far as I can tell just didn't bother to check if it actually worked (!?!?)
Whether mitochondrial mutations are upstream or downstream of other things is unclear. I think Nick Lane has suggested a mechanism by which mitochondrial mutations could actually accumulate faster than by chance (definitely in "The Vital Question" but possibly elsewhere) but I don't know if it has been tested.
(Posting as a top-level comment since I have a few points to say but the stuff about DNA methylation is sort of in response to comments below)
Edit: It appears I have basically just said what Gwern said but with graphs.
I can't work out what they're doing with their statistics to get the numbers they claim, because they don't show their working. I have however seen a very similar article which used the same data. In that case I believe the error came from assuming a normal distribution of intelligence in the overall population and in the faculty population.
Here are a couple of graphs I have thrown together to demonstrate what I suspect has happened:
The first graph shows the distributions of the overall population (mean 100, s.d. 15) and faculty population (mean 126, s.d. 6.3) and the ratio between them (divided by ~60 so that the maximum ratio is 1 and it fits on the same graph). The ratio itself seems to peak around 133 like the article stated. So what's going on?
If we scale the two distributions so they line up between 130 and 150, we see that above around 133, the faculty distribution falls off faster. This is because of the smaller standard deviation. Normal distributions have the probability density function
for standard deviation σ and mean μ.
At values of x far from the mean, the −x2 term inside the exponential dominates the rest, and so 1/σ2 dominates the value of the distribution, which is why the distribution of faculty drops off faster than the general population.
The mistake made in this case is assuming a normal distribution of faculty staff. Imagine a ratio function which looks like the one on the graph above, but above 133 remains constant and does not decrease. (This should match our priors on how university faculty hiring works). The distribution of faculty staff would look the same up to an iq of 133, but then would drop off slightly more slowly. This would be undetectable except by surveying an enormous number of faculty members. Normal distributions are a decent prior but they are not the only distributions which exist, especially when the distribution is created by filtering.
I don't know if this is what the authors of this particular article have done, because they won't tell us, but I suspect this is the sort of error they have made.
I love the depth you're going into with this sequence, and I am very keen to read more about this. I wonder if the word "knowledge" is not ideal. It seems like the examples you've given, while all clearly "knowledge" could correspond to different things. Possibly the human-understandable concept of "knowledge" is tied up with lots of agent-y optimizer-y things which make it more difficult to describe in a human-comfortable way on the level of physics (or maybe it's totally possible and you're going to prove me dead-wrong in the next few posts!)
My other thought is that knowledge is stable to small perturbations (equivalently: small amounts of uncertainty) of the initial knowledge-accumulating region: a rock on the moon moved a couple of atoms to the left would not get the same mutual information with the history of humanity, but a ship moved a couple of atoms to the left would make the same map of the coastline.
This brings to mind the idea of abstractions as things which are not "wiped out" by noise or uncertainty between a system and an observer. Lots of examples I can think of as knowledge seem to be representations of abstractions but so do some counterexamples (it's possible - minus quantumness - to have knowledge about the position of an atom at a certain time).
Other systems which are stable to small perturbations of the starting configuration are optimizers. I have written about optimizers previously using an information-theoretic point of view (though before realizing I only have a slippery grasp on the concept of knowledge). Is a knowledge-accumulating algorithm simply a special class of optimization algorithm? Backpropagation definitely seems to be both, so there's probably significant overlap, but maybe there are some counter examples I haven't thought of yet.
This sounds very interesting and I'd be very excited to hear the results of your work. I have a lot of random disorganized thoughts on the matter which I'll lay out here in case some of them are valuable.
I wonder if, for ordinary neural networks, something like bottlenecking the network at one or more of the layers would force abstraction.
This makes me think of autoencoders trying to compress information, which leads to an interesting question: is there a general way to "translate" between autoencoders trained on the same dataset? By this I mean having a simple function (like an individual matrix) between the first half of one autoencoder and the second half of another. If there is this would give evidence that they are using the same abstractions.
This also reminds me of that essay about the tails coming apart, which suggests to me that the abstractions a system will use will depend on the dataset, the outcome being predicted, and also perhaps the size and capabilities of the model (a bigger model might make more accurate predictions by splitting "grip strength" and "arm strength" apart but a smaller model might have to combine them). This seems to be related to the dimensionality points you've mentioned, where the specific abstractions used depend on the number of abstractions a model is allowed to use.
This makes me think of Principal Component Analysis in statistics, which has a similar vibe to the natural abstraction hypothesis in that it involves compressing statistical information onto a smaller number of dimensions, exactly how many dimensions depends on the statistical methods you are using.
In the real world, the classic examples of abstractions are stuff like "temperature of a gas" which involves throwing away something like 180 bits of information per individual gas molecule (if my memory of statistical mechanics is correct), while still letting you predict internal energy, pressure, how it will flow. Abstractions for other systems are unlikely to be as clear-cut: we can probably not compress 10^25ish bits of information into a small number of bits of information, for the average system. For exampe I think that about 8 characteristics of different plant species (seed size, height, leaf thickness etc.) can be compressed onto two dimensions which contain about 80% of the variation in the data, but it's not immediately clear why we ought to stop there, or indeed use a second dimension when one would presumably contain >40% of the variation.
Finally I suspect that the name "abstraction thermometer" is underselling the capabilities of what you describe. Finding all the abstractions of any given system is incredibly powerful. For example one set of abstractions which would predict the progression of a disease would be the set of the pathogens, proteins, and small molecules which can cause that disease. If the natural abstraction hypothesis is true (and in cases like this it would seem to be) then an "abstraction thermometer" is in this case able to find out everything about the biological system in question, and would therefore give us an incredible amount of knowledge.
I more meant "keeping around cognitive machinery which is capable of this" without making use of it. Given that wild wolves use (relatively) simple hunting strategies which do not seem to rely on much communication, there doesn't seem to be much need to have a brain capable of communicating relatively abstract thoughts. That doesn't seem to affect your core argument though
Good point about autistic humans who can't learn sign language though, I hadn't considered that. I guess my model of autism was more like:
"Autism affects the brain in lots of different ways which is able to knock out specific abilities (like speech) without knocking out other abilities (like the capability to have and communicate complex thoughts, which would not have evolved in an animal without speech)"
than drawing on some amount of general purpose computing behind each one. I haven't studied autism enough to know if this is correct.
Perhaps I am being too confident in it. I didn't have time to cite sources but the biology of AD seems to be a microcosm of the biology of ageing overall, and EET-A has shown a bunch of random unconnected benefits in mouse models (regenerating blood vessels after a heart attack etc.).
I do not know how I would obtain it (one would probably need free access to a chemical lab to synthesize it, just looking at EET and other analogues they seem relatively synthesize-able) as for dosing I would dose at comparable ppm levels to the rodent models.
I did 3 separate parts partially because I thought they seemed rather unconnected, and mostly because I was concerned about posting a very long and cumbersome post. Now that I look at it, it didn't really need to be three parts at all, it just felt a lot longer when I was writing it.
You may be right there, and I would certainly be pleased to hear of any projects like this.
I believe the model could work without it, but AD seems to be an attractive state that many human brains fall into with various genetic associations. The main evidence for it is that mutations in Aβ precursor protein can have very high penetrance (i.e. everyone who has the mutation develops early-onset AD (https://link.springer.com/content/pdf/10.1007/s11920-000-0061-z.pdf). You are definitely right that I was too specific in my assessment of exactly how Aβ plaques cause a feedback mechanism, thanks for catching that. I have amended the post to fix that.
Lastly what do you mean specifically by prion-like? Amyloid fibrils are a prion-like structure in the sense that the growth of existing fibres is much, much more favourable than the formation of new ones. (this leads to exponential growth as long fibres break apart leaving new open ends for new protein molecules to add) However Aβ plaque formation was reversed in the mice given EET-A which means that at some physiologically achievable concentrations of free Aβ, the amyloids dissipate due to un-misfolding of Aβ (at least in mouse models). This would suggest that the cause of AD is various factors pushing the brain over a threshold where Aβ can accumulate. (which could be metabolic, or mutations which make Aβ more likely to accumulate) This is in contrast to "classical" prions where the original misfolded protein is able to continuously cause the misfolding of normal protein at normal physiological conditions, and the only barrier to a prion disease occurring is that no misfolded protein is present.
The paper which you sent also postulates a feedback loop between Aβ and Tau which is interesting. I had considered the Aβ feedback into the earlier mechanism as an afterthought but perhaps it is more important than my model suggested.
If this turns out to be basically true, then what about wild wolves? I think there is a strong case that the capacity for this sort of communication to have been bred into domestic dogs as a result of humans selecting for e.g. better overall intelligence and ability to understand human commands.
Another option is that wild wolf packs have the capacity for this sort of communication but don't (unless we've simply not noticed it) and this seems much less likely to me, for the sole reason that being able to communicate in this way would give a very large advantage to wild wolves. It would be odd if they kept the cognitive machinery for this around (and using up resources for the rest of their bodies) without making use of it.
There is a final option that developing a language is like discovering a technology, and once a language exists it is much easier to teach it to others than it originally was to develop the language. This would be very interesting to investigate, perhaps languages are like a sort of software on the brain, which are able to convert various processes (association learning, pattern recognition, episodic memory) into something more structured which allows for easier reasoning. This is getting very Sapir-Whorf hypothesis-ey and as someone who is not a linguist or anthropologist I can't really say if this is even reasonable or not.
As an aside the second option reminds me of the experiments to try and teach chimpanzees to use human sign language. (which were considered at the time to be a great success but were less than stellar) Chimpanzees in the wild have a very rudimentary form of sign language but have not developed it into something like a human language despite the potential advantages (either in social conflicts or in hunting/gathering food etc.). This to me suggests that chimpanzees probably don't have the capacity for more complex sign languages than they already have.
Good point, perhaps my view is skewed as I do almost all of my learning and explaining in technical fields (mostly chemistry and biology) and with people who are on a similar knowledge level to me. I can imagine that in a situation of trust but little knowledge (e.g. I am explaining my work to a family member) or in a different field to mine they would be more useful.
I think my assessment here may have been too focussed on a specific subset of analogy use, which I did not properly specify in the post.
Edit to clarify: I still believe intuition pumps in philosophy are a bad sort of analogy in that they are too easily manipulated to serve the philosophical interests of the speaker
Red or green weapons i.e. swords, longswords, battleaxes (not axes or hammers though) seem to have a mana scaling dependent on their +n modifier (although green weapons have a drop-off at higher modifiers. It appears to be a clear enough pattern that it's not a statistical artefact. I've not found anything else about the tools or jewellery though.