Posts
Comments
A paper I'm doing mech interp on used a random split when the dataset they used already has a non-random canonical split. They also validated with their test data (the dataset has a three way split) and used the original BERT architecture (sinusoidal embeddings which are added to feedforward, post-norming, no MuP) in a paper that came out in 2024. Training batch size is so small it can be 4xed and still fit on my 16GB GPU. People trying to get into ML from the science end have got no idea what they're doing. It was published in Bioinformatics.
sellers auction several very similar lots in quick succession and then never auction again
This is also extremely common in biochem datasets. You'll get results in groups of very similar molecules, and families of very similar protein structures. If you do a random train/test split your model will look very good but actually just be picking up on coarse features.
I think the LessWrong community and particularly the LessWrong elites are probably too skilled for these games. We need a harder game. After checking the diplomatic channel as a civilian I was pretty convinced that there were going to be no nukes fired, and I ignored the rest of the game based on that. I also think the answer "don't nuke them" is too deeply-engrained in our collective psyche for a literal Petrov Day ritual to work like this. It's fun as a practice of ritually-not-destroying-the-world though.
Isn't Les Mis set in the second French Revolution (1815 according to wikipedia) not the one that led to the Reign of Terror (which was in the 1790s)?
I have an old hypothesis about this which I might finally get to see tested. The idea is that the feedforward networks of a transformer create little attractor basins. Reasoning is twofold: the QK-circuit only passes very limited information to the OV circuit as to what information is present in other streams, which introduces noise into the residual stream during attention layers. Seeing this, I guess that another reason might be due to inferring concepts from limited information:
Consider that the prompts "The German physicist with the wacky hair is called" and "General relativity was first laid out by" will both lead to "Albert Einstein". Both of them will likely land in different parts of an attractor basin which will converge.
You can measure which parts of the network are doing the compression using differential optimization, in which we take d[OUTPUT]/d[INPUT] as normal, and compare to d[OUTPUT]/d[INPUT] when the activations of part of the network are "frozen". Moving from one region to another you'd see a positive value while in one basin, a large negative value at the border, and then another positive value in the next region.
Yeah, I agree we need improvement. I don't know how many people it's important to reach, but I am willing to believe you that this will hit maybe 10%. I expect the 10% to be people with above-average impact on the future, but I don't know what %age of people is enough.
90% is an extremely ambitious goal. I would be surprised if 90% of the population can be reliably convinced by logical arguments in general.
I've posted it there. Had to use a linkpost because I didn't have an existing account there and you can't crosspost without 100 karma (presumably to prevent spam) and you can't funge LW karma for EAF karma.
Only after seeing the headline success vs test-time-compute figure did I bother to check it against my best estimates of how this sort of thing should scale. If we assume:
- A set of questions of increasing difficulty (in this case 100), such that:
- The probability of getting question correct on a given "run" is an s-curve like for constants and
- The model does "runs"
- If any are correct, the model finds the correct answer 100% of the time
- gives a score of 20/100
Then, depending on ( is is uniquely defined by in this case), we get the following chance of success vs question difficulty rank curves:
Higher values of make it look like a sharper "cutoff", i.e. more questions are correct ~100% of the time, but more are wrong ~100% of the time. Lower values of make the curve less sharp, so the easier questions are gotten wrong more often, and the harder questions are gotten right more often.
Which gives the following best-of-N sample curves, which are roughly linear in in the region between 20/100 and 80/100. The smaller the value of , the steeper the curve.
Since the headline figure spans around 2 orders of magnitude compute, the model on appears to be performing on AIMES similarly to a best-of-N sampling on the case.
If we allow the model to split the task up into subtasks (assuming this creates no overhead and each subtask's solution can be verified independently and accurately) then we get a steeper gradient roughly proportional to , and a small amount of curvature.
Of course this is unreasonable, since this requires correctly identifying the shortest path to success with independently-verifiable subtasks. In reality, we would expect the model to use extra compute on dead-end subtasks (for example, when doing a mathematical proof, verifying a correct statement which doesn't actually get you closer to the answer, or when coding, correctly writing a function which is not actually useful for the final product) so performance scaling from breaking up a task will almost certainly be a bit worse than this.
Whether or not the model is literally doing best-of-N sampling at inference time (probably it's doing something at least a bit more complex) it seems like it scales similarly to best-of-N under these conditions.
Overall it looked a lot like other arguments, so that’s a bit of a blow to the model where e.g. we can communicate somewhat adequately, ‘arguments’ are more compelling than random noise, and this can be recognized by the public.
Did you just ask people "how compelling did you find this argument" because this is a pretty good argument that AI will contribute to music production. I would rate it highly on compelling, just not a compelling argument for X-risk.
I was surprised by the "expert opinion" case causing people to lower their P(doom), then I saw the argument itself suggests to people that experts have a P(doom) of around 5%. If most people give a number > 5% (as in the open response and slider cases) then of course they're going to update downwards on average!
I would be interested to see what a specific expert opinion (e.g. Geoffrey Hinton, Yoshua Bengio, Elon Musk, Yann LeCunn as a negative control) would have, given that those individuals have more extreme P(dooms)
My update on the choice of measurement is that "convincingness" is effectively meaningless.
I think the values of update probability are likely to be meaningful. The top two arguments are both very similar, as they play off of humans misusing AI (which I also find to be the most compelling argument to individuals), then there is a cluster relating to talking about how powerful AI is or could be and how it could compete with people.
"Also, to engage in a little bit of mind-reading, Zuckerberg sees no enemies in China, only in OpenAI et al. through the "safety" regulation they can lobby the US government to enact."
This is a reasonable position, apart from the fact that it is at odds with the situation on the ground. OpenAI are not lobbying the government in favour of SB 1047, nor are Anthropic or Google (afaik). It's possible that in future they might, but other than Anthropic I think this is very unlikely.
For me, the idea of large AI companies using X-risk fears to shut down competition falls into the same category as the idea that large AI companies are using X-risk fears to hype their products. I think they are both interesting schemes that AI companies might be using in worlds that are not this one.
This is a stronger argument than I first thought it was. You're right, and I think I have underestimated the utility of genuine ownership of tools like fine-tunes in general. I would imagine it goes api < cloud hosting < local hosting in terms of this stuff, and with regular local backups (what's a few TB here or there) then it would be feasible to protect your cloud hosted 405B system from most takedowns as long as you can find a new cloud provider. I'm under the impression that the vast majority of 405B users will be using cloud providers, is that correct?
Epigenetic cancers are super interesting, thanks for adding this! I vaguely remember hearing that there were some incredibly promising treatments for them, though I've not heard anything for the past five or ten years on that. Importantly for this post, they also fill out the (rare!) examples of mutation-free cancers that we've seen, while fitting comfortably within the DNA paradigm.
If everyone affirms this is indeed all the major arguments for open weights, then I can at some point soon produce a polished full version as a post and refer back to it, and consider the matter closed until someone comes up with new arguments.
Feels like the vast majority of the benifits Zuck touted could be achieved with:
1. A cheap, permissive API that allows finetuning, some other stuff. If Meta really want people to be able to do things cheaply, presumably they can offer it far far cheaper than almost anyone could do it themself without directly losing money.
2. A few partnerships with research groups to study it, since not many people have enough resources that doing research on a 405B model is optimal, and don't already have their own.
3. A basic pledge (that is actually followed) to not delete everyone's data, finetunes, etc. to deal with concerns about "ownership"
I assume there are other (sometimes NSFW) benefits he doesn't want to mention, because the reason the above options don't allow those activities is that Meta loses reputation from being associated with them even if they're not actually harmful.
Are there actually a hundred groups who might usefully study a 405B-parameter model, so Meta couldn't efficiently partner with all of them? Maybe with GPUs getting cheaper there will be a few projects on it in the next MATS stream? I kinda suspect that the research groups who get the most out of it will actually be the interpretability/alignment teams at Google and Anthropic, since they have the resources to run big experiments on Llama to compare to Gemini/Claude!
If you're willing to take my rude and unfiltered response (and not complain about it) here it is:
This is very fucking stupid.
Otherwise (written in about half an hour):
- Fungal infections would lead to the vast majority of cancers being in skin, gut, lung i.e. exposed tissue. These are relatively common, but this does not explain the high prevalence of breast and prostate cancers. It also doesn't explain why different cancers have such different prognoses, etc.
- Why do different cancer subtypes change in prevalence over the course of a person's life if they're tied to infection?
https://www.cancerresearchuk.org/health-professional/cancer-statistics/incidence/age#heading-One - Around half of cancers have a mutation in p53, which is involved in preserving the genome. Elephants have multiple copies of p53 and very rarely get cancer. People with de novo mutations in p53 get loads of cancer. The random spread of DNA damage is downstream of the DNA damage causing cancer: once p53 is deactivated (or the genome is otherwise unguarded) mutations can accumulate all over the genome, drowning out the causal ones.
https://en.wikipedia.org/wiki/P53 - If it was infection-based, then you'd expect immunocompromised patients to get more of the common types of cancer. Instead they get super weird exotic cancers not found in people with normal immune systems.
https://www.hopkinsmedicine.org/health/conditions-and-diseases/hiv-and-aids/aidsrelated-malignancies - Chemotherapy, does work? I don't know what to say on this one, chemotherapy works, are all the RCTs which show it works supposed to be fake? Do I need to cite them:
https://pubmed.ncbi.nlm.nih.gov/30629708/
https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(23)00285-4/fulltext
https://www.redjournal.org/article/S0360-3016(07)00996-0/fulltext
I feel like a post which uncritically repeats someone's recommendation to not take chemotherapy has the potential to harm readers. You should at least add an epistemic status warning readers they might become stupider reading this. - Antifungals are relatively easy to get ahold of. Why hasn't this man managed to run a single successful trial? Moreover, cryptococcal meningitis is a fungal diseas which is fatal if untreated and, from the CDC:
Each year, an estimated 152,000 cases of cryptococcal meningitis occur among people living with HIV worldwide. Among those cases, an estimated 112,000 deaths occur, the majority of which occur in sub-Saharan Africa.
Which implies 40,000 people are successfully treated with strong antifungals every single year. These are HIV patients, who are more likely to get cancer and under this theory would be more likely than anyone else to have fungal-induced cancer. How come nobody has pointed out the miraculous curing of hundreds or thousands of patients by now? - Scientific consensus is an extremely powerful tool.
https://slatestarcodex.com/2017/04/17/learning-to-love-scientific-consensus/
I think the fungal theory is basically completely wrong. Perhaps some obscure couple of percent of cancers are caused by fungi. I cannot disprove this, though I think it's very unlikely.
Ooh boy this is a fun question:
For temperature reasons, a complete Dyson sphere is likely to be built outside the earth, as the energy output of the sun would force one at 1 A.U. to be 393K = 119 C. I assume the AI would prefer not to run all of its components this hot. A sphere like that would cook us like an oven unless the heat dissipating systems somehow don't radiate any energy back inwards (which is probably impossible).
A Dyson swarm might well be built at a mixture of inside and outside the earth's orbit. In that case the best candidate is to disassemble mercury, using solar energy to power electrolysis to turn the crust into metals, send up satellites to catch more sunlight, and focus that back down to the surface.
Mercury orbits at 60 million km from the sun. This means a circumference of 360 million km. The sun is 1.2 million km across, but because it's at 0.38 au from the sun, a band which blocks out the sun for the earth entirely would only need to be 0.8 million km. This gives a total surface area of 290e12 square kilometers to block out the sun entirely. Something like a Dyson belt.
If the belt is 1 m thick on average, this gives it a total volume of 290e18 cubic meters. Mercury has a volume of 60 billion cubic km = 60e18 cubic meters. This would blot out approximately 1/5 of the sun's radiation.
To put things in perspective, Mars is kinda maybe almost habitable with a lot of effort and gets less than 1/2 of the sun's radiation. I would make a wild guess that with 80% of the solar radiation we could scrape by with immense casualties due to massive decreases in agricultural yield. Temperature is somewhat tractable due to our ability to pump a bunch of sulfur hexafluoride into the atmosphere to heat things up.
As a caveat, I would suggest that if the AI is "nice" enough to spare Earth, it's likely to be nice enough to beam some reconstituted sunlight over to us. A priori I would say the niceness window for "unwilling to murder us while on earth, and we pose a direct threat, but unwilling to suffer the trivial cost of keeping the lights on" is extremely narrow.
One easy way to decompose the OV map would be to generate two SAEs for the residual stream before and after the attention layer, and then just generate a matrix of maps between SAE features by the multiplication:
To get the value of the connection between feature in SAE and feature in SAE 2.
Similarly, you could look at the features in SAE and check how they attend to one another using this system. When working with transcoders in attention-free resnets, I've been able to totally decompose the model into a stack of transcoders, then throw away the original model.
Seems we are on the cusp of being able to totally decompose an entire transformer into sparse features and linear maps between them. This is incredibly impressive work.
We might also expect these circuits to take into account relative position rather than absolute position, especially using sinusoidal rather than learned positional encodings.
An interesting approach would be to encode the key and query values in a way that deliberately removes positional dependence (for example, run the base model twice with randomly offset positional encodings, train the key/query value to approximate one encoding from the other) then incorporate a relative positional dependence into the learned large QK pair dictionary.
This applies doubly if you're in a high-leverage position, which could mean a position of "power" or just near to an ambivalent "powerful" person. If your boss is vaguely thinking of buying a LLM subscription for their team, a quick "By the way, OpenAI isn't a great company, maybe we should consider [XYZ] instead..." is a good idea.
This should also go through a cost-benefit analysis, but I think it's more likely to pass than the typical individual user.
I've found that too. Taking and both seem reasonable to me, but it feels weird to me to take for cross-entropy losses, since that's already log-ish. In my case the plots were generally worse to look at than the ones I showed above when scanning over a very broad range of coefficients (and therefore values).
Is there a solution to avoid constraining the norms of the columns of to be 1? Anthropic report better results when letting it be unconstrained. I've tried not constraining it and allowing it to vary which actually gives a slight speedup in performance. This also allows me to avoid an awkward backward hook. Perhaps most of the shrinking effect gets absorbed by the term?
I agree with this point when it comes to technical discussions. I would like to add the caveat that when talking to a total amateur, the sentence:
AI is like biorisk more than it is like than ordinary tech, therefore we need stricter safety regulations and limits on what people can create at all.
Is the fastest way I've found to transmit information. Maybe 30% of the entire AI risk case can be delivered in the first four words.
I'd be most interested in detecting hydroperoxides, which is easier than detecting trans fats. I don't know how soluble a lipid hydroperoxide is in hexane, but isopropanol-hexane mixtures are often used for lipid extracts and would probably work better.
Evaporation could probably be done relatively safely by just leaving the extract at room temperature (I would definitely not advise heating the mixture at all) but you'd need good ventilation, preferably an outdoor space.
I think commercial LCMS/GCMS services are generally available to people in the USA/UK, and these would probably be the gold standard for detecting various hydroperoxides. I wouldn't trust IR spectroscopy to distinguish the hydroperoxides from other OH-group containing contaminants when you're working with a system as complicated as a box of french fries.
As far as I'm aware nobody claims trans fats aren't bad.
See comment by Gilch, allegedly Vaccenic acid isn't harmful. The particular trans-fats produced by isomerization of oleic and linoleic acid, however, probably are harmful. Elaidic acid for example is a major trans-fat component in margarines, which were banned.
Yeah i was unaware of vaccenic acid. I've edited the post to clarify.
I've also realized that it might explain the anomalous (i.e. after adjusting for confounders) effects of living at higher altitude. The lower the atmospheric pressure, the less oxygen available to oxidize the PUFAs. Of course some foods will be imported already full of oxidized FAs and that will be too late, but presumably a McDonalds deep fryer in Colorado Springs is producing less PUFAs/hour than a correspondingly-hot one in San Francisco.
This feels too crazy to put in the original post but it's certainly interesting.
That post is part of what spurred this one
I uhh, didn't see that. Odd coincidence! I've added a link and will consider what added value I can bring from my perspective.
Thanks for the feedback. There's a condition which I assumed when writing this which I have realized is much stronger than I originally thought, and I think I should've devoted more time to thinking about its implications.
When I mentioned "no information being lost", what I meant is that in the interaction , each value (where is the domain of ) corresponds to only one value of . In terms of FFS, this means that each variable must be the maximally fine partition of the base set which is possible with that variable's set of factors.
Under these conditions, I am pretty sure that
I was thinking about causality in terms of forced directional arrows in Bayes nets, rather than in terms of d-separation. I don't think your example as written is helpful because Bayes nets rely on the independence of variables to do causal inference: is equivalent to .
It's more important to think about cases like where causality can be inferred. If we change this to by adding noise then we still get a distribution satisfying (as and are still independent).
Even if we did have other nodes forcing (such as a node which is parent to , and another node which is parent to ), then I still don't think adding noise lets us swap the orders round.
On the other hand, there are certainly issues in Bayes nets of more elements, particularly the "diamond-shaped" net with arrows . Here adding noise does prevent effective temporal inference, since, if and are no longer d-separated by , we cannot prove from correlations alone that no information goes between them through .
I had forgotten about OEIS! Anyway Ithink the actual number might be 1577 rather than 1617 (this also gives no answers). I was only assuming agnosticism over factors in the overlap region if all pairs had factors, but I think that is missing some examples. My current guess is that any overlap region like should be agnostic iff all of the overlap regions "surrounding" it in the Venn diagram (, , , ) in this situation either have a factor present or agnostic. This gives the series 1, 2, 15, 1577, 3397521 (my computer has not spat out the next element). This also gives nothing on the OEIS.
My reasoning for this condition is that we should be able to "remove" an observable from the system without trouble. If we have an agnosticism, in the intersection , then we can only remove observable if this doesn't cause trouble for the new intersection , which is only true if we already have an factor in (or are agnostic about it).
I know very, very little about category theory, but some of this work regarding natural latents seem to absolutely smack of it. There seems to be a fairly important three-way relationship between causal models, finite factored sets, and Bayes nets.
To be precise, any causal model consisting of root sets , downstream sets , and functions mapping sets to downstream sets like must, when equipped with a set of independent probability distributions over B, create a joint probability distribution compatible with the Bayes net that's isomorphic to the causal model in the obvious way. (So in the previous example, there would be arrows from only , , and to ) The proof of this seems almost trivial but I don't trust myself not to balls it up somehow when working with probability theory notation.
In the resulting Bayes net, one "minimal" natural latent which conditionally separates and is just the probabilities over just the root elements from which both and depend on. It might be possible to show that this "minimal" construction of satisfies a universal property, and so other which is also "minimal" in this way must be isomorphic to .
I think the position of the ball is in V, since the players are responding to the position of the ball by forcing it towards the goal. It's difficult to predict the long-term position of the ball based on where it is now. The position of the opponent's goal would be an example of something in U for both teams. In this case both team's utility-functions contain a robust pointer to the goal's position.
I'd go for:
Reinforcement learning agents do two sorts of planning. One is the application of the dynamic (world-modelling) network and using a Monte Carlo tree search (or something like it) over explicitly-represented world states. The other is implicit in the future-reward-estimate function. You need to have as much planning as possible be of the first type:
- It's much more supervisable. An explicitly-represented world state is more interrogable than the inner workings of a future-reward-estimate.
- It's less susceptible to value-leaking. By this I mean issues in alignment which arise from instrumentally-valuable (i.e. not directly part of the reward function) goals leaking into the future-reward-estimate.
- You can also turn down the depth on the tree search. If the agent literally can't plan beyond a dozen steps ahead it can't be deceptively aligned.
I would question the framing of mental subagents as "mesa optimizers" here. This sneaks in an important assumption: namely that they are optimizing anything. I think the general view of "humans are made of a bunch of different subsystems which use common symbols to talk to one another" has some merit, but I think this post ascribes a lot more agency to these subsystems than I would. I view most of the subagents of human minds as mechanistically relatively simple.
For example, I might reframe a lot of the elements of talking about the unattainable "object of desire" in the following way:
1. Human minds have a reward system which rewards thinking about "good" things we don't have (or else we couldn't ever do things)
2. Human thoughts ping from one concept to adjacent concepts
3. Thoughts of good things associate to assessment of our current state
4. Thoughts of our current state being lacking cause a negative emotional response
5. The reward signal fails to backpropagate to the reward system in 1 enough, so the thoughts of "good" things we don't have are reinforced
6. The cycle continues
I don't think this is literally the reason, but framings on this level seem more mechanistic to me.
I also think that any framings along the lines of "you are lying to yourself all the way down and cannot help it" and "literally everyone is messed in some fundamental way and there are no humans who can function in satisfying way" are just kind of bad. Seems like a Kafka trap to me.
I've spoken elsewhere about the human perception of ourselves as a coherent entity being a misfiring of systems which model others as coherent entities (for evolutionary reasons), I don't particularly think some sort of societal pressure is the primary reason for our thinking of ourselves as being coherent, although societal pressure is certainly to blame for the instinct to repress certain desires.
I'm interested in the "Xi will be assassinated/otherwise killed if he doesn't secure this bid for presidency" perspective. Even if he was put in a position where he'd lose the bid for a third term, is it likely that he'd be killed for stepping down? The four previous paramount leaders weren't. Is the argument that he's amassed too much power/done too much evil/burned too many bridges in getting his level of power?
Although I think most people who amass Xi's level of power are best modelled as desiring power (or at least as executing patterns which have in the past maximized power) for its own sake, so I guess the question of threat to his life is somewhat moot with regards to policy.
Seems like there's a potential solution to ELK-like problems. If you can force the information to move from the AI's ontology to (it's model of) a human's ontology and then force it to move it back again.
This gets around "basic" deception since we can always compare the AI's ontology before and after the translation.
The question is how do we force the knowledge to go through the (modeled) human's ontology, and how do we know the forward and backward translators aren't behaving badly in some way.
Unmentioned but large comparative advantage of this: it's not based in the Bay Area.
The typical alignment pitch of: "Come and work on this super-difficult problem you may or may not be well suited for at all" Is a hard enough sell for already-successful people (which intelligent people often are) without adding: "Also you have to move to this one specific area of California which has a bit of a housing and crime problem and very particular culture"
I was referring to "values" more like the second case. Consider the choice blindness experiments (which are well-replicated). People think they value certain things in a partner, or politics, but really it's just a bias to model themselves as being more agentic than they actually are.
Both of your examples share the common fact that the information is verifiable at some point in the future. In this case the best option is to put down money. Or even just credibly offer to put down money.
For example, X offers to bet Y $5000 (possibly at very high odds) that in the year 2030 (after the Moon Nazis have invaded) they will provide a picture of the moon. If Y takes this bet seriously they should update. In fact all other actors A, B, C, who observe this bet will update.
The same is (sort of) true of the second case: just credibly bet some money that in the next five months Russia will release the propaganda video. Of course if you bet too much Russia might not release the video, and you might go bankrupt.
I don't think this works for the general case, although it covers a lot of smaller cases. Depends on the rate at which the value of the information you want to preserve depreciates.
When you say the idea of human values is new, do you mean the idea of humans having values with regards to a utilitarian-ish ethics, is new? Or do you mean the concept of humans maximizing things rationally (or some equivalent concept) is new? If it's the latter I'd be surprised (but maybe I shouldn't be?).
From my experience as a singer, relative pitch exercises are much more difficult when the notes are a few octaves apart. So making sure the notes jump around over a large range would probably help.
You make some really excellent points here.
The teapot example is atypical of deception in humans, and was chosen to be simple and clear-cut. I think the web-of-lies effect is hampered in humans by a couple of things, both of which result from us only being approximations of Bayesian reasoners. One is the limits to our computation, we can't go and check a new update that "snake oil works" against all possible connections. Another part (which is also linked to computation limits) is that I suspect a small enough discrepancy gets rounded down to zero.
So if I'm convinced that "snake oil is effective against depression". I don't necessarily check it against literally all the beliefs I have about depression, which limits the spread of the web. Or if it only very slightly contradicts my existing view of the mechanism of depression, that won't be enough for me to update the existing view at all, and the difference is swept under the rug. So the web peters out.
Of course the main reason snake oil salesmen work is because they play into people's existing biases.
But perhaps more importantly:
This information asymmetry is typically over something that the deceiver does not expect the agent to be able to investigate easily.
This to me seems like regions where the function just isn't defined yet, or is very fuzzy. This means rather than a web of lies we have some lies isolated from the rest of the model by a region of confusion. This means there is no discontinuity in the function, which might be an issue.
I interpret (at least some of) this behaviour as being more about protecting the perception of NFTs as a valid means of ownership than protecting the NFT directly. As analogy, if you bought the Mona Lisa to gain status from owning it and having people visit it, but everyone you spoke to made fun of you and said that they had a copy too, you might be annoyed.
Although before I read your comment I had actually assumed this upset behaviour was mostly coming from trolls - who had right-click copied the NFTs - making fake accounts to LARP as NFT owners. I don't directly interact with NFT owning communities at all so most of my information about how people are actually behaving is filtered through the lens of what gets shared around on various social media.
I think I understand now. My best guess is that if your proof was applied to my example the conclusion would be that my example only pushes the problem back. To specify human values via a method like I was suggesting, you would still need to specify the part of the algorithm that "feels like" it has values, which is a similar type of problem.
I think I hadn't grokked that your proof says something about the space of all abstract value/knowledge systems whereas my thinking was solely about humans. As I understand it, an algorithm that picks out human values from a simulation of the human brain will correspondingly do worse on other types of mind.
I don't understand this. As far as I can tell, I know what my preferences are, and so that information should in some way be encoded in a perfect simulation of my brain. Saying there is no way at all to infer my preferences from all the information in my brain seems to contradict the fact that I can do it right now, even if me telling them to you isn't sufficient for you to infer them.
Once an algorithm is specified, there is no more extra information to specify how it feels from the inside. I don't see how there can be any more information necessary on top of a perfect model of me to specify my feeling of having certain preferences.
This is a great analysis of different causes of modularity. One thought I have is that L1/L2 and pruning seem similar to one another on the surface, but very different to dropout, and all of those seem very different to goal-varying.
If penalizing the total strength of connections during training is sufficient to enforce modularity, could it be the case that dropout is actually just penalizing connections? (e.g. as the effect of a non-firing neuron is propagated to fewer downstream neurons)
I can't immediately see a reason why a goal-varying scheme could penalize connections but I wonder if this is in fact just another way of enforcing the same process.
I think the tweet about the NHS app(s) is slightly misleading. I'm pretty confident those screenshots relate to two separate apps: one is a general health services app which can also be used to generate a certificate of vaccination (as the app has access to health records). The second screenshot relates to a covid-specific app which enables "check-ins" at venues for contact-tracing purposes, and the statement there seems to be declaring that the local information listing venues visited could - in theory - be used to get demographic information. One is called the "NHS App" and the other is called the "NHS Covid 19 App" so it's an understandable confusion.
I'm afraid I didn't intend for people to be able to add conditions to their plans. While something like that is completely reasonable I can't find a place to draw the line between that and what would be too complex. The only system that might work is having everyone send me their own python code but that's not fair on people who can't code, and more work than I'm willing to do. Other answers haven't included conditions and I think it wouldn't be fair on them. I think my decision is that:
If you don't get the time to respond with a time to move on from the Thunderwood Peaks then I'll put it at a week (which I have chosen but won't say here for obvious reasons) somewhere between 0 and 10 which I would guess best represents your intentions.
I'm really sorry about the confusion, I should've made that all clearer from the start!