Posts

Forget Everything (Statistical Mechanics Part 1) 2024-04-22T13:33:35.446Z
Measuring Learned Optimization in Small Transformer Models 2024-04-08T14:41:27.669Z
Briefly Extending Differential Optimization to Distributions 2024-03-10T20:41:09.551Z
Finite Factored Sets to Bayes Nets Part 2 2024-02-03T12:25:41.444Z
From Finite Factors to Bayes Nets 2024-01-23T20:03:51.845Z
Differential Optimization Reframes and Generalizes Utility-Maximization 2023-12-27T01:54:22.731Z
Mathematically-Defined Optimization Captures A Lot of Useful Information 2023-10-29T17:17:03.211Z
Defining Optimization in a Deeper Way Part 4 2022-07-28T17:02:33.411Z
Defining Optimization in a Deeper Way Part 3 2022-07-20T22:06:48.323Z
Defining Optimization in a Deeper Way Part 2 2022-07-11T20:29:30.225Z
Defining Optimization in a Deeper Way Part 1 2022-07-01T14:03:18.945Z
Thinking about Broad Classes of Utility-like Functions 2022-06-07T14:05:51.807Z
The Halting Problem and the Impossible Photocopier 2022-03-31T18:19:20.292Z
Why Do I Think I Have Values? 2022-02-03T13:35:07.656Z
Knowledge Localization: Tentatively Positive Results on OCR 2022-01-30T11:57:19.151Z
Deconfusing Deception 2022-01-29T16:43:53.750Z
[Book Review]: The Bonobo and the Atheist by Frans De Waal 2022-01-05T22:29:32.699Z
DnD.Sci GURPS Evaluation and Ruleset 2021-12-22T19:05:46.205Z
SGD Understood through Probability Current 2021-12-19T23:26:23.455Z
Housing Markets, Satisficers, and One-Track Goodhart 2021-12-16T21:38:46.368Z
D&D.Sci GURPS Dec 2021: Hunters of Monsters 2021-12-11T12:13:02.574Z
Hypotheses about Finding Knowledge and One-Shot Causal Entanglements 2021-12-01T17:01:44.273Z
Relying on Future Creativity 2021-11-30T20:12:43.468Z
Nightclubs in Heaven? 2021-11-05T23:28:19.461Z
I Really Don't Understand Eliezer Yudkowsky's Position on Consciousness 2021-10-29T11:09:20.559Z
Nanosystems are Poorly Abstracted 2021-10-24T10:44:27.934Z
No Really, There Are No Rules! 2021-10-07T22:08:13.834Z
Modelling and Understanding SGD 2021-10-05T13:41:22.562Z
[Book Review] "I Contain Multitudes" by Ed Yong 2021-10-04T19:29:55.205Z
Reachability Debates (Are Often Invisible) 2021-09-27T22:05:06.277Z
A Confused Chemist's Review of AlphaFold 2 2021-09-27T11:10:16.656Z
How to Find a Problem 2021-09-08T20:05:45.835Z
A Taxonomy of Research 2021-09-08T19:30:52.194Z
Addendum to "Amyloid Plaques: Medical Goodhart, Chemical Streetlight" 2021-09-02T17:42:02.910Z
Good software to draw and manipulate causal networks? 2021-09-02T14:05:18.389Z
Amyloid Plaques: Chemical Streetlight, Medical Goodhart 2021-08-26T21:25:04.804Z
Generator Systems: Coincident Constraints 2021-08-23T20:37:38.235Z
Fudging Work and Rationalization 2021-08-13T19:51:44.531Z
The Reductionist Trap 2021-08-09T17:00:56.699Z
Uncertainty can Defuse Logical Explosions 2021-07-30T12:36:29.875Z
Hobbies and the curse of Spontaneity 2021-07-22T13:25:43.973Z
A Models-centric Approach to Corrigible Alignment 2021-07-17T17:27:32.536Z
Generalising Logic Gates 2021-07-17T17:25:08.428Z
Equivalent of Information Theory but for Computation? 2021-07-17T09:38:48.227Z
Positive Expectations; how to build Hopefulness 2021-07-03T13:41:16.188Z
Jemist's Shortform 2021-05-31T22:39:28.638Z
Are there any methods for NNs or other ML systems to get information from knockout-like or assay-like experiments? 2021-05-18T21:33:38.474Z
Optimizers: To Define or not to Define 2021-05-16T19:55:35.735Z
Alzheimer's, Huntington's and Mitochondria Part 3: Predictions and Retrospective 2021-05-03T14:47:23.365Z
Alzheimer's, Huntington's and Mitochondria Part 2: Glucose Metabolism 2021-05-03T14:47:10.125Z

Comments

Comment by J Bostock (Jemist) on Forget Everything (Statistical Mechanics Part 1) · 2024-04-22T15:11:45.212Z · LW · GW

I uhh, didn't see that. Odd coincidence! I've added a link and will consider what added value I can bring from my perspective.

Comment by J Bostock (Jemist) on From Finite Factors to Bayes Nets · 2024-03-07T12:57:56.027Z · LW · GW

Thanks for the feedback. There's a condition which I assumed when writing this which I have realized is much stronger than I originally thought, and I think I should've devoted more time to thinking about its implications.

When I mentioned "no information being lost", what I meant is that in the interaction , each value  (where  is the domain of ) corresponds to only one value of . In terms of FFS, this means that each variable must be the maximally fine partition of the base set which is possible with that variable's set of factors.

Under these conditions, I am pretty sure that 

Comment by J Bostock (Jemist) on From Finite Factors to Bayes Nets · 2024-02-03T19:04:11.903Z · LW · GW

I was thinking about causality in terms of forced directional arrows in Bayes nets, rather than in terms of d-separation. I don't think your example as written is helpful because Bayes nets rely on the independence of variables to do causal inference:  is equivalent to .

It's more important to think about cases like  where causality can be inferred. If we change this to  by adding noise then we still get a distribution satisfying  (as  and  are still independent).

Even if we did have other nodes forcing  (such as a node  which is parent to , and another node  which is parent to ), then I still don't think adding noise lets us swap the orders round.

On the other hand, there are certainly issues in Bayes nets of more elements, particularly the "diamond-shaped" net with arrows . Here adding noise does prevent effective temporal inference, since, if  and  are no longer d-separated by , we cannot prove from correlations alone that no information goes between them through .

Comment by J Bostock (Jemist) on From Finite Factors to Bayes Nets · 2024-01-25T19:20:15.071Z · LW · GW

I had forgotten about OEIS! Anyway Ithink the actual number might be 1577 rather than 1617 (this also gives no answers). I was only assuming agnosticism over factors in the overlap region  if all pairs  had factors, but I think that is missing some examples. My current guess is that any overlap region like  should be agnostic iff all of the overlap regions "surrounding" it in the Venn diagram () in this situation either have a factor present or agnostic. This gives the series 1, 2, 15, 1577, 3397521 (my computer has not spat out the next element). This also gives nothing on the OEIS.

My reasoning for this condition is that we should be able to "remove" an observable from the system without trouble. If we have an agnosticism, in the intersection , then we can only remove observable  if this doesn't cause trouble for the new intersection , which is only true if we already have an factor in  (or are agnostic about it). 

Comment by J Bostock (Jemist) on Natural Latents: The Math · 2024-01-16T22:38:17.614Z · LW · GW

I know very, very little about category theory, but some of this work regarding natural latents seem to absolutely smack of it. There seems to be a fairly important three-way relationship between causal models, finite factored sets, and Bayes nets.

To be precise, any causal model consisting of root sets , downstream sets , and functions mapping sets to downstream sets like  must, when equipped with a set of independent probability distributions over B, create a joint probability distribution compatible with the Bayes net that's isomorphic to the causal model in the obvious way. (So in the previous example, there would be arrows from only , and  to ) The proof of this seems almost trivial but I don't trust myself not to balls it up somehow when working with probability theory notation.

In the resulting Bayes net, one "minimal" natural latent which conditionally separates  and  is just the probabilities over just the root elements from  which both  and  depend on. It might be possible to show that this "minimal" construction of  satisfies a universal property, and so other  which is also "minimal" in this way must be isomorphic to .

Comment by J Bostock (Jemist) on Differential Optimization Reframes and Generalizes Utility-Maximization · 2023-12-27T15:11:08.399Z · LW · GW

I think the position of the ball is in V, since the players are responding to the position of the ball by forcing it towards the goal. It's difficult to predict the long-term position of the ball based on where it is now. The position of the opponent's goal would be an example of something in U for both teams. In this case both team's utility-functions contain a robust pointer to the goal's position.

Comment by J Bostock (Jemist) on But What's Your *New Alignment Insight,* out of a Future-Textbook Paragraph? · 2022-05-15T10:49:51.406Z · LW · GW

I'd go for:

Reinforcement learning agents do two sorts of planning. One is the application of the dynamic (world-modelling) network and using a Monte Carlo tree search (or something like it) over explicitly-represented world states. The other is implicit in the future-reward-estimate function. You need to have as much planning as possible be of the first type:

  1. It's much more supervisable. An explicitly-represented world state is more interrogable than the inner workings of a future-reward-estimate.
  2. It's less susceptible to value-leaking. By this I mean issues in alignment which arise from instrumentally-valuable (i.e. not directly part of the reward function) goals leaking into the future-reward-estimate.
  3. You can also turn down the depth on the tree search. If the agent literally can't plan beyond a dozen steps ahead it can't be deceptively aligned.
Comment by J Bostock (Jemist) on The Game of Masks · 2022-04-28T23:13:15.459Z · LW · GW

I would question the framing of mental subagents as "mesa optimizers" here. This sneaks in an important assumption: namely that they are optimizing anything. I think the general view of "humans are made of a bunch of different subsystems which use common symbols to talk to one another" has some merit, but I think this post ascribes a lot more agency to these subsystems than I would. I view most of the subagents of human minds as mechanistically relatively simple.

For example, I might reframe a lot of the elements of talking about the unattainable "object of desire" in the following way:

1. Human minds have a reward system which rewards thinking about "good" things we don't have (or else we couldn't ever do things)
2. Human thoughts ping from one concept to adjacent concepts
3. Thoughts of good things associate to assessment of our current state
4. Thoughts of our current state being lacking cause a negative emotional response
5. The reward signal fails to backpropagate to the reward system in 1 enough, so the thoughts of "good" things we don't have are reinforced
6. The cycle continues

I don't think this is literally the reason, but framings on this level seem more mechanistic to me. 

I also think that any framings along the lines of "you are lying to yourself all the way down and cannot help it" and "literally everyone is messed in some fundamental way and there are no humans who can function in satisfying way" are just kind of bad. Seems like a Kafka trap to me.

I've spoken elsewhere about the human perception of ourselves as a coherent entity being a misfiring of systems which model others as coherent entities (for evolutionary reasons), I don't particularly think some sort of societal pressure is the primary reason for our thinking of ourselves as being coherent, although societal pressure is certainly to blame for the instinct to repress certain desires.

Comment by J Bostock (Jemist) on China Covid #2 · 2022-04-23T15:08:49.083Z · LW · GW

I'm interested in the "Xi will be assassinated/otherwise killed if he doesn't secure this bid for presidency" perspective. Even if he was put in a position where he'd lose the bid for a third term, is it likely that he'd be killed for stepping down? The four previous paramount leaders weren't. Is the argument that he's amassed too much power/done too much evil/burned too many bridges in getting his level of power?

Although I think most people who amass Xi's level of power are best modelled as desiring power (or at least as executing patterns which have in the past maximized power) for its own sake, so I guess the question of threat to his life is somewhat moot with regards to policy.

Comment by J Bostock (Jemist) on Jemist's Shortform · 2022-04-20T22:38:09.821Z · LW · GW

Seems like there's a potential solution to ELK-like problems. If you can force the information to move from the AI's ontology to (it's model of) a human's ontology and then force it to move it back again.

This gets around "basic" deception since we can always compare the AI's ontology before and after the translation.

The question is how do we force the knowledge to go through the (modeled) human's ontology, and how do we know the forward and backward translators aren't behaving badly in some way.

Comment by Jemist on [deleted post] 2022-02-18T13:24:32.087Z

Unmentioned but large comparative advantage of this: it's not based in the Bay Area.

The typical alignment pitch of: "Come and work on this super-difficult problem you may or may not be well suited for at all" Is a hard enough sell for already-successful people (which intelligent people often are) without adding: "Also you have to move to this one specific area of California which has a bit of a housing and crime problem and very particular culture"

Comment by J Bostock (Jemist) on Why Do I Think I Have Values? · 2022-02-05T19:11:38.463Z · LW · GW

I was referring to "values" more like the second case. Consider the choice blindness experiments (which are well-replicated). People think they value certain things in a partner, or politics, but really it's just a bias to model themselves as being more agentic than they actually are.

Comment by J Bostock (Jemist) on Transferring credence without transferring evidence? · 2022-02-04T11:28:16.866Z · LW · GW

Both of your examples share the common fact that the information is verifiable at some point in the future. In this case the best option is to put down money. Or even just credibly offer to put down money.

For example, X offers to bet Y $5000 (possibly at very high odds) that in the year 2030 (after the Moon Nazis have invaded) they will provide a picture of the moon. If Y takes this bet seriously they should update. In fact all other actors A, B, C, who observe this bet will update.

The same is (sort of) true of the second case: just credibly bet some money that in the next five months Russia will release the propaganda video. Of course if you bet too much Russia might not release the video, and you might go bankrupt.

I don't think this works for the general case, although it covers a lot of smaller cases. Depends on the rate at which the value of the information you want to preserve depreciates.

Comment by J Bostock (Jemist) on Why Do I Think I Have Values? · 2022-02-03T21:10:19.971Z · LW · GW

When you say the idea of human values is new, do you mean the idea of humans having values with regards to a utilitarian-ish ethics, is new? Or do you mean the concept of humans maximizing things rationally (or some equivalent concept) is new? If it's the latter I'd be surprised (but maybe I shouldn't be?).

Comment by J Bostock (Jemist) on How would you learn absolute pitch? · 2022-01-30T00:02:20.911Z · LW · GW

From my experience as a singer, relative pitch exercises are much more difficult when the notes are a few octaves apart. So making sure the notes jump around over a large range would probably help.

Comment by J Bostock (Jemist) on Deconfusing Deception · 2022-01-29T23:53:05.554Z · LW · GW

You make some really excellent points here. 

The teapot example is atypical of deception in humans, and was chosen to be simple and clear-cut. I think the web-of-lies effect is hampered in humans by a couple of things, both of which result from us only being approximations of Bayesian reasoners. One is the limits to our computation, we can't go and check a new update that "snake oil works" against all possible connections. Another part (which is also linked to computation limits) is that I suspect a small enough discrepancy gets rounded down to zero.

So if I'm convinced that "snake oil is effective against depression". I don't necessarily check it against literally all the beliefs I have about depression, which limits the spread of the web. Or if it only very slightly contradicts my existing view of the mechanism of depression, that won't be enough for me to update the existing view at all, and the difference is swept under the rug. So the web peters out.

Of course the main reason snake oil salesmen work is because they play into people's existing biases. 

But perhaps more importantly:

This information asymmetry is typically over something that the deceiver does not expect the agent to be able to investigate easily.

This to me seems like regions where the function  just isn't defined yet, or is very fuzzy. This means rather than a web of lies we have some lies isolated from the rest of the model by a region of confusion. This means there is no discontinuity in the function, which might be an issue.

Comment by J Bostock (Jemist) on NFTs, Coin Collecting, and Expensive Paintings · 2022-01-24T11:41:18.177Z · LW · GW

I interpret (at least some of) this behaviour as being more about protecting the perception of NFTs as a valid means of ownership than protecting the NFT directly. As analogy, if you bought the Mona Lisa to gain status from owning it and having people visit it, but everyone you spoke to made fun of you and said that they had a copy too, you might be annoyed.

Although before I read your comment I had actually assumed this upset behaviour was mostly coming from trolls - who had right-click copied the NFTs - making fake accounts to LARP as NFT owners. I don't directly interact with NFT owning communities at all so most of my information about how people are actually behaving is filtered through the lens of what gets shared around on various social media.

Comment by J Bostock (Jemist) on How an alien theory of mind might be unlearnable · 2022-01-06T17:08:28.993Z · LW · GW

I think I understand now. My best guess is that if your proof was applied to my example the conclusion would be that my example only pushes the problem back. To specify human values via a method like I was suggesting, you would still need to specify the part of the algorithm that "feels like" it has values, which is a similar type of problem.

I think I hadn't grokked that your proof says something about the space of all abstract value/knowledge systems whereas my thinking was solely about humans. As I understand it, an algorithm that picks out human values from a simulation of the human brain will correspondingly do worse on other types of mind.

Comment by J Bostock (Jemist) on How an alien theory of mind might be unlearnable · 2022-01-05T22:39:11.622Z · LW · GW

I don't understand this. As far as I can tell, I know what my preferences are, and so that information should in some way be encoded in a perfect simulation of my brain. Saying there is no way at all to infer my preferences from all the information in my brain seems to contradict the fact that I can do it right now, even if me telling them to you isn't sufficient for you to infer them.

Once an algorithm is specified, there is no more extra information to specify how it feels from the inside. I don't see how there can be any more information necessary on top of a perfect model of me to specify my feeling of having certain preferences.

Comment by J Bostock (Jemist) on Regularization Causes Modularity Causes Generalization · 2022-01-03T18:19:18.753Z · LW · GW

This is a great analysis of different causes of modularity. One thought I have is that L1/L2 and pruning seem similar to one another on the surface, but very different to dropout, and all of those seem very different to goal-varying.

If penalizing the total strength of connections during training is sufficient to enforce modularity, could it be the case that dropout is actually just penalizing connections? (e.g. as the effect of a non-firing neuron is propagated to fewer downstream neurons)

I can't immediately see a reason why a goal-varying scheme could penalize connections but I wonder if this is in fact just another way of enforcing the same process.

Comment by J Bostock (Jemist) on Covid 12/30: Infinity War · 2021-12-30T18:38:48.736Z · LW · GW

I think the tweet about the NHS app(s) is slightly misleading. I'm pretty confident those screenshots relate to two separate apps: one is a general health services app which can also be used to generate a certificate of vaccination (as the app has access to health records). The second screenshot relates to a covid-specific app which enables "check-ins" at venues for contact-tracing purposes, and the statement there seems to be declaring that the local information listing venues visited could - in theory - be used to get demographic information. One is called the "NHS App" and the other is called the "NHS Covid 19 App" so it's an understandable confusion.

Comment by J Bostock (Jemist) on D&D.Sci GURPS Dec 2021: Hunters of Monsters · 2021-12-21T13:39:04.475Z · LW · GW

I'm afraid I didn't intend for people to be able to add conditions to their plans. While something like that is completely reasonable I can't find a place to draw the line between that and what would be too complex. The only system that might work is having everyone send me their own python code but that's not fair on people who can't code, and more work than I'm willing to do. Other answers haven't included conditions and I think it wouldn't be fair on them. I think my decision is that:

If you don't get the time to respond with a time to move on from the Thunderwood Peaks then I'll put it at a week (which I have chosen but won't say here for obvious reasons) somewhere between 0 and 10 which I would guess best represents your intentions.

I'm really sorry about the confusion, I should've made that all clearer from the start!

Comment by J Bostock (Jemist) on Open & Welcome Thread December 2021 · 2021-12-20T16:29:38.302Z · LW · GW

I think your comment excellently illustrates the problems with the experiment!

Next to the upvote/downvote buttons there's a separate box for agreement/disagreement. I think the aim is to separate "this post contributes to the discussion in a positive/negative way" from "I think the claims expressed here are accurate". It's active in the comments of the post I linked in my comment and there's a pinned comment from Ruby explaining it.

Comment by J Bostock (Jemist) on Open & Welcome Thread December 2021 · 2021-12-20T13:10:23.865Z · LW · GW

I'm very interested to try the new two-axis voting system but it seems to only be active on one post which also happens to be very tied up with some current Bay Area-specific issues which limits who can actually engage with it. I also think it would be good for the community to get to "practice" with such voting on some topics which are easier to discuss so norms can be established before moving on to the more explosive ones. I'd like to see more posts with this enabled, perhaps a few more people with posts having >20 comments currently on the frontpage could be asked about it, or a pinned post could be made by the mods explaining it and enabling people to ask for it.

I do think that group-politics-related posts might have the greatest potential to benefit from this type of voting (especially relative to the current system).

Comment by J Bostock (Jemist) on D&D.Sci GURPS Dec 2021: Hunters of Monsters · 2021-12-17T23:35:58.022Z · LW · GW

Sure! I was planning to anyways but that plus my own busyness means it will more likely be early next week/even later if people would prefer.

Comment by J Bostock (Jemist) on Housing Markets, Satisficers, and One-Track Goodhart · 2021-12-16T22:50:47.616Z · LW · GW

As the unmet demand for housing at all levels is currently outstripped by supply, the optimal local move is to replace cheaper-per-space housing with expensive-per-space housing, where the latter is targeted towards rich people, whenever permission from local government can be obtained. If the unmet demand for housing at all levels were much smaller, then this move wouldn't be profitable by default and developers would have to choose where to build new marginal rich-people-targeted houses more carefully. For some human-desirable variable "strength of community", the rents/sale prices will be higher the more of that is present. Then the obvious choice is to build your new development such that the "strength of community" of the removed building is lowest, relative to the "strength of community" of the new building. The existence of this sort of choice would mean that existing communities that people like would be less likely to be removed.

Comment by J Bostock (Jemist) on A fate worse than death? · 2021-12-14T13:22:34.822Z · LW · GW

 But I don't know why it's downvoted so far - it's an important topic, and I'm glad to have some more discussion of it here (even if I disagree with the conclusions and worry about the unstated assumptions).

I agree with this. The author has made a number of points I disagree with but hasn't done anything worthy of heavy downvotes (like having particularly bad epistemics, being very factually wrong, personally attacking people, or making a generally low-effort or low-quality post). This post alone has changed my views towards favouring a modification of the upvote/downvote system.

Comment by J Bostock (Jemist) on D&D.Sci GURPS Dec 2021: Hunters of Monsters · 2021-12-13T20:19:23.642Z · LW · GW

Option 2

Comment by J Bostock (Jemist) on A fate worse than death? · 2021-12-13T14:30:43.799Z · LW · GW

In the described scenario, the end result is omnicide. Thus, it is not much different from the AI immediately killing all humans. 

I strongly disagree with this. I would much, much rather be killed immediately than suffer for a trillion years and then die. This is for the same reason that I would rather enjoy a trillion years of life and then die, than die immediately.

In this case, the philosophy's adherents have no preference between dying and doing something else with zero utility (e.g. touching their nose). As humans encounter countless actions of a zero utility, the adherents are either all dead or being inconsistent. 

I think you're confusing the utility of a scenario with the expected utility of an action. Assigning zero utility to being dead is not the same as assigning zero expected utility to dying over not dying. If we let the expected utility of an action be defined relative to the expected utility of not doing that action, then "touching my nose", which doesn't affect my future utility, does have an expected utility of zero. But if I assign positive utility to my future existence, then killing myself has negative expected utility relative to not doing so.

Comment by J Bostock (Jemist) on A fate worse than death? · 2021-12-13T12:17:47.624Z · LW · GW

You're argument rests on the fact that people who have suffered a million years of suffering could - in theory - be rescued and made happy, with it only requiring "tech and time". In an S-risk scenario, that doesn't happen.

In what I'd consider the archetypical S-risk scenario, an AI takes over, starts simulating humans who suffer greatly, and there is no more human agency ever again. The (simulated) humans experience great suffering until the AI runs out of power (some time trillions of years in the future when the universe can no longer power any more computation) at which point they die anyway.

As for your points on consistency, I'm pretty sure a utilitarian philosophy that simply assigns utility zero to the brain state of being dead is consistent. Whether this is actually consistent with people's revealed preferences and moral intuitions I'm not sure.

Comment by J Bostock (Jemist) on D&D.Sci GURPS Dec 2021: Hunters of Monsters · 2021-12-11T20:38:31.390Z · LW · GW

I imagine their response would be along the lines of: "Why the hell should I let to someone who doesn't even know how big a Dull Viper is tell me how to hunt it!?"

Comment by J Bostock (Jemist) on Second-order selection against the immortal · 2021-12-06T17:35:46.630Z · LW · GW

I think it won't be easy to modify the genome of individuals to achieve predictable outcomes even if you get the machinery you describe to work. 

Is this because of factors like the almost-infinite number of interactions between different genes, such that even with a hypothetical magic technology to arbitrarily and perfectly change the DNA in every cell in the body, it wouldn't be possible to predict the outcome of such a change? Or is it because you don't think that any machinery will ever be precise enough to make this work well enough? Or some other issue entirely?

Comment by J Bostock (Jemist) on Second-order selection against the immortal · 2021-12-05T20:15:09.042Z · LW · GW

What I meant is changing the genetic code in ~all of the cells in a human body. Or some sort of genetic engineering which has the same effect as that.

Here's one model I have as to how you could genetically engineer a living human:

Many viruses are able to reverse-transcribe RNA to DNA and insert that DNA into cells. This causes a lot of problems for cells, but there are (probably) large regions of the genome where insertions of new DNA wouldn't cause problems. I don't think it would be difficult to target insertion of DNA to those regions, as DNA binding proteins could be attached to DNA insertion proteins.

This sort of technology requires only the insertion of RNA into a cell. There are a number of ways to put RNA into cells at the moment, such as "edited" viruses, lipid droplets, and more might be developed.

I also believe targeting somatic stem cells for modification via cell-specific surface proteins is possible. If not we could also cause the modified cells to revert to stem cells (by causing them to express Yamanaka Factors etc.).

The stem cells will differentiate and eventually replace (almost all) unmodified cells.

The resulting technology would allow arbitrary insertion of genetic code into most somatic cells (neurons might not be direct targets but perhaps engineering of glia or whatever could do them). Using CRISPR-like technologies rather than reverse transcription we could also do arbitrary mutation, gene knockout, etc.

I guess this is still somewhat handwavey. Speculating on future technology is always handwavey. 

Comment by J Bostock (Jemist) on Second-order selection against the immortal · 2021-12-04T12:06:54.871Z · LW · GW

I think cultural evolution will be the greater factor by a large margin. I think the technology for immortality is possible but that it will either directly involve genetic engineering of living humans, or be one or two steps away from it. People who are willing to take an immortality drug are very likely to also be willing to improve themselves in other ways. If the Horde is somehow going to outcompete them due entirely to beneficial mutations, the Imperium could simply steal them.

Comment by J Bostock (Jemist) on Hypotheses about Finding Knowledge and One-Shot Causal Entanglements · 2021-12-02T11:59:14.850Z · LW · GW

Thanks! I get your arguments about "knowledge" being restricted to predictive domains, but I think it's (mostly) just a semantic issue. I also don't think the specifics of the word "knowledge" are particularly important to my points which is what I attempted to clarify at the start, but I've clearly typical-minded and assumed that of course everyone would agree with me about a dog/fish classifier having "knowledge", when it's more of an edge-case than I thought! Perhaps a better version of this post would have either tabooed "knowledge" altogether or picked a more obviously-knowledge-having model.

Comment by J Bostock (Jemist) on Rapid Increase of Highly Mutated B.1.1.529 Strain in South Africa · 2021-11-26T14:45:42.364Z · LW · GW

This is a pretty strong indication of immune escape to me, if it persists in other outbreaks. If this was purely from increased infectiousness in naive individuals it would imply an R-value (in non-immune populations) of like 40 or something, which seems much less plausible than immune escape. I don't know what the vaccination/infection rates are in these communities though.

Comment by J Bostock (Jemist) on Jemist's Shortform · 2021-11-25T17:34:26.275Z · LW · GW

The UK has just switched their available rapid Covid tests from a moderately unpleasant one to an almost unbearable one. Lots of places require them for entry. I think the cost/benefit makes sense even with the new kind, but I'm becoming concerned we'll eventually reach the "imagine a society where everyone hits themselves on the head every day with a baseball bat" situation if cases approach zero.

Comment by J Bostock (Jemist) on Potential Alignment mental tool: Keeping track of the types · 2021-11-23T13:21:16.003Z · LW · GW

My current belief on this is that the greatest difficulty is going to be finding the "human values" in the AI's model of the world. Any AI smart enough to deceive humans will have a predictive model of humans which almost trivially must contain something that looks like "human values". The biggest problems I see are:

1: "Human values" may not form a tight abstracted cluster in a model of the world at all. This isn't so much  conceptual issue as in theory we could just draw a more complex boundary around them, but it makes it practically more difficult.

2: It's currently impossible to see what the hell is going on inside most large ML systems. Interpretability work might be able to allow us to find the right subsection of a model.

3: Any pointer we build to the human values in a model also needs to be stable to the model updating. If that knowledge gets moved around as parameters change, the computational tool/mathematical object which points to them needs to be able to keep track of that. This could include sudden shifts, slow movement, breaking up of models into smaller separate models.

(I haven't defined knowledge, I'm not very confused about what it means to say "knowledge of X is in a particular location in the model" but I don't have space here to write it all up)

Comment by J Bostock (Jemist) on Petrov Day Retrospective: 2021 · 2021-11-05T10:35:05.314Z · LW · GW

Very good point. Perhaps there just intrinsically is no way of doing something that this community perceives as "burning" money, without upsetting people.

Comment by J Bostock (Jemist) on I Really Don't Understand Eliezer Yudkowsky's Position on Consciousness · 2021-10-30T21:20:05.454Z · LW · GW

Having now had a lot of different conversations on consciousness I'm coming to a slightly disturbing belief that this might be the case. I have no idea what this implies for any of my downstream-of-consciousness views.

Comment by J Bostock (Jemist) on I Really Don't Understand Eliezer Yudkowsky's Position on Consciousness · 2021-10-30T18:40:20.393Z · LW · GW

I'm confident your model of Eliezer is more accurate than mine.

Neither the twitter thread or other writings originally gave me the impression that he had a model in that fine-grained detail. I was mentally comparing his writings on consciousness to his writings on free will. Reading the latter made me feel like I strongly understood free will as a concept, and since then I have never been confused, it genuinely reduced free will as a concept in my mind.

His writings on consciousness have not done anything more than raise that model to the same level of possibility as a bunch of other models I'm confused about. That was the primary motivation for this post. But now that you mention it, if he genuinely believes that he has knowledge which might bring him closer to (or might bring others closer to to) programming a conscious being, I can see why he wouldn't share it in high detail.

Comment by J Bostock (Jemist) on I Really Don't Understand Eliezer Yudkowsky's Position on Consciousness · 2021-10-29T16:41:17.472Z · LW · GW

Basically yes I care about the subjective experiences of entities. I'm curious about the use of the word "still" here. This implies you used to have a similar view to mine but changed it, if so what made you change your mind? Or have I just missed out on some massive shift in the discourse surrounding consciousness and moral weight? If the latter is the case (which it might be, I'm not plugged into a huge number of moral philosophy sources) that might explain some of my confusion.

Comment by J Bostock (Jemist) on I Really Don't Understand Eliezer Yudkowsky's Position on Consciousness · 2021-10-29T16:25:36.388Z · LW · GW

he defines consciousness as "what an algorithm implementing complex social games feels like when reflecting on itself".


In that case I'll not use the word consciousness and abstract away to "things which I ascribe moral weight to", (which I think is a fair assumption given the later discussion of eating "BBQ GPT-3 wings" etc.)

Eliezer's claim is therefore something along the lines of: "I only care about the suffering of algorithms which implement complex social games and reflect on themselves" or  possibly "I only care about the suffering of algorithms which are capable of (and currently doing a form of) self-modelling".

I've not seen nearly enough evidence to convince me of this.

I don't expect to see a consciousness particle called a qualon. I more expect to see something like: "These particular brain activity patterns which are robustly detectable in an fMRI are extremely low in sleeping people, higher in dreaming people, higher still in awake people and really high in people on LSD and types of zen meditation."

Comment by J Bostock (Jemist) on I Really Don't Understand Eliezer Yudkowsky's Position on Consciousness · 2021-10-29T16:15:35.892Z · LW · GW

You present an excellently-written and interesting case here. I agree with the point that self-modelling systems can think in certain ways which are unique and special and chickens can't do that.

One reason I identify consciousness with having qualia is that Eliezer specifically does that in the twitter thread. The other is that qualia is generally less ambiguous than terms like consciousness and self-awareness and sentience. The disadvantage is that the concept of qualia is something which is very difficult (and beyond my explaining capabilities) to explain to people who don't know what it means. I choose to take this tradeoff because I find that I, personally, get much more out of discussions about specifically qualia, than any of the related words. Perhaps I'm not taking seriously enough the idea that illusionism will explain why I feel like I'm conscious and not explain why I am conscious.

I also agree that most other existing mainstream views are somewhat poor, but to me this isn't particularly strong positive evidence for Eliezer's views. This is because models of consciousness on the level of detail of Eliezer's are hard to come up with, so there might be many other excellent ones that haven't been found yet. And Eliezer hasn't done (to my knowledge) anything which rules out other arguments on the level of detail of his own.

Basically I think that the reason the best argument we see is Eliezer's is less along the lines of "this is the only computational argument that could be made for consciousness" and more along the lines of "computational arguments for consciousness are really difficult and this is the first one anyone has found".

Comment by J Bostock (Jemist) on I Really Don't Understand Eliezer Yudkowsky's Position on Consciousness · 2021-10-29T15:57:41.361Z · LW · GW

Eliezer later states that he is referring to qualia specifically, which for me are (within a rounding error) totally equivalent to moral relevance.

Comment by J Bostock (Jemist) on Petrov Day Retrospective: 2021 · 2021-10-23T11:31:15.768Z · LW · GW

My first thought was that this could be avoided by - if the button was pressed - giving it to a "rare diseases in cute puppies" type charity, rather than destroying it. I'd suspect the intersection of "people who care strongly enough about effective altruism to be angry", "people who don't understand the point of Petrov Day", and "people who have the power to generate large amounts of negative publicity" is very small.

But I think a lot of LWers who are less onboard with Petrov Day in general would be just as (or almost as) turned off by this concept as the idea of burning the money. Perhaps something akin to the one landfish did would be better? At least in that case I would guess most LWers are OK enough with either MIRI or AMF (or maybe substitute other charities?) receiving money at the expense of one another for it to work OK.

Comment by J Bostock (Jemist) on Jemist's Shortform · 2021-10-15T20:53:30.108Z · LW · GW

Just realized I'm probably feeling much worse than I ought to on days when I fast because I've not been taking sodium. I really should have checked this sooner. If you're planning to do long (I do a day, which definitely feels long) fasts, take sodium! 

Comment by J Bostock (Jemist) on Questions about YIMBY · 2021-10-08T22:35:20.606Z · LW · GW

The green belt problem is not one I'd considered before. I've always assumed the biggest problems for places like London were the endless low-density suburbs rather than the limit on building houses outside of a certain radius. If you work in the centre of London and live in some new development just outside the green belt, that already seems like something of a failure.

I don't want to doubt the expert economic analysis though, perhaps removing it would allow people to move from the suburbs to new developments, freeing up suburb space. This also seems wrong as higher population density is the goal, but perhaps the people who are better off living outside the city are retirees or similar who reduce the demand for low-density housing in the city and therefore them leaving allows higher density housing to be built.

Comment by J Bostock (Jemist) on No Really, There Are No Rules! · 2021-10-08T17:12:43.789Z · LW · GW

Actually that's a good point,  I think that's the only rule which doesn't need to be written (which I completely forgot to mention). Other rules regarding text can be manipulated the same way the other rules can.

Comment by J Bostock (Jemist) on D&D.Sci 4th Edition: League of Defenders of the Storm · 2021-10-04T11:14:45.755Z · LW · GW

Using python I conducted a few different analyses:

Proportion of character wins vs other characters:
Proportion of character wins when paired with other characters:

With these I gave each possible team a score, equal to the sum over characters (sum over enemy characters(proportion of wins for said character) + sum over other teammates(proportion of wins when paired with said character)), and the highest scoring team was:

Rock-n-Roll Ranger, Blaze Boy, Nullifying Nightmare, Greenery Giant, Tidehollow Tyrant

This was much more pleasant than using Excel! I think I might try and learn R or some other dedicated statistical language for the next one.

PvP team (without having the time to estimate anything about my enemies' teams so highly likely to get countered) has actually come out the same. There's a good chance something is up with my analysis or my method is too biased towards synergistic teams.

Tidehollow Tyrant, Rock-n-Roll Ranger, Nullifying Nightmare, Greenery Giant, Blaze Boy