Goodhart Ethology 2021-09-17T17:31:33.833Z
Competent Preferences 2021-09-02T14:26:50.762Z
Introduction to Reducing Goodhart 2021-08-26T18:38:51.592Z
How to turn money into AI safety? 2021-08-25T10:49:01.507Z
HCH Speculation Post #2A 2021-03-17T13:26:46.203Z
Hierarchical planning: context agents 2020-12-19T11:24:09.064Z
Modeling humans: what's the point? 2020-11-10T01:30:31.627Z
What to do with imitation humans, other than asking them what the right thing to do is? 2020-09-27T21:51:36.650Z
Charlie Steiner's Shortform 2020-08-04T06:28:11.553Z
Constraints from naturalized ethics. 2020-07-25T14:54:51.783Z
Meta-preferences are weird 2020-07-16T23:03:40.226Z
Down with Solomonoff Induction, up with the Presumptuous Philosopher 2020-06-12T09:44:29.114Z
The Presumptuous Philosopher, self-locating information, and Solomonoff induction 2020-05-31T16:35:48.837Z
Life as metaphor for everything else. 2020-04-05T07:21:11.303Z
Meta-preferences two ways: generator vs. patch 2020-04-01T00:51:49.086Z
Gricean communication and meta-preferences 2020-02-10T05:05:30.079Z
Impossible moral problems and moral authority 2019-11-18T09:28:28.766Z
What's the dream for giving natural language commands to AI? 2019-10-08T13:42:38.928Z
The AI is the model 2019-10-04T08:11:49.429Z
Can we make peace with moral indeterminacy? 2019-10-03T12:56:44.192Z
The Artificial Intentional Stance 2019-07-27T07:00:47.710Z
Some Comments on Stuart Armstrong's "Research Agenda v0.9" 2019-07-08T19:03:37.038Z
Training human models is an unsolved problem 2019-05-10T07:17:26.916Z
Value learning for moral essentialists 2019-05-06T09:05:45.727Z
Humans aren't agents - what then for value learning? 2019-03-15T22:01:38.839Z
How to get value learning and reference wrong 2019-02-26T20:22:43.155Z
Philosophy as low-energy approximation 2019-02-05T19:34:18.617Z
Can few-shot learning teach AI right from wrong? 2018-07-20T07:45:01.827Z
Boltzmann Brains and Within-model vs. Between-models Probability 2018-07-14T09:52:41.107Z
Is this what FAI outreach success looks like? 2018-03-09T13:12:10.667Z
Book Review: Consciousness Explained 2018-03-06T03:32:58.835Z
A useful level distinction 2018-02-24T06:39:47.558Z
Explanations: Ignorance vs. Confusion 2018-01-16T10:44:18.345Z
Empirical philosophy and inversions 2017-12-29T12:12:57.678Z
Dan Dennett on Stances 2017-12-27T08:15:53.124Z
Philosophy of Numbers (part 2) 2017-12-19T13:57:19.155Z
Philosophy of Numbers (part 1) 2017-12-02T18:20:30.297Z
Limited agents need approximate induction 2015-04-24T21:22:26.000Z


Comment by Charlie Steiner on AXRP Episode 11 - Attainable Utility and Power with Alex Turner · 2021-09-26T02:23:07.787Z · LW · GW

Good episode as always :)

I'm interested in getting deeper into what Alex calls.

this framing of an AI that we give it a goal, it computes the policy, it starts following the policy, maybe we see it mess up and we correct the agent. And we want this to go well over time, even if we can’t get it right initially. 

Is the notion that if we understand how to build low-impact AI, we can build AIs with potentially bad goals, watch them screw up, and we can then fix our mistakes and try again? Does the notion of "low-impact" break down, though, if humans are eventually going to use the results from these experiments to build high-impact AI?

Given my recent subject matter, this podcast was also a good reminder to take another think about the role that instrumental goals (like power) play in the arguments that seemingly-small difficulties in learning human values can lead to large divergences.

Comment by Charlie Steiner on [Summary] "Introduction to Electrodynamics" by David Griffiths - Part 1 · 2021-09-23T02:52:19.880Z · LW · GW

Wait, undergrads actually like Griffiths?

I mean, it has been a while since I hung out with the youths. Maybe we only liked Sakurai in my day because we were hipsters.

Comment by Charlie Steiner on [deleted post] 2021-09-23T02:23:10.324Z

You're right, we didn't even get to the part where the proposed game is weird even without the anthropics.

Comment by Charlie Steiner on [deleted post] 2021-09-23T02:21:41.098Z

The issue with this answer is that the paradox works just as well without the anthropic shenanigans!

(See this comment and this post)

You don't need clones, you could put ordinary people in a room and have them play this game. The paradox comes from a different feature of the game.

On a completely unrelated note, I also disagree about the role of the first-person view in anthropics. Maybe see this post?

Comment by Charlie Steiner on What are good models of collusion in AI? · 2021-09-23T01:06:34.594Z · LW · GW

Huh, good question. I don't really know, but I'll try and help anyway :)

Of all the prisoner's dilemma tournaments we've run, I think 2014 was probably the most interesting. But the lesson was pretty common-sensical - just simulate your opponent and if cooperating is at least as good as defecting, cooperate. There's another interesting result about the prisoner's dilemma that I found while googling that I hadn't seen before (for background on that post, see here).

Comment by Charlie Steiner on A sufficiently paranoid non-Friendly AGI might self-modify itself to become Friendly · 2021-09-23T00:50:05.218Z · LW · GW

You correctly say the words that self-preservation would be an instrumental goal, but when you talk about the agent it seems like it's willing to give up on what are supposed to be its terminal goals in order to avoid shutdown. How is self-preservation merely instrumental, then?

I recently saw the notion of "reverse alignment" that might provide some wiggle room here (I'll try and remember to edit in an attribution if I see this person go public). Basically if the agent ranks a universe where an FAI is in control as 75% as good as a universe where it's in control (relative to what it thinks will happen if it gets shut down), then it will self-modify to an FAI if it thinks that an FAI is less than 75% as likely to get shut down. Of course the problem is that there might be some third UFAI design that ranks higher according to the original agent's preferences and also has a low chance of being shut down. So if you have an AI that's already has very small reverse-alignment divergence, plus a screening mechanism that's both informative and loophole-free, then the AI is incentivized to self-modify to FAI.

Comment by Charlie Steiner on Vanessa Kosoy's Shortform · 2021-09-20T21:01:45.031Z · LW · GW

Ah. I indeed misunderstood, thanks :) I'd read "short-term quantilization" as quantilizing over short-term policies evaluated according to their expected utility. My story doesn't make sense if the AI is only trying to push up the reported value estimates (though that puts a lot of weight on these estimates).

Comment by Charlie Steiner on Vanessa Kosoy's Shortform · 2021-09-20T15:14:06.550Z · LW · GW

Agree with the first section, though I would like to register my sentiment that although "good at selecting but missing logical facts" is a better model, it's still not one I'd want an AI to use when inferring my values.

I'm not sure what you're saying in the "turning off the stars example". If the probability for the user to autonomously decide to turn off the stars is much lower than the quantilization fraction, then the probability that quantilization will decide to turn off the stars is low. And, the quantilization fraction is automatically selected like this.

I think my point is if "turn off the stars" is not a primitive action, but is a set of states of the world that the AI would overwhelming like to go to, then the actual primitive actions will get evaluated based on how well they end up going to that goal state. And since the AI is better at evaluating than us, we're probably going there.

Another way of looking at this claim is that I'm telling a story about why the safety bound on quantilizers gets worse when quantilization is iterated. Iterated quantilization has much worse bounds than quantilizing over the iterated game, which makes sense if we think of games where the AI evaluates many actions better than the human.

Comment by Charlie Steiner on Vanessa Kosoy's Shortform · 2021-09-19T15:58:12.340Z · LW · GW

Very interesting - I'm sad I saw this 6 months late.

After thinking a bit, I'm still not sure if I want this desideratum. It seems to require a sort of monotonicity, where we can get superhuman performance just by going through states that humans recognize as good, and not by going through states that humans would think are weird or scary or unevaluable.

One case where this might come up is in competitive games. Chess AI beats humans in part because it makes moves that many humans evaluate as bad, but are actually good. But maybe this example actually supports your proposal - it seems entirely plausible to make a chess engine that only makes moves that some given population of humans recognize as good, but is better than any human from that population.

On the other hand, the humans might be wrong about the reason the move is good, so that the game is made of a bunch of moves that seem good to humans, but where the humans are actually wrong about why they're good (from the human perspective, this looks like regularly having "happy surprises"). We might hope that such human misevaluations are rare enough that quantilization would lead to moves on average being well-evaluated by humans, but for chess I think that might be false! Computers are so much better than humans at chess that a very large chunk of the best moves according to both humans and the computer will be ones that humans misevaluate.

Maybe that's more a criticism of quantilizers, not a criticism of this desideratum. So maybe the chess example supports this being a good thing to want? But let me keep critiquing quantilizers then :P

If what a powerful AI thinks is best (by an exponential amount) is to turn off the stars until the universe is colder, but humans think it's scary and ban the AI from doing scary things, the AI will still try to turn off the stars in one of the edge-case ways that humans wouldn't find scary. And if we think being manipulated like that is bad and quantilize over actions to make the optimization milder, turning off the stars is still so important that a big chunk of the best moves according to both humans and the computer are going to be ones that humans misevaluate, and the computer knows will lead to a "happy surprise" of turning off the stars not being scary. Quantilization avoids policies that precisely exploit tiny features of the world, and it avoids off-distribution behavior, but it still lets the AI get what it wants if it totally outsmarts the humans.

The other thing this makes me think of is Lagrange multipliers. I bet there's a duality between applying this constraint to the optimization process, and adding a bias (I mean, a useful prior) to the AI's process for modeling .

Comment by Charlie Steiner on The theory-practice gap · 2021-09-18T16:14:38.090Z · LW · GW

I guess I fall into the stereotypical pessimist camp? But maybe it depends on what the actual label of the y-axis on this graph is.

Does an alignment scheme that will definitely not work, but is "close" to a working plan in units of number of breakthroughs needed count as high or low on the y-axis? Because I think we occupy a situation where we have some good ideas, but all of them are broken in several ways, and we would obviously be toast if computers got 5 orders of magnitude faster overnight and we had to implement our best guesses.

On the other hand, I'm not sure there's too much disagreement about that - so maybe what makes me a pessimist is that I think fixing those problems still involves work in the genre of "theory" rather than just "application"?

Comment by Charlie Steiner on Goodhart Ethology · 2021-09-17T22:20:22.077Z · LW · GW

np, I'm just glad someone is reading/commenting :)

Comment by Charlie Steiner on Research speedruns · 2021-09-17T21:39:46.143Z · LW · GW

But is there an audience for research speedrunning on Twitch? :P

Closely related: is there a way to "win" a research speedrun - perhaps by speedrunning things for which there are already-existing public quizzes? Or maybe have a group, and one person goes first and then makes the quiz? I'm not sure if I'd be into this sort of thing now, but as a kid I actually participated in trivia contests that worked sort of like this and were instrumental in training my google-fu (back when that was a more useful skill).

Comment by Charlie Steiner on Goodhart Ethology · 2021-09-17T21:29:38.730Z · LW · GW

Yeah, this is right. The variable uncertainty comes in for free when doing curve fitting - close to the datapoints your models tend to agree, far away they can shoot off in different directions. So if you have a probability distribution over different models, applying the correction for the optimizer's curse has the very sensible effect of telling you to stick close to the training data.

Comment by Charlie Steiner on Measurement, Optimization, and Take-off Speed · 2021-09-12T05:23:48.776Z · LW · GW

I'm confused about your picture of "outer optimization power." What sort of decisions would be informed by knowing how sensitive the learned model is to perturbations of hyperparameters?

Any thoughts on just tracking the total amount of gradient-descending done, or total amount of changes made, to measure optimization?

Comment by Charlie Steiner on Gradient descent is not just more efficient genetic algorithms · 2021-09-10T03:31:10.079Z · LW · GW

Ah, that makes sense.

Comment by Charlie Steiner on A Primer on the Symmetry Theory of Valence · 2021-09-09T15:09:19.895Z · LW · GW

I was gonna be more critical but, hey, whatever. Still, I figured I should put up my definition of pain rather than deleting it.

Pain is not people with hemispherectomies having asymmetrical brains. Pain is aversion, is learning not to do that again, and yelling and contorting my face, and fight-or-flight response, tensing my muscles, and the bodily sensations as my circulatory system responds to injury, and not being able to focus well on anything but short term strategies for removing the aversive stimulus, and priming my memory to recall danger and injury, and being able to easily compare the sensation with other signals that fit the learned word "pain," and knowing I'll feel like crap for a while even after the pain passes.

Comment by Charlie Steiner on Gradient descent is not just more efficient genetic algorithms · 2021-09-09T03:36:13.658Z · LW · GW

Huh. I was interpreting it differently then - if I was building a module-checker to keep an eye out for AI tampering, I would not feed the result of the checker back into the gradient signal.

The big difference, parroting Steve ( , I think) is that gradient descent doesn't try things out and then keep what works, it models changes and does what is good in the model.

Comment by Charlie Steiner on Quantum particles and classical filaments · 2021-09-05T23:59:42.476Z · LW · GW

Good question :P

Comment by Charlie Steiner on Quantum particles and classical filaments · 2021-09-05T16:57:53.202Z · LW · GW

I figure it would be the same as complex electric fields etc - you just take the real part at the end.

Comment by Charlie Steiner on Introduction to Reducing Goodhart · 2021-09-05T07:46:35.515Z · LW · GW

Thanks for the comment :)

I don't agree it's true that we have a coherent set of preferences for each environment.

I'm sure we can agree that humans don't have their utility function written down in FORTRAN on the inside of our skulls. Nor does our brain store a real number associated with each possible state of the universe (and even if we did, by what lights would we call that number a utility function?).

So when we talk about a human's preferences in some environment, we're not talking about opening them up and looking at their brain, we're talking how humans have this propensity to take reasonable actions that make sense in terms of preferences. Example: You say "would you like doritos or an apple?" and I say "apple," and then you use this behavior to update your model of my preferences.

But this action-propensity that humans have is sometimes irrational (bold claim I know) and not so easily modeled as a utility function, even within a single environment.

The scheme you talk about for building up human values seems to have a recursive character to it: you get the bigger, broader human utility function by building it out of smaller, more local human utility functions, and so on, until at some base level of recursion there are utility functions that get directly inferred from facts about the human. But unless there's some level of human action where we act like rational utility maximizers, this base level already contains the problems I'm talking about, and since it's the base level those problems can't be resolved or explained by recourse to a yet-baser level.

Different people have different responses to this problem, and I think it's legitimate to say "well, just get better at inferring utility functions" (though this requires some actual work at specifying a "better"). But I'm going to end up arguing that we should just get better at dealing with models of preferences that aren't utility functions.

Comment by Charlie Steiner on Quantum particles and classical filaments · 2021-09-05T04:01:59.845Z · LW · GW

Fun so far!

I know about Wick rotation, but I'm curious about the converse - have you ever heard of someone using imaginary temperature?

It's not even all that crazy - the Boltzmann distribution would just be sinuosidal. A cursory googling also turns up the notion of expanding phase diagrams of simple lattice models to the complex temperature plane. But I've never heard of an actual case where this would show up (unlike negative temperature which has justifying examples like lasers).

Comment by Charlie Steiner on Are there substantial research efforts towards aligning narrow AIs? · 2021-09-05T00:49:36.645Z · LW · GW

Yeah. Just off the top of my head, OpenAI's safety group has put some work into language models (e.g.), and there's a new group called Preamble working on helping recommender systems meet certain desiderata.

Also see the most recent post on LW from Beth Barnes.

Comment by Charlie Steiner on Why the technological singularity by AGI may never happen · 2021-09-03T16:03:37.430Z · LW · GW

Yep. But there's no reason to think humans sit on the top of the hill on these curves - in fact, humans with all their foibles and biological limitations are pretty good evidence that significantly smarter things are practical, not just possible.

I can't remember if Eliezer has an updated version of this argument, but see .

Comment by Charlie Steiner on Open & Welcome Thread September 2021 · 2021-09-03T00:58:59.989Z · LW · GW

Karma for most things is just pretend points (a perk of our small size), so don't feel too stressed. For new-ish posts, though, votes should be primarily interpreted as voting on what you want to appear highly when people look at the front page

Comment by Charlie Steiner on Good software to draw and manipulate causal networks? · 2021-09-02T17:53:47.885Z · LW · GW

I don't know about drawing, but the inference problem is one of the things probabilistic programming languages (like Church or PyMC) are designed to solve - I wonder if anyone's tried to automate the generation of a diagram from a probabilistic model.

Comment by Charlie Steiner on Online LessWrong Community Weekend · 2021-09-01T19:43:20.292Z · LW · GW

Looks fun! Checking spam folder now :P

Comment by Charlie Steiner on Grokking the Intentional Stance · 2021-09-01T01:46:05.404Z · LW · GW

Nice summary :) It's relevant for the post that I'm about to publish that you can have more than one intentional-stance view of the same human. The inferred agent-shaped model depends not only on the subject and the observer, but also on the environment, and on what the observer hopes to get by modeling.

Comment by Charlie Steiner on What could small scale disasters from AI look like? · 2021-09-01T01:07:48.996Z · LW · GW

Some high-profile failures I think we won't get are related to convergent goals, such as acquiring computing power, deceiving humans into not editing you, etc. We'll probably get examples of this sort of thing in small scale experiments, that specialists might hear about, but if an AI that's deceptive for instrumental reasons causes $1bn in damages I think it will be rather too late to learn our lesson.

Comment by Charlie Steiner on Might AI modularity be a modular subproblem of deconfusion? · 2021-09-01T00:38:39.924Z · LW · GW

Yeah, modularity is important for interpretability. It would be nice to be able to separate the goals, the world model, and the planning algorithm, and have them interact in a straightforward, comprehensible way.

Unfortunately, this seems somewhat unlikely to happen by default. Things are just so much more efficient if you intermix these modules - e.g. if the world-model is allowed to encode some important preference information by the choices it makes about how to categorize the world, rather than having to agnostically support every such categorization.

See also the benefits of end-to-end training. Passing a training signal from the goals back to the planner will lead to the planner encoding goal information in its planning tendencies, etc. So we'll plausibly end up with systems that are sort of modular, but have been trained together in a way that blurs the lines a bit.

As for a safety module - no, not one that could be bolted onto an already-functioning AI. If you had such a safety module that was actually safe, you would just put it in charge and throw away the rest of the AI.

Comment by Charlie Steiner on How to turn money into AI safety? · 2021-08-26T18:41:17.800Z · LW · GW

My Gordon Worley impression: If we don't have a fraud problem, we're not throwing around enough money :P

Comment by Charlie Steiner on How to turn money into AI safety? · 2021-08-26T09:39:47.462Z · LW · GW

Yeah, I mean the first. Good survey question ideas :)

Comment by Charlie Steiner on How to turn money into AI safety? · 2021-08-26T09:38:21.170Z · LW · GW

This makes me wonder if causes trying to recruit very smart kids is already a problem that organizations chosen by parents of smart kids view as an annoyance, and have adaptations against.

Comment by Charlie Steiner on How to turn money into AI safety? · 2021-08-26T09:08:30.976Z · LW · GW

Yes, I agree, but I think people still have lots of ideas about local actions that will help us make progress. For example, I have empirical questions about GPT-2 / 3 that I don't have the time to test right now. So I could supervise maybe one person worth of work that just consisted of telling them what to do (though this hypothetical intern should also come up with some of their own ideas). I could not lay out a cohesive vision for other people to follow long-term (at least not very well), but as per my paragraph on cohesive visions, I think it suffices for training to merely have spare ideas lying around, and it suffices for forming an org to merely be fruitful to talk to.

Comment by Charlie Steiner on How DeepMind's Generally Capable Agents Were Trained · 2021-08-21T09:17:27.759Z · LW · GW

Great summary, and I swear I saved this question until reading to the end, but... Unity? Really? Huh. Maybe the extra cost is small because of simple environments and graphics processing being small relative to agent processing.

Comment by Charlie Steiner on Resolving human values, completely and adequately · 2021-08-16T06:03:49.545Z · LW · GW

More interesting post to me now than it was to past-me :) Thanks from the future. Anyhow, typos for the typo god:

"doing to little"->"doing too little"

Also the second link in "many ways" is broken now, I think it was probably to ?

Comment by Charlie Steiner on How would the Scaling Hypothesis change things? · 2021-08-14T04:18:56.906Z · LW · GW

Here's maybe an example of what I'm thinking:

GPT-3 can zero-shot add numbers (to the extent that it can) because it's had to predict a lot of numbers getting added. And it's way better than GPT-2 which could only sometimes add 1 or 2 digits (citation just for clarity).

In a "weak scaling" view, this trend (such as it is) would continue - GPT-4 will be able to do more arithmetic, and will basically always carry the 1 when adding 3-digit numbers, and is starting to do notably well at adding 5-digit numbers, though it still often fails to carry the 1 across multiple places there. In this picture adding more data and compute is analogous to doing interpolation better and between rarer examples. After all, this is all that's necessary to make the loss go down.

In a "strong scaling" view, the prediction function that gets learned isn't just expected to interpolate, but to extrapolate, and extrapolate quite far with enough data and compute. And so maybe not GPT-4, but at least GPT-5 would be expected to "actually learn addition," in the sense that even if we scrubbed all 10+ digit addition from the training data, it would effortlessly (given an appropriate prompt) be able to add 15-digit numbers, because at some point the best hypothesis for predicting addition-like text involves a reliably-extrapolating algorithm for addition.

You mean in some way other than the improvements on zero/few-shotting/meta-learning we already see from stuff like Dactyl or GPT-3 where bigger=better?

So in short, how much better is bigger? I think the first case is more likely for a lot of different sorts of tasks, and I think that this is still going to lead to super-impressive performance but is simultaneously really bad data efficiency. I'm also fairly convinced by Steve's arguments for humans having architectural/algorithmic reasons for better data efficiency.

Comment by Charlie Steiner on How would the Scaling Hypothesis change things? · 2021-08-13T22:25:51.851Z · LW · GW

I already believe in the scaling hypothesis, I just don't think we're in a world that's going to get to test it until after transformative AI is built by people who've continued to make progress on algorithms and architecture.

Perhaps there's an even stronger hypothesis that I'm more skeptical about, which is that you could actually get decent data-efficiency out of current architectures if they were just really really big? (I think that my standards for "decent" involve beating what is currently thought of as the scaling law for dataset size for transformers doing text prediction.) I think this would greatly increase the importance I'd place on politics / policy ASAP, because then we'd already be living in a world where a sufficiently large project would be transformative, I think.

Comment by Charlie Steiner on Charlie Steiner's Shortform · 2021-08-13T21:20:07.818Z · LW · GW

(biorxiv )

Cool paper on trying to estimate how many parameters neurons have (h/t Samuel at EA Hotel). I don't feel like they did a good job distinguishing how hard it was for them to fit nonlinearities that would nonetheless be the same across different neurons, versus the number of parameters that were different from neuron to neuron. But just based on differences in physical arrangement of axons and dendrites, there's a lot of opportunity for diversity, and I do think the paper was convincing that neurons are sufficiently nonlinear that this structure is plausibly important. The question is how much neurons undergo selection based on this diversity, or even update their patterns as a form of learning!

Comment by Charlie Steiner on Jaynes-Cox Probability: Are plausibilities objective? · 2021-08-13T18:40:27.519Z · LW · GW

Sort of?

There is a sense in which Cox's theorem and related formalizations of probability assume that the plausibility of (A|B) is some function F(A,B). But what they end up showing is not that F is some specific function, just that it must obey certain rules (the laws of probability).

So the objectivity is not in the results of the theorem, it's more like there's an assumption of some kind of objectivity (or at least self-consistency) that goes into what formalizers of probability are willing to think of as a "plausibility" in the first place.

Comment by Charlie Steiner on Multi-agent predictive minds and AI alignment · 2021-08-11T00:51:16.515Z · LW · GW

I'm re-reading this three years on and just want to note my appreciation (for all that I'd put a different spin on things). Still trying to solve the same problems now as then!

The part about "bringing details to consciousness" does make me want to write a deflationary post about consciousness, but to be honest maybe I should resist.

Comment by Charlie Steiner on Research agenda update · 2021-08-06T22:04:45.472Z · LW · GW

I only really know about the first bit, so have a comment about that :)

Predictably, when presented with the 1st-person problem I immediately think of hierarchical models. It's easy to say "just imagine you were in their place." What I'd think could do this thing is accessing/constructing a simplified model of the world (with primitives that have interpretations as broad as "me" and "over there") that is strongly associated with the verbal thought (EDIT: or alternately is a high-level representation that cashes out to the verbal thought via a pathway that ends in verbal imagination), and then cashing out the simplified model into a sequence of more detailed models/anticipations by fairly general model-cashing-out machinery.

I'm not sure if this is general enough to capture how humans do it, though. When I think of humans on roughly this level of description, I usually think of having many different generative models (a metaphor for a more continuous system with many principal modes, which is still a metaphor for the brain-in-itself) that get evaluated at first in simple ways, and if found interesting get broadcasted and get to influence the current thought, meanwhile getting evaluated in progressively more complex ways. Thus a verbal thought "imagine you were in their place" can get sort of cashed out into imagination by activation of related-seeming imaginings. This lacks the same notion of "models" as above; i.e. a context agent is still too agenty, we don't need the costly simplification of agentyness in our model to talk about learning from other peoples' actions.

Plus that doesn't get into how to pick out what simplified models to learn from. You can probably guess better than me if humans do something innate that involves tracking human-like objects and then feeling sympathy for them. And I think I've seen you make an argument that something similar could work for an AI, but I'm not sure. (Would a Bayesian updater have less of the path-dependence that safety of such innate learning seems to rely on?)

Comment by Charlie Steiner on Curing insanity with malaria · 2021-08-06T21:16:17.543Z · LW · GW

Wait, but isn't this after the invention of arsenic treatment, one of the big drugs of the early 1900s? How big was arsenic vs. malaria (really not the two forces you want to choose between) for syphilis treatment, do you know?

Comment by Charlie Steiner on What are some beautiful, rationalist sounds? · 2021-08-06T16:20:05.353Z · LW · GW

If you're more in the mood for rap, Baba Brinkman has some good songs (although he's more inconsistent).

I think the quality is pretty proportional to recency, so I'll give blurbs for just his two most recent youtube videos:

Cloud Feedback is about uncertainties in estimating climate sensitivity to CO2, what the physical mechanisms are, and how we should update. Listen for the "hysteresis" rhyme.

Qubits is about how cool quantum computing is (pun intended). It is quite possibly the best public presentation of quantum computing I've seen, and was featured on Shtetl-Optimized.

Comment by Charlie Steiner on Toon Alfrink's sketchpad · 2021-08-06T15:39:24.328Z · LW · GW

Buddhists claim that they can put brains in a global maximum of happiness, called enlightenment. Assuming that EA aims to maximize happiness plain and simple, this claim should be taken seriously.

This sounds like a Pascal's Wager argument. Christians, after all, claim they can put you in a global maximum of happiness, called heaven. What percentage of our time is appropriate to spend considering this?

I'm not saying meditation is bunk. I'm saying there has to be some other reason why we take claims about it seriously, and the official religious dogma of Buddhists is not particularly trustworthy. We should base interest in meditation on our model of users who make self-reports, at the very least - and these other reasons for interest in meditation do not support characterizations like "global maximum of happiness."

Comment by Charlie Steiner on Mo Zhu's Shortform · 2021-08-06T15:33:09.042Z · LW · GW

Yeah  - there's a huge amount of characters that are the combination of some logogram with a phonetic marker. Memorizing these is about as hard as memorizing a short English word's meaning purely from its shape if you disregard the sound.

Comment by Charlie Steiner on Very Unnatural Tasks? · 2021-08-06T11:41:22.438Z · LW · GW

I'm assuming we're not counting normal instrumental convergent goals as "too natural," so our AGI can do things like gather resources, attempt to rearrange lots of matter, etc.

One fun scenario that gives weird results is someone attempting to maximize the output of a classifier trained by supervised learning. So you train something to detect when either a static pattern or some sort of dynamic system of matter is "good," and then you try to maximize "goodness," and then you get the universe equivalent of an adversarial example.

This leads to the weird behavior of taking certain easy-to-perceive patterns that correlate with the goodness-signal in the training data (but not all such patterns) and the AI trying as hard as it can to make those patterns as intense as possible throughout the universe.

Comment by Charlie Steiner on DeepMind: Generally capable agents emerge from open-ended play · 2021-07-28T18:09:52.311Z · LW · GW

I think if anything's allowed it to learn more diverse tasks, it's the attentional layers that have gotten thrown in at the recurrent step (though I haven't actually read beyond the blog post, so I don't know what I'm talking about). In which case it seems like it's a question of how much data and compute you want to throw at the problem. But I'll edit this after I read the paper and aren't just making crazy talk.

Comment by Charlie Steiner on Wanted: Foom-scared alignment research partner · 2021-07-26T23:40:20.843Z · LW · GW

Good luck! In addition to looking at object-level books like Artificial Intelligence: A Modern Approach (Russell and Norvig), it might be useful to talk to someone from 80,000 hours. If you're up for it, I'm sure they'd love to try to influence your future in ways they think will be good :P

Comment by Charlie Steiner on AlphaFold 2 paper released: "Highly accurate protein structure prediction with AlphaFold", Jumper et al 2021 · 2021-07-26T03:15:06.222Z · LW · GW

What this makes me think is that quantum computing is mostly doomed. The killer app for quantum computing is predicting molecules and electronic structures. (Perhaps someone would pay for Shor's algorithm, but its coolness far outstrips its economic value). But it's probably a lot cheaper to train a machine-learning based approximation on a bunch of painstakingly assembled data than it is to build enough 50 milliKelvin cyostats. According to this view, the physics labs that will win at superconductor prediction are not the ones working on quantum computers or on theoretical breakthroughs, they're going to be the guys converting every phonon spectrum from the last 50 years into a common data format so they can spend $30K to train a big 3D transformer on it.

Comment by Charlie Steiner on AlphaFold 2 paper released: "Highly accurate protein structure prediction with AlphaFold", Jumper et al 2021 · 2021-07-26T03:06:25.318Z · LW · GW

Have you been able to try the academic copy (rosettafold)?