Posts

Mediation From a Distance 2020-03-20T22:02:46.545Z · score: 12 (4 votes)
Alignment as Translation 2020-03-19T21:40:01.266Z · score: 44 (15 votes)
Abstraction = Information at a Distance 2020-03-19T00:19:49.189Z · score: 21 (4 votes)
Positive Feedback -> Optimization? 2020-03-16T18:48:52.297Z · score: 19 (7 votes)
Adaptive Immune System Aging 2020-03-13T03:47:22.056Z · score: 59 (20 votes)
Please Press "Record" 2020-03-11T23:56:27.699Z · score: 37 (9 votes)
Trace README 2020-03-11T21:08:20.669Z · score: 31 (9 votes)
Name of Problem? 2020-03-09T20:15:11.760Z · score: 9 (2 votes)
The Lens, Progerias and Polycausality 2020-03-08T17:53:30.924Z · score: 58 (20 votes)
Interfaces as a Scarce Resource 2020-03-05T18:20:26.733Z · score: 112 (31 votes)
Trace: Goals and Principles 2020-02-28T23:50:12.900Z · score: 11 (3 votes)
johnswentworth's Shortform 2020-02-27T19:04:55.108Z · score: 8 (1 votes)
Value of the Long Tail 2020-02-26T17:24:28.707Z · score: 45 (15 votes)
Theory and Data as Constraints 2020-02-21T22:00:00.783Z · score: 32 (8 votes)
Exercises in Comprehensive Information Gathering 2020-02-15T17:27:19.753Z · score: 97 (41 votes)
Demons in Imperfect Search 2020-02-11T20:25:19.655Z · score: 68 (21 votes)
Category Theory Without The Baggage 2020-02-03T20:03:13.586Z · score: 97 (33 votes)
What Money Cannot Buy 2020-02-01T20:11:05.090Z · score: 192 (86 votes)
Algorithms vs Compute 2020-01-28T17:34:31.795Z · score: 28 (6 votes)
Coordination as a Scarce Resource 2020-01-25T23:32:36.309Z · score: 95 (30 votes)
Material Goods as an Abundant Resource 2020-01-25T23:23:14.489Z · score: 57 (21 votes)
Constraints & Slackness as a Worldview Generator 2020-01-25T23:18:54.562Z · score: 29 (12 votes)
Technology Changes Constraints 2020-01-25T23:13:17.428Z · score: 70 (26 votes)
Theory of Causal Models with Dynamic Structure? 2020-01-23T19:47:22.825Z · score: 25 (5 votes)
Formulating Reductive Agency in Causal Models 2020-01-23T17:03:44.758Z · score: 28 (6 votes)
(A -> B) -> A in Causal DAGs 2020-01-22T18:22:28.791Z · score: 33 (8 votes)
Logical Representation of Causal Models 2020-01-21T20:04:54.218Z · score: 34 (8 votes)
Use-cases for computations, other than running them? 2020-01-19T20:52:01.756Z · score: 29 (9 votes)
Example: Markov Chain 2020-01-10T20:19:31.309Z · score: 15 (4 votes)
How to Throw Away Information in Causal DAGs 2020-01-08T02:40:05.489Z · score: 15 (2 votes)
Definitions of Causal Abstraction: Reviewing Beckers & Halpern 2020-01-07T00:03:42.902Z · score: 20 (5 votes)
Homeostasis and “Root Causes” in Aging 2020-01-05T18:43:33.038Z · score: 56 (22 votes)
Humans Are Embedded Agents Too 2019-12-23T19:21:15.663Z · score: 74 (20 votes)
Causal Abstraction Intro 2019-12-19T22:01:46.140Z · score: 23 (6 votes)
Abstraction, Causality, and Embedded Maps: Here Be Monsters 2019-12-18T20:25:04.584Z · score: 25 (7 votes)
Is Causality in the Map or the Territory? 2019-12-17T23:19:24.301Z · score: 23 (11 votes)
Examples of Causal Abstraction 2019-12-12T22:54:43.565Z · score: 21 (5 votes)
Causal Abstraction Toy Model: Medical Sensor 2019-12-11T21:12:50.845Z · score: 30 (10 votes)
Applications of Economic Models to Physiology? 2019-12-10T18:09:43.494Z · score: 38 (8 votes)
What is Abstraction? 2019-12-06T20:30:03.849Z · score: 26 (7 votes)
Paper-Reading for Gears 2019-12-04T21:02:56.316Z · score: 119 (40 votes)
Gears-Level Models are Capital Investments 2019-11-22T22:41:52.943Z · score: 89 (33 votes)
Wrinkles 2019-11-19T22:59:30.989Z · score: 66 (25 votes)
Evolution of Modularity 2019-11-14T06:49:04.112Z · score: 83 (29 votes)
Book Review: Design Principles of Biological Circuits 2019-11-05T06:49:58.329Z · score: 126 (52 votes)
Characterizing Real-World Agents as a Research Meta-Strategy 2019-10-08T15:32:27.896Z · score: 27 (10 votes)
What funding sources exist for technical AI safety research? 2019-10-01T15:30:08.149Z · score: 24 (8 votes)
Gears vs Behavior 2019-09-19T06:50:42.379Z · score: 53 (19 votes)
Theory of Ideal Agents, or of Existing Agents? 2019-09-13T17:38:27.187Z · score: 16 (8 votes)
How to Throw Away Information 2019-09-05T21:10:06.609Z · score: 20 (7 votes)

Comments

Comment by johnswentworth on Alignment as Translation · 2020-04-01T17:12:18.414Z · score: 2 (1 votes) · LW · GW

That's a marginal cost curve at a fixed time. Its shape is not directly relevant to the long-run behavior; what's relevant is how the curve moves over time. If any fixed quantity becomes cheaper and cheaper over time, approaching (but never reaching) zero as time goes on, then the price goes to zero in the limit.

Consider Moore's law, for example: the marginal cost curve for compute looks U-shaped at any particular time, but over time the cost of compute falls like , with k around ln(2)/(18 months).

Comment by johnswentworth on Alignment as Translation · 2020-03-31T17:32:34.393Z · score: 2 (1 votes) · LW · GW

Of course the limit can't be reached, that's the entire reason why people use the phrase "in the limit".

Comment by johnswentworth on johnswentworth's Shortform · 2020-03-30T20:47:16.835Z · score: 2 (1 votes) · LW · GW

For short-term, individual cost/benefit calculations around C19, it seems like uncertainty in the number of people currently infected should drop out of the calculation.

For instance: suppose I'm thinking about the risk associated with talking to a random stranger, e.g. a cashier. My estimated chance of catching C19 from this encounter will be roughly proportional to . But, assuming we already have reasonably good data on number hospitalized/died, my chances of hospitalization/death given infection will be roughly inversely proportional to . So, multiplying those two together, I'll get a number roughly independent of .

How general is this? Does some version of it apply to long-term scenarios too (possibly accounting for herd immunity)? What short-term decisions do depend on ?

Comment by johnswentworth on Alignment as Translation · 2020-03-30T20:16:10.476Z · score: 2 (1 votes) · LW · GW
A finite sized computer cannot contain a fine-grained representation of the entire universe.

cannot ever be zero for finite , yet it approaches zero in the limit of large x. The OP makes exactly the same sort of claim: our software approaches omniscience in the limit.

Comment by johnswentworth on Alignment as Translation · 2020-03-30T20:11:32.293Z · score: 2 (1 votes) · LW · GW

The rules it's given are, presumably, at a low level themselves. (Even if that's not the case, the rules it's given are definitely not human-intelligible unless we've already solved the translation problem in full.)

The question is not whether the low-level AI will follow those rules, the question is what actually happens when something follows those rules. A python interpreter will not ever deviate from the simple rules of python, yet it still does surprising-to-a-human things all the time. The problem is accurately translating between human-intelligible structure and the rules given to the AI.

The problem is not that the AI might deviate from the given rules. The problem is that the rules don't always mean what we want them to mean.

Comment by johnswentworth on Alignment as Translation · 2020-03-28T23:50:32.282Z · score: 2 (1 votes) · LW · GW

I'm pretty sure none of this actually affects what I said: the low-level behavior still needs produce results which are predictable to humans in order for predictability to be useful, and that's still hard.

The problem is that making an AI predictable to a human is hard. This is true regardless of whether or not it's doing any outside-the-box thinking. Having a human double-check the instructions given to a fast low-level AI does not make the problem any easier; the low-level AI's behavior still has to be understood by a human in order for that to be useful.

As you say toward the end, you'd need something like a human-readable communications protocol. That brings us right back to the original problem: it's hard to translate between humans' high-level abstractions and low-level structure. That's why AI is unpredictable to humans in the first place.

Comment by johnswentworth on Alignment as Translation · 2020-03-28T00:29:33.346Z · score: 4 (2 votes) · LW · GW
I think you get "ground truth data" by trying stuff and seeing whether or not the AI system did what you wanted it to do.

That's the sort of strategy where illusion of transparency is a big problem, from a translation point of view. The difficult cases are exactly the cases where the translation usually produces the results you expect, but then produce something completely different in some rare cases.

Another way to put it: if we're gathering data by seeing whether the system did what we wanted, then the long tail problem works against us pretty badly. Those rare tail-cases are exactly the cases we would need to observe in order to notice problems and improve the system. We're not going to have very many of them to work with. Ability to generalize from small data sets becomes a key capability, but then we need to translate how-to-generalize in order for the AI to generalize in the ways we want (this gets at the can't-ask-the-AI-to-do-anything-novel problem).

Comment by johnswentworth on Alignment as Translation · 2020-03-27T22:01:24.366Z · score: 2 (1 votes) · LW · GW

(The other comment is my main response, but there's a possibly-tangential issue here.)

In a long-tail world, if we manage to eliminate 95% of problems, then we generate maybe 10% of the value. So now we use our 10%-of-value product to refine our solution. But it seems rather optimistic to hope that a product which achieves only 10% of the value gets us all the way to a 99% solution. It seems far more likely that it gets to, say, a 96% solution. That, in turn, generates maybe 15% of the value, which in turn gets us to a 96.5% solution, and...

Point being: in the long-tail world, it's at least plausible (and I would say more likely than not) that this iterative strategy doesn't ever converge to a high-value solution. We get fancier and fancier refinements with decreasing marginal returns, which never come close to handling the long tail.

Now, under this argument, it's still a fine idea to try the iterative strategy. But you wouldn't want to bet too heavily on its success, especially without a reliable way to check whether it's working.

Comment by johnswentworth on Alignment as Translation · 2020-03-27T21:43:41.248Z · score: 4 (2 votes) · LW · GW
An important part of my intuition about value-in-the-tail is that if your first solution can knock off 95% of the risk, you can then use the resulting AI system to design a new AI system where you've translated better and now you've eliminated 99% of the risk...

I don't see how this ever actually gets around the chicken-and-egg problem.

An analogy: we want to translate from English to Korean. We first obtain a translation dictionary which is 95% accurate, then use it to ask our Korean-speaking friend to help out. Problem is, there's a very important difference between very similar translations of "help me translate things" - e.g. consider the difference between "what would you say if you wanted to convey X?" and "what should I say if I want to convey X?", when giving instructions to an AI. Both of those would produce very similar results, right up until everything went wrong. (Let me know if this analogy sounds representative of the strategies you imagine.)

If you do manage to get that first translation exactly right, and successfully ask your friend for help, then you're good - similar to the "translate how-to-translate" strategy from the OP. And with a 95% accurate dictionary, you might even have a decent chance of getting that first translation right. But if that first translation isn't perfect, then you need some way to find that out safely - and the 95% accurate dictionary doesn't make that any easier.

Another way to look at it: the chicken-and-egg problem is a ground truth problem. If we have enough data to estimate X to within 5%, then doing clever things with that data is not going reduce that error any further. We need some other way to get at the ground truth, in order to actually reduce the error rate. If we know how to convey what-we-want with 95% accuracy, then we need some other way to get at the ground truth of translation in order to increase that accuracy further.

Comment by johnswentworth on Alignment as Translation · 2020-03-27T19:16:17.071Z · score: 4 (2 votes) · LW · GW

Endorsed; that definitely captures the key ideas.

If you haven't already, you might want to see my answer to Steve's comment, on why translation to low-level structure is the right problem to think about even if the AI is using higher-level models.

Comment by johnswentworth on Alignment as Translation · 2020-03-27T19:00:51.600Z · score: 6 (3 votes) · LW · GW

I agree with most of this reasoning. I think my main point of departure is that I expect most of the value is in the long tail, i.e. eliminating 95% of problems generates <10% or maybe even <1% of the value. I expect this both in the sense that eliminating 95% of problems unlocks only a small fraction of economic value, and in the sense that eliminating 95% of problems removes only a small fraction of risk. (For the economic value part, this is mostly based on industry experience trying to automate things.)

Optimization is indeed the standard argument for this sort of conclusion, and is a sufficient condition for eliminating 95% of problems to have little impact on risk. But again, it's not a necessary condition - if the remaining 5% of problems are still existentially deady and likely to come up eventually (but not often enough to be caught in testing), then risk isn't really decreased. And that's exactly the sort of situation I expect when viewing translation as the central problem: illusion of transparency is exactly the sort of thing which doesn't seem like a problem 95% of the time, right up until you realize that everything was completely broken all along.

Anyway, sounds like value-in-the-tail is a central crux here.

Comment by johnswentworth on Alignment as Translation · 2020-03-27T16:18:54.196Z · score: 2 (1 votes) · LW · GW

Predictable low-level behavior is not the same as predictable high-level behavior. When I write or read python code, I can have a pretty clear idea of what every line does in a low-level sense, but still sometimes be surprised by high-level behavior of the code.

We still need to translate what-humans-want into a low-level specification. "Making it predictable" at a low-level doesn't really get us any closer to predictability at the high-level (at least in the cases which are actually difficult in the first place). "Making it predictable" at a high-level requires translating high-level "predictability" into some low-level specification, which just brings us back to the original problem: translation is hard.

Comment by johnswentworth on What are the most plausible "AI Safety warning shot" scenarios? · 2020-03-27T00:34:38.422Z · score: 4 (2 votes) · LW · GW

I'd give it something in the 2%-10% range. Definitely not likely.

Comment by johnswentworth on What are the most plausible "AI Safety warning shot" scenarios? · 2020-03-26T23:38:58.858Z · score: 7 (4 votes) · LW · GW

One of the basic problems in the embedded agency sequence is: how does an agent recognize its own physical instantiation in the world, and avoid e.g. dropping a big rock on the machine it's running on? One could imagine an AI with enough optimization power to be dangerous, which gets out of hand but then drops a metaphorical rock on its own head - i.e. it doesn't realize that destroying a particular data center will shut itself down.

Similarly, one could imagine an AI which tries to take over the world, but doesn't realize that unplugging the machine on which it's running will shut it down - because it doesn't model itself as embedded in the world. (For similar reasons, such an AI might not see any reason to create backups of itself.)

Another possible safety valve: one could imagine an AI which tries to wirehead, but its operators put a lot of barriers in place to prevent it from doing so. The AI seizes whatever resources it needs to metaphorically smash those barriers, does so violently, then wireheads itself and just sits around.

Generalizing these two scenarios: I think it's plausible that unprincipled AI architectures tend to have built in safety valves - they'll tend to shoot themselves in the foot if they're able to do so. That's definitely not something I'd want to bet the future of the human species on, but it is a class of scenarios which would allow for an AI to deal a lot of damage while still failing to take over.

Comment by johnswentworth on Alignment as Translation · 2020-03-26T20:10:02.334Z · score: 4 (2 votes) · LW · GW
"how do I ensure that the AI system has an undo button" and "how do I ensure that the AI system does things slowly"

I don't think this is realistic if we want an economically-competitive AI. There are just too many real-world applications where we want things to happen which are fast and/or irreversible. In particular, the relevant notion of "slow" is roughly "a human has time to double-check", which immediately makes things very expensive.

Even if we abandon economic competitiveness, I doubt that slow+reversible makes the translation problem all that much easier (though it would make the AI at least somewhat less dangerous, I agree with that). It's probably somewhat easier - having a few cycles of feedback seems unlikely to make the problem harder. But if e.g. we're originally training the AI via RL, then slow+reversible basically just adds a few more feedback cycles after deployment; if millions or billions of RL cycles didn't solve the problem, then adding a handful more at the end seems unlikely to help much (though an argument could be made that those last few are higher-quality). Also, there's still the problem of translating a human's high-level notion of "reversible" into a low-level notion of "reversible".

Taking a more outside view... restrictions like "make it slow and reversible" feel like patches which don't really address the underlying issues. In general, I'd expect the underlying issues to continue to manifest themselves in other ways when patches are applied. For instance, even with slow & reversible changes, it's still entirely plausible that humans don't stop something bad because they don't understand what's going on in enough detail - that's a typical scenario in the "translation problem" worldview.

Zooming out even further...

I think the solutions I would look for would be quite different though...

I think what's driving this intuition is that you're looking for ways to make the AI not dangerous, without actually aligning it (i.e. without solving the translation problem) - mainly by limiting capabilities. I expect that such strategies, in general, will run into similar problems to those mentioned above:

  • Capabilities which make an AI economically valuable are often capabilities which make it dangerous. Limit capabilities for safety, and the AI won't be economically competitive.
  • Choosing which capabilities are "dangerous" is itself a problem of translating what-humans-want into some other framework, and is subject to the usual problems: simple solutions will be patches which don't address everything, there will be a long tail of complicated corner cases, etc.
Comment by johnswentworth on Alignment as Translation · 2020-03-26T17:20:08.154Z · score: 6 (3 votes) · LW · GW

Starting point: the problem which makes AI alignment hard is not the same problem which makes AI dangerous. This is the capabilities/alignment distinction: AI with extreme capabilities is dangerous; aligning it is the hard part.

So it seems like this framing of alignment removes the notion of the AI "optimizing for something" or "being goal-directed". Do you endorse dropping that idea?

Anything with extreme capabilities is dangerous, and needs to be aligned. This applies even outside AI - e.g. we don't want a confusing interface on a nuclear silo. Lots of optimization power is a sufficient condition for extreme capabilities, but not a necessary condition.

Here's a plausible doom scenario without explicit optimization. Imagine an AI which is dangerous in the same way as a nuke is dangerous, but more so: it can make large irreversible changes to the world too quickly for anyone to to stop it. Maybe it's capable of designing and printing a supervirus (and engineered bio-offence is inherently easier than engineered bio-defense); maybe it's capable of setting off all the world's nukes simultaneously; maybe it's capable of turning the world into grey goo.

If that AI is about as transparent as today's AI, and does things the user wasn't expecting about as often as today's AI, then that's not going to end well.

Now, there is the counterargument that this scenario would produce a fire alarm, but there's a whole host of ways that could fail:

  • The AI is usually very useful, so the risks are ignored
  • Errors are patched rather than fixing the underlying problem
  • Really big errors turn out to be "easier" than small errors - i.e. high-to-low level translations are more likely to be catastrophically wrong than mildly wrong
  • It's hard to check in testing whether there's a problem, because errors are rare and/or don't look like errors at the low-level (and it's hard/expensive to check results at the high-level)
  • In the absence of optimization pressure, the AI won't actively find corner-cases in our specification of what-we-want, so it might actually be more difficult to notice problems ahead-of-time
  • ...

Getting back to your question:

Do you endorse dropping that idea?

I don't endorse dropping the AI-as-optimizer idea entirely. It is definitely a sufficient condition for AI to be dangerous, and a very relevant sufficient condition. But I strongly endorse the idea that optimization is not a necessary condition for AI to be dangerous. Tool AI can be plenty dangerous if it's capable of making large, fast, irreversible changes to the world, and the alignment problem is still hard for that sort of AI.

Comment by johnswentworth on What is the point of College? Specifically is it worth investing time to gain knowledge? · 2020-03-24T22:54:16.175Z · score: 4 (3 votes) · LW · GW

I graduated 7 years ago. During that time, I've actually used most of the subjects I studied in college - partly at work (as a data scientist), partly in my own research, and partly just when they happen to come up in conversation or day-to-day life. On the occasions when I've needed to return to a topic I haven't used in a while, it's typically been very fast.

But the question "how long does it take to get back up to speed on something I learned a while ago?" kind of misses the point. Most of the value doesn't come from being able to quickly get back up to speed on fluid mechanics or materials science or inorganic chemistry. Rather, the value comes knowing which pieces I actually need to get back up to speed on. What matters is remembering what questions to ask, how to formulate them, and what the important pieces usually are. Details are easy to find on wikipedia or in papers if you're familiar with the high-level structure.

To put it differently: you want to already have an idea of what kinds of things are usually important for problems in some field, and what kinds of things usually aren't important. If you have that, then it's fast and easy to look up the parts which are important for any particular problem, and double-check that you're not missing anything crucial.

Comment by johnswentworth on How to Contribute to the Coronavirus Response on LessWrong · 2020-03-24T22:39:36.730Z · score: 35 (7 votes) · LW · GW

One big item I'd add to this list: reading through a paper/post/source in the links db, checking information in it, and writing a comment/post about what checks were performed and whether the source looks accurate. The top reason I consider LW a better source of information on coronavirus than other places is because the information here is more likely to be true (or at least have a well-calibrated indication of plausibility attached); having more LWers review primary work amplifies that advantage.

Comment by johnswentworth on What is the point of College? Specifically is it worth investing time to gain knowledge? · 2020-03-24T02:58:54.854Z · score: 6 (5 votes) · LW · GW

First, the standard answer: Bryan Caplan's The Case Against Education. Short version: education is about signalling to future employers how smart/diligent/willing-to-jump-through-hoops/etc you are. Skill acquisition is mostly irrelevant. This is basically true for most people most of the time.

That said... I personally have gotten a lot of value out of things I learned in courses. This is not something that happens by default; the vast majority of my classmates did not get nearly as much value out of courses as I did. I'll list a few things I did differently which may help.

Avoid nontechnical classes: this one is kind of a "well duh" thing, but there are some subtleties. "Technical" should be interpreted in a broad sense - things like e.g. law or languages aren't technical in the sense of STEM, but they're technical in the sense that the things they teach are intended to be directly useful. By contrast, subjects which are primarily about aesthetics or history or critical theory are not really intended to be directly useful.

Decreasing marginal returns: the first course in any particular field/subfield is far more valuable than the second course, the second course is more valuable than the third, etc. This suggests going for breadth over depth. In particular, I recommend taking one or two courses in many different fields so that you can talk to specialists in those fields without being completely lost. You don't need to become an expert yourself; much of the value is in being able to quickly and easily work with specialists in many different fields. You can translate jargon and act as a human interface, and you can easily jump into many different areas.

General-purpose tools: focus on fields which provide tools applicable to many different domains. Most of applied math qualifies, as well as computer science, economics, and law. Ideally, you take one or two courses in some general-purpose subject, then run into applications of that subject while sampling other fields. By seeing it come up in different contexts, you're more likely to remember and use it.

Summary of this point and previous: go for depth in general purpose tools, and practice those tools while gaining breadth in other areas.

Use available resources: I've covered about as much material in open courseware as I have in in-person classes. I've watched online lectures, I've read textbooks, and I've audited courses. (I've even audited classes at universities where I'm not registered - professors are usually happy with people just showing up.) In college, I'd often watch a semester's worth of online lectures on a subject before taking a class on the subject; material is a lot easier to follow when you already have a general idea of where things are headed and how it all slots together..

Have a stock of problems: as you learn new tools, it's useful to have a handful of problems to try them out on. Hard open algorithmic problems like integer factorization or P vs NP or graph isomorphism are great for testing out all sorts of applied math/CS tricks. "How could I use this to start a company and make a gazillion dollars?" is a problem which applies to practically anything. The problems should be things you're interested in and enjoy thinking about, so that you'll find it worthwhile to try out new tools on them even though most of the tools don't actually yield breakthroughs on most of the problems.

This point and the previous one help a lot with actually remembering things and being able to apply them in-the-wild.

Optimize: at my college (Harvey Mudd), it was very easy to tell who had actually tried to optimize their course choices - it was the people who used the "build your own major" option. We only had half a dozen majors, and the course requirements always included things which weren't really what any particular person was interested in. If you wanted to cram in more courses you were actually interested in, while avoiding irrelevant courses, a build-your-own major was the way to go.

More generally, you'll get more out of classes if you read through the whole course catalog, mark the classes which sound interesting, and then optimize your schedule to focus on those classes. Sounds obvious, yet most people don't do it.

Be comprehensive: you're not going to know everything, but you can learn enough that nothing is very far from the things you do know. You can learn enough that you at least have some idea of which things you don't know. You can learn enough that, even if you don't know something, you've probably heard of it, and you have some idea of where to learn about it if you need to. The key is to aim for comprehensive knowledge of the world - you don't need to know every little detail, but at least get the broad strokes of the big things. Anytime you don't have a clue about something, pay attention to it, and look for the most general subject which would give you some background about that thing.

Math, physics and economics are particularly useful for comprehensive foundations - they give you the tools to solve practically anything in principle, though you'll often need more specialized tools in practice.

Comment by johnswentworth on Alignment as Translation · 2020-03-20T17:51:18.339Z · score: 4 (2 votes) · LW · GW

I do expect that systems trained with limited information/compute will often learn multi-level models. That said, there's a few reasons why low-level is still the right translation target to think about.

First, there's the argument from the beginning of the OP: in the limit of abundant information & compute, there's no need for multi-level models; just directly modelling the low-level will have better predictive power. That's a fairly general argument, which applies even beyond AI, so it's useful to keep in mind.

But the main reason to treat low-level as the translation target is: assuming an AI does use high-level models, translating into those models directly will only be easier than translating into the low level to the extent that the AI's high-level models are similar to a human's high-level models. We don't have any reason to expect AI to use similar abstraction levels as humans except to the extent that those abstraction levels are determined by the low-level structure. In studying how to translate our own high-level models into low-level structure, we also learn when and to what extent an AI is likely to learn similar high-level structures, and what the correspondence looks like between ours and theirs.

Comment by johnswentworth on [UPDATED] COVID-19 cabin secondary attack rates on Diamond Princess · 2020-03-20T17:33:41.875Z · score: 9 (5 votes) · LW · GW

I went back-and-forth with Bucky a bit, looked at the formulas, and I now think the current graph is correct. The main surprising thing was that the likelihood isn't sharper; apparently there's actually pretty few 1-berth cabins, so we don't have a sharp estimate for the background infection rate. Most of the uncertainty in the secondary rate is tightly coupled to the uncertainty in the background rate.

Comment by johnswentworth on [UPDATED] COVID-19 cabin secondary attack rates on Diamond Princess · 2020-03-18T23:00:06.861Z · score: 5 (3 votes) · LW · GW

That graph looks fishy. Wouldn't a secondary attack rate of 1 mean that everyone in a cabin with someone sick catches it immediately? Shouldn't that be deterministically ruled out by the data, and therefore have exactly-zero likelihood?

Also, in general, seeing likelihood graphed on a linear scale makes me think something is very wrong.

Maybe a bug somewhere?

Comment by johnswentworth on Adaptive Immune System Aging · 2020-03-14T00:17:31.183Z · score: 4 (2 votes) · LW · GW

Great questions.

Comment by johnswentworth on Please Press "Record" · 2020-03-12T00:46:03.938Z · score: 9 (4 votes) · LW · GW

I'm no expert, but I doubt that that's as much of an issue as it sounds. My understanding is that the Berkeley thing was based on a DoJ finding, not an actual lawsuit. If it were a lawsuit, then it would serve as precedent (at least until overturned by a higher court, which is usually how such stupidity would be fixed). But as a DoJ finding, I believe it's much less binding - a future DoJ (i.e. the DoJ we have now) can just stop issuing such dumb orders. Given the current position of the political see-saw, that's exactly what I'd expect - this is not the particular brand of stupidity one expects under a nominally-Republican administration.

Even setting aside my guesses on that front, legal problems about making videos publicly available are definitely the sort of thing one can worry about later. The ADA definitely doesn't ban just pressing the record button.

Comment by johnswentworth on Zoom In: An Introduction to Circuits · 2020-03-11T17:33:03.370Z · score: 8 (5 votes) · LW · GW

I think this question sort of misses what matters.

There's all sorts of computations which (probably) aren't very interpretable; SHA-256 is a solid example. But it's an empirical fact that our physical world has a lot more interpretable structure in it than SHA-256 computations. We have things like trees or cars, large-scale abstract structures which repeat over and over again, and display similar predictable behavior across instances despite different small-scale configurations.

Trained neural networks are not basically-random computations (like SHA-256); they're trained on the real world. We know that the real world has a lot of interpretable structure, so it's feasible that a network trained on the real world will reflect that structure. That's what Olah et al's research is about - backing out the structure of the real world from a network trained on the real world.

It's the coupling of the (trained) network to the real world which plays the central role. Something like Conway's game of life doesn't have any coupling to the real world, so it's not really analogous.

Comment by johnswentworth on Name of Problem? · 2020-03-11T00:57:18.514Z · score: 2 (1 votes) · LW · GW

Yeah, that makes sense. And off the top of my head, it seems like they would indeed be regular grammars - each node in the tree would be a state in the finite state machine, and then copies of the tree would produce loops in the state transition graph. Symbols on the edges would be the argument names (or indices) for the inputs to atomic operations. Still a few i's to dot and t's to cross, but I think it works.

Elegant, too. Nice solution!

Comment by johnswentworth on Name of Problem? · 2020-03-10T21:22:06.697Z · score: 2 (1 votes) · LW · GW

Yup, that's right.

I tentatively think it's ok to just ignore cases with "outside" infinities. Examples like f(n) = f(n+1) should be easy to detect, and presumably it would never show up in a program which halts. I think programs which halt would only have "inside" infinities (although some non-halting programs would also have inside infinities), and programs with non-inside infinities should be detectable - i.e. recursive definitions of a function shouldn't have the function itself as the outermost operation.

Still not sure - I could easily be missing something crucial - but the whole problem feels circumventable. Intuitively, Turing completeness only requires infinity in one time-like direction; inside infinities should suffice, so syntactic restrictions should be able to eliminate the other infinities.

Comment by johnswentworth on Name of Problem? · 2020-03-10T03:03:22.018Z · score: 2 (1 votes) · LW · GW

Yes, but that's for a functional notion of equivalence - i.e. it's about whether the two TMs have the same input-output behavior. The notion of equivalence I'm looking at is not just about same input-output, but also structurally-similar computations. Intuitively, I'm asking whether they're computing the same function in the same way.

(In fact, circumventing the undecidability issue is a key part of why I'm formulating the problem like this in the first place. So you're definitely asking the right question here.)

Comment by johnswentworth on Name of Problem? · 2020-03-09T22:39:24.509Z · score: 2 (1 votes) · LW · GW

Yes, that's correct. I'd view "f((((...) + 1) + 1) + 1)" as an equivalent way of writing it as a string (along with the definition of f as f(n) = f(n + 1)). "...(((((...) + 1) + 1) + 1) + 1)..." just emphasizes that the expression tree does not have a root - it goes to infinity in both directions. By contrast, the expression tree for f(n) = f(n) + 1 does have a root; it would expand to (((((...) + 1) + 1) + 1) + 1).

Does that make sense?

Comment by johnswentworth on Name of Problem? · 2020-03-09T22:20:55.301Z · score: 2 (1 votes) · LW · GW

Oh, I made a mistake. I guess they would look like ...((((((((...)))))))))... and ...(((((...) + 1) + 1) + 1) + 1)..., respectively. Thanks for the examples, that's helpful - good examples where the fixed point of expansion is infinite "on the outside" as well as "inside".

Was that the confusion? Another possible point of confusion is why the "+ 1"s are in the expression tree; the answer is that addition is usually an atomic operator of a language. It's not defined in terms of other things; we can't/don't beta-reduce it. If it were defined in terms of other things, I'd expand it, and then the expression tree would look more complicated.

Comment by johnswentworth on Name of Problem? · 2020-03-09T22:15:21.765Z · score: 2 (1 votes) · LW · GW

No. To clarify, we're not reducing any of the atomic operators of the language - e.g. we wouldn't replace (0 == 0) ? 0 : 1 with 0. As written, that's not a beta-reduction. If the ternary operator were defined as a function within the language itself, then we could beta-reduce it, but that wouldn't give us "0" - it would give us some larger expression, containing "0 == 0", "0", and "1".

Actually, thinking about it, here's something which I think is equivalent to what I mean by "expand", within the context of lambda calculus: beta-reduce, but never drop any parens. So e.g. 2 and (2) and ((2)) would not be equivalent. Whenever we beta-reduce, we put parens around any term which gets substituted in.

Intuitively, we're talking about a notion of equivalence between programs which cares about how the computation is performed, not just the outputs.

Comment by johnswentworth on Name of Problem? · 2020-03-09T21:48:47.108Z · score: 2 (1 votes) · LW · GW

The first would generate a stick: ((((((((...)))))))))

The second would generate: (((((...) + 1) + 1) + 1) + 1)

These are not equivalent.

Does that make sense?

Comment by johnswentworth on Name of Problem? · 2020-03-09T21:42:06.485Z · score: 3 (2 votes) · LW · GW

The expansion is infinite, but it's a repeating pattern, so we can use a finite representation (namely, the program itself). We don't have to write the whole thing out in order to compare.

An analogy: we can represent infinite repeating strings by just writing a finite string, and then assuming it repeats. The analogous problem is then: decide whether two such strings represent the same infinite string. For instance, "abab" and "ababab" would represent the same infinite repeating string: "abababababab...".

Comment by johnswentworth on Name of Problem? · 2020-03-09T21:15:31.508Z · score: 2 (1 votes) · LW · GW

Yes, exactly. Anywhere the name of a function appears, replace it with the expression defining the function. (Also, I'm ignoring higher-order functions, function pointers, and the like; presumably the problem is undecidable in languages with those kinds of features, since it's basically just beta-equivalence of lambda terms. But we don't need those features to get a Turing-complete language.)

Comment by johnswentworth on Why hasn't the technology of Knowledge Representation (i.e., semantic networks, concept graphs, ontology engineering) been applied to create tools to help human thinkers? · 2020-03-09T17:57:51.094Z · score: 6 (3 votes) · LW · GW
2. Inability of existing KR systems to ergonomically conform to human patterns of learning and reasoning. If so, this might be due to a lack of sufficient understanding how to transition between informal natural language based reasoning and formalized reasoning, or it may simply be that the chosen formalisms are not the best ones for empowering human thought.

I haven't studied knowledge representation much, but my passing impression is that this is the main problem. I suspect that KR people tried too hard to make their structures look like natural language, when in fact the underlying structures of human thought not are not particularly language-shaped.

Central example driving my intuition here: causal graphs/Bayes nets. These seem to basically-correctly capture human intuition about causality. Once you know the language of causal graphs, it's really easy to translate intuition about causality into the graphical language - indicating a "knowledge representation" which lines up quite well with human reasoning. And sure enough, causal graphs have been pretty widely adopted.

On the other hand, somewhat ironically, things like concept graphs and semantic networks do a pretty crappy job of capturing concepts and the semantics of words. Try to glean the meaning of "cat" from a semantic graph, and you'll learn that it has a "tail", and "whiskers", is a "mammal", and so forth. Of course, we don't really know what any of those words mean either - just a big network of links to other strings. It would be a great tool for making a fancy Markov language model, but it's not great for actually capturing human knowledge.

Comment by johnswentworth on Why hasn't the technology of Knowledge Representation (i.e., semantic networks, concept graphs, ontology engineering) been applied to create tools to help human thinkers? · 2020-03-09T17:47:07.830Z · score: 8 (2 votes) · LW · GW

"Infer the structure from the data" still implies that the NN has some internal representation of knowledge. Whether the structure is initialized or learned isn't necessarily central to the question - what matters is that there is some structure, and we want to know how to represent that structure in an intelligible manner. The interesting question is then: are the structures used by "knowledge representation" researchers isomorphic to the structures learned by humans and/or NNs?

I haven't read much on KR, but my passing impression is that the structures they use do not correspond very well to the structures actually used internally by humans/NNs. That would be my guess as to why KR tools aren't used more widely.

On the other hand, there are representations of certain kinds of knowledge which do seem very similar to the way humans represent knowledge - causal graphs/Bayes nets are an example which jumps to mind. And those have seen pretty wide adoption.

Comment by johnswentworth on The Lens, Progerias and Polycausality · 2020-03-08T23:30:15.661Z · score: 10 (5 votes) · LW · GW

Hold that thought - there's a post on evolution of aging coming up pretty soon. It's one of the better-understood areas, since we can get a ton of information by comparing across species.

Comment by johnswentworth on The Lens, Progerias and Polycausality · 2020-03-08T22:43:01.732Z · score: 19 (6 votes) · LW · GW

So, here's the surprising bit, which I think even a lot of biologists haven't fully absorbed: it's not DNA damage itself that's accumulating in a non-equilibrium manner. DNA damage events happen at a very fast rate, like thousands of times per day per cell (at least for that particular type of damage; there's several types). It's also repaired at a fast rate (true of all types, as far as I've seen). With that sort of half-life, if DNA damage were out of equilibrium, it would be out of equilibrium on a timescale much faster than aging. If DNA damage increases with age (and there's a lot of indirect evidence that it does), then it's a steady-state level that's increasing, which means that either the damage rate is increasing or the repair rate is decreasing.

In other words, DNA damage itself isn't the root cause - there is indeed something else upstream. You're exactly right on that. (Though, with respect to variance in timeline, bear in mind that many processes play out in parallel across cells - if cells "go bad" one by one, then large number statistics will smooth out the noise a lot.)

The most promising potential culprit I've heard about is transposons: "parasitic" DNA sequences which copy themselves and reinsert themselves into the genome. The human genome has loads of these things, or pieces of them - I've heard a majority of human DNA consists of dead transposons, though I don't have a reference on hand. Normally, they're actively suppressed. But the transposon theory of aging says that, every once in a while, one of them successfully copies. Usually it will copy into non-coding DNA, and then be suppressed, so there's no noticeable effect. But over time, the transposon count increases, the suppressor count doesn't increase, and eventually the transposons get out of control. The DNA damage is a side-effect of active transposons - one of the main ingredients of a transposon is a protein which snips the DNA, allowing the transposon itself to sneak in. In particular, this may be an issue in stem cells - most cells would enter apoptosis/senescence once DNA damage level gets high, but if a stem cell has a transposon count slightly below the cutoff, then it will produce cells which rapidly apoptose/senesce.

Anyway, I'm sure this sequence will get around to theories of the root cause of aging eventually. There's a number of them, although most have been ruled out.

Comment by johnswentworth on Cortés, Pizarro, and Afonso as Precedents for Takeover · 2020-03-08T03:01:31.875Z · score: 2 (1 votes) · LW · GW

Braudel is both long and dense, and I wouldn't recommend the second two volumes at all, but the first volume is probably the single best history book I've read. Beware that his understanding of economics is pretty poor - trust his facts, but be wary of his interpretations.

Comment by johnswentworth on Cortés, Pizarro, and Afonso as Precedents for Takeover · 2020-03-08T00:27:18.484Z · score: 9 (5 votes) · LW · GW

I think technological advantage - specifically sailing technology - probably played a much larger role in Afonso's takeover than it would seem from a quick read. Key pieces:

  • Monsoons
  • Lateen sail

Monsoons: wind around India blows consistently Southwest for half the year, and Northeast for the other half. IIRC from Braudel, this made trade in the Indian ocean highly predictable: everyone sailed with the wind at their back and ran consistent one-year circuits. As you mention:

The Indian Ocean contained most of the world's trade at the time, since it linked up the world's biggest and wealthiest regions.

I'd guess that the monsoons were probably a bigger factor here than vicinity to wealthy regions. In particular:

Europe is just coming out of the Middle Ages and does not have an obvious technological advantage over India or China or the Middle East, and has an obvious economic disadvantage.

Europe had less total wealth (because it had a smaller population) and was behind technologically in some ways (e.g. metallurgy), but even in the 15th-16th century Europe was considered "wealthy" on a per-capita basis. In particular, Europe had much more per-capita capital goods, even before the industrial revolution - especially mills and machinery. Braudel covers a lot of this.

Anyway, monsoons. Consistent wind direction, with an annual cycle. That makes the lateen sail a major strategic advantage: Portugese ships would have been able to tack upwind, a technique which was basically unheard of in the Indian ocean at the time. (On top of that, the Portugese were happy to sail in open ocean at that time and were accustomed to navigating away from land - unlike the Indian ocean locals. Again, Braudel talks about this a fair bit.) So the local navies were presumably stuck at one end of the ocean for six months, while the Portugese had free reign to sail around wherever they wanted. And to top it all off, even if the local navies did manage to catch them, the Portugese could just sail out to open ocean, and the locals wouldn't want to follow.

Now combine that with supply: throughout most of history, a single ship could carry as much supplies as about 4000 horses (source: Logistics of the Macedonian Army). For any island garrisons, or for garrisons surrounded by desert, horses wouldn't even be an option. Thus the importance of naval dominance even for land wars in premodern times: an overland supply train was extremely expensive at best, and often entirely infeasible. Control the water, and the enemy starved.

Put all that together, and Afonso's plan looks less ridiculously ambitious. They had a technological advantage which was perfectly suited to the problem.

Comment by johnswentworth on How hard would it be to attack coronavirus with CRISPR? · 2020-03-07T00:12:27.123Z · score: 11 (3 votes) · LW · GW

This is actually pretty similar to the original function of the CRISPR/CAS9 system in the wild. Wild bacterial CRISPR systems copy short RNA segments complementary to bacteriophage DNA, then use those to target and destroy any phage DNA within the bacteria. So it's definitely something which could work in principle, and is already used by some bacteria.

That said, at this point it would probably be harder to immunize against a virus using CRISPR-based techniques than using traditional vaccines. Just injecting a bunch of CRISPR protein machinery and bits of coronavirus-complementary RNA directly into the bloodstream wouldn't really do anything; you'd need to genetically modify human cells to produce the CRISPR machinery themselves.

(Side note: you might be interested in Todd Rider's DRACO project.)

Comment by johnswentworth on Matrix Multiplication · 2020-03-05T18:02:03.544Z · score: 4 (3 votes) · LW · GW

This question has a bunch of different angles:

  • What are the typical problems bundled under "matrix multiplication"?
  • What are the core algorithms for those problems, and where do they overlap?
  • How does special hardware help?
  • Why "tensor" rather than "matrix"?

The first two angles - problems and algorithms - are tightly coupled. Typical matrix problems we want to solve as an end-goal:

  • dot a matrix with a vector (e.g. going from one layer to another in a neural net)
  • solve Ax = b (e.g. regressions, minimization)
  • dot a sparse/structured matrix with a vector (e.g. convolution)
  • solve Ax = b for sparse/structured A (e.g. finite element models)

Both types of dot products are algorithmically trivial. Solving Ax = b is where the interesting stuff happens. Most of the heavy lifting in matrix solve algorithms is done by breaking A into blocks, and then doing (square) matrix multiplications on those blocks. Strassen's algorithm is often used at the block level to improve big-O runtime for large matrix multiplies. By recursively breaking A into blocks, it also gives a big-O speedup for Ax = b.

I'm not a hardware expert, but my understanding is that hardware is usually optimized to make multiplication of square matrices of a certain size very fast. This is largely about caching - you want caches to be large enough to hold the matrices in question. Fancy libraries know how large the cache is, and choose block sizes accordingly. (Other fancy libraries use algorithms which perform close-to-optimally for any cache size, by accessing things in a pattern which plays well with any cache.) The other factor, of course, is just having lots of hardware multipliers.

Solving Ax=b with sparse/structured matrices is a whole different ballgame. The main goal there is to avoid using the generic algorithms at all, and instead use something fast adapted to the structure of the matrix - examples include FFT, (block-)tridiagonal, and (block-)arrowhead/diagonal-plus-low-rank matrix algorithms. In practice, these often have block structure, so we still need to fall back on normal matrix-multiplication-based algorithms for the blocks themselves.

As for "matrix" vs "tensor"... "tensor" in AI/ML usually just means a multidimensional array. It does not imply the sort of transformation properties we associate with "tensors" in e.g. physics, although Einstein's implicit-sum index notation is very helpful for ML equations/code.

Comment by johnswentworth on johnswentworth's Shortform · 2020-03-05T02:42:27.277Z · score: 14 (4 votes) · LW · GW

I find it very helpful to get feedback on LW posts before I publish them, but it adds a lot of delay to the process. So, experiment: here's a link to a google doc with a post I plan to put up tomorrow. If anyone wants to give editorial feedback, that would be much appreciated - comments on the doc are open.

I'm mainly looking for comments on which things are confusing, parts which feel incomplete or slow or repetitive, and other writing-related things; substantive comments on the content should go on the actual post once it's up.

EDIT: it's up. Thank you to Stephen for comments; the post is better as a result.

Comment by johnswentworth on I don't understand Rice's Theorem and it's killing me · 2020-03-02T07:42:59.529Z · score: 5 (3 votes) · LW · GW

I see three main pieces of potential confusion here:

  • Turing completeness
  • undecidability of the halting problem
  • Rice's Theorem itself

Here's my intuition for each. Bear in mind that I'm trying to convey intuition here, not formal proofs.

Turing completeness: something is Turing complete if it can simulate a computer running any program - though not necessarily very quickly. A human with paper and pencil can simulate a computer. Magic the Gathering can simulate a computer with a fairly elaborate setup. Etc. Computers can simulate everything in the universe (as far as we can tell so far), so if something can simulate a computer, it can simulate anything.

Halting problem: suppose you're trying to predict whether a certain program halts. Here's what the program does: it runs an extremely high-fidelity simulation of you attempting to predict whether the program halts, and then does the opposite of whatever simulated-you predicts. Moral of the story: there are programs for which you, personally, cannot predict whether they halt. More generally, we can replace "you" with any putative halt-checker, and conclude that for any program which attempts to predict whether other programs halt, there is some program which fools it - there is no perfect halt-checker.

Rice's theorem: rather than a halt-checker, you want to write an X-checker: a program which decides whether other programs do X. But if I had an X-checker, then I could use it to build a halt-checker. I take some program which does X (so my X-checker returns "True" on that program). Then, I construct a new program: it runs some arbitrary program, then throws away the result and runs the program which does X. Why would this matter? Well, if my arbitrary program doesn't halt, then my new program will never actually get to do X. So, my X-checker will return "True" on the new program if-and-only-if the arbitrary program halts.

If I want to know whether Rice's theorem applies to some property of programs, I think through this construction, and think about whether it actually works - whether an X-checker would actually let me build a halt-checker this way. (Note that there are much more complicated proofs of Rice's Theorem. I don't understand those, and it's quite possible that this intuition misses some key things which they cover.)

Comment by johnswentworth on johnswentworth's Shortform · 2020-03-01T23:37:46.814Z · score: 6 (3 votes) · LW · GW

Someone should write a book review of The Design of Everyday Things aimed at LW readers, so I have a canonical source to link to other than the book itself.

Comment by johnswentworth on High Variance Productivity Advice · 2020-03-01T18:23:41.434Z · score: 7 (4 votes) · LW · GW

Great post!

You mention "your clients" a few times; you might want to mention what it is that you do, so that people have some sense of the data driving these insights. Personally, I think your unique "dataset" is one of the main things which makes this post interesting/valuable.

Comment by johnswentworth on Trace: Goals and Principles · 2020-02-29T22:03:07.023Z · score: 4 (2 votes) · LW · GW
What are computations - programs?

A computation is the actual series of steps performed when running a program. So any program specifies a computation, but most languages don't have a nice way to do things with the computation other than get the final output (i.e. by running it). The point of this project is to work out a nice data structure for representing a computation, so we can ask questions about it other than just getting the final output.

Comment by johnswentworth on johnswentworth's Shortform · 2020-02-27T23:10:11.722Z · score: 6 (3 votes) · LW · GW

Better? I doubt it. If physicists wrote equations the way programmers write code, a simple homework problem would easily fill ten pages.

Verboseness works for programmers because programmers rarely need to do anything more complicated with their code than run it - analogous to evaluating an expression, for a physicist or mathematician. Imagine if you needed to prove one program equivalent to another algebraically - i.e. a sequence of small transformations, with a record of intermediate programs derived along the way in order to show your work. I expect programmers subjected to such a use-case would quickly learn the virtues of brevity.

Comment by johnswentworth on johnswentworth's Shortform · 2020-02-27T19:04:55.439Z · score: 9 (4 votes) · LW · GW

What if physics equations were written like statically-typed programming languages?

Comment by johnswentworth on New article from Oren Etzioni · 2020-02-27T02:50:06.683Z · score: 2 (1 votes) · LW · GW

It's certainly plausible that things have changed dramatically, although my default guess is that they haven't - a pile of hacks can go a surprisingly long way, and the only tricky-looking spot I saw in that video was a short section just after 1:30. And Musk saying that they're "pushing hard for end-to-end ML" is exactly the sort of thing I'd expect to hear if such a project was not actually finding any traction. I'm sure they're trying to do it, but ML is finicky at the best of times, and I expect we'd hear it shouted from the rooftops if end-to-end self-driving ML was actually starting to work yet.