Spaced Repetition Systems for Intuitions? 2021-01-28T17:23:19.000Z
Alex Ray's Shortform 2020-11-08T20:37:18.327Z


Comment by Alex Ray (alex-ray) on Alex Ray's Shortform · 2021-09-17T22:52:34.508Z · LW · GW

AGI technical domains

When I think about trying to forecast technology for the medium term future, especially for AI/AGI progress, it often crosses a bunch of technical boundaries.

These boundaries are interesting in part because they're thresholds where my expertise and insight falls off significantly.

Also interesting because they give me topics to read about and learn.

A list which is probably neither comprehensive, nor complete, nor all that useful, but just writing what's in my head:

  • Machine learning research - this is where a lot of the tip-of-the-spear of AI research is happening, and seems like that will continue to the near future
  • Machine learning software - tightly coupled to the research, this is the ability to write programs that do machine learning
  • Machine learning compilers/schedulers - specialized compilers translate the program into sequences of operations to run on hardware.  High quality compilers (or the lack of them) has blocked a bunch of nontraditional ML-chips from getting traction.
  • Supercomputers / Compute clusters - large arrays of compute hardware connected in a dense network.  Once the domain of government secret projects, now it's common for AI research companies to have (or rent) large compute clusters.  Picking the hardware (largely commercial-off-the-shelf), designing the topology of connections, and building it are all in this domain.
  • Hardware (Electronics/Circuit) Design - A step beyond building clusters out of existing hardware is designing custom hardware, but using existing chips and chipsets.  This allows more exotic connectivity topologies than you can get with COTS hardware, or allows you to fill in gaps that might be missing in commercially available hardware.
  • Chip Design - after designing the circuit boards comes designing the chips themselves.  There's a bunch of AI-specific chips that are already on the market, or coming soon, and almost all of them are examples of this.  Notably most companies that design chips are "fabless" -- meaning they need to partner with a manufacturer in order to produce the chip.  Nvidia is an example of a famous fabless chip designer.  Chips are largely designed with Process Design Kits (PDKs) which specify a bunch of design rules, limitations, and standard components (like SRAM arrays, etc).
  • PDK Design - often the PDKs will have a bunch of standard components that are meant to be general purpose, but specialized applications can take advantage of more strange configurations.  For example, you could change a SRAM layout to tradeoff a higher bit error rate for lower power, or come up with different ways to separate clock domains between parts of a chip.  Often this is done by companies who are themselves fabless, but also don't make/sell their own chips (and instead will research and develop this technology to integrate with chip designers).
  • Chip Manufacture (Fabrication / Fab) - This is some of the most advanced technology humanity has produced, and is probably familiar to many folks here.  Fabs take chip designs and produce chips -- but the amount of science and research that goes into making that happen is enormous.  Fabs probably have the tightest process controls of any manufacturing process in existence, all in search of increasing fractions of a percent of yield (the fraction of manufactured chips which are acceptable).
  • Fab Process Research - For a given fab (semiconductor manufacturing plant - "fabricator") there might be specializations for different kinds of chips that are different enough to warrant their own "process" (sequence of steps executed to manufacture it).  For example, memory chips and compute chips are different enough to need different processes, and developing these processes requires a bunch of research.
  • Fab "Node" Research - Another thing people might be familiar with is the long-running trend for semiconductors to get smaller and denser.  The "Node" of a semiconductor process refers to this size (and other things that I'm going to skip for now). This is separate from (but related to) optimizing manufacturing processes, but is about designing and building new processes in order to shrink features sizes, or push aspect ratios.  Every small decrease (e.g. "5nm" -> "3nm", though those sizes don't refer to anything real) costs tens of billions of dollars, and further pushes are likely even more expensive.
  • Semiconductor Assembly Research - Because we have different chips that need different processes (e.g. memory chips and compute chips) -- we want to connect them together, ideally better than we could do with just a circuit board.  This research layer includes things like silicon interposers, and various methods of 3D stacking and connection of chips.  (Probably also should consider reticle-boundary-crossing here, but it kinda is also simultaneously a few of the other layers)
  • Semiconductor Materials Science - This probably should be broken up, but I know the least about this.  Semiconductors can produce far more than chips like memory and compute -- they can also produce laser diodes, camera sensors, solar panels, and much more!  This layer includes exotic methods of combining or developing new technologies -- e.g. "photonics at the edge" - a chip where the connections to it are optical instead of electronic!

Anyways I hope that was interesting to some folks.

Comment by Alex Ray (alex-ray) on Alex Ray's Shortform · 2021-09-15T21:03:28.750Z · LW · GW

Book Aesthetics

I seem to learn a bunch about my aesthetics of books by wandering a used book store for hours.

Some books I want in hardcover but not softcover.  Some books I want in softcover but not hardcover.  Most books I want to be small.

I prefer older books to newer books, but I am particular about translations.  Older books written in english (and not translated) are gems.

I have a small preference for books that are familiar to me, a nontrivial part of them were because they were excerpts taught in english class.

I don't really know what exactly constitutes a classic, but I think I prefer them.  Lists of "Great Classics" like Mortimer Adler's are things I've referenced in the past.

I enjoy going through multi-volume series (like the Harvard Classics) but I think I prefer my library to be assembled piecemeal.

That being said, I really like the Penguin Classics.  Maybe they're familiar, or maybe their taste matches my own.

I like having a little lending library near my house so I can elegantly give away books that I like and think are great, but I don't want in my library anymore.

Very few books I want as references, and I still haven't figured out what references I do want.  (So far a small number: Constitution, Bible)

I think a lot about "Ability to Think" (a whole separate topic) and it seems like great works are the products of great 'ability to think'.

Also it seems like authors of great works know or can recognize other great works.

This suggests that figuring out who's taste I think is great, and seeing what books they recommend or enjoy.

I wish there was a different global project of accumulating knowledge than books.  I think books works well for poetry and literature, but it works less well for science and mechanics.

Wikipedia is similar to this, but is more like an encyclopedia, and I'm looking for something that includes more participatory knowledge.

Maybe what I'm looking for is a more universal system of cross-referencing and indexing content.  The internet as a whole would be a good contender here, but is too haphazard.

I'd like things like "how to build a telescope at home" and "analytic geometry" to be well represented, but also in the participatory knowledge sort of way.

(This is the way in which much of human knowledge is apprenticeship-based and transferred, and merely knowing the parts of a telescope -- what you'd learn from an encyclopedia -- is insufficient to be able to make one)

I expect to keep thinking on this, but for now I have more books!

Comment by Alex Ray (alex-ray) on Alex Ray's Shortform · 2021-09-10T18:33:29.785Z · LW · GW

The Positive and the Negative

I work on AI alignment, in order to solve problems of X-Risk.  This is a very "negative" kind of objective.

Negatives are weird.  Don't do X, don't be Y, don't cause Z.  They're nebulous and sometimes hard to point at and move towards.

I hear a lot of a bunch of doom-y things these days.  From the evangelicals, that this is the end times / end of days.  From environmentalists that we are in a climate catastrophe.  From politicians that we're in a culture war / edging towards a civil war.  From the EAs/Rationalists that we're heading towards potential existential catastrophe (I do agree with this one).

I think cognition and emotion and relation can get muddied up with too much of negatives without things to balance them out.  I don't just want to prevent x-risk -- I also want to bring about a super awesome future.

So I think personally I'm going to try to be more balanced in this regard, even in the small scale, by mixing in the things I'm wanting to move towards in addition to things I want to move away from.

In the futurist and long term communities, I want to endorse and hear more about technology developments that help bring about a more awesome future (longevity and materials science come to mind as concrete examples).

Comment by Alex Ray (alex-ray) on Alex Ray's Shortform · 2021-09-05T22:13:48.781Z · LW · GW

Oh yeah like +100% this.

Creating an environment where we can all cultivate our weird hunches and proto-beliefs while sharing information and experience would be amazing.

I think things like "Scout Mindset" and high baselines of psychological safety (and maybe some of the other phenomenological stuff) help as well.

If we have the option to create these environments instead, I think we should take that option.

If we don't have that option (and the environment is a really bad epistemic baseline) -- I think the "bet your beliefs" does good.

Comment by Alex Ray (alex-ray) on Kids Roaming · 2021-09-05T21:23:05.508Z · LW · GW

Addressing only the second point (technology) not the first (risks): Our neighbors have small roaming kids (near the same ages), and they have walkie-talkies that range a pretty good distance from the house.  It seems like their parents can then pretty easily/quickly contact them from inside the house, and the children can also easily/quickly contact them from outside.  It also looks like they make regular use of them for normal messages (e.g. "time for dinner, start heading home") instead of emergencies-only.

I think having these, combined with having a small group (as opposed to solo) seems to help lower a bunch of the risk factors.

Comment by Alex Ray (alex-ray) on Alex Ray's Shortform · 2021-09-05T21:09:22.892Z · LW · GW

"Bet Your Beliefs" as an epistemic mode-switch

I was just watching this infamous interview w/ Patrick Moore where he seems to be doing some sort of epistemic mode switch (the "weed killer" interview)[0]

Moore appears to go from "it's safe to drink a cup of glyphosate" to (being offered the chance to do that) "of course not / I'm not stupid".

This switching between what seems to be a tribal-flavored belief (glyphosate is safe) and a self-protecting belief (glyphosate is dangerous) is what I'd like to call an epistemic mode-switch.  In particular, it's a contradiction in beliefs, that's really only obvious if you can get both modes to be near each other (in time/space/whatever).

In the rationality community, it seems good to:

  1. Admit this is just a normal part of human reasoning.  We probably all do this to some degree some times, and
  2. It seems like there are ways we can confront this by getting the modes near each other.

I think one of the things thats going on here is a short-term self-preservation incentive is a powerful tool for forcing yourself to be clear about your beliefs.  In particular, it seems good at filtering/attenuating beliefs that are just tribal signaling.

This suggests that if you can get people to use this kind of short-term self-preservation incentive, you can probably get them to report more calibrated and consistent beliefs.

I think this is one of the better functions of "bet your beliefs".  Summarizing what I understand "bet your beliefs" to be:  There is a norm in the rationality community of being able to challenge peoples beliefs by asking them to form bets on them (or forming and offering them bets on them) -- and taking/refusing bets is seen as evidence of admission of beliefs.

Previously I've mostly just thought of this as a way of increasing evidence that someone believes what they say they do.  If them saying "I believe X" is some small amount of evidence, then them accepting a bet about X is more evidence.

However, now I see that there's potentially another factor at play.  By forcing them to consider short-term losses, you can induce an epistemic mode-switch away from signaling beliefs towards self-preservation beliefs.

It's possible this is already what people thought "bet your beliefs" was doing, and I'm just late to the party.

Caveat: the rest of this is just pontification.

It seems like a bunch of the world has a bunch of epistemic problems.  Not only are there a lot of obviously bad and wrong beliefs, they seem to be durable and robust to evidence that they're bad and wrong.

Maybe this suggests a particular kind remedy to epistemological problems, or at the very least "how can I get people to consider changing their mind" -- by setting up situations that trigger short-term self-preservation thinking.

[0] Context from 33:50 here:

Comment by Alex Ray (alex-ray) on Parameter counts in Machine Learning · 2021-07-01T16:48:38.513Z · LW · GW

One reason it might not be fitting as well for vision, is that vision has much more weight-tying / weight-reuse in convolutional filters.  If the underlying variable that mattered was compute, then image processing neural networks would show up more prominently in compute (rather than parameters).

Comment by Alex Ray (alex-ray) on How will OpenAI + GitHub's Copilot affect programming? · 2021-07-01T00:48:30.227Z · LW · GW

I think all of those points are evidence that updates me in the direction of the null hypothesis, but I don't think any of them is true to the exclusion of the others.

I think a moderate amount of people will use copilot.  Cost, privacy, and internet connection will factor to limit this.

I think copilot will have a moderate affect on users outputs.  I think it's the best new programming tool I've used in the past year, but I'm not sure I'd trade it for, e.g. interactive debugging (reference example of a very useful programming tool)

I think copilot will have no significant differential effect on infosec, at least at first.  The same way I think the null hypothesis should be a language model produces average language, I think the null hypothesis is a code model produces average code (average here meaning it doesn't improve or worsen the infosec situation that jim is pointing to).

In general these lead me to putting a lot of weight on 'no significant impact' in aggregate, though I think it is difficult for anything to have a significant impact on the state of computer security.

(Some examples come to mind: Snowden leaks (almost definitely), Let'sEncrypt (maybe), HTTPSEverywhere (maybe), Domain Authentication (maybe)) 

Comment by Alex Ray (alex-ray) on How will OpenAI + GitHub's Copilot affect programming? · 2021-06-30T06:16:16.092Z · LW · GW

(Disclaimer: I work at OpenAI, and I worked on the models/research behind copilot.  You should probably model me as a biased party)

This will probably make the already-bad computer security/infosec situation significantly worse.

I'll take the other side to that bet (the null hypothesis), provided the "significantly" unpacks to something reasonable.  I'll possibly even pay to hire the contractors to run the experiment.

I think a lot of people make a lot of claims about new tech that will have a significant impact that end up falling flat.  A new browser will revolutionize this or that; a new website programming library will make apps significantly easier, etc etc.

I think a good case in point is TypeScript.  JavaScript is the most common language on the internet.  TypeScript adds strong typing (and all sorts of other strong guarantees) and has been around for a while.  However I would not say that TypeScript has significantly impacted the security/infosec situation.

I think my prediction is that Copilot does not significantly affect the computer security/infosec situation.

It's worth separating out that this line of research -- in particular training large language models on code data -- probably has a lot more possible avenues of impact than a code completer in VS Code.  My prediction is not about the sum of all large language models trained on code data.

I also do think we agree that it would be good if models always produced the code-we-didnt-even-know-we-wanted, but for now I'm a little bit wary of models that can do things like optimize code outside of our ability to notice/perceive.

Comment by Alex Ray (alex-ray) on Alex Ray's Shortform · 2021-06-12T16:22:57.210Z · LW · GW

Intersubjective Mean and Variability.

(Subtitle: I wish we shared more art with each other)

This is mostly a reaction to the (10y old) LW post:  Things you are supposed to like.

I think there's two common stories for comparing intersubjective experiences:

  • "Mismatch": Alice loves a book, and found it deeply transformative.  Beth, who otherwise has very similar tastes and preferences to Alice, reads the book and finds it boring and unmoving.
  • "Match": Charlie loves a piece of music.  Daniel, who shares a lot of Charlie's taste in music, listens to it and also loves it.

One way I can think of unpacking this is that there is in terms of distributions:

  • "Mean" - the shared intersubjective experiences, which we see in the "Match" case
  • "Variability" - the difference in intersubjective experiences, which we see in the "Mismatch" case

Another way of unpacking this is due to factors within the piece or within the subject

  • "Intrinsic" - factors that are within the subject, things like past experiences and memories and even what you had for breakfast
  • "Extrinsic" - factors that are within the piece itself, and shared by all observers

And one more ingredient I want to point at is question substitution.  In this case I think the effect is more like "felt sense query substitution" or "received answer substitution" since it doesn't have an explicit question.

  • When asked about a piece (of art, music, etc) people will respond with how they felt -- which includes both intrinsic and extrinsic factors.

Anyways what I want is better social tools for separating out these, in ways that let people share their interest and excitement in things.

  • I think that these mismatches/misfirings (like the LW post that set this off) and the reactions to them cause a chilling effect, where the LW/rationality community is not sharing as much art because of this
  • I want to be in a community that's got a bunch of people sharing art they love and cherish

I think great art is underrepresented in LW and want to change that.

Comment by Alex Ray (alex-ray) on Alex Ray's Shortform · 2021-06-12T16:00:20.068Z · LW · GW

How I would do a group-buy of methylation analysis.

(N.B. this is "thinking out loud" and not actually a plan I intend to execute)

Methylation is a pretty commonly discussed epigenetic factor related to aging.  However it might be the case that this is downstream of other longevity factors.

I would like to measure my epigenetics -- in particular approximate rates/locations of methylation within my genome.  This can be used to provide an approximate biological age correlate.

There are different ways to measure methylation, but one I'm pretty excited about that I don't hear mentioned often enough is the Oxford Nanopore sequencer.

The mechanism of the sequencer is that it does direct-reads (instead of reading amplified libraries, which destroy methylation unless specifically treated for it), and off the device is a time-series of electrical signals, which are decoded into base calls with a ML model.  Unsurprisingly, community members have been building their own base caller models, including ones that are specialized to different tasks.

So the community made a bunch of methylation base callers, and they've been found to be pretty good.

So anyways the basic plan is this:

Why I think this is cool?  Mostly because ONT makes a $1k sequencer than can fit in your pocket, and can do well in excess of 1-10Gb reads before needing replacement consumables.  This is mostly me daydreaming what I would want to do with it.

Aside: they also have a pretty cool $9k sample prep tool, which would be useful to me since I'm empirically crappy at doing bio experiments, but the real solution would probably just be to have a contract lab do all the steps and just send the data.

Comment by Alex Ray (alex-ray) on Beijing Academy of Artificial Intelligence announces 1,75 trillion parameters model, Wu Dao 2.0 · 2021-06-06T02:21:24.467Z · LW · GW

In my experience, I haven't seen a good "translation" process -- instead models are pretrained on bigger and bigger corpora which include more languages.

GPT-3 was trained on data that was mostly english, but also is able to (AFAICT) generate other languages as well.

For some english-dependent metrics (SuperGLUE, Winogrande, LAMBADA, etc) I expect a model trained on primarily non-english corpora would do worse.

Also, yes, the tokenization I would expect to be different for a largely different corpora.

Comment by Alex Ray (alex-ray) on Teaching ML to answer questions honestly instead of predicting human answers · 2021-06-04T01:09:26.330Z · LW · GW

I feel overall confused, but I think that's mostly because of me missing some relevant background to your thinking, and the preliminary/draft nature of this.

I hope sharing my confusions is useful to you.  Here they are:

I'm not sure how the process of "spending bits" works.  If the space of possible models was finite and discretized, then you could say spending bits is partitioning down to "1/2^B"th of the space -- but this is not at all how SGD works, and seems incompatible with using SGD (or any optimizer that doesn't 'teleport' through parameter space) as the optimization algorithm.

Spending bits does make sense in terms of naive rejection sampling (but I think we agree this would be intractably expensive) and other cases of discrete optimization like integer programming.  It's possible I would be less confused if this was explained using a different optimization algorithm, like BFGS or some hessian-based method or maybe a black-box bayesian solver.

Separately, I'm not sure why the two heads wouldn't just end up being identical to each other.  In shorter-program-length priors (which seem reasonable in this case; also minimal-description-length and sparse-factor-graph, etc etc) it seems like weight-tying the two heads or otherwise making them identical.

Lastly, I think I'm confused by your big formula for the unnormalized posterior log probability of () -- I think the most accessible of my confusions is that it doesn't seem to pass "basic type checking consistency".

I know the output should be a log probability, so all the added components should be logprobs/in terms of bits.

The L() term makes sense, since it's given in terms of bits.

The two parameter distances seem like they're in whatever distance metric you're using for parameter space, which seems to be very different from the logprobs.  Maybe they both just have some implicit unit conversion parameter out front, but I think it'd be surprising if it were the case that every "1 parameter unit" move through parameter space is worth "1 nat" of information.  For example, it's intuitive to me that some directions (towards zero) would be more likely than other directions.

The C() term has a lagrange multiplier, which I think are usually unitless.  In this case I think it's safe to say it's also maybe doing units conversion.  C() itself seems to possibly be in terms of bits/nats, but that isn't clear.

In normal lagrangian constrained optimization, lambda would be the parameter that gives us the resource tradeoff "how many bits of (L) loss on the data set tradeoff with a single bit (C) of inconsistency"

Finally the integral is a bit tricky for me to follow.  My admittedly-weak physics intuitions are usually that you only want to take an exponential (or definitely a log-sum-exp like this) of unitless quantities, but it looks like it has the maybe the unit of our distance in parameter space.  That makes it weird to integrate over possible parameter, which introduces another unit of parameter space, and then take the logarithm of it.

(I realize that unit-type-checking ML is pretty uncommon and might just be insane, but it's one of the ways I try to figure out what's going on in various algorithms)

Looking forward to reading more about this in the future.

Comment by Alex Ray (alex-ray) on "Existential risk from AI" survey results · 2021-06-04T00:24:56.702Z · LW · GW

Thanks for doing this research and sharing the results.

I'm curious if you or MIRI plan to do more of this kind of survey research in the future, or its just a one-off project.

Comment by Alex Ray (alex-ray) on Beijing Academy of Artificial Intelligence announces 1,75 trillion parameters model, Wu Dao 2.0 · 2021-06-04T00:22:51.832Z · LW · GW

I think this take is basically correct.  Restating my version of it:

Mixture of Experts and similar approaches modulate paths through the network, such that not every parameter is used every time.  This means that parameters and FLOPs (floating point operations) are more decoupled than they are in dense networks.

To me, FLOPs remains the harder-to-fake metric, but both are valuable to track moving forward.

Comment by Alex Ray (alex-ray) on Beijing Academy of Artificial Intelligence announces 1,75 trillion parameters model, Wu Dao 2.0 · 2021-06-04T00:19:50.493Z · LW · GW

I think the engadget article failed to capture the relevant info, so just putting my preliminary thoughts down here.  I expect my thoughts to change as more info is revealed/translated.

Loss on the dataset (for cross-entropy this is measured in bits of perplexity per token or per character) is a more important metric than parameter count, in my opinion.

However, I think parameter count does matter at least a small part because it is a signal for:
* the amount of resources that are available to the researchers (very expensive to do very large runs)
* the amount of engineering capacity that the project has access to (difficult to write code that functions well at that scale -- nontrivial to just code a working 1.7T parameter model training loop)

I expect more performance metrics at some point, on the normal set of performance benchmarks.

I also expect to be very interested in how they release/share/license the model (if at all), and who is allowed access to it.

Comment by Alex Ray (alex-ray) on Peekskill Lyme Incidence · 2021-06-02T01:52:46.764Z · LW · GW

I'm curious about a couple things about your case if you're willing to share.

1. does this mean you still carry the disease?
2. did the diagnosis involve western blot / checking for antibodies? (vs just observations/location/history/etc)
3. what is your current level of concern about long term symptoms from lyme given this or future exposures?

Comment by Alex Ray (alex-ray) on Alex Ray's Shortform · 2021-01-25T01:22:33.769Z · LW · GW

My feeling is that I don't have a strong difference between them.  In general simpler policies are both easier to execute in the moment and also easier for others to simulate.

The clearest version of this is to, when faced with a decision, decide on an existing principle to apply before acting, or else define a new principle and act on this.

Principles are examples of short policies, which are largely path-independent, which are non-narrative, which are easy to execute, and are straightforward to communicate and be simulated by others.

Comment by Alex Ray (alex-ray) on Alex Ray's Shortform · 2021-01-25T01:00:05.490Z · LW · GW

(Note: this might be difficult to follow.  Discussing different ways that different people relate to themselves across time is tricky.  Feel free to ask for clarifications.)


I'm reading the paper Against Narrativity, which is a piece of analytic philosophy that examines Narrativity in a few forms:

  • Psychological Narrativity - the idea that "people see or live or experience their lives as a narrative or story of some sort, or at least as a collection of stories."
  • Ethical Narrativity - the normative thesis that "experiencing or conceiving one's life as a narrative is a good thing; a richly [psychologically] Narrative outlook is essential to a well-lived life, to true or full personhood."

It also names two kinds of self-experience that it takes to be diametrically opposite:

  • Diachronic - considers the self as something that was there in the further past, and will be there in the further future
  • Episodic - does not consider the self as something that was there in the further past and something that will be there in the further future

Wow, these seem pretty confusing.  It sounds a lot like they just disagree on the definition of the world "self".  I think there is more to it than that, some weak evidence being discussing this concept of length with a friend (diachronic) who had a very different take on narrativity than myself (episodic).

I'll try to sketch what I think "self" means.  It seems that for almost all nontrivial cognition, it seems like intelligent agents have separate concepts (or the concept of a separation between) the "agent" and the "environment".  In Vervaeke's works this is called the Agent-Arena Relationship.

You might say "my body is my self and the rest is the environment," but is that really how you think of the distinction?  Do you not see the clothes you're currently wearing as part of your "agent"?  Tools come to mind as similar extensions of our self.  If I'm raking leaves for a long time, I start to sense myself as a the agent being the whole "person + rake" system, rather than a person whose environment includes a rake that is being held.

(In general I think there's something interesting here in proto-human history about how tool use interacts with our concept of self, and our ability to quickly adapt to thinking of a tool as part of our 'self' as a critical proto-cognitive-skill.)

Getting back to Diachronic/Episodic:  I think one of the things that's going on in this divide is that this felt sense of "self" extends forwards and backwards in time differently.


I often feel very uncertain in my understanding or prediction of the moral and ethical natures of my decisions and actions.  This probably needs a whole lot more writing on its own, but I'll sum it up as two ideas having a disproportionate affect on me:

  • The veil of ignorance, which is a thought experiment which leads people to favor policies that support populations more broadly (skipping a lot of detail and my thoughts on it for now).
  • The categorical imperative, which I'll reduce here as the principle of universalizability -- a policy for actions given context is moral if it is one you would endorse universalizing (this is huge and complex, and there's a lot of finicky details in how context is defined, etc.  skipping that for now)

Both of these prompt me to take the perspective of someone else, potentially everyone else, in reasoning through my decisions.  I think the way I relate to them is very Non-Narrative/Episodic in nature.

(Separately, as I think more about the development of early cognition, the more the ability to take the perspective of someone else seems like a magical superpower)

I think they are not fundamentally or necessarily Non-Narrative/Episodic -- I can imagine both of them being considered by someone who is Strongly Narrative and even them imagining a world consisting of a mixture of Diachronic/Episodic/etc.


Priors are hard.  Relatedly, choosing between similar explanations of the same evidence is hard.

I really like the concept of the Solomonoff prior, even if the math of it doesn't apply directly here.  Instead I'll takeaway just this piece of it:

"Prefer explanations/policies that are simpler-to-execute programs"

A program may be simpler if it has fewer inputs, or fewer outputs.  It might be simpler if it requires less memory or less processing.

This works well for choosing policies that are easier to implement or execute, especially as a person with bounded memory/processing/etc.


A simplifying assumption that works very well for dynamic systems is the Markov property.

This property states that all of the information in the system is present in the current state of the system.

One way to look at this is in imagining a bunch of atoms in a moment of time -- all of the information in the system is contained in the current positions and velocities of the atoms.  (We can ignore or forget all of the trajectories that individual atoms took to get to their current locations)

In practice we usually do this to systems where this isn't literally true, but close-enough-for-practical-purposes, and combine it with stuffing some extra stuff into the context for what "present" means.

(For example we might define the "present" state of a natural system includes "the past two days of observations" -- this still has the Markov property, because this information is finite and fixed as the system proceeds dynamically into the future)


I think that these pieces, when assembled, steer me towards becoming Episodic.

When choosing between policies that have the same actions, I prefer the policies that are simpler. (This feels related to the process of distilling principles.)

When considering good policies, I think I consider strongly those policies that I would endorse many people enact.  This is aided by these policies being simpler to imagine.

Policies that are not path-dependent (for example, take into account fewer things in a person's past) are simpler, and therefore easier to imagine.

Path-independent policies are more Episodic, in that they don't rely heavily on a person's place in their current Narratives.


I don't know what to do with all of this.

I think one thing that's going on is self-fulfilling -- where I don't strongly experience psychological Narratives, and therefore it's more complex for me to simulate people who do experience this, which via the above mechanism leads to me choosing Episodic policies.

I don't strongly want to recruit everyone to this method of reasoning.  It is an admitted irony of this system (that I don't wish for everyone to use the same mechanism of reasoning as me) -- maybe just let it signal just how uncertain I feel about my whole ability to come to philosophical conclusions on my own.

I expect to write more about this stuff in the near future, including experiments I've been doing in my writing to try to move my experience in the Diachronic direction.  I'd be happy to hear comments for what folks are interested in.


Comment by Alex Ray (alex-ray) on The Case for a Journal of AI Alignment · 2021-01-11T01:13:15.341Z · LW · GW

I think there's a lot of really good responses, that I won't repeat.

I think the traditional model of journals has a lot of issues, not the least of which are bad incentives.

The new model used by eLife is pretty exciting to me, but very different than what you proposed.  I think it's worth considering:

  • only reviewing works that have already been published as preprints (I think LW/AF should count for this, as well as ArXiV)
  • publishing reviews -- this lets the rest of the community benefit more from the labor of reviewing, though it does raise the standard for reviewers
  • curate the best / highest reviewed articles to be "published"

The full details of their new system is here in an essay they published describing the changes and why they made them.

Comment by Alex Ray (alex-ray) on Why GPT wants to mesa-optimize & how we might change this · 2021-01-09T18:15:01.739Z · LW · GW

Clarifying Q: Does mesa-optimization refer to any inner optimizer, or one that is in particular not aligned with the outer context?

Comment by Alex Ray (alex-ray) on Why GPT wants to mesa-optimize & how we might change this · 2021-01-02T19:16:27.859Z · LW · GW

Epistemic status: I’m not really an expert at NLP.  I’ve only been working on language modeling for ~8mo, which is much less than some of the folks here, and this is based on my experiences.

Beam Search:

Beam search with large unsupervised generatively pretrained transformers (GPTs) is weirder than it appears in the NLP literature.  Other commenters have mentioned degeneracies, but for me the sticking points for beam search were:

  • It tends to quickly fall on a modal response — so it’s already bad for any sort of situation you want to generate a diversity of samples and choose the best from
  • It’s hard to correctly score between varying-length segments.  Every paper that uses beam search has some heuristic hack here, which is almost always some parametrized function they pulled from another paper or hacked together.
  • It seems to mostly do best (once tuned) at some narrow/specific distribution (e.g. generating short responses in a chat setting).  It’s hard to get beam search tuned to work well across the full distribution used to train these models (i.e. “text on the internet”)

Given these three issues, in my experience it’s been better to just focus on tuning naive sampling, with a few key parameters: temperature, top_p, etc (these are part of the OpenAI API).

Caveat: it’s possible I’m just bad at tuning beam search.  It’s possible I’m bad at scholarship and missed the “one key paper” that would make it all clear to me.  I would take the above as more of an anecdote than a scientific result.

Separation of training and sampling:

This has been mentioned by other commenters, but might bear repeating that there is no sampling at all in the training process for GPTs.  They’re trained to approximate marginal next token distributions, and the default is to share the loss on the prediction for every token equally.  In practice the loss on later tokens is lower.

All of this is saying that training is a separate process for sampling.  I think there is probably very good research to be done in better sampling — in particular, I think it is possible to have a machine which aligns sampling from an unaligned model.

Lookahead & pondering:

I think the point about lookahead is still worth considering.  One of the differences between transformers and the previous most-popular architecture for language models (LSTMs) is that transformers use the same amount of compute for every token.  (It’s possible to build them otherwise, but I haven’t seen any of these that I’ve been impressed by yet)

I think my favorite example of this in the literature is Adaptive Computation Time (ACT)[], where essentially the model learns how to “spend” extra compute on certain characters.

(One of the things going on with ACT is dealing with the non-uniformity of the distribution of information content in character strings — for GPTs this is at least partially ameliorated by the byte-pair encoding)

So I think it is reasonable to train a model to be able to use extra “pondering” time when sampling.  Either by having an external controller that tells the model when to ponder and when to output, or by having the model learn itself how to ponder (which is the “halting neuron” signal in ACT).

I do think that any sort of pondering is subject to mesa-optimization concerns.

Fix 1 - BERT:

Caveat: I haven’t trained BERT models or taken a trained one and tried hard to get high quality samples from it.  This is based on intuitions and hearsay.

Here I’ll use “GPT” to refer to autoregressive next token prediction objectives, to mirror the style of the article.  This objective can of course be used with other architectures in other settings.

Instead of thinking the “mask-part-out prediction” (BERT) and the “mask future text” (GPT) as two separate tasks, think of them as points in the space of distributions over masks.

In particular, its trivial to come up with mask distributions that include both a preponderance of masks which leave small parts out (BERT-like) and masks which leave future tokens out (GPT-like) as well as possibly other mask patterns.

My intuition is that the higher probability you mask out all future tokens, the easier it is to get high quality samples from that model.

Fix 1 - Editing Text:

(Same caveat as above regarding inexperience w/ BERT models)

BERT objectives by themselves do not allow efficient text editing, and neither do GPT objectives.

Thinking about the task of composing an edit you, the model needs to:

  • Identify the section that will be removed (if any)
  • Figure out the length of the replacement text (if any)
  • Compose the replacement text (if any)
  • Possibly also have some way of attending over the old text, while still knowing to replace it

Neither BERT nor GPT objectives do a great job of this by itself.  If I had to choose, though, I think you can encode this sort of thing in the GPT dataset and have it autoregressively generate edits.

(This is part of a conjecture I’ve been meaning to writeup for lesswrong of “the dataset is the interface” for GPT models)

Fix 2 - Changing the training:

I think there’s some interesting stuff here, but so far this is in the regime of training algorithms that are unexplored, enormously complex, and poorly understood.

The clearest part here is that it uses sampling in the training loop which so far I’ve almost exclusively seen in reinforcement learning (RL).

But, we can probably implement something like this with RL.  In particular, training is a process of selecting a context (masking), sampling from the model to fill in the mask, and scoring based on the objective.

In this case, drawing some analogies to RL:

  • Action - token
  • Action distribution - token distribution (the basic output of a GPT model given an input context)
  • Policy - language model (in particular a GPT model, though with hacks BERT/other models could be used)
  • Reward - objective (log-loss on the true document, for a GPT model)
  • Environment - a document, probably with some starting context already provided

It’s pretty easy to see here that this wouldn’t work well from generating from scratch.  If I provide zero contextual tokens to the model, sample N tokens, and then score it on how close it got to a true (hidden) document, I am going to have a very bad time.

This might be a good approach for fine-tuning a GPT model — which is (exactly what some colleagues did)[].

Even in the fine-tuning case, we have all of the myriad and sundry problems with RL (instability, inefficiency, etc) that our plain-and-simple language modeling objective lacks.

Fix 2 - update away:

I think this probably won’t work just from experience.  I’ve found it very hard to get the model to “reduce your probability on the most likely outcome and increase your probability on the next most likely outcome” — instead objectives like this tend to just increase the temperature of everything (or worse, it puts all of the increase in entropy in the long tail of bad answers).

It’s possible there is a good way to do this, but for now I don’t know of a good way to get a model to increase the probability of “secondary options” without just degenerating into increasing entropy.

Fix 2 - track updates:

If I understand this correctly, I think this is easily approximated by having an objective/loss/reward term which penalizes differences from the original model.  For small deltas I think this is a good approach, and unfortunately is only as good as the original model you’re comparing it too.

As far as the specific proposal for managing updates towards/away from beam search updates, that seems also possible via a similar mechanism — penalize distributional difference from those samples.

I think we haven’t really explored these sort of penalties enough, and in particular how they interact when combined with other objectives.

Fix 2 - will it stunt:

I think that any objective that scores better predictions higher will incentivize some sort of lookahead/pondering.

If you prevent it from being coincident with the beam search distribution, then I expect the model will learn how to do lookahead/pondering in the null space of beam search.

Will these solve mesa-optimization:

This isn’t clear to me, but I think it’s worth studying.

In particular, it would be good to figure out some way of contriving a mesa-optimization setup, such that we could measure if these fixes would prevent it or not.

Beam Search in the API:

I think my above comments about Beam Search apply here.

Beam search, like any optimization algorithm, is hugely dependent on its scoring function.  If you score on likelihood, you’ll end up with high-likelihood (“unsurprising”) text.

Future thoughts - sampling research:

I think in general we’re in a weirdly asymmetric world, where we have a huge amount of compute and effort into computing auto-regressive next token distributions, and comparatively very little sophistication in sampling from them.

This comment is probably too long already for me to expand too much on this, but in particular, I think the log-likelihood objective is default unaligned (as most datasets are default unaligned) but I think we can find ways of sampling from log-likelihood optimized models in ways that are aligned.

Comment by Alex Ray (alex-ray) on Final Version Perfected: An Underused Execution Algorithm · 2020-12-02T18:12:14.083Z · LW · GW


It seems like the prerequisite assumptions are likely to be violated sometimes (in general most assumptions aren't total rules).

My question is about the rate of violations to this prerequisite assumption.

A few ways to cut at it (feel free to answer just one or none of them):

  • When going through a list subsequent times, how often do you notice/feel internally that your views on a past item have shifted?
  • How often do you make a new list and start the process anew, even though you have an existing list that could be continued on?
  • How often do you go back and erase or modify marks on a list while using this process?

I think I find my internal experience (and relation to stuff on my to-do list) changes pretty significantly over the course of a day.

Comment by Alex Ray (alex-ray) on Final Version Perfected: An Underused Execution Algorithm · 2020-11-29T21:06:29.102Z · LW · GW

This is my favorite kind of lesswrong post -- a quick rationality technique that I can immediately go try and report back on.

I was able to prototype it quickly in my notes list by using a dedicated symbol as the marker.  It looks like any weird/unused symbol could be used as this.  Seems like a quick hack to work with any digital list (I used §).

Question about non-stationarity: How often is the "stable" prerequisite violated in practice?

E.g. if a bunch of items are physically exhausting, and a bunch are not, I might want to not do physically exhausting items in sequence.  I didn't run into this personally in my tiny trial, so at least the answer isn't "all the time".

Comment by Alex Ray (alex-ray) on Alex Ray's Shortform · 2020-11-28T03:52:30.881Z · LW · GW

1. What am I missing from church?

(Or, in general, by lacking a religious/spiritual practice I share with others)

For the past few months I've been thinking about this question.

I haven't regularly attended church in over ten years.  Given how prevalent it is as part of human existence, and how much I have changed in a decade, it seems like "trying it out" or experimenting is at least somewhat warranted.

I predict that there is a church in my city that is culturally compatible with me.

Compatible means a lot of things, but mostly means that I'm better off with them than without them, and they're better off with me than without me.

Unpacking that probably will get into a bunch of specifics about beliefs, epistemics, and related topics -- which seem pretty germane to rationality.

2. John Vervaeke's Awakening from the Meaning Crisis is bizzarely excellent.

I don't exactly have handles for exactly everything it is, or exactly why I like it so much, but I'll try to do it some justice.

It feels like rationality / cognitive tech, in that it cuts at the root of how we think and how we think about how we think.

(I'm less than 20% through the series, but I expect it continues in the way it has been going.)

Maybe it's partially his speaking style, and partially the topics and discussion, but it reminded me strongly of sermons from childhood.

In particular: they have a timeless quality to them.  By "timeless" I mean I think I would take away different learnings from them if I saw them at different points in my life.

In my work & research (and communicating this) -- I've largely strived to be clear and concise.  Designing for layered meaning seems antithetical to clarity.

However I think this "timelessness" is a missing nutrient to me, and has me interested in seeking it out elsewhere.

For the time being I at least have a bunch more lectures in the series to go!

Comment by Alex Ray (alex-ray) on Alex Ray's Shortform · 2020-11-25T03:37:53.035Z · LW · GW

I don't know if he used that phrasing, but he's definitely talked about the risks (and advantages) posed by singletons.

Comment by Alex Ray (alex-ray) on Alex Ray's Shortform · 2020-11-22T18:34:47.117Z · LW · GW

Thinking more about the singleton risk / global stable totalitarian government risk from Bostrom's Superintelligence, human factors, and theory of the firm.

Human factors represent human capacities or limits that are unlikely to change in the short term.  For example, the number of people one can "know" (for some definition of that term), limits to long-term and working memory, etc.

Theory of the firm tries to answer "why are economies markets but businesses autocracies" and related questions.  I'm interested in the subquestion of "what factors given the upper bound on coordination for a single business", related to "how big can a business be".

I think this is related to "how big can an autocracy (robustly/stably) be", which is how it relates to the singleton risk.

Some thoughts this produces for me:

  • Communication and coordination technology (telephones, email, etc) that increase the upper bounds of coordination for businesses ALSO increase the upper bound on coordination for autocracies/singletons
  • My belief is that the current max size (in people) of a singleton is much lower than current global population
  • This weakly suggests that a large global population is a good preventative for a singleton
  • I don't think this means we can "war of the cradle" our way out of singleton risk, given how fast tech moves and how slow population moves
  • I think this does mean that any non-extinction event that dramatically reduces population also dramatically increases singleton risk
  • I think that it's possible to get a long-term government aligned with the values of the governed, and "singleton risk" is the risk of an unaligned global government

So I think I'd be interested in tracking two "competing" technologies (for a hand-wavy definition of the term)

  1. communication and coordination technologies -- tools which increase the maximum effective size of coordination
  2. soft/human alignment technologies -- tools which increase alignment between government and governed
Comment by Alex Ray (alex-ray) on Alex Ray's Shortform · 2020-11-22T18:34:09.492Z · LW · GW

Thinking more about the singleton risk / global stable totalitarian government risk from Bostrom's Superintelligence, human factors, and theory of the firm.

Human factors represent human capacities or limits that are unlikely to change in the short term.  For example, the number of people one can "know" (for some definition of that term), limits to long-term and working memory, etc.

Theory of the firm tries to answer "why are economies markets but businesses autocracies" and related questions.  I'm interested in the subquestion of "what factors given the upper bound on coordination for a single business", related to "how big can a business be".

I think this is related to "how big can an autocracy (robustly/stably) be", which is how it relates to the singleton risk.

Some thoughts this produces for me:

  • Communication and coordination technology (telephones, email, etc) that increase the upper bounds of coordination for businesses ALSO increase the upper bound on coordination for autocracies/singletons
  • My belief is that the current max size (in people) of a singleton is much lower than current global population
  • This weakly suggests that a large global population is a good preventative for a singleton
  • I don't think this means we can "war of the cradle" our way out of singleton risk, given how fast tech moves and how slow population moves
  • I think this does mean that any non-extinction event that dramatically reduces population also dramatically increases singleton risk
  • I think that it's possible to get a long-term government aligned with the values of the governed, and "singleton risk" is the risk of an unaligned global government

So I think I'd be interested in tracking two "competing" technologies (for a hand-wavy definition of the term)

  1. communication and coordination technologies -- tools which increase the maximum effective size of coordination
  2. soft/human alignment technologies -- tools which increase alignment between government and governed
Comment by Alex Ray (alex-ray) on The tech left behind · 2020-11-18T07:20:08.884Z · LW · GW

+1 Plan 9.

I think it (weirdly) especially hits a strange place with the "forgotten" mark, in that pieces of it keep getting rediscovered (sometimes multiple times).

I got to work w/ some of the Plan 9 folks, and they would point out (with citations) when highly regarded papers in OSDI had been built (and published) in Plan 9, sometimes 10-20 years prior.

One form of this "forgotten" tech is tech that we keep forgetting and rediscovering, but:

  1. maybe this isn't the type of forget the original question is about, and
  2. possibly academia itself is incentivizing this (since instead of only getting one paper out of a good idea, if it can get re-used, then that's good for grad students / labs that need publications)
Comment by Alex Ray (alex-ray) on Alex Ray's Shortform · 2020-11-18T01:09:56.929Z · LW · GW

Future City Idea: an interface for safe AI-control of traffic lights

We want a traffic light that
* Can function autonomously if there is no network connection
* Meets some minimum timing guidelines (for example, green in a particular direction no less than 15 seconds and no more than 30 seconds, etc)
* Secure interface to communicate with city-central control
* Has sensors that allow some feedback for measuring traffic efficiency or throughput

This gives constraints, and I bet an AI system could be trained to optimize efficiency or throughput within the constraints.  Additionally, you can narrow the constraints (for example, only choosing 15 or 16 seconds for green) and slowly widen them in order to change flows slowly.

This is the sort of thing Hash would be great for, simulation wise.  There's probably dedicated traffic simulators, as well.

At something like a quarter million dollars a traffic light, I think there's an opportunity here for startup.

(I don't know Matt Gentzel's LW handle but credit for inspiration to him)

Comment by Alex Ray (alex-ray) on The tech left behind · 2020-11-18T00:52:29.817Z · LW · GW

I think commercial applications of nuclear fission sources are another good example.

Through the 1940s, there were lots of industrial processes, and commercial products which used nuclear fission or nuclear materials in some way.  Beta sources are good supplies of high-energy electrons (used in a bunch of polymer processes, among other things), alpha sources are good supplies of positively charged nuclei (used in electrostatic discharge, and some sensing applications).

I think one of the big turning points was the Atomic Energy Act, in the US, though international agreements might also be important factors here.

The world seems to have collectively agreed that nuclear risks are high, and we seem to have chosen to restrict proliferation (by regulating production and sale of nuclear materials) -- and as a side effect have "forgotten" the consumer nuclear technology industry.

I am interested in this because its also an example where we seem to have collectively chose to stifle/prevent innovation in an area of technology to reduce downside risk (dirty bombs and other nuclear attacks).

Comment by Alex Ray (alex-ray) on The tech left behind · 2020-11-18T00:29:08.380Z · LW · GW

I think Google Wave/Apache Wave is a good candidate here, at least for the crowd familiar with it.

Designed to be a new modality of digital communication, it combined features of email, messengers/chat, collaborative document editing, etc.

It got a ton of excitement from a niche crowd while it was in a closed beta.

It never got off the ground, though, and less than a year after finishing the beta, it was slowly turned down and eventually handed over to Apache.

Comment by Alex Ray (alex-ray) on How to get the benefits of moving without moving (babble) · 2020-11-15T01:25:28.971Z · LW · GW

I really appreciate how this and the previous posts does a lot to describe and frame the problems that moving would solve, such that it's possible to make progress on them.

I think it's harder to clearly frame the problem (or clearly break a big/vague problem into concrete subproblems).

Anyways, some babbling:

  • Home
    • Redecorate or remodel
    • Arrange a furniture swap with friends and neighbors
    • Marie kondo your stuff
    • Give friends a virtual tour of your space and ask what they would change
    • Ask your friends for virtual tours of their spaces for ideas
    • Have a garage sale / get rid of a bunch of stuff
    • Buy land
    • buy a house
    • buy a condo
  • Work
    • Ask for a raise
    • Ask for a promotion
    • Interview at other companies
    • Get career coaching advice (80k / EA folks are pretty practiced at this!)
    • Give career coaching advice (showing up, as a professional in my 30s, to EA / early career events has been more fulfilling than I thought it would be)
    • Go to conferences in your field
    • Go to conferences in the field you want to be in (strongly recommend this)
    • Join vocational groups (meetups, etc) to connect with folks in your field
    • Give presentations about your job/work at vocational groups
    • Try moving around weekend days (wednesday/sunday?)
    • Try working from home (okay this would probably have been more useful pre-pandemic)
    • Change up your commute (bike or walk or public transit or drive or motorcycle)
    • Join a carpool with people you think are cool to be in cars with
    • Start a carpool with people you think are cool to be in cars with
    • Take an online course (bonus: get your work to pay for it)
    • Get a professional degree (online or night school or whatever)
  • Relationships (mostly not romantic)
    • Make it easy for other people to schedule 1:1s with you (calendly / etc)
    • Reach out to possible mentors you would want to have
    • Reach out to possible mentees (many people I know could be great mentors but are too uncertain to take the first step towards mentorship themselves)
    • Retire a mentor of yours that you're no longer getting lots of value out of (but maybe still be friends?)
    • Start a book club
    • Start a paper reading group
    • Start a running group or gym group
    • Host more parties (post-vaccine)
    • Go to more parties (post-vaccine)
    • Join a monastery / religion / church / etc (not for everyone)
    • Start a religion / etc (I think the anti-digital movement has room for a neo-luddite group to coalesce, but thats a separate post)
  • Other
    • Find a counselor or therapist
    • Turn off your internet after dark
    • Go on a 10 day silent retreat
    • Spend a month in an RV
    • Travel much more often (for me this could be "travel at least 3 days every month" but for others could vary)
Comment by Alex Ray (alex-ray) on Alex Ray's Shortform · 2020-11-15T00:25:51.543Z · LW · GW

Looking at this more, I think I my uncertainty is resolving towards "No".

Some things:
- It's hard to bet against the bonds themselves, since we're unlikely to hold them as individuals
- It's hard to make money on the "this will experience a sharp decline at an uncertain point in the future" kind of prediction (much easier to do this for the "will go up in price" version, which is just buying/long)
- It's not clear anyone was able to time this properly for Detroit, which is the closest analog in many ways
- Precise timing would be difficult, much more so while being far away from the state

I'll continue to track this just because of my family in the state, though.

Point of data: it was 3 years between Detroit bonds hitting "junk" status, and the city going bankrupt (in the legal filing sense), which is useful for me for intuitions as to the speed of these.

Comment by Alex Ray (alex-ray) on Alex Ray's Shortform · 2020-11-08T20:37:18.807Z · LW · GW

Can LessWrong pull another "crypto" with Illinois?

I have been following the issue with the US state Illinois' debt with growing horror.

Their bond status has been heavily degraded -- most states' bonds are "high quality" with the standards agencies (moodys, standard & poor, fitch), and Illinois is "low quality".  If they get downgraded more they become a "junk" bond, and lose access to a bunch of the institutional buyers that would otherwise be continuing to lend.

COVID has increased many states costs', for reasons I can go into later, so it seems reasonable to think we're much closer to a tipping point than we were last year.

As much as I would like to work to make the situation better I don't know what to do.  In the meantime I'm left thinking about how to "bet my beliefs" and how one could stake a position against Illinois.

Separately I want to look more into EU debt / restructuring / etc as its probably a good historical example of how this could go.  Additionally previously the largest entity to go bankrupt in the USA was the city of Detroit, which probably is also another good example to learn from.

Comment by Alex Ray (alex-ray) on Nuclear war is unlikely to cause human extinction · 2020-11-07T08:29:28.295Z · LW · GW

Thanks for writing this up!  I think having more well-researched and well-written dives into things like this are great

A bunch of scattered thoughts and replies to these:

Overall I agree with the central idea (Nuclear War is unlikely to cause human extinction), but I disagree enough with the reasoning to want to hammer it into better shape.

Writing this I feel like in a "editing/constructive feedback" mood, but I'd welcome you to throw it all out if its not what you're going for.  To the feedback!

This seems to only consider current known nuclear weapons arsenals.  It seems worth including probabilities that different kinds of weapons are built before such a war.  In particular, longer-lived species of bombs (e.g. salt bombs, cobalt bombs, etc)

I think I want to separate "kill everyone with acute radiation right away" and "kill everyone with radiation in all of the food/water", and the latter seems less addressed by the energy/half life argument.  I think the weapon design space here is pretty huge, so ruling these out seems hard to me.  (Though I do think if we're assigning probabilities, they should get lower probabilities than conventional weapons)

In general I would prefer approximate numbers or at least likelihood ratios for what you think this evidence balances out to, and what likelihood of odds you would put on different outcomes.

(For example: "what is the likelihood ratio of the 3.C evidence that nuclear war planners are familiar with ideas like nuclear winter" -- I don't think these are strictly required, but they really help me contextualize and integrate this information)

In particular, Toby Ord gives a bunch of excellent quantitative analysis of X-risks, including nuclear war risk, in The Precipice.

(In fact, if your main point of the post was to present a different model from that one, adding numbers would greatly help in comparing and contrasting and integrating them)

Finally, I think my mental models of case 3) are basically the same as any event that is a significant change to the biosphere -- and it seems reasoning about this gets harder given your premise.

A hypothetical: if there are 3 major climate events in the next 100 years (of which one is a bellicose nuclear exchange), and humanity goes extinct due to climate related symptoms, does the nuclear war "cause" the human extinction in a way you're trying to capture?

Maybe what I want is for the premise to be more precise: define a time limit (extinct within X years) and maybe factor what it means to "cause" (for example, it seems like this suggests that an economic collapse triggered by an nuclear war, which triggers other things, that eventually leads to extinction, is not as clearly "caused by nuclear war")

Also maybe define a bit what "full-scale" means?  I assume that it means total war (as opposed to limited war), but good to clear up in any case.

That's all that came to mind for now.  Thanks again for sharing!~

Comment by Alex Ray (alex-ray) on Location Discussion Takeaways · 2020-11-05T06:21:33.401Z · LW · GW

This is probably better put in another post, but I think I agree with your read of the situation and recommendations, and want to follow it with the (to me) logical next step: "how to get some of the good things we could get by moving, without moving"

I like this post because it does talk about a bunch of things that could be got (sense of safety, isolation from political unrest, etc).

It seems not-easy and also not-impossible to brainstorm ways of dealing with this as a community in a sane and cost-effective way.

One way this could go:  (sketch of a vision)

Right now SF and most cities have a bunch of "earthquake disaster preparation" advice.  Things you should have on hand, ready to go, plans you should have made ahead of time, people to contact in case communications go down, things to do to prepare your home structure, attach your furniture to the walls, etc.

We could make some community version of that, pointed directly at the things we want to point at.

Comment by Alex Ray (alex-ray) on Where do (did?) stable, cooperative institutions come from? · 2020-11-04T19:49:25.785Z · LW · GW

Sharing an idea that came to mind while reading it, low confidence.

Maybe "forming great cultures" is really just the upper tail of "forming cultures" -- the more cultures we form, the more great cultures we get.

In this case the interesting thing is tracking how many cultures we form, and what factors control this rate.

I think over the timescales described, humans haven't really gotten much more interesting to other humans.  Humans are pretty great, hanging out together is (to many) more fun and exciting than hanging out alone.

A difference could be that the alternatives have been getting more and more interesting -- wandering in the woods is pretty but also boring.  Walking around town might be less boring.  Reading a book less boring still.  Listening to music is better for some.  The internet has created a whole lot of less-boring activities.  Somewhere in there we crossed a threshold for "forming cultures" becoming less and less interesting.

This is basically the idea "we form cultures when we get bored, and we're less bored".

But let's say I personally find the idea of starting a culture exciting, does this still affect me?

I think 'yes', because the people I try to recruit for participating in my culture will also have to choose between joining me and the BATNA.

Things that would update me against this: models that show "starting a culture" continues to be exciting, models that show "hedonic setpoint" reasons for the 'BATNA gets better' idea to be broken, evidence that more cultures are being started now than ever before.

Of all the things here I think the idea I'm most interested in inspecting is "formation of great cultures tracks formation of cultures in general".

Comment by Alex Ray (alex-ray) on Where do (did?) stable, cooperative institutions come from? · 2020-11-04T19:33:06.209Z · LW · GW

I think the point about people Goodhart-ing the things seen as greatness makes sense.  These incentives would have been around for a long time and don't predict recent changes, though.

One thing that is different now is more of the words/sentences/pictures/ideas of interactions I have are with some form of manipulable media (websites, podcasts, radio, television, etc) rather than flesh-and-blood humans.  Here I'm trying to capture something like the amount of beliefs, knowledge, and ideas moved, rather than amount of time or attention.

So this predicts things like 'its easier to form institutional cultures when there is more human-human interaction', which would point to a decline in recent decades, but also probably have significant events at past points in history.

Radio, television, internet, etc would probably be interesting points to study.

The recent pandemic is then interesting, because this would predict that in places that shut down for the pandemic, it became acutely more difficult to build/maintain cultures of great institutions, because we acutely curtailed human-human interaction.

Comment by Alex Ray (alex-ray) on Where do (did?) stable, cooperative institutions come from? · 2020-11-04T19:23:59.310Z · LW · GW

Maybe not the right place, but my understanding is that Robert Gordon's hypothesis is very different from the others.

The common view between these folks is that our expectation is for growth, and with this comes plans/strategies/policies which are breaking down as our growth has been slower than expected (at this point for decades).

(I think I know more about this one) Gordon's view is that stagnation is because our growth has come from discovering, scaling, and rolling-out a sequence of "once only" inventions.  We can only disseminate germ theory once, we can only add women to the workforce once, we can only widely deploy indoor plumbing once, etc.  This means the expectation is that as we get all the easy improvements, growth will slow down.  Gordon's view is importantly independent of culture, and makes similar predictions of US, UK, JP, SK, CN, FR, DE, (which they'll arrive at in different dates given the convergence model, but eventually all trend the same).  Gordon's prediction is that we're just now in a world where we're stuck at ~1% TFP growth.

(I know some about this) Cowen's hypothesis in the Great Stagnation is similar to Gordon's, but seems to argue that the stagnation he's describing is 1) more specific to American culture 2) reversible, in that he predicts given some policy changes that we can get back to the higher growth of earlier decades.  I don't know how much Cowen's thinking has changed since publishing that book.

(I know less about this) E.Weinstein's hypothesis is there's something in the cultural zeitgeist that is causing the stagnation.  I am interested in learning more about this take, and would appreciate references.

Comment by Alex Ray (alex-ray) on What is the current bottleneck on genetic engineering of human embryos for improved IQ · 2020-10-23T04:17:34.098Z · LW · GW

I don't really have much but this is at least from last year:

Steve Hsu discusses Human Genetic Engineering and CRISPR babies in this (the first?) episode of the podcast he has w/ Corey Washington


Comment by Alex Ray (alex-ray) on Reviews of TV show NeXt (about AI safety) · 2020-10-11T17:17:08.948Z · LW · GW

In case other folks would be interested, here is the trailer on youtube:

Not obvious from the review (to me): it's a fictional drama about a conflict between humans and a rogue AI.

Comment by Alex Ray (alex-ray) on Forecasting Thread: AI Timelines · 2020-08-25T15:47:06.990Z · LW · GW

It might be useful for every person responding to attempt to define precisely what they mean by human level AGI.

Comment by Alex Ray (alex-ray) on What's a Decomposable Alignment Topic? · 2020-08-25T07:10:54.142Z · LW · GW

I work at OpenAI on safety. In the past it seems like theres a gap between what I'd consider to be alignment topics that need to be worked on, and the general consensus for this forum. A good friend poked me to write something for this so here I am.

Topics w/ strategies/breakdown:

  • Fine-tuning GPT-2 from human preferences, to solve small scale alignment issues
    • Brainstorm small/simple alignment failures: ways that existing generative language models are not aligned with human values
    • Design some evaluations or metrics for measuring a specific alignment failure (which lets you measure whether you’ve improved a model or not)
    • Gather human feedback data / labels / whatever you think you can try training on
    • Try training on your data (there are tutorials on how to use Google Colab to fine-tune GPT-2 with a new dataset)
    • Forecast scaling laws: figure out how performance on your evaluation or metric varies with the amount of human input data; compare to how much time it takes to generate each labelled example (be quantitative!)
  • Multi-objective reinforcement learning — instead of optimizing a single objective, optimize multiple objectives together (and some of the objectives can be constraints)
    • What are ways we can break down existing AI alignment failures in RL-like settings into multi-objective problems, where some of the objectives are safety objectives and some are goal/task objectives
    • How can we design safety objectives such that they can transfer across a wide variety of systems, machines, situations, environments, etc?
    • How can we measure and evaluate our safety objectives, and what should we expect to observe during training/deployment?
    • How can we incentivize individual development and sharing of safety objectives
    • How can we augment RL methods to allow transferrable safety objectives between domains (e.g., if using actor critic methods, how to integrate a separate critic for each safety objective)
    • What are good benchmark environments or scenarios for multi-objective RL with safety objectives (classic RL environments like Go or Chess aren’t natively well-suited to these topics)
  • Forecasting the Economics of AGI (turn ‘fast/slow/big/etc’ into real numbers with units)
    • This is more “AI Impacts” style work than you might be asking for, but I think it’s particularly well-suited for clever folks that can look things up on the internet.
    • Identify vague terms in AI alignment forecasts, like the “fast” in “fast takeoff”, that can be operationalized
    • Come up with units that measure the quantity in question, and procedures for measurements that result in those units
    • Try applying traditional economics growth models, such as experience curves, to AI development, and see how well you can get things to fit (much harder to do this for AI than making cars — is a single unit a single model trained? Maybe a single week of a researchers time? Is the cost decreasing in dollars or flops or person-hours or something else? Etc etc)
    • Sketch models for systems (here the system is the whole ai field) with feedback loops, and inspect/explore parts of the system which might respond most to different variables (additional attention, new people, dollars, hours, public discourse, philanthropic capital, etc)

Topics not important enough to make it into my first 30 minutes of writing:

  • Cross disciplinary integration with other safety fields, what will and won’t work
  • Systems safety for organizations building AGI
  • Safety acceleration loops — how/where can good safety research make us better and faster at doing safety research
  • Cataloguing alignment failures in the wild, and create a taxonomy of them

Anti topics: Things I would have put on here a year ago

  • Too late for me to keep writing so saving this for another time I guess

I’m available tomorrow to chat about these w/ the group. Happy to talk then (or later, in replies here) about any of these if folks want me to expand further.

Comment by Alex Ray (alex-ray) on What problem would you like to see Reinforcement Learning applied to? · 2020-07-17T01:34:16.992Z · LW · GW

I'm surprised this hasn't got more comments. Julian, I've been incredibly impressed by your work in RL so far, and I'm super excited to see what you end up working on next.

I hope folks will forgive me just putting down some opinions about what problems in RL to work on:

I think I want us (RL, as a field) to move past games -- board games, video games, etc -- and into more real-world problems.

Where to go looking for problems?

These are much harder to make tractable! Most of the unsolved problems are very hard. I like referencing the NAE's Engineering Grand Challenges and the UN's Sustainable/Millennium Development Goals when I want to think about global challenges. Each one is much bigger than a research project, but I find them "food for thought" when I think about problems to work on.

What characteristics probably make for good problems for deep RL?

1. Outside of human factors -- either too big for humans, or too small, or top fast, or too precise, etc.

2. Episodic/resettable -- has some sort of short periodicity, giving bounds on long-term credit assignment

3. Already connected to computers -- solving a task with RL in a domain that isn't already hooked up to software/sensors/computers is going to be 99% setup and 1% RL

4. Supervised/Unsupervised failed -- I think in general it makes sense only to try RL after we've tried the simpler methods and they've failed to work (perhaps the data is too few, or labels too weak/noisy)

What are candidate problem domains?

Robotics is usually the first thing people say, so best just get it out of the way first. I think this is exactly right, but I think the robots we have access to today are terrible, so this turns into mostly a robot design problem with a comparatively smaller ML problem on top. (After working with robots & RL for years I have hours of this but saving that for another time)

Automatic control systems is underrated as a domain. Many problems involving manufacturing with machines involve all sorts of small/strange adjustments to things like "feed rate" "rotor speed" "head pressure" etc etc etc. Often these are tuned/adjusted by people who build up intuition over time, then transfer intuition to other humans. I expect it would be possible for RL to learn how to "play" these machines better and faster than any human. (Machines include: CNC machines, chemical processing steps, textile manufacture machines, etc etc etc)

Language models have been very exciting to me lately, and I really like this approach to RL with language models: I think the large language models are a really great substrate to work with (so far much better than robots!) but specializing them to particular purposes remains difficult. I think having much better RL science here would be really great.

Some 'basic research' topics

Fundamental research into RL scaling. It seems to me that we still don't really have a great understanding of the science of RL. Compared to scaling laws in other domains, RL is hard to predict, and has a much less well understood set of scaling laws (model size, batch size, etc etc). is a great example of the sort of thing I'd like to have for RL.

Multi-objective RL. In general if you ask RL people about multi-objective, you'll get a "why don't you just combine them into a single objective" or "just use one as an aux goal", but it's much more complex than that in the deep RL case, where the objective changes the exploration distribution. I think having multiple objectives is a much more natural way of expressing what we want systems to do. I'm very excited about at some point having transferrable objectives (since there are many things we want many systems to do, like "don't run into the human" and "don't knock over the shelf", in addition to whatever specific goal).

Trying to find some concrete examples, I'm coming up short.

I'm sorry I didn't meet the recommendation for replies, but glad to have put something on here. I think this is far too few replies for a question like this.

Comment by Alex Ray (alex-ray) on Should I wear wrist-weights while playing Beat Saber? · 2019-07-22T23:30:33.511Z · LW · GW

Comment because this is answering a different question than “should I use wrist weights”

I have found that a weight vest is a nice improvement to the game. I’d recommend trying it, and it possibly might have some of the common benefits with the wrist weights without some of the downsides.

Comment by Alex Ray (alex-ray) on 87,000 Hours or: Thoughts on Home Ownership · 2019-07-07T06:48:05.249Z · LW · GW

I'm pretty surprised this entire argument goes without using any amount of quantitive modeling or data analysis.

I do think it presents a bunch of persuasive and philosophical arguments in the direction of your conclusions, but it's easy to imagine (and find, searching on the internet) persuasive and philosophical arguments in the opposite direction.

(Caveat: I'm a bit new to this forum and how things work, but surely for folks here this is better answered by building a model and incorporating uncertainty?)

A few of the specifics you give I've found are not borne out in the research I've done (e.g. price sensitivity has more to do with location/centrality than it does with luxury, though more luxurious homes tend to be built farther from city centers). This could just be from different sources, but I'm noticing want to wave several large [CITATION NEEDED] flags.

Also maybe it'd be useful to share the quantitative analysis I've done? Basically "modeling the financial, legal, and social implications of buying a house together" has been the biggest project of mine for 2019 outside of work, but I'm not an expert (most of us in my house, myself included, are first time home buyers). I'd consider myself better informed than the average person who has not owned a house for an extended period of time, but would be very interested in learning more and learning where my models are bad.

For interested folks I found Shiller's Irrational Exuberance to give a bunch of nice solid models (backed with data!) on speculative pricing bubbles in ways that seem to apply to SF bay area housing in particular.