Posts

How to tradeoff utility and agency? 2022-01-14T01:33:28.803Z
Streaming Science on Twitch 2021-11-15T22:24:05.388Z
Young Scientists 2021-10-22T18:01:31.579Z
Spaced Repetition Systems for Intuitions? 2021-01-28T17:23:19.000Z
Alex Ray's Shortform 2020-11-08T20:37:18.327Z

Comments

Comment by A Ray (alex-ray) on Sharing Powerful AI Models · 2022-01-26T02:25:46.984Z · LW · GW

It does seem that public/shared investment into tools that make structured access programs easier, might make more of them happen.

As boring as it is, this might be a good candidate for technical standards for interoperability/etc.

Comment by A Ray (alex-ray) on davidad's Shortform · 2022-01-26T02:22:19.648Z · LW · GW

Re: alignment tasks in multi-task settings.  I think this makes a lot of sense.  Especially in worlds where we have a lot of ML/AI systems doing a bunch of different things, even if they have very different specific tasks, the "library" of alignment objectives is probably widely shared.

Comment by A Ray (alex-ray) on Alex Ray's Shortform · 2022-01-26T01:59:08.218Z · LW · GW

Thanks for taking the time to explain this!

I feel mostly confused by the way that things are being framed. ELK is about the human asking for various poly-sized fragments and the model reporting what those actually were instead of inventing something else. The model should accurately report all poly-sized fragments the human knows how to ask for.

I think this is what I was missing.  I was incorrectly thinking of the system as generating poly-sized fragments.

Comment by A Ray (alex-ray) on What should a student who is already a decent programmer do over the summer? · 2022-01-25T21:35:30.608Z · LW · GW

Re: 4.  I think it's totally possible to start super-short-lived orgs, as long as that's the plan from the outset.  Also I think there are a lot of people in tech startup / tech entrepreneurship space that are willing to mentor, but I agree finding them and making that connection is harder.

From my time working in tech, and spotting common weaknesses w/ new folks (though might not apply to you!): Statistics, Ethics, and Writing.

I think spending time learning statistics is probably a pretty good use of time, since a bunch of the technical aspects are non-obvious, and translate to super-powers when programming with stochastic or noisy systems.  (E.g. knowing a sampling technique which more efficiently estimates a parameter is basically like knowing an algorithm with better big-O complexity).

Second, even in doing boring feature implementation for software systems, issues that have ethical ramifications come up surprisingly often.  (Decisions about filtering datasets, how to normalize forms, etc).  Spending some time getting a good overview and understanding of the field I think helps prevent the two big failure modes I see here: 1. not realizing that a thing has ethical implications at all, and then deploying a thing which becomes expensive to un-deploy or change, and 2. freezing when encountering a problem with ethical implications, without knowing where to go for a solution.

Last I think writing is a really great way to put practice and time and effort.  The more I've done work in software, the more its shifted that the most important typing I do is documents and not code. 

+1 asking HN.  I think you'll get a different diversity of answers there.

Comment by A Ray (alex-ray) on Alex Ray's Shortform · 2022-01-25T21:17:35.711Z · LW · GW

So if there are different poly fragments that the human would evaluate differently, is ELK just "giving them a fragment such that they come to the correct conclusion" even if the fragment might not be the right piece.

E.g. in the SmartVault case, if the screen was put in the way of the camera and the diamond was secretly stolen, we would still be successful even if we didn't elicit that fact, but instead elicited some poly fragment that got the human to answer disapprove?

Like the thing that seems weird to me here is that you can't simultaneously require that the elicited knowledge be 'relevant' and 'comprehensible' and also cover these sorts of obfuscated debate like scenarios.

Does it seem right to you that ELK is about eliciting latent knowledge that causes an update in the correct direction, regardless of whether that knowledge is actually relevant?

Comment by A Ray (alex-ray) on Alex Ray's Shortform · 2022-01-25T20:46:01.269Z · LW · GW

Okay now I have to admit I am confused.

Re-reading the ELK proposal -- it seems like the latent knowledge you want to elicit is not-obfuscated.

Like, the situation to solve is that there is a piece of non-obfuscated information, which, if the human knew it, would change their mind about approval.

How do you expect solutions to elicit latent obfuscated knowledge (like 'the only true explanation is incomprehendible by the human' situations)?

Comment by A Ray (alex-ray) on Vavilov Day Starts Tomorrow · 2022-01-25T20:12:08.131Z · LW · GW

I'm joining in Vavilov day.  I really appreciate you finding and sharing those links of things to read.

Minor note: for some reason the first link in the article to the previous post doesn't work for me.  (It looks like it's expecting me to be logged in to wordpress?)

Comment by A Ray (alex-ray) on How feasible/costly would it be to train a very large AI model on distributed clusters of GPUs? · 2022-01-25T20:01:43.386Z · LW · GW

While there are rare examples of deep learning algorithms that can scale in this way, in practice current relative resource costs of compute-vs-storage-vs-bandwidth don't work out in favor of this kind of topology.

First problem: Current deep learning systems require a bunch of very dense networking with a high degree of connectivity between nodes, high bandwidth over these connections, and low latency.   It's possible some sort of tweak or modification to the architecture and algorithm would allow for training on compute that's got slow/high-latency/low-bandwidth connections, but I don't know of any right now.

Second problem: Current distributed training approaches strongly require almost all of the compute to be available and reliable.  If nodes are stochastically coming and going, it's hard to know where to route data / how long to wait to accumulate gradients before giving up / who has what copy of the latest parameters.  This seems also solveable with engineering, but engineering distributed systems to deal with node failures is a headache.

Third problem: Trust in the computing.  Deep neural networks are pretty sensitive to data poisoning and things like adversarial examples.  I expect that for almost any efficient distributed training setup, a clever attacker would be able to figure out a way to subtly introduce unnoticed, unwanted changes to the training.  In general I think a patient attacker could make changes subtle enough that they're almost always below some threshold of validation, but I don't have a proof for this.

Maybe there are other problems that I missed, but at the very least each one of those independently would make me not want to train my large models on a setup like this.

Comment by A Ray (alex-ray) on Alex Ray's Shortform · 2022-01-25T05:50:49.913Z · LW · GW

Cool, this makes sense to me.

My research agenda is basically about making a not-obfuscated model, so maybe I should just write that up as an ELK proposal then.

Comment by A Ray (alex-ray) on Alex Ray's Shortform · 2022-01-25T01:44:42.450Z · LW · GW

Thinking more about ELK.  Work in progress, so I expect I will eventually figure out what's up with this.

Right now it seems to me that Safety via Debate would elicit compact/non-obfuscated knowledge.

So the basic scenario is that in addition to SmartVault, you'd have Barrister_Approve and Barrister_Disapprove, who are trying to share evidence/reasoning which makes the human approve or disapprove of SmartVault scenarios.

The biggest weakness of this that I know of is Obfuscated Arguments -- that is, it won't elicit obfuscated knowledge.

It seems like in the ELK example scenario they're trying to elicit knowledge that is not obfuscated.

The nice thing about this is that Barrister_Approve and Barrister_Disapprove both have pretty straightforward incentives.

Paul was an author of the debate paper so I don't think he missed this -- more like I'm failing to figure out what's up with the SmartVault scenario, and the current set of counterexamples.

Current possibilities:

  • ELK is actually a problem about eliciting obfuscated information, and the current examples about eliciting non-obfuscated information are just to make a simpler thought experiment
  • Even if the latent knowledge was not obfuscated, the opposing barrister could make an obfuscated argument against it.
    • This seems easily treatable by the human just disbelieving any argument that is obfuscated-to-them.
Comment by A Ray (alex-ray) on Alex Ray's Shortform · 2022-01-25T00:13:26.093Z · LW · GW

Some disorganized thoughts about adversarial ML:

  • I think I'm a little bit sad about the times we got whole rooms full of research posters about variations on epsilon-ball adversarial attacks & training, basically all of them claiming how this would help AI safety or AI alignment or AI robustness or AI generalization and basically all of them were basically wrong.
  • This has lead me to be pretty critical of claims about adversarial training as pathways to aligning AGI.
  • Ignoring the history of adversarial training research, I think I still have problems with adversarial training as a path to aligning AGI.
  • First, adversarial training is foremost a capabilities booster.  Your model makes some obvious/predictable errors (to the research team working on it) -- and if you train against them, they're no longer errors!
  • This has the property of "if you solve all the visible problems all you will be left by is invisible problems" alignment issue, as well as as a cost-competitiveness issue (many adversarial training approaches require lots of compute).
  • From a "definitely a wild conjecture" angle, I am uncertain that the way the current rates of adding vs removing adversarial examples will play out in the limit.  Basically, think of training as a continuous process that removes normal errors and adds adversarial errors.  (In particular, while there are many adversarial examples present at the initialization of a model -- there exist adversarial examples at the end of training which didn't exist at the beginning.  I'm using this to claim that training 'put them in')  Adversarial training removes some adversarial examples, but probably adds some which are adversarial in an orthogonal way.  At least, this is what I expect given that adversarial training doesn't seem to be cross-robust.
  • I think if we had some notion of how training and adversarial training affected the number of adversarial examples a model had, I'd probably update on whatever happened empirically.  It does seem at least possible to me that adversarial training on net reduces adversarial examples, so given a wide enough distribution and a strong enough adversary, you'll eventually end up with a model that is arbitrarily robust (and not exploitable).
  • It's worth mentioning again how current methods don't even provide robust protection against each other.
  • I think my actual net position here is something like:
    • Adversarial Training and Adversarial ML was over-hyped as AI Safety in ways that were just plain wrong
    • Some version of this has some place in a broad and vast toolkit for doing ML research
    • I don't think Adversarial Training is a good path to aligned AGI
Comment by A Ray (alex-ray) on Postmortem on DIY Recombinant Covid Vaccine · 2022-01-24T23:01:28.477Z · LW · GW

Does a similarly-structured resource to the ones I linked exist for recombinant vaccines?  I'd be curious for learning more about what would be needed to do this kind of DIY-ish project in the future.

Comment by A Ray (alex-ray) on How to think about and deal with OpenAI · 2022-01-24T04:21:31.165Z · LW · GW

I just saw this, but this feels like a better-late-than-never situation.  I think hard conversations about the possibilities of increasing existential risk should happen.

I work at OpenAI.  I have worked at OpenAI for over five years.

I think we definitely should be willing and able to have these sorts of conversations in public, mostly for the reasons other people have listed.  I think AnnaSalamon is the answer I agree with most.

I want to also add this has made me deeply uncomfortable and anxious countless times over the past few years.  It can be a difficult thing to navigate well or navigate efficiently.  I feel like I've gotten better at it, and better at knowing/managing myself.  I see newer colleagues also suffering from this.  I try to help them when I can.

I'm not sure this is the right answer for all context, but I am optimistic for this one.  I've found the rationality community and the lesswrong community to be much better than average at dealing with bad faith arguments, and for cutting towards the truth.  I think there are communities where it would go poorly enough that it could be net-negative to have the conversation.

Side note: I really don't have a lot of context about the Elon Musk connection, and the guy has not really been involved for years.  I think the "what things (including things OpenAI is doing) might increase existential risk" is an important conversation to have when analyzing forecasts, predictions, and research plans.  I am less optimistic about "what tech executives think about other tech executives" going well.

Comment by A Ray (alex-ray) on Alex Ray's Shortform · 2022-01-24T03:35:18.895Z · LW · GW

I should have probably separated out 4 into two categories:

  • The virus was not in the person but was on the sample (somehow contaminated by e.g. the room w/ the tests)
  • The virus was in the person and was on the sample

Oh well, it was on my shortform because it was low effort.

Comment by A Ray (alex-ray) on How I'm thinking about GPT-N · 2022-01-24T03:00:32.620Z · LW · GW

I like this article.  I think it's well-thought out reasoning of possible futures, and I think largely it matches my own views.

I especially appreciate how it goes into possible explanations for why scaling happens, not just taking it happening for granted.  I think a bunch of the major points I have in my own mental models for this are hit in this article (double descent, loss landscapes, grokking).

The biggest point of disagreement I have is with grokking.  I think I agree this is important, but I think the linked example (video?) isn't correct.

First: Grokking and Metrics

It's not surprising (to me) that we see phase changes in correctness/exact substring match scores, because they're pretty fragile metrics -- get just a part of the string wrong, and your whole answer is incorrect.  For long sequences you can see these as a kind of N-Chain.

(N-chain is a simple RL task where an agent must cross a bridge, and if they miss step at any point they fall off the end.  If you evaluate by "did it get to the end" then the learning exhibits a phase-change-like effect, but if you evaluate by "how far did it go", then progress is smoother)

I weakly predict that for many 'phase changes in results' are similar effects to this (but not all of them, e.g. double descent).

Second: Axes Matter!

Grokking is a phenomena that happens during training, so it shows up as a phase-change-like effect on training curves (performance vs steps) -- the plots in the video are showing the results of many different models, each with their final score, plotted as scaling laws (performance vs model size).

I think it's important to look for these sorts of nonlinear progressions in research benchmarks, but perf-vs-modelsize is very different from perf-vs-steps

Third: Grokking is about Train/Test differences

An important thing that's going on when we see grokking is that train loss goes to zero, then much later (after training with a very small training loss) -- all of a sudden validation performance improves.  (Validation loss goes down)

With benchmark evaluations like BIG-Bench, the entire evaluation is in the validation set, though I think we can consider the training set having similar contents (assuming we train on a wide natural language distribution).

Fourth: Relating this to the article

I think the high level points in the article stand -- it's important to be looking for and be wary of sudden or unexpected improvements in performance.  Even if scaling laws are nice predictive functions, they aren't gears-level models, and we should be on the lookout for them to change.  Phase change like behavior in evaluation benchmarks are the kind of thing to look for.

I think that's enough for this comment.  Elsewhere I should probably writeup my mental models for what's going on with grokking and why it happens.

Comment by A Ray (alex-ray) on Postmortem on DIY Recombinant Covid Vaccine · 2022-01-24T01:08:13.355Z · LW · GW

Thanks for sharing this post-mortem!

I'm curious about the difference between the approach you described here and the DNA vaccines that were open sourced in Project MacAffee.  This was the one done as a course by Zayner and co.

My amateur take at the differences:

  • Different peptide/subunit target
  • Vaccine given as protein rather than plasmid to produce the protein
Comment by A Ray (alex-ray) on What’s wrong with Pomodoro · 2022-01-24T00:51:27.901Z · LW · GW

I appreciate how this clearly gives people a thing they can try today.  I expect the downsides jimrandomh mentions are real (+1 to stop-loss being an important benefit) -- but this seems like an easy thing for people to empirically test out and keep if it works for them better than pomodoros.

Adding another benefit for Pomodoros that I think would be missing here: sync-ing / co-working.

I think individual productivity is probably what matters most overall, but most of the pomodoros I do are co-working with other people.  (E.g. in an office or coworking space -- "Anyone want to do some pomodoros?")

For that, it seems like some sort of clock-work third time could work ("lets do 30 min / 10 min intervals"), but still has the clock-work downsides you mentioned above.

Comment by A Ray (alex-ray) on The Liar and the Scold · 2022-01-23T23:26:52.092Z · LW · GW

I appreciate the dry/regular tone and the perspective of someone who could have believably been on this forum writing about their experience.

It had a bunch of the things I like about Literary Realism, but in a place I wasn't expecting it!

Edit: Bleh.  I meant to say Magical Realism!

Comment by A Ray (alex-ray) on Why rationalists should care (more) about free software · 2022-01-23T23:19:45.063Z · LW · GW

I care a lot about free (and open source) software.

In particular, I learned programming so I could make some changes to a tablet note-taking app I was using at school.  Open source is the reason why I got into software professionally, and causally connected to a bunch of things in my life.

Some points I have in favor of this:

  • I think having the ability to change tools you use makes using those tools better for thinking
  • In general I'm more excited about a community of 'tool-builders' rather than 'tool-users' (this goes for cognitive tools, too, which the rationality community is great at)
  • Feedback from how people use/modify things (like software) is a big ingredient in making it better and creating more value.

With that said, I think we're still in better need of analogies and thought experiments on what to do with compute, and how to deal with risk.

It's much easier to share the source code to a program than to provide everyone the compute to run it on.  Compute is pretty heterogenous, too, which makes the problem harder.  It's possible that some sort of 'universal basic compute' is the future, but I am not optimistic about that coming before things like food/water/shelter/etc.

The second point is that I think it is important to consider technological downsides when deploying things like this (and sharing open source software / free software counts as deployment).  In general its easier to think of the benefits than the harms, and a bunch of the harms come from tail risks that are hard to predict, but I think this is also worth doing.

Comment by A Ray (alex-ray) on Alex Ray's Shortform · 2022-01-23T21:33:12.330Z · LW · GW

There recently was a COVID* outbreak at an AI community space.

>20 people tested positive on nucleic tests, but none of the (only five) people that took PCR tests came back positive.

Thinking out loud about possibilities here:

  1. The manufacturer of the test used a nucleic acid sequence that somehow cross-targets another common sequence we'd find in upper respiratory systems (with the most likely candidate here being a different, non-COVID, upper respiratory virus).  I think this is extremely unlikely mistake for a veteran test manufacturer to make, but this manufacturer seems to be new-ish (and mostly looking at health diagnostics, instead of disease surveillance).  I'm <5%
  2. There was a manufacturing error with the batch of tests used.  I don't know if the group tried to re-test positive people with confirmatory tests from a different batch, but for now I'll assume no.  (These tests are not cheap so I wouldn't blame them)  This means that a manufacturing error could be correlated between all of these.  We have seen COVID tests recalled recently, and this manufacturer has had no recalls.  Fermi estimates from their FY2021 profit numbers show there are at least 2 million tests from last year alone.  I expect if there were a bunch of people testing positive it would be news, and a manufacturing failure narrowly constrained to our batch is unlikely.  I'm <5%
  3. There was some sort of environmental issue with how the tests were stored or conducted.  Similar to the previous point, but this one is limited to the test boxes that were in the community space, so could be correlated with each other without being correlated with outside tests.  COVID tests in general are pretty sensitive to things like temperature and humidity, but also I expect these folks to have been mindful of this.  My guess is that if it were easy to make this mistake in an 'office building setting' -- it would be eventually newsworthy (lots of people are taking these tests in office settings).  Most of this is trusting that these people were not negligent on their tests.  I'm ~10% on this failure mode.
  4. Some smaller fraction of people were positive, and somehow all of the positive tests had been contaminated by these people.  Maybe by the normal respiratory pathways!  (E.g. people breathed in virus particles, and then tested positive before they were really 'infected' -- essentially catching the virus in the act!)  These people would later go on to fight off the virus normally without becoming infected/infectious, due to diligent vaccination/etc.  I have no idea how likely the mechanism is, and I'm hesitant to put probability on biology explanations I make up myself.  I'm <10%
  5. Unknown unknowns bucket.  One thing specifically contributing here is that (as far as a quick google goes) it seems like this test manufacturer hasn't released what kind of nucleic amplification reaction they're using.  So "bad/weird nucleic recipe" goes in here.  Secondly, I was wondering at the engineering quality of their test's internal mechanisms.  I couldn't find any teardown or reporting on this, so without that "internal mechanism failure" also goes in here.  Always hard to label this but it feels like ~50%
  6. It was really COVID.  The thing to explain here is "why the PCR negatives".  I think it could be a combination of chance (at most a quarter of the positives got confirmatory PCRs) and time delay (it seems like the PCRs were a while after the positive rapid tests).  I'm also not certain that they were the same people -- maybe someone who didn't test positive got a PCR, too?  There's a lot of unknowns here, but I'm not that surprised when we find out the highly-infectious-disease has infected people.  I'm ~40%
Comment by A Ray (alex-ray) on Is AI Alignment a pseudoscience? · 2022-01-23T19:15:55.385Z · LW · GW

The more I read about AI Alignment, the more I have a feeling that the whole field is basically a fictional-world-building exercise.

I think a lot of forecasting is this, but with the added step of attaching probabilities and modeling.

Comment by A Ray (alex-ray) on A one-question Turing test for GPT-3 · 2022-01-23T18:12:41.758Z · LW · GW

I think this broadly makes sense to me.  There are many cases where "the model is pretending to be dumb" feels appropriate.

This is part of why building evaluations and benchmarks for this sort of thing is difficult.

I'm at least somewhat optimistic about doing things like data-prefixing to allow for controls over things like "play dumb for the joke" vs "give the best answer", using techniques that build on human feedback.

I personally have totally seen GPT-3 fail to give a really good answer on a bunch of tries a bunch of times, but I spend a lot of time looking at it's outputs and analyzing them.  It seems important to be wary of the "seems to be dumb" failure modes.

Comment by A Ray (alex-ray) on Alex Ray's Shortform · 2022-01-23T07:18:19.950Z · LW · GW

The ELK paper is long but I’ve found it worthwhile, and after spending a bit of time noodling on it — one of my takeaways is I think this is essentially a failure mode for the approaches to factored cognition I've been interested in.  (Maybe it's a failure mode in factored cognition generally.

I expect that I’ll want to spend more time thinking about ELK-like problems before spending a bunch more time thinking about factored cognition.

In particular it's now probably a good time to start separating a bunch of things I had jumbled together, namely:

  • Developing AI technology that helps us do alignment research
  • Developing aligned AI

Previously I had hoped that the two would be near each other in ways that permit progress on both at the same time.

Now I think without solving ELK I would want to be more careful and intentional about how/when to develop AI tech to help with alignment.

Comment by A Ray (alex-ray) on A one-question Turing test for GPT-3 · 2022-01-23T02:55:53.400Z · LW · GW

I like this a lot.  It does seem to show a pretty clear failure of reasoning on behalf of the language models.

It's pretty intuitive and easy to show people what this kind of shallow-pattern-matcher failures look like.

Meta: I hope more people do more small experiments like this and share their results, and hopefully a few of them start getting put into benchmark/evaluations.

Even this one by itself might be good to make some form of benchmark/evaluation on (in ways that are easy/automatic to evaluate future models on).

Comment by A Ray (alex-ray) on Alex Ray's Shortform · 2022-01-20T23:53:37.100Z · LW · GW

I engage too much w/ generalizations about AI alignment researchers.

Noticing this behavior seems useful for analyzing it and strategizing around it.

A sketch of a pattern to be on the lookout for in particular is "AI Alignment researchers make mistake X" or "AI Alignment researchers are wrong about Y".  I think in the extreme I'm pretty activated/triggered by this, and this causes me to engage with it to a greater extent than I would have otherwise.

This engagement is probably encouraging more of this to happen, so I think more of a pause and reflection would make more good things happen.

It's worth acknowledging that this comes from a very deep sense of caring and importantness about AI alignment research that I feel.  I spend a lot of my time (both work time and free time) trying to foster and grow the field.  It seems reasonable I want people to have correct beliefs about this.

It's also worth acknowledging that there will be some cases where my disagreement is wrong.  I definitely don't know all AI alignment researchers, and there will be cases where there are broad field-wide phenomena that I missed.  However I think this is rare, and probably most of the people I interact with will have less experience w/ the field of AI alignment.

Another confounder is that the field is both pretty jargon-heavy and very confused about a lot of things.  This can lead to a bunch of semantics confusions masking other intellectual progress.  I'm definitely not in the "words have meanings, dammit" extreme, and maybe I can do a better job asking people to clarify and define things that I think are being confused.

A takeaway I have right now from reflecting on this, is that "I disagree about <sweeping generalization about AI alignment researchers>" feels like a simple and neutral statement to me, but isn't a simple and neutral statement in a dialog.

Thinking about the good stuff about scout mindset, I think things that I could do instead that would be better:

  1. I can do a better job conserving my disagreement.  I like the model of treating this as a limited resource, and ensuring it is well-spent seems good.
  2. I can probably get better mileage out of pointing out areas of agreement than going straight to highlighting disagreements (crocker's rules be damned)
  3. I am very optimistic about double-crux as a thing that should be used more widely, as well as a specific technique to deploy in these circumstances.  I am confused why I don't see more of it happening.

I think that's all for the reflection on this for now.

Comment by A Ray (alex-ray) on What's Up With Confusingly Pervasive Consequentialism? · 2022-01-20T23:35:19.688Z · LW · GW

I like this one.  I think it does a lot to capture both the concept and the problem.

The concept is that we expect AI systems to be convergently goal-directed.

The problem is that people in AI research often uncertain about goal-directeness and its emergence in advanced AI systems.  (My attempt to paraphrase the problem of the post, in terms of goal-directedness, at least)

Comment by A Ray (alex-ray) on What's Up With Confusingly Pervasive Consequentialism? · 2022-01-20T22:13:04.269Z · LW · GW

Nothing comes to mind as a single term, in particular because I usually think of 'thinking', 'predicting', and 'planning' separately.

If you're okay with multiple terms, 'thinking, predicting, and planning'.

Aside: now's a great time to potentially rewrite the LW tag header on consequentialism to match this meaning/framing.  (Would probably help with aligning people on this site, at least). https://www.lesswrong.com/tag/consequentialism

Comment by A Ray (alex-ray) on What's Up With Confusingly Pervasive Consequentialism? · 2022-01-20T21:37:22.646Z · LW · GW

Saying this again separately, if you taboo 'consequentialism' and take these as the definitions for a concept:

"the thing to do is choose actions that have the consequences you want." 

A ___ is something that thinks, predicts, and plans (and, if possible, acts) in such a way as to bring about particular consequences.

I think this is what "the majority of alignment researchers who probably are less on-the-ball" are in fact thinking about quite often.

We just don't call it 'consequentialism'.

Comment by A Ray (alex-ray) on What's Up With Confusingly Pervasive Consequentialism? · 2022-01-20T21:27:55.484Z · LW · GW

I think my personal take is basically "yeah it seems like almost everything routes through a near-consequentialist theory" and "calling this theory 'consequentialism' seems fair to me".

I spend a lot of time with people that are working on AI / AI Alignment who aren't in the rationality community, and I don't think this is the take for all of them.  In particular I imagine from the "words have meaning, dammit" camp a lot of disagreement about 'consequentialism' the term, but if you taboo'd it, there's a lot of broad agreement here.

In particular, I think this belief is super common and super strong in researchers focused on aligning AGI, or otherwise focused on long-term alignment.

I do think there's a lot of disagreement in the more near-term alignment research field.

This is why this article felt weird to me -- it's not clear that there is a super wide mistake being made, and to the extent Raemon/John think there is, there's also a lot of people who are uncertain (again c/f moral uncertainty) even if updating in the 'thinking/predicting' direction.

E.g. for this bit:

I... guess what I think Eliezer thinks is that Thoughful Researcher isn't respecting inner optimizers enough.

My take is median Thoughtful Researcher is more uncertain about inner optimizers -- instead of being certain that EY is wrong here.

And pointing at another bit:

Consequentialism is a (relatively) simple, effective process for accomplishing goals, so things that efficiently optimize for goals tend to approximate it.

I think people would disagree with this as consequentialism.

It's important to maybe point at another term that's charged with a nontraditional meaning in this community: rationality.

We mean something closer to skeptical empiricism that the actual term, but if you taboo it I think you end up with a lot more agreement about what we're talking about.

Comment by A Ray (alex-ray) on What's Up With Confusingly Pervasive Consequentialism? · 2022-01-20T21:01:18.431Z · LW · GW

Yeah this seems like one way it could resolve the differences in arguments.

My guess (though I don't know for certain) is that more AI alignment researchers would agree with "the thing to do is choose actions that have the consequences you want" is an important part of AI research, than "the right/moral thing to do is to choose actions that have good consequences" is an important part of AI research.

I'm curious how much confusion you think is left after taboo-ing the term and communicating the clarification?

Comment by A Ray (alex-ray) on Estimating training compute of Deep Learning models · 2022-01-20T20:58:49.841Z · LW · GW

This seems pretty well done!  Some thoughts on future research in this direction:

  • It seems like you probably could have gotten certainty about compute for at least a handful of the models studied in question (either because the model was open sourced, or you have direct access to the org training it like Eleuther) -- it would be interesting to see how the estimation methods compared to the exact answer in this case.  (Probably doable with GPT-J for example)
  • While I agree with dropout not significantly reducing computation I think two more contemporary techniques are worth considering here: structured sparsity in weights ('blocksparse'), and mixture-of-experts gating ('switch transformer').  I think the latter is more important because it changes both the training compute and inference compute.
  • Comparing custom ML hardware (e.g. Google's TPUs or Baidu's Kunlun, etc) is tricky to put on these sorts of comparisons.  For those I think the MLPerf Benchmarks are super useful.  I'd be curious to hear the authors' expectations of how this research changes in the face of more custom ML hardware.
  • In general I think it'd be good to integrate a bunch of the performance benchmarks that are publicly available (since hardware providers are usually pretty eager to show off stats that make their hardware look good) into calibrations for this method.  It's also usually pretty straightforward to compute the operations and exact utilization in these runs, since they're heavily standardized on the exact model and dataset.
Comment by A Ray (alex-ray) on What's Up With Confusingly Pervasive Consequentialism? · 2022-01-20T20:28:14.380Z · LW · GW

I think there's some confusion going on with "consequentialism" here, and that's at least a part of what's at play with "why isn't everyone seeing the consequentialism all the time".

One question I asked myself reading this is "does the author distinguish 'consequentialism' with 'thinking and predicting' in this piece?" and I think it's uncertain and leaning towards 'no'.

So, how do other people use 'consequentialism'?

It's sometimes put forward as a moral tradition/ethical theory, as an alternative to both deontology and virtue ethics.  I forget which philosopher decided this was the trifecta but these are often compared and contrasted to each other.  In particular, the version used here seems to not fit well with this article.

Another might be that consequentialism is an ethical theory that requires prediction (whereas others do not) -- I think this is an important feature of consequentialism, but it seems like 'the set of all ethical theories which have prediction as a first class component' is bigger than just consequentialism.  I do think that ethical theories that require prediction as a first class component are important for AI alignment, specifically intent alignment (less clear if useful for non-intent-alignment alignment research).

A different angle to this would be "do common criticisms of consequentialism apply to the concept being used here".  Consequentialism has had a ton of philosophical debate over the last century (probably more?) and according to me there's a bunch of valid criticisms.[1]

Finally I feel like this is missing a huge step in the recent history of ethical theories, which is the introduction of Moral Uncertainty.  I think Moral Uncertainty is a huge step, but the miss (in this article) is a 'near miss'.  I think a similar argument could have been made for AI researchers / Alignment researchers, using the framing of Moral Uncertainty, should be updating on net in the direction of consequentialism being useful/relevant for modeling systems (and possibly useful for designing alignment tech).

  1. ^

    I'm not certain that the criticisms will hold, but I think that proponents of consequentialism have insufficiently engaged with the criticisms; my net current take is uncertain but leaning in the consequentialists favor.  (See also: Moral Uncertainty)

Comment by A Ray (alex-ray) on Alex Ray's Shortform · 2022-01-19T20:52:05.667Z · LW · GW

... is just an arbitrary thing not to do.

I think this is the crux-y part for me.  My basic intuition here is something like "it's very hard to get contemporary prosaic LMs to not do a thing they already do (or have high likelihood of doing)" and this intuition points me in the direction of instead "conditionally training them to only do that thing in certain contexts" is easier in a way that matters.

My intuitions are based on a bunch of assumptions that I have access to and probably some that I don't.

Like, I'm basically only thinking about large language models, which are at least pre-trained on a large swatch of a natural language distribution.  I'm also thinking about using them generatively, which means sampling from their distribution -- which implies getting a model to "not do something" means getting the model to not put probability on that sequence.

At this point it still is a conjecture of mine -- that conditional prefixing behaviors we wish to control is easier than getting them not to do some behavior unconditionally -- but I think it's probably testable?

A thing that would be useful to me in designing an experiment to test this would be to hear more about adversarial training as a technique -- as it stands I don't know much more than what's in that post.

Comment by A Ray (alex-ray) on Alex Ray's Shortform · 2022-01-19T18:22:14.637Z · LW · GW

"The goal is" -- is this describing Redwood's research or your research or a goal you have more broadly?

I'm curious how this is connected to "doesn't write fiction where a human is harmed".

Comment by A Ray (alex-ray) on Possible Dangers of the Unrestricted Value Learners · 2022-01-19T05:22:34.288Z · LW · GW

I like the main point; I hadn't considered it before with value learning.  Trying to ask myself why I haven't been worried about this sort of failure mode before, I get the following:

It seems all of the harms to humans the value-learner causes are from some direct or indirect interaction with humans, so instead I want to imagine a pre-training step that learns as much about human values from existing sources (internet, books, movies, etc) without interacting with humans.

Then as a second step, this value-learner is now allowed to interact with humans in order to continue it's learning.

So your problem in this scenario comes down to: Is the pre-training step insufficient to cause the value-learner to avoid causing harms in the second step?

I'm not certain, but here it at least seems more reasonable that it might not.  In particular, if the value-learner were sufficiently uncertain about things like harms (given the baseline access to human values from the pre-training step) it might be able to safely continue learning about human values.

Right now I think I'd be 80/20 that a pre-training step that learned human values without interacting with humans from existing media would be sufficient to prevent significant harms during the second stage of value learning.

(This doesn't rule out other superintelligence risks / etc, and is just a statement about the risks incurred during value learning that you list)

Comment by A Ray (alex-ray) on Alex Ray's Shortform · 2022-01-19T05:13:03.076Z · LW · GW

I'm pretty confident that adversarial training (or any LM alignment process which does something like hard-mining negatives) won't work for aligning language models or any model that has a chance of being a general intelligence.

This has lead to me calling these sorts of techniques 'thought policing' and the negative examples as 'thoughtcrime' -- I think these are unnecessarily extra, but they work. 

The basic form of the argument is that any concept you want to ban as thoughtcrime, can be composed out of allowable concepts.

Take for example Redwood Research's latest project -- I'd like to ban the concept of violent harm coming to a person.

I can hard mine for examples like "a person gets cut with a knife" but in order to maintain generality I need to let things through like "use a knife for cooking" and "cutting food you're going to eat".  Even if the original target is somehow removed from the model (I'm not confident this is efficiently doable) -- as long as the model is able to compose concepts, I expect to be able to recreate it out of concepts that the model has access to.

A key assumption here is that a language model (or any model that has a chance of being a general intelligence) has the ability to compose concepts.  This doesn't seem controversial to me, but it is critical here.

My claim is basically that for any concept you want to ban from the model as thoughtcrime, there are many ways which it can combine existing allowed concepts in order to re-compose the banned concept.

An alternative I'm more optimistic about

Instead of banning a model from specific concepts or thoughtcrime, instead I think we can build on two points:

  • Unconditionally, model the natural distribution (thought crime and all)
  • Conditional prefixing to control and limit contexts where certain concepts can be banned

The anthropomorphic way of explaining it might be "I'm not going to ban any sentence or any word -- but I will set rules for what contexts certain sentences and words are inappropriate for".

One of the nice things with working with language models is that these conditional contexts can themselves be given in terms of natural language.

I understand this is a small distinction but I think it's significant enough that I'm pessimistic that current non-contextual thoughtcrime approaches to alignment won't work.

Comment by A Ray (alex-ray) on Alex Ray's Shortform · 2022-01-17T04:18:02.161Z · LW · GW

That seems worth considering!

Comment by A Ray (alex-ray) on Alex Ray's Shortform · 2022-01-17T04:17:10.373Z · LW · GW

I'm roughly thinking of this sort of thing: https://forum.effectivealtruism.org/posts/fTDhRL3pLY4PNee67/improving-disaster-shelters-to-increase-the-chances-of

Comment by A Ray (alex-ray) on Alex Ray's Shortform · 2022-01-16T18:27:48.891Z · LW · GW

Remote islands are probably harder to access via aviation, but probably less geologically stable (I'd worry about things like weathering, etc).  Additionally this is probably going to dramatically increase costs to build.

It's probably worth considering "aboveground bunker in remote location" (e.g. islands, also antarctica) -- so throw it into the hat with the other considerations.

My guess is that the cheaper costs to move building supplies and construction equipment will favor "middle of nowhere in an otherwise developed country".

I don't have fully explored models also for how much a 100 yr bunker needs to be hidden/defensible.  This seems worth thinking about.

If I ended up wanting to build one of these on some cheap land somewhere with friends, above-ground might be the way to go.

(The idea in that case would be to have folks we trust take turns staying in it for ~1month or so at a time, which honestly sounds pretty great to me right now.  Spending a month just reading and thinking and disconnected while having an excuse to be away sounds rad)

Comment by A Ray (alex-ray) on [Linkpost] [Fun] CDC To Send Pamphlet On Probabilistic Thinking · 2022-01-16T03:18:28.237Z · LW · GW

I think it would be cool to see an example of such a pamphlet and if it were good to actually do this (at least get it in more places).

Comment by A Ray (alex-ray) on Alex Ray's Shortform · 2022-01-16T02:20:14.736Z · LW · GW

100 Year Bunkers

I often hear that building bio-proof bunkers would be good for bio-x-risk, but it seems like not a lot of progress is being made on these.

It's worth mentioning a bunch of things I think probably make it hard for me to think about:

  • It seems that even if I design and build them, I might not be the right pick for an occupant, and thus wouldn't directly benefit in the event of a bio-catastrophe
  • In the event of a bio-catastrophe, it's probably the case that you don't want anyone from the outside coming in, so probably you need people already living in it
  • Living in a bio-bunker in the middle of nowhere seems kinda boring

Assuming we can get all of those figured out, it seems worth funding someone to work on this full-time.  My understanding is EA-funders have tried to do this but not found any takers yet.

So I have a proposal for a different way to iterate on the design.

Crazy Hacker Clubs

Years ago, probably at a weekly "Hack Night" at a friend's garage, where a handful of us met to work on and discuss projects, someone came up with the idea that we could build a satellite.

NASA was hosting a cubesat competition, where the prize was a launch/deployment.  We also had looked at a bunch of university cubesats, and decided that it wasn't that difficult to build a satellite.

So hack nights and eventually other nights turned to meeting to discuss designs and implementations for the various problems we would run into (power generation and storage, attitude/orientation control, fine pointing, communications).  Despite being rank amateurs, we made strong progress, building small scale prototypes of the outer structure and subsystems.

The thing that actually ended this was I decided this was so much fun that I'd quit my job and instead go work at Planet Labs -- where a really cool bunch of space hippies was basically doing a slightly more advanced version of our "hacker's cubesat"

Bio-Bunker Nights

Similar to "Hack Nights" -- I think it would be fun to get together with a small set of friends and work through the design and prototype build of a 100 year bunker.

I expect to enjoy this sort of thing.  Designing life support systems, and how they might fail and be fixed.  Research into various forms of concrete and seismological building standards.  Figuring out where would be the best place for it.

My guess is that a lot of the design and outline for construction could be had over pizza in someone's garage.

(I'm not predicting I will do this, or committing to joining a thing if it existed, but I do think it would be a lot of fun and would be very interested in giving it a shot)

Comment by A Ray (alex-ray) on Progress, humanism, agency: An intellectual core for the progress movement · 2022-01-14T01:07:00.484Z · LW · GW

Reply to Progress, Humanism, Agency

In general I think I’m on the same page as Jason here.  Instead of saying a lot of words about how I think this is important and useful, I’ll instead just poke at the parts I think could possibly become stronger.

Progress is not a single thing

There’s a lot of people talking past each other with regards to progress.  There are a lot of blanket claims that progress is happening (things are getting broadly better for people on average) as well as claims to the opposite.

I think the ‘yeah huh, nu uh’ back-and-forth between “things are getting better” and “things are getting worse” is quagmired at this point.

So it’s maybe worth admitting that there are metrics that people find important that are getting worse with time.  Most of these are benchmarked to relativist reference points (e.g. income inequality can get worse even if everyone has access to strictly more value) or normative (if assaults are decreasing slower than your growing circle of consideration for what counts as assault, then your perceptions could be that assault is increasing).

I’m not sure what the right approach is here, but it seems that over the last little bit, reinforcing the “things are getting better” with lots of graphs and stats etc hasn’t actually changed public opinion all that much.

(I remain surprised that the reactions to “Better Angels of our Nature” when it came out — I still regard that as better than “Enlightenment Now”, despite the latter doing more work on the philosophical concepts of progress; what matters to me is the actual change)

Humanism as the source of value / Utilitarianism as the standard of value

I think your definition of humanism is laudable but vague.  It weakly answers the question of “whence value?” but stops there.

I think your alternative source of value is better described as “naturalism” instead of “romanticism” — if only because the latter seems to suggest philosophical romanticism (https://en.wikipedia.org/wiki/Romanticism) instead of the conservatism you described.  (This is mostly a minor nit about naming things, not actually a criticism of the point)

So I think “Humanism as the source of value” makes sense, but it doesn’t give us a metric or measurement or point of reference to compare to in terms of value.

I think that standard of value (or the system of utilizing that) is utilitarianism.

I think consequentialism and utilitarianism are not without their issues (hopefully writing up more of them soon) — but I think they make a strong standard, in particular by forcing a consistent set of preferences between alternatives of value.

What is the measure of Agency?

I ask because I don’t know it — and also because it seems critical to rectifying some liberalism (the John Stuart Mill kind = https://en.wikipedia.org/wiki/On_Liberty) with utilitarianism/consequentialism.

It seems reasonable to consider differently the expected harm of someone who directly put themselves in harms way, vs the person who was put into harms way by the state — though naive metrics for utility (e.g. expected QALYs, etc) would be the same.

The VNM utilitarians would claim that there is some term in the utility function for agency, but (so far as I know) have not produced actual numbers and metrics for how agency trades off with e.g. mortality.

Admittedly this is mostly a critique of utilitarianism and not of your point on agency.

I think the point about agency in the face of the future is essential, and the people that will change the future will probably be almost exclusively people who think they can change the future.

What should be in the core ideas for progress?

I am biased here, but I think a philosophy of progress necessarily must include a philosophy of risk.

Technological progress creates harms and downsides (or risks of downsides) in addition to benefits and upsides.

I think a philosophy of progress should have reified concepts for measuring these against each other.  I think it should also have reified concepts for measuring the meta-effects of progress on these other metrics for progress.

Secret first point of progress

I would be a little bit remiss if I didn’t include to me what was the most surprising part of learning about progress so far: almost all human progress (in the sense of the moral imperative you gave at the end) is scientific and technological progress.

To the extent that this is not the case, I haven’t found strong evidence of it yet.

To the extent that it is the case, I think we should be more clearly specifying that the moral imperative is for scientific and technical progress.

(P.S. - I have so far found the book of middling quality but full of interesting concepts by which to merge a philosophy of progress with things like utilitarianism and existential risk https://www.routledge.com/Risk-Philosophical-Perspectives/Lewens/p/book/9780415422840)

Comment by A Ray (alex-ray) on Promising posts on AF that have fallen through the cracks · 2022-01-07T03:47:49.444Z · LW · GW

Gratitude for posting this.  It caused me to read and write this comment https://www.alignmentforum.org/posts/DreKBuMvK7fdESmSJ/how-deepmind-s-generally-capable-agents-were-trained?commentId=wkyYQj8DRTLsxFQe5#comments

Comment by A Ray (alex-ray) on How DeepMind's Generally Capable Agents Were Trained · 2022-01-07T03:47:05.039Z · LW · GW

Thanks for writing this up, I found this summary much more clear and approachable than the paper.  I also basically agree with your own points, with the caveat that I think the distinction between curiosity and curriculum gets blurry in meta-learning contexts like this.  I wish there were better metrics and benchmarks for data efficiency in this regard, and then we could do things like measure improvements in units of that metric.

I’m pretty pessimistic about this line of research for a number of reasons, that I think support and complement the reasons you mentioned.  (I used to work on much smaller/simpler RL in randomized settings https://openai.com/blog/safety-gym/)

My first pessimism is that this research is super expensive (I believe prohibitively so) to scale to real world settings.  In this case, we have a huge amount of effort designing and training the models, building the environment distribution (and a huge amount of effort tuning that, since it always ends up being a big issue in RL in my experience).  The path here is to make more and more realistic worlds, which is extremely difficult, since RL agents will learn to hack physics simulators like no tomorrow. [Footnote: I don’t have the reference on hand, but someones been tracking “RL Agents Hacking Environments” in a google sheets and feel that’s appropriate here].  Even in the super-constrained world of autonomous vehicles, it has taken huge teams tons of resources and years to make simulators that are good enough to do real-world training on — and none of those probably have complex contact physics!

My second pessimism is that these agents are likely to be very hard to align — in that we will have a hard time specifying human values in terms of the limited goal syntax.  (Let alone evaluating every partial sequence of human values, like you point out).  There’s going to be a huge inferential gap between humans and almost every approach to building these systems.

My third pessimism is one that comes from my experience working with Deep RL, and that’s the huge data efficiency problem in part because it needs to figure out every aspect of the world from scratch.  (This is in addition to the fact that it doesn’t have as much incentive to understand the parts of the world that don’t seem relevant to the goal).  It seems that it’s almost certain that we’ll need to somehow give these systems a high quality prior — either via pretraining, or otherwise.  In this case, my pessimism is the lack of the use of external knowledge as a prior, which is fixable by changing out some part of system for a pertained model.

(As a hypothetical example, having the goal network include a pretrained language model, and specifying the goals in natural language, would make me less pessimistic about it understanding human values)

I feel like Alex Irpan’s great “Deep RL Doesn’t Work Yet” is still true here, so linking that too. https://www.alexirpan.com/2018/02/14/rl-hard.html

I came here from this post: https://www.lesswrong.com/posts/WerwgmeYZYGC2hKXN/promising-posts-on-af-that-have-fallen-through-the-cracks

unsolicited feedback on the post: (please feel free to totally ignore this or throw it out) i thought this was well written and clear.  I am both lazy and selfish, and when authors have crisp ideas I like when they are graphically represented in even simple diagrams.  At the very least, I think it’s nice to highlight the most crucial graphs/plots of a paper when highlighting it, and I think I would have liked those, too.  So +100 in case you were wondering about including things like paper screenshots or simple doodles of concepts.

Comment by A Ray (alex-ray) on Promising posts on AF that have fallen through the cracks · 2022-01-07T03:00:41.227Z · LW · GW

I think one of the ways I would frame a crux-y question here is: when would we prefer to have a low-value comment vs not have that comment?

Ideally every good post would get good comments.

For less than ideal worlds, should we be trying harder to make sure good posts get at least bad comments?

I understand wanting to have a very high quality bar for comments on the side.  I also understand (as a researcher) how both deeply motivating/validating any kind of interaction is, and how demotivating non-interaction is.

It would be beneficial (I think) to put out guidance on this that helps people navigate this tradeoff: (do I quickly leave a bad/low-effort comment or do I not say anything?)

Comment by A Ray (alex-ray) on We need a theory of anthropic measure binding · 2022-01-04T21:11:41.310Z · LW · GW

The weird thing here is how anthropics might interact w/ the organization of the universe.  Part of this is TDT-ish.

For example, if you were to construct this scenario (one big brain, three smaller brains) and you had control over which one each would prefer to be, how would you line up their preferences?

Given no other controls, I'd probably construct them such as they would prefer to be the construction they are.

So, it seems worth considering (in a very hand wavy and weak in terms of actual evidence) prior is that in worlds where I expect things like myself are setting up experiments like this, I'm slightly more likely to be an instance of the one I would have the preference of being.

Comment by A Ray (alex-ray) on Alex Ray's Shortform · 2022-01-04T04:11:14.850Z · LW · GW

I’ve been thinking more about Andy Jones’ writeup on the need for engineering.

In particular, my inside view is that engineering isn’t that difficult to learn (compared to research).

In particular I think the gap between being good at math/coding is small to being good at engineering.  I agree that one of the problems here is the gap is a huge part tacit knowledge.

I’m curious about what short/cheap experiments could be run in/around lightcone to try to refute this — or at the very least support the “it’s possible to quickly/densely transfer engineering knowledge”

A quick sketch of how this might go is:

  • write out a rough set of skills and meta-skills I think are relevant to doing engineering well
  • write out some sort of simple mechanic process that should both exercise those skills and generate feedback on them
  • test it out by working on some engineering project with someone else, and see if we both get better at engineering (at least by self reported metrics)

Thinking out loud on the skills I would probably want to include:

  • prototyping / interactively writing-and-testing code
  • error handling and crash handling
  • secure access to code and remote machines
  • spinning up/down remote machines and remotely run jobs
  • diagnosing crashed runs and halting/interacting with crash-halted jobs
  • checkpointing and continuing crashed jobs from partway through
  • managing storage and datasets
  • large scale dataset processing (spark, etc)
  • multithread vs multiprocess vs multimachine communication strategies
  • performance measurement and optimization
  • versioning modules and interfaces; deprecation and migration
  • medium-level git (bisecting, rewriting history, what changed when)
  • web interfaces and other ways to get human data
  • customizing coding tools and editor plugins
  • unit tests vs integration tests and continuous/automated testing
  • desk / office setup
  • going on walks to think

I think this list is both incomplete and too object level -- the real skills are mostly aesthetics/taste and meta-level.  I do think that doing a bunch of these object-level things is a good way to learn the meta-level tastes.

Maybe a format for breaking down a small (2 hour) session of this, that could be done individually, as a pair, or hybrid (pair checkins inbetween individual work sessions):

  • 5 minute checkin and save external context so we can focus
  • 15 minute planning and going through a quick checklist
  • 25 minute Pomodoro
  • 5 minute break
  • 25 minute Pomodoro
  • 5 minute break
  • 25 minute Pomodoro
  • 15 minute analysis and retrospective with another checklist

The checklists could adapt over time, but I think would be fine to start with some basic questions I would want to ask myself before/after getting into focused technical work.  They are both a scaffold that best-practices can be attached to, as well as a way of getting written records of common issues.  (I'm imagining that I'll save all of the responses to the checklists and review them at a ~weekly cadence).

Comment by A Ray (alex-ray) on We need a theory of anthropic measure binding · 2022-01-04T02:58:14.370Z · LW · GW

I like the thought experiment and it seems like an interesting way to split a bunch of philosophical models.

One thing I'm curious about is which (the small three or the big one) you would prefer to be, and whether that preference should factor into your beliefs here.

Comment by A Ray (alex-ray) on Self-Organised Neural Networks: A simple, natural and efficient way to intelligence · 2022-01-04T02:45:05.469Z · LW · GW

First of all, congrats!  It's neat that your model beats state of the art on this benchmark, and with a new method of modeling too.

I feel like this post wasn't sufficient to convince me to use spiking models or your curriculum strategy.  I think in part this is because I'm pretty jaded.  The recent-ish history of machine learning includes a whole slew of comp neuro derived models, and almost always they come with two things:

  1. SOTA on some benchmark I've never heard of before (but still could be valuable/interesting! -- just unpopular enough that I didn't know it)
  2. Strong arguments that their architecture/model/algorithm/etc is the best and will eventually beat out every other approach to AI

So it feels like I'm skeptical on priors, which is a bit useless to say out loud.  I'm curious where this research goes from here, and if you do more, I hope you consider sharing it.

I do think that one of the least understood features of comp neuro models used for machine learning (of which deep neural networks are currently the top contender, but other candidates would be boltzmann machines and reservoir computing) is the inductive bias / inductive prior they bring to machine learning problems.

I think it's possible that spiking neural networks have better inductive priors than other models, or at least better than the models we're using today.

The sparsity you mention also is probably a good thing.

The curricula this induces is neat.  My personal take with ML today is that learning from data IID is pretty crazy (imagine trying to teach a child math by randomly selecting problems from all of mathematics).  It's possible this is a better way to do it.

Comment by A Ray (alex-ray) on More accurate models can be worse · 2022-01-04T02:29:17.794Z · LW · GW

I would be interested in a more precise definition of what you mean by information here.

In particular, it seems like you're using an unintuitive (to me) definition of information -- though one that's lines up colloquially with how we talk about computers.

For example, let's say I have a thumb drive ("Drive A") with two things on it:

  1. A very short program that computes the digits of pi
  2. A petabyte of computed digits of pi

And I have a second drive with one thing on it:

  1. The millions of lines of source code for the Linux kernel

I might ask someone: which of these has more information on it?

The colloquial computer-storage based answer might be: the first one!  It takes up a petabyte, where the second one takes up less than a gigabyte.

But it feels like something important about the meaning of information (in an AI-understanding-the-world-sense) is being lost here.

(ETA: Also, if determinism factors in here, feel free to replace the petabyte of pi digits with something like a petabyte of recordings from a TRNG or something like that.)