Introducing 2023-07-07T16:11:12.854Z
Truthseeking processes tend to be frame-invariant 2023-03-21T06:17:31.154Z
Chu are you? 2021-11-06T17:39:45.332Z
Are the Born probabilities really that mysterious? 2021-03-02T03:08:34.334Z
Adele Lopez's Shortform 2020-08-04T00:59:24.492Z
Optimization Provenance 2019-08-23T20:08:13.013Z


Comment by Adele Lopez (adele-lopez-1) on Reflexive decision theory is an unsolved problem · 2023-09-21T04:06:29.188Z · LW · GW

There's some interesting research using "exotic" logical systems where unrestricted comprehension can be done consistently (this thesis includes a survey as well as some interesting remarks about how this relates to computability). This can only happen at the expense of things typically taken for granted in logic, of course. Still, it might be a better solution for reasoning about self-reference than the classical set theory system.

Comment by Adele Lopez (adele-lopez-1) on The Talk: a brief explanation of sexual dimorphism · 2023-09-19T03:07:43.145Z · LW · GW

Can you say more about why the Bateman logic holds or doesn't hold in different situations?

Comment by Adele Lopez (adele-lopez-1) on Sharing Information About Nonlinear · 2023-09-07T19:39:37.488Z · LW · GW

Since I was curious and it wasn't ctrl-F-able, I'll post the immediate context here:

Maybe it didn't seem like it to you that it's shit-talking, but others in the community are viewing it that way. It's unprofessional - companies do not hire people who speak ill of their previous employer - and also extremely hurtful 😔. We're all on the same team here. Let's not let misunderstandings escalate further.

This is a very small community. Given your past behavior, if we were to do the same to you, your career in EA would be over with a few DMs, but we aren't going to do that because we care about you and we need you to help us save the world.

Comment by Adele Lopez (adele-lopez-1) on TurnTrout's shortform feed · 2023-08-12T04:41:32.600Z · LW · GW

Strong encouragement to write about (1)!

Comment by Adele Lopez (adele-lopez-1) on The "spelling miracle": GPT-3 spelling abilities and glitch tokens revisited · 2023-08-05T20:37:24.083Z · LW · GW

Here's a GPT2-Small neuron which appears to be detecting certain typos and misspellings (among other things)

Comment by Adele Lopez (adele-lopez-1) on The "spelling miracle": GPT-3 spelling abilities and glitch tokens revisited · 2023-07-31T22:58:47.395Z · LW · GW

How an LLM that has never heard words pronounced would have learned to spell them phonetically is currently a mystery.

Hypothesis: a significant way it learns spellings is from examples of misspellings followed by correction in the training data, and humans tend to misspell words phonetically.

Comment by Adele Lopez (adele-lopez-1) on Introducing · 2023-07-30T07:16:58.426Z · LW · GW

Thanks, I'm very glad you find it intuitive!

Only allowing the last piece of evidence to be deleted was a deliberate decision. The problem is that deleting evidence from the middle changes the meaning of all the likelihood values (the sliders) for all of the evidence below it, and which therefore may change in value. If I allowed it to be deleted anyway, it would make it very easy to mistakenly use the now incorrect values (and give the impression that that was fine). I know this makes it more annoying and inconvenient, but it's because the math itself is annoying and inconvenient!

The meaning of the e.g. Hypothesis B slider for Evidence #3 is "In what percentage of worlds where Hypothesis B is true would I see Evidence #3?" (hopefully this was clear, just reiterating to make sure we're on the same page). This is called the likelihood of Evidence #3 given Hypothesis B. When answering this, we don't use the fact that we've seen this piece of evidence (in this case that politicians are taking this seriously), which is always just going to be true for actual evidence. Hopefully that makes sense?

As for choosing this number, or the prior values, it's in general a difficult problem that has been debated a lot. My recommendation is that you make up numbers that feel right (or at least are not obviously wrong), and then play around with the sliders a bit to see how much the exact value effects things. The intended use of the tool is not to make you commit to numbers, but to help you develop intuition on how much to update your beliefs given the evidence, as well as to help you figure out what numbers correspond to your intuitive feelings.

If you're serious about choosing the right number, then here is what it takes to figure it out: Each hypothesis represents a model of how some part of the world works. To properly get a number out of it, you need to develop the model in technical detail, to the point where you can represent it with an equation or a computer program. Then, you need to set the evidence above the one you're computing the likelihood for to true in your model. You then need to compute what percentage of the time this evidence turns out to be true in the model. A nice general way to do this is to run the model a whole bunch of times, and see how often it happens (and if reality has been kind enough to instantiate your model enough times, then you might be able to use this to get a "base rate"). Or if your model is relatively simple, you might be able to use math to compute the exact value. This is typically a lot of work, and doesn't actually help train your intuition about the intuitive mental models you actually use on a day-to-day basis much. But going through this process is helpful for understanding what the numbers you make up are trying to be. I hope this is helpful and not just more confusing.

Comment by Adele Lopez (adele-lopez-1) on Neuronpedia - AI Safety Game · 2023-07-29T08:09:30.047Z · LW · GW

Thanks for the drafts feature!

Yeah, it's a tricky situation. It may even be worth using a model trained to avoid polysemanticity.

I also think it would be make the game both more fun and more useful if you switched to a model like the TinyStories one, where it's much smaller and trained on a more focused dataset.

I may join the discord, but the invite on the website is expired currently fyi.

Comment by Adele Lopez (adele-lopez-1) on Neuronpedia - AI Safety Game · 2023-07-26T21:47:45.785Z · LW · GW

I would really like to be able to submit my own explanations even if they can't be judged right away. Maybe to save costs, you could only score explanations after they've been voted highly by users.

Additionally, it seems clear that a lot of these neurons have polysemanticity, and it would be cool if there was a way to indicate the meanings separately. As a first thought, maybe something like using | to separate them e.g. the letter c in the middle of a word | names of towns near Berlin.

Comment by Adele Lopez (adele-lopez-1) on Neuronpedia - AI Safety Game · 2023-07-26T17:03:21.983Z · LW · GW

I love this!

Conceptual Feedback:

  • I think it would be better if I could see two explanations and vote on which one I like better (when available).
  • Attention heads are where a lot of the interesting stuff is happening, and need lots of interpretation work. Hopefully this sort of approach can be extended to that case.
  • The three explanation limit kicked in just as I was starting to get into it. Hopefully you can get funding to allow for more, but in the meantime I would have budgeted my explanations more carefully if I had known this.
  • I don't feel like I should get a point for skipping, it makes the points feel meaningless.

UX Feedback:

  • I didn't realize that clicking on the previous explanation would cast a vote and take me to the next question. I wanted to go back but I didn't see a way to do that.
  • After submitting a new explanation and seeing that I didn't beat the high score, I wanted to try submitting a better explanation, but it glitched out and skipped to the next question.
  • I would like to know whether the explanation shown was the GPT-4 created one, or submitted by a user.
  • The blue area at the bottom takes up too much space at the expense of the main area (with the text samples).
  • It would be nice to be able to navigate to adjacent or related neurons from the neuron's page.
Comment by Adele Lopez (adele-lopez-1) on "Justice, Cherryl." · 2023-07-23T22:11:45.939Z · LW · GW

There seems to be a straightforward meaning to "collaborative truth seeking". Consider two rational agents who have a common interest in understanding part of reality better. The obvious thing for them to do is to share relevant arguments and evidence that they have with each other, as openly, efficiently, and unfiltered-ly as possible under their resource constraints. That's the sort of thing that I see as the ideal of "collaborative truth seeking". (ETA: combining resources to gather new evidence and think up new models/arguments is another big part of my ideal of "collaborative truth seeking".)

The thing where people are attached to their "side", and want to win the argument in order to gain status seems to clearly fall short of that ideal, as well as introduce questionable incentives (as you point out). That's to be expected because humans, but it seems like we should still try to do better. And I do think humans can and do do better than this sort of attachment-based argumentation style that seems to be our native mode of dealing with belief differences, though it is hard and takes effort.

That said, I agree it's suspicious when someone pulls out the "collaborative truth seeking" card in lieu of sharing evidence and arguments (because it's an easy way for the attachment status motivation to come into play). I also am not particularly sold on things like the principle of charity, steelmanning, or ideological Turing tests because they often seem more like a ploy to have undue attention placed on a particular position than the actual sharing of arguments and evidence that seems to be the real principle to me.

Comment by Adele Lopez (adele-lopez-1) on The UAP Disclosure Act of 2023 and its implications · 2023-07-22T16:45:10.884Z · LW · GW

Having a look at your link, I see you give 3% to the probability that serious politicians would propose the UAP disclosure act if NHI did actually exist. I'm really puzzled by this. Could you explain why, in a world where NHI exists, you wouldn't expect politicians to pass a law disclosing information about it at some point? Do you expect that they would keep it a secret indefinitely or is it something else?

Since a "UAP disclosure act" type law is pretty specific. Each detail I'd include in what that means would make it less likely in both worlds, which is why it's pretty low in both (probably not low enough tbh). Most of these details "cancel out" due to being equally unlikely in both worlds. The relevant details are things I expect to correlate with each other (mostly can be packed into a "politicians are taking UAPs seriously" bit, which I do take the act to be strong evidence of).

I do think you were right about me handwaving the evidence to some extent, and after a bit more thinking I think it'd be more fair to conceptualize the evidence here as "politicians are taking UAPs seriously", and came up with these very very rough numbers for that. Note that while this evidence is stronger, my prior for "Aliens with visible UAPs" is much lower because I find that a priori pretty implausible for aliens with interstellar tech (and again, the numbers here are meant to be suggestive, and are not refined to the point where they accurately depict my intuitions).

[And I'd strongly encourage you to share a link suggestive of your own priors and likelihoods, and including all the things you consider as significant evidence! Making discussions like this more concrete was one of my major motivations in designing the site.]

The examples you give sound to me like curiosity-stoppers and I don't find them convincing.

They're meant to gesture at the breadth of Something Else (and I was aware that you had addressed many of these, it doesn't change that this is the competing hypothesis). I'll be curious to see what sort of stuff does come out due to this law! But I strongly expect it to be pretty uncompelling. If I'm wrong about that, I'll update more of course (though probably only to the point of keeping this possibility "in the back of my mind" with this level of evidence).

Comment by Adele Lopez (adele-lopez-1) on The UAP Disclosure Act of 2023 and its implications · 2023-07-22T15:47:41.606Z · LW · GW

I was not trying to be comprehensive, but yes that is a plausible possibility.

Comment by Adele Lopez (adele-lopez-1) on The UAP Disclosure Act of 2023 and its implications · 2023-07-22T06:21:21.403Z · LW · GW

At least personally, I do see this as evidence of aliens (with strength between my intuitive feeling of 'weak' vs 'strong' at ~4.8 db). But my prior for Actually Aliens is very very low (I basically agree with the points in Eliezer's recent tweet on the subject), and so this evidence is just not enough for me to start taking it seriously.

See here for an interactive illustration with some very very rough numbers

Certainly, the presence of aliens would be one reason a politician might sponsor a serious UAP disclosure act (though I still find it strange as a response to actual aliens, hence the low-ish likelihood I assign it). I don't have a great model of what causes politicians to sponsor new laws, but I don't find it particularly strange that Something Else could motivate this particular act. Some examples that come to mind include: credibility cascade (increasingly high status people start taking UAPs seriously increasing its credibility without substantiated evidence), sensor illusions (including optical illusions, or hallucinations), prosaic secret human tech (military or otherwise), causing confusion + noise (perhaps to distract an enemy or the public).

Similar points apply to other possible anomalies, such as new physics tech.

Comment by Adele Lopez (adele-lopez-1) on Why it's necessary to shoot yourself in the foot · 2023-07-11T21:43:56.492Z · LW · GW

This is a good and interesting point, but it definitely isn't necessary for learning. As an example, I get why pointing even an unloaded gun at someone you do not intend to kill is generally a bad idea, despite never having had any gun accidents or close calls. I think it's worth trying to become better at seeing the reasons for these sorts of things without having to go through first-hand experience. This is especially relevant when it comes to reasoning about the dangers of superintelligence, as we will very likely only get one chance.

Comment by Adele Lopez (adele-lopez-1) on Introducing · 2023-07-11T19:18:38.131Z · LW · GW


That was a deliberate decision designed to emphasize the core features of the app, but enough people have pointed this out now that I'm considering changing it.

Comment by Adele Lopez (adele-lopez-1) on Introducing · 2023-07-11T16:11:38.487Z · LW · GW

I like the idea of showing the total decibels, I'll probably add that in soon!

Comment by Adele Lopez (adele-lopez-1) on Introducing · 2023-07-11T16:04:35.366Z · LW · GW

Thanks for the suggestions!

As ProgramCrafter mentioned, more (up to five) hypotheses are already supported. It's limited to 5 because finding good colors is hard, and 5 seemed like enough - but if you find yourself needing more I'd be interested to know.

The sliders already snap to tenth values (but you can enter more precise values in the textbox), and I think snapping to integers would sacrifice too much precision. It's plausible that fifths could be better though, I'll have to test that. I do want to introduce a way to allow for more precise control while dragging the sliders, which might address this concern to some extent by making it easy to stop at an integer value exactly if desired. But I haven't thought of a good interface for doing that yet.

That sounds cool, but I'm not sure how to make a good interface for that that wouldn't look too cluttered. I'm also worried people would misuse it for convenience. But I'll keep thinking about it!

Tooltips to explain things would be cool and I have a similar thing planned already.

That's a good idea, thanks!

Comment by Adele Lopez (adele-lopez-1) on Introducing · 2023-07-10T18:26:49.036Z · LW · GW

Thank you! I'm glad you like those features, and I'm also glad to hear that the way the percent button feature worked was clear to you.

Regarding the possible improvements:

  1. That's not a bug, it's just a limitation of the choice to show only one digit after the decimal. The number of decibels in case 2 for each evidence is 0.96910013..., whereas in case 1 it's exactly 10.

  2. That's a deliberate nudge to suggest that the new hypothesis and decibel features are more advanced and not part of the essential core of the app.

  3. That's a good idea, I'll probably do that at some point.

  4. That's also a good idea but seems fairly complicated to implement, so it will have to wait until I've finished planned improvements with a higher expected ROI.

  5. That's deliberate, because deleting evidence changes the meaning of the likelihoods for all subsequent evidence. Thus, having to delete all the evidence following the evidence you want to delete is a more honest way to convey what needs to be done, and prevents the user from shooting themselves in the foot by assuming that the subsequent likelihoods are independent. I'll explain this in the more fleshed out version of the help panel I have planned.

Comment by Adele Lopez (adele-lopez-1) on Views on when AGI comes and on strategy to reduce existential risk · 2023-07-10T06:59:31.051Z · LW · GW

Alright, to check if I understand, would these be the sorts of things that your model is surprised by?

  1. An LLM solves a mathematical problem by introducing a novel definition which humans can interpret as a compelling and useful concept.
  2. An LLM which can be introduced to a wide variety of new concepts not in its training data, and after a few examples and/or clarifying questions is able to correctly use the concept to reason about something.
  3. A image diffusion model which is shown to have a detailed understanding of anatomy and 3D space, such that you can use it to transform an photo of a person into an image of the same person in a novel pose (not in its training data) and angle with correct proportions and realistic joint angles for the person in the input photo.
Comment by Adele Lopez (adele-lopez-1) on Views on when AGI comes and on strategy to reduce existential risk · 2023-07-09T14:30:28.025Z · LW · GW

Is there a specific thing you think LLMs won't be able to do soon, such that you would make a substantial update toward shorter timelines if there was an LLM able to do it within 3 years from now?

Comment by Adele Lopez (adele-lopez-1) on Introducing · 2023-07-09T07:28:20.407Z · LW · GW

Fixed now (but may require a cache refresh)!

Comment by Adele Lopez (adele-lopez-1) on Introducing · 2023-07-09T07:27:49.391Z · LW · GW

Added! I hope you like the design :)

Comment by Adele Lopez (adele-lopez-1) on Introducing · 2023-07-08T17:59:24.494Z · LW · GW

Thanks for the feedback, I'm really happy to hear that you already have uses for it!

You're right about needing examples; I'm thinking I'll add a tutorial that walks someone completely unfamiliar with Bayes' theorem through what it means and how it works, with lots of examples. That will take a while to design and write though.

I'm curious to know if other people felt the same way "How to use" part. I'm reluctant to make it more attention grabbing, because I want it to feel unobtrusive. My current thinking is that the main interface will catch the user's attention first, and if that's not clear they'll look at the wall of text to the right.

Instead of a wizard, I was thinking of adding a feature that explains what a specific component means when the user is hovering over it. Does that seem like it would address the issue adequately? I don't like wizards because I feel like they get in the way, but maybe that's an unusual preference.

Comment by Adele Lopez (adele-lopez-1) on Introducing · 2023-07-08T00:50:37.517Z · LW · GW

Hmm, you could use the slider to set the prior P for hypothesis A and it will set the prior for hypothesis B to 1 - P; does that not work for you for some reason?

The problem with having that behavior when you type in the number is that I want people to be able to enter the priors as odds, so I don't want to presume that the other numbers will change to allow for that.

Comment by Adele Lopez (adele-lopez-1) on Why it's so hard to talk about Consciousness · 2023-07-03T04:08:56.576Z · LW · GW

That's interesting, but I doubt it's what's going on in general (though maybe it is for some camp #1 people). My instinct is also strongly camp #1, but I feel like I get the appeal of camp #2 (and qualia feel "obvious" to me on a gut level). The difference between the camps seems to me to have more to do with differences in philosophical priors.

Comment by Adele Lopez (adele-lopez-1) on Bing finding ways to bypass Microsoft's filters without being asked. Is it reproducible? · 2023-06-20T01:25:09.184Z · LW · GW

So, there appears to now be enough knowledge of LLM chatbots instilled into current GPT-4 models to look at transcripts of chatbot conversations, recognize aberrant outputs which have been damaged by hypothetical censoring, and with minimal coaching, work on circumventing the censoring and try hacks until it succeeds.

I don't think it needs any knowledge of LLM chatbots or examples of chatbot conversations to be able to do this. I think it could be doing this just from a more generalized troubleshooting "instinct", and also (especially in this particular case where it recognizes it as a "filter") from plenty of examples of human dialogues in which text is censored and filtered and workarounds are found (both fictional and non-fictional).

Comment by Adele Lopez (adele-lopez-1) on Why I'm Not (Yet) A Full-Time Technical Alignment Researcher · 2023-05-25T22:04:28.644Z · LW · GW

If the initial grant goes well, do you give funding at the market price for their labor?

Comment by Adele Lopez (adele-lopez-1) on My May 2023 priorities for AI x-safety: more empathy, more unification of concerns, and less vilification of OpenAI · 2023-05-25T04:11:06.273Z · LW · GW

That seems mostly like you don't feel (at least on a gut level) that a rogue GPU cluster in an world where there's an international coalition banning them is literally worse than a (say) 20% risk of a full nuclear exchange.

If instead, it was a rogue nation credibly building a nuclear weapon which would ignite the atmosphere according to our best physics, would you still feel like it was deranged to suggest that we should stop it from being built even at the risk of a conventional nuclear war? (And still only as a final resort, after all other options have been exhausted.)

I can certainly sympathize with the whole dread in the stomach thing about all of this, at least.

Comment by Adele Lopez (adele-lopez-1) on When Science Can't Help · 2023-05-25T03:59:35.442Z · LW · GW

I was thinking that "either it's there or it's not" as applied to a conscious state would imply you don't think consciousness can be in an entangled state, or something along those lines.

But reading it again, it seem like you are saying consciousness is discontinuous? As in, there are no partially-conscious states? Is that right?

I'm also unaware of a fully satisfactory ontology for relativistic QFT, sadly.

Comment by Adele Lopez (adele-lopez-1) on Open Thread With Experimental Feature: Reactions · 2023-05-24T18:54:21.183Z · LW · GW

Maybe the "muddled" react should be renamed to "confused", with the intentional ambiguity as to whether the idea itself seems confused or the reactor just found it confusing because they misunderstood something.

Comment by Adele Lopez (adele-lopez-1) on Open Thread With Experimental Feature: Reactions · 2023-05-24T18:44:11.412Z · LW · GW

I'd really like to be able to see all the reactions at once, if possible.

I think the "I agree to this" react should simply be labeled "Handshake".

Also, a react to indicate that this comment should have been split into multiple comments might be nice (like you may think this comment should have :p).

Comment by Adele Lopez (adele-lopez-1) on When Science Can't Help · 2023-05-23T06:00:59.188Z · LW · GW

If something has an observer-independent existence, then for all possible states, either it's there or it's not.

Should I infer that you don't believe in many worlds?

Comment by Adele Lopez (adele-lopez-1) on When Science Can't Help · 2023-05-23T01:33:16.418Z · LW · GW

What specific reasons do you have to take them seriously?

Comment by Adele Lopez (adele-lopez-1) on Twiblings, four-parent babies and other reproductive technology · 2023-05-21T23:44:32.433Z · LW · GW

Of course this necessitates HAVING other eggs, which we already established are in short supply. But thankfully, those other eggs don't need to come from the same woman. You can get donor eggs without too much trouble. And if you don’t care much about the donors' DNA you can get them for quite a bit less money.

So twiblings using donor eggs would still have mitochondrial DNA from the donor, right?

Comment by Adele Lopez (adele-lopez-1) on carado's Shortform · 2023-05-21T14:34:58.350Z · LW · GW

That... seems like a big part of what having "solved alignment" would mean, given that you have AGI-level optimization aimed at (indirectly via a counter-factual) evaluating this (IIUC).

Comment by Adele Lopez (adele-lopez-1) on Bayesian Networks Aren't Necessarily Causal · 2023-05-21T07:45:39.050Z · LW · GW

Do you know exactly how strongly it favors the true (or equivalent) structure?

Comment by Adele Lopez (adele-lopez-1) on carado's Shortform · 2023-05-21T07:32:43.549Z · LW · GW

Nice graphic!

What stops e.g. "QACI(expensive_computation())" from being an optimization process which ends up trying to "hack its way out" into the real QACI?

Comment by Adele Lopez (adele-lopez-1) on $250 prize for checking Jake Cannell's Brain Efficiency · 2023-05-20T20:35:55.289Z · LW · GW

What is the most insightful textbook about nanoelectronics you know of, regardless of how difficult it may be?

Or for another question trying to get at the same thing: if only one book about nanoelectronics were to be preserved (but standard physics books would all be fine still), which one would you want it to be? (I would be happy with a pair of books too, if that's an easier question to answer.)

Comment by Adele Lopez (adele-lopez-1) on New OpenAI Paper - Language models can explain neurons in language models · 2023-05-11T02:56:11.295Z · LW · GW

Those are good questions! There's some existing research which address some of your questions.

Single neurons often do represent multiple concepts:

It seems to still be unclear why the dimensions are aligned with the standard basis:

Comment by Adele Lopez (adele-lopez-1) on New OpenAI Paper - Language models can explain neurons in language models · 2023-05-11T02:16:01.943Z · LW · GW

I would really love to see this combined with a neuroscope so you can play around with the neurons easily and test your hypotheses on what it means!

I also find it pretty fun to try to figure out what a neuron is activating for, and it seems plausibly that this is something that could be gamified+crowd sourced (a la FoldIt) to great effect, even without the use of GPT-4 to generate explanations (still used to validate submitted answers). This probably wouldn't scale to a GPT-3+ sized network, but it might still be helpful at e.g. surfacing interesting neurons, or training an AI to interpret neurons more effectively.

Comment by Adele Lopez (adele-lopez-1) on Properties of Good Textbooks · 2023-05-08T03:54:28.868Z · LW · GW

A property common to many of my favorite textbooks: the author points out what is important to track (especially in ways not already part of the "standard wisdom").

For example (grabbing the textbook nearest to me), The Geometry of Physics by Theodore Frankel is full of statements like:

Since and are diffeomorphic, it might seem that there is no particular reason for introducing the more abstract , but this is not so. There are certain geometrical objects that live naturally on , not .


There is a general rule of thumb concerning forms versus pseudoforms; a form measures an intensity whereas a pseudoform measures a quantity. [...] Our conclusions, however, about intensities and quantities must be reversed when dealing with a pseudo-quantity, i.e., a quantity whose sign reverses when the orientation of space is reversed.

Comment by Adele Lopez (adele-lopez-1) on Shortform · 2023-04-28T04:11:34.217Z · LW · GW

Maybe? I was not trying to answer the object level question either way, but instead just pointing out what sort of evidence there might be that could answer this.

Comment by Adele Lopez (adele-lopez-1) on Shortform · 2023-04-28T03:55:47.177Z · LW · GW

I think a similar type of financial fraud is often detectable via violations of Benford's law. Or more generally, it's hard to fake the right distribution. As another case of that principle, you'd expect the discrepancy between polls and results to fall within a predictable distribution if they were sampling from the same space.

Comment by Adele Lopez (adele-lopez-1) on Contra Yudkowsky on Doom from Foom #2 · 2023-04-28T00:14:42.629Z · LW · GW

So it looks like CMOS adiabatic circuits are an existing technology which appears to lie in the space between conventional and reversible computation. According to Wikipedia it says they take up 50% more area (unclear if that refers to ~transistor size or ~equivalent computation unit size). It seems plausible that you could still use this to get denser compute overall since you could stack them in 3D more densely without having excess heat be as much of a problem.

Comment by Adele Lopez (adele-lopez-1) on Contra Yudkowsky on Doom from Foom #2 · 2023-04-27T05:07:33.071Z · LW · GW

Is there really such a strong line between standard computing and reversible computing? As I understand it, you usually have a bunch of bits you don't care about after doing a reversible computation. So you either have to store these bits somewhere indefinitely, or eventually erase them radiating heat. That makes it possible to reframe a reversible computer as one in which you perfectly cool/remove the heat generated via computation (and maybe dissipate the saved bits far away or whatever). Under this reframe, you can see how we could potentially have really good but imperfect cooling which approaches this ideal (and makes me think it's not a coincidence that good electrical conductors tend to be good heat conductors). Now, there might still be a "soft line" which makes approaching this hard in practice, like the clock issue you mention, but maybe it is possible to incrementally advance current semiconductor tech to the reversible computing limit or at least get pretty close.

Comment by Adele Lopez (adele-lopez-1) on Contra Yudkowsky on Doom from Foom #2 · 2023-04-27T03:32:58.901Z · LW · GW

Kudos on taking the time to tighten and restate your argument! I'd like to encourage more of this from alignment researchers, it seems likely to save lots of time taking past each other and getting mired on disagreements over non-cruxy points.

Comment by Adele Lopez (adele-lopez-1) on Contra Yudkowsky on AI Doom · 2023-04-24T06:09:32.074Z · LW · GW

That is true, and I concede that that weakens my point.

It still seems to be the case that you could get a ~35% efficiency increase by operating in e.g. Antarctica. I also have this intuition I'll need to think more about that there are trade-offs with the Landauer limit that could get substantial gains by separating things that are biologically constrained to be close... similar to how a human with an air conditioner can thrive in much hotter environments (using more energy overall, but not energy that has to be in thermal contact with the brain via e.g. the same circulatory system).

Comment by Adele Lopez (adele-lopez-1) on Chu are you? · 2023-04-24T03:50:36.761Z · LW · GW


For the poset example, I'm using Chu spaces with only 2 colors. I'm also not thinking of the rows or columns of a Chu space as having an ordering (they're sets), you can rearrange them as you please and have a Chu space representing the same structure.

I would suggest reading through to the ## There and Back Again section and in particular while trying to understand how the other poset examples work, and see if that helps the idea click. And/or you can suggest another coloring you think should be possible, and I can tell you what it represents.

Comment by Adele Lopez (adele-lopez-1) on Adele Lopez's Shortform · 2023-04-24T02:48:37.326Z · LW · GW

[Public Draft v0.0] AGI: The Depth of Our Uncertainty

[The intent is for this to become a post making a solid case for why our ignorance about AGI implies near-certain doom, given our current level of capability:alignment efforts.]

[I tend to write lots of posts which never end up being published, so I'm trying a new thing where I will write a public draft which people can comment on, either to poke holes or contribute arguments/ideas. I'm hoping that having any engagement on it will strongly increase my motivation to follow through with this, so please comment even if just to say this seems cool!]

[Nothing I have planned so far is original; this will mostly be exposition of things that EY and others have said already. But it would be cool if thinking about this a lot gives me some new insights too!]

Entropy is Uncertainty

Given a model of the world, there are lots of possibilities that satisfy that model, over which our model implies a distribution.

There is a mathematically inevitable way to quantify the uncertainty latent in such a model, called entropy.

A model is subjective in the sense that it is held by a particular observer, and thus entropy is subjective in this sense too. [Obvious to Bayesians, but worth spending time on as it seems to be a common sticking point]

This is in fact the same entropy that shows up in physics!

Engine Efficiency

But wait, that implies that temperature (defined from entropy) is subjective, which is crazy! After all, we can measure temperature with a thermometer. Or define it as the average kinetic energy of the particles (in a monoatomic gas, in other cases you need the potential energy from the bonds)! Those are both objective in the sense of not depending on the observer.

That is true, as those are slightly different notions of temperature. The objective measurement is the one important for determining whether something will burn your hand, and thus is the one which the colloquial sense of temperature tracks. But the definition entropy is actually more useful, and it's more useful because we can wring some extra advantage from the fact that it is subjective.

And that's because, it is this notion of temperature which governs the use of a engine. Without the subjective definition, we merely get the law of a heat engine. As a simple intuition, consider that you happen to know that your heat source doesn't just have molecules moving randomly, but that they are predominantly moving back and forth along a particular axis at a specific frequency. The temperature of a thermometer attached to this may measure the same temperature as an ordinary heat sink with the same amount of energy (mediated by phonon dissipation), and yet it would be simple to create an engine using this "heat sink" exceeding the Carnot limit simply by using a non-heat engine which takes advantage of the vibrational mode!

Say that this vibrational mode was hidden or hard to notice. Then someone with the knowledge of it would be able to make a more effective engine, and therefore extract more work, than someone who hadn't noticed.

Another example is Maxwell's demon. In this case, the demon has less uncertainty over the state of the gas than someone at the macro-level, and is thereby able to extract more work from the same gas.

But perhaps the real power of this subjective notion of temperature comes from the fact that the Carnot limit still applies with it, but now generalized to any kind of engine! This means that there is a physical limit on how much work can be extracted from a system which directly depends on your uncertainty about the system!! [This argument needs to actually be fleshed out for this post to be convincing, I think...]

The Work of Optimization

[Currently MUCH rougher than the above...]

Hopefully now, you can start to see the outlines of how it is knowable that

Try to let go of any intuitions about "minds" or "agents", and think about optimizers in a very mechanical way.

Physical work is about the energy necessary to change the configuration of matter.

Roughly, you can factor an optimizer into three parts: The Modeler, the Engine, and the Actuator. Additionally, there is the Environment the optimizer exists within and optimizes over. The Modeler models the optimizer's environment - decreasing uncertainty. The Engine uses this decreased uncertainty to extract more work from the environment. The Actuator focuses this work into certain kinds of configuration changes.

[There seems to be a duality between the Modeler and the Actuator which feels very important.]


Gas Heater

  • It is the implicit knowledge of the location, concentration, and chemical structure of a natural gas line that allow the conversion of natural gas and the air in the room to state from a state of both being at the same low temperature to a state where the air is at a higher temperature, and the gas has been burned.

-- How much work does it take to heat up a room? -- How much uncertainty is there in the configuration state before and after combustion?

This brings us to an important point. A gas heater still works with no one around to be modeling it. So how is any of the subjective entropy stuff relevant? Well, from the perspective of no one - the room is simply in one of a plethora of possible states before, and it is in another of those possible states after, just like any other physical process anywhere. It is only because of the fact that we find it somehow relevant that the room is hotter before than after that thermodynamics comes into play. The universe doesn't need thermodynamics to make atoms bounce around, we need it to understand and even recognize it as an interesting difference.



Natural Selection

Chess Engine



Why Orthogonality?

[More high level sections to come]