Posts
Comments
Sorry, I think it's entirely possible that this is just me not knowing or understanding some of the background material, but where exactly does this diverge from justifying the AI pursuing a goal of maximizing the inclusive genetic fitness of its creators? Which clearly either isn't what humans actually want (there are things humans can do to make themselves have more descendants that no humans, including the specific ones who could take those actions, want to take, because of godshatter) or is just circular (who knows what will maximize inclusive genetic fitness in an environment that is being created, in large part, by the decision of how to promote inclusive genetic fitness?). At some point, your writing started talking about "design goals", but I don't understand why tools / artifacts constructed by evolved creatures, that happen to increase the inclusive genetic fitness of the evolved creatures who constructed them by means other than the design goals of those who constructed them, wouldn't be favored by evolution, and thus part of the "purpose" of the evolved creatures in constructing them; and this doesn't seem like an "error" even in the limit of optimal pursuit of inclusive genetic fitness, this seems to be just what optimal pursuit of IGF would actually do. In other words, I don't want a very powerful human-constructed optimizer to pursue the maximization of human IGF, and I think hardly any other humans do either; but I don't understand in detail why your argument doesn't justify AI pursuit of maximizing human IGF, to the detriment of what humans actually value.
As the person who requested of MIRI to release the Sequences as paper books in the first place, I have asked MIRI to release the rest of them, and credibly promised to donate thousands of dollars if they did so. Given the current situation vis-a-vis AI, I'm not that surprised that it still does not appear to be a priority to them, although I am disappointed.
MIRI, if you see this, yet another vote for finishing the series! And my offer still stands!
Thank you for writing this. It has a lot of stuff I haven't seen before (I'm only really interested in neurology insofar as it's the substrate for literally everything I care about, but that's still plenty for "I'd rather have a clue than treat the whole area as spooky stuff that goes bump in the night").
As I understand it, you and many scientists are treating energy consumption by anatomical part of the brain (as proxied by blood flow) as the main way to see "what the brain is doing". It seems possible to me that there are other ways that specific thoughts could be kept compartmentalized, e.g. which neurotransmitters are active (although I guess this correlates pretty strongly to brain region anyway) or microtemporal properties of neural pulses; but the fact that we've found any kind of reasonably consistent relationship between [brain region consuming energy] and [mental state as reported or as predicted by the situation] means that brain region is a factor used for separating / modularizing cognition, if not that it's the only such part. So, I'll take brain region = mental module for granted for now and get to my actual question:
Do you know whether anyone has compiled data, across a wide variety of experiments or other data-gathering opportunities, of which brain regions have which kinds of correlations with one another? E.g. "these two tend to be active simultaneously", "this one tends to become active just after this one", etc.
I'm particularly interested in this for the brain regions you mention in this article, those related in various senses to good and/or to bad. If one puts both menthol and capsaicin in one's mouth at the same time, the menthol will stimulate cold receptors and the capsaicin will stimulate heat receptors, and one will have an experience out of range of what the sensors usually encounter: hot and cold, simultaneously in the same location. What I actually want to know is: are good and bad (or some forms of them, anyway) also represented in a way where one isn't actually the opposite of the other, neurologically speaking? If so, are there actual cases that are clearly best described as "good and bad", where to pick a single number instead would inevitably miss the intensity of the experience?
2 years and 2 days later, in your opinion, has what you predicted in your conclusion happened?
(I'm just a curious bystander; I have no idea if there are any camps regarding this issue, but if so, I'm not a member of any of them.)
might put lawyers out of business
This might be even worse than she thought. Many, many contracts include the exact opposite of this clause, i.e., that the section titles are without any effect whatsoever on the actual interpretation of the contract. I never noticed until just now that this is an instance of self-dealing on the part of the attorneys (typically) drafting the contracts! They're literally saying that if they make a drafting error, in a way that makes the contract harder to understand and use and is in no conceivable way an improvement to the contract, the courts need to assume "well, one common kind of drafting error is putting a clause in the wrong section, probably that's what happened here" only because the "clauses in wrong section are void" provisions you mentioned are as far as I know literally unheard of!
I was just reading about this, and apparently subvocalizing refers to small but physically detectable movement of the vocal cords. I don't know whether / how often I do this (I am not at all aware of it). But it is literally impossible for me to read (or write) without hearing the words in my inner ear, and I'm not dyslexic (my spelling is quite good and almost none of what's described in OP sounds familiar, so I doubt it's that I'm just undiagnosed). I thought this was more common than not, so I'm kind of shocked that the reacts on this comment's grandparent indicate only about 1/3 (of respondents to the "poll") subvocalize. The voice I hear is quite featureless, and I can read maybe 300 words per minute, which I think is actually faster than average, though needing to "hear" the words does impose an upper bound on reading speed.
Leaving an unaligned force (humans, here) in control of 0.001% of resources seems risky. There is a chance that you've underestimated how large the share of resources controlled by the unaligned force is, and probably more importantly, there is a chance that the unaligned force could use its tiny share of resources in some super-effective way that captures a much higher fraction of resources in the future. The actual effect on the economy of the unaligned force, other than the possibility of its being larger than thought or being used as a springboard to gain more control, seems negligible, so one should still expect full extermination unless there's some positive reason for the strong force to leave the weak force intact.
Humans do have such reasons in some cazes (we like seeing animals, at least in zoos, and being able to study them, etc.; same thing for the Amish; plus we also at least sometimes place real value on the independence and self-determination of such beings and cultures), but there would need to be an argument made that AI will have such positive reasons (and a further argument why the AIs wouldn't just "put whatever humans they wanted to preserve" in "zoos", if one thinks that being in a zoo isn't a great future). Otherwise, exterminating humans would be trivially easy with that large of a power gap. Even if there are multiple ASIs that aren't fully aligned with one another, offense is probably easier than defense; if one AI perceives weak benefits to keeping humans around, but another AI perceives weak benefits to exterminating us, I'd assume we get exterminated and then the 2nd AI pays some trivial amount to the 1st for the inconvenience. Getting AI to strongly care about keeping humans around is, of course, one way to frame the alignment problem. I haven't seen an argument that this will happen by default or that we have any idea how to do it; this seems more like an attempt to say it isn't necessary.
Ah, okay, some of those seem to me like they'd change things quite a lot. In particular, a week's notice is usually possible for major plans (going out of town, a birthday or anniversary, concert that night only, etc.) and being able to skip books that don't interest one also removes a major class of reason not to go. The ones I can still see are (1) competing in-town plans, (2) illness or other personal emergency, and (3) just don't feel like going out tonight. (1) is what you're trying to avoid, of course. On (3) I can see your opinion going either way. It does legitimately happen sometimes that one is too tired for whatever plans one had to seem appealing, but it's legitimate to say that if that happens to you so often that you mind the cost of the extra rounds of drinks you end up buying, maybe you're not a great member for that club. (2) seems like a real problem, and I'm gonna guess that you actually wouldn't make people pay for drinks if they said they missed because they had COVID, there was a death in the family, etc.?
Reads like a ha ha only serious to me anyway.
I started a book club in February 2023 and since the beginning I pushed for the rule that if you don't come, you pay for everyone's drinks next time.
I'm very surprised that in that particular form that worked, because the extremely obvious way to postpone (or, in the end, avoid) the penalty is to not go next time either (or, in the end, ever again). I guess if there's agreement that pretty close to 100% attendance is the norm, as in if you can only show up 60% of the time don't bother showing up at all, then it could work. That would make sense for something like a D&D or other tabletop RPG session, or certain forms of competition like, I dunno, a table tennis league, where someone being absent even one time really does cause quite significant harm to the event. But it eliminates a chunk of the possible attendees entirely right from the start, and I imagine would make the members feel quite constrained by the club, particularly if it doesn't appear to be really required by the event itself. And those don't seem good for getting people to show up, either.
That's not to say the analogy overall doesn't work. I'd imagine requiring people to buy a ticket to go to poker night, with that ticket also covering the night's first ante / blind, does work to increase attendance, and for the reasons you state (and not just people being foolish about "sunk costs"). It's just payment of the penalty after the fact, and presumably with no real enforcement, that I don't get. And if you say it works for your book club, I guess probably it does and I'm wrong somehow. But in any case, I notice that I am confused.
I think this is a very important distinction. I prefer to use "maximizer" for "timelessly" finding the highest value of an objective function, and reserve "optimizer" for the kind of stepwise improvement discussed in this post. As I use the terms, to maximize something is to find the state with the highest value, but to optimize it is to take an initial state and find a new state with a higher value. I recognize that "optimize" and "optimizer" are sometimes used the way you're saying, as basically synonymous with "maximize" / "maximizer", and I could retreat to calling the inherently temporal thing I'm talking about an "improver" (or an "improvement process" if I don't want to reify it), but this actually seems less likely to be quickly understood, and I don't think it's all that useful for "optimize" and "maximize" to mean exactly the same thing.
(There is a subset of optimizers as I (and this post, although I think the value should be graded rather than binary) use the term that in the limit reach the maximum, and a subset of those that even reach the maximum in a finite number of steps, but optimizers that e.g. get stuck in local maxima aren't IMO thereby not actually optimizers, even though they aren't maximizers in any useful sense.)
Good post; this has way more value per minute spent reading and understanding it than the first 6 chapters of Jaynes, IMO.
There were 20 destroyed walls and 37 intact walls, leading to 10 − 3×20 − 1×37 = 13db
This appears to have an error; 10 − 3×20 − 1×37 = 10 - 60 - 37 = -87, not 13. I think you meant for the 37 to be positive, in which case 10 - 60 + 37 = -13, and the sign is reversed because of how you phrased which hypothesis the evidence favors (although you could also just reverse all the signs if you want the arithmetic to come out perfectly).
Also, nitpick, but
and every 3 db of evidence increases the odds by a factor of 2
should have an "about" in it, since 10^(3/10) is ~1.99526231497, not 2. (3db ≈ 2× is a very useful approximation, and implied by 10^3 ≈ 2^10, but encountering it indirectly like this would be very confusing to anyone who isn't already familiar with it.)
I re-read this, and wanted to strong-upvote it, and was disappointed that I already had. This is REALLY good. Way better than the thing it parodies (which was already quite good). I wish it were 10x as long.
The way that LLM tokenization represents numbers is all kinds of stupid. It's honestly kind of amazing to me they don't make even more arithmetic errors. Of course, an LLM can use a calculator just fine, and this is an extremely obvious way to enhance its general intelligence. I believe "give the LLM a calculator" is in fact being used, in some cases, but either the LLM or some shell around it has to decide when to use the calculator and how to use the calculator's result. That apparently didn't happen or didn't work properly in this case.
Thanks for your reply. "70% confidence that... we have a shot" is slightly ambiguous - I'd say that most shots one has are missed, but I'm guessing that isn't what you meant, and that you instead meant 70% chance of success.
70% feels way too high to me, but I do find it quite plausible that calling it a rounding error is wrong. However, with a 20 year timeline, a lot of people I care about will almost definitely still die, who could have not died if death were Solved, which group with very much not negligible probability includes myself. And as you note downthread, the brain is a really deep problem with prosaic life extension. Overall I don't see how anything along these lines can be fast enough and certain enough to be a crux on AI for me, but I'm glad people are working on it more than is immediately apparent to the casual observer. (I'm a type 1 diabetic and would have died at 8 years old if I'd lived before insulin was discovered and made medically available, so the value of prosaic life extension is very much not lost on me.)
P.S. Having this set of values and beliefs is very hard on one's epistemics. I think it's a writ-large version of what Eliezer has stated as "thinking about AI timelines is bad for one's epistemics". Here are some examples:
(1) Although I've never been at all tempted by e/acc techno-optimism (on this topic specifically) / alignment isn't a problem at all / alignment by default, boy, it sure would be nice to hear about a strategy for alignment that didn't sound almost definitely doomed for one reason or another. Even though Eliezer can (accurately, IMO) shoot down a couple of new alignment strategies before getting out of bed in the morning. So far I've never found myself actually doing it, but it's impossible not to notice that if I just weren't as good at finding problems or as willing to acknowledge problems found by others, then some alignment strategies I've seen might have looked non-doomed, at least at first...
(2) I don't expect any kind of deliberate slowdown of making AGI to be all that effective even on its own terms, with the single exception of indiscriminate "tear it all down", which I think is unlikely to get within the Overton window, at least in a robust way that would stop development even in countries that don't agree (forcing someone to sabotage / invade / bomb them). Although such actions might buy us a few years, it seems overdetermined to me that they still leave us doomed, and in fact they appear to cut away some of the actually-helpful options that might otherwise be available (the current crop of companies attempting to develop AGI definitely aren't the least concerned with existential risk of all actors who'd develop AGI if they could, for one thing). Compute thresholds of any kind, in particular, I expect to lead to much greater focus on doing more with the same compute resources rather than doing more by using more compute resources, and I expect there's a lot of low-hanging fruit there since that isn't where people have been focusing, and that the thresholds would need to decrease very much very fast to actually prevent AGI, and decreasing the thresholds below the power of a 2023 gaming rig is untenable. I'm not aware of any place in this argument where I'm allowing "if deliberate slowdowns were effective on their own terms, I'd still consider the result very bad" to bias my judgment. But is it? I can't really prove it isn't...
(3) The "pivotal act" framing seems unhelpful to me. It seems strongly impossible to me for humans to make an AI that's able to pass strawberry alignment that has so little understanding of agency that it couldn't, if it wanted to, seize control of the world. (That kind of AI is probably logically possible, but I don't think humans have any real possibility of building one.) An AI that can't even pass strawberry alignment clearly can't be safely handed "melt all the GPUs" or any other task that requires strongly superhuman capabilities (and if "melt all the GPUs" were a good idea, and it didn't require strongly superhuman capabilities, then people should just directly do that). So, it seems to me that the only good result that could come from aiming for a pivotal act would be that the ASI you're using to execute it is actually aligned with humans and "goes rogue" to implement our glorious transhuman future; and it seems to me that if that's what you want, it would be better to aim for that directly rather than trying to fit it through this weirdly-shaped "pivotal act" hole.
But... if this is wrong, and a narrow AGI could safely do a pivotal act, I'd very likely consider the resulting world very bad anyway, because we'd be in a world where unaligned ASI has been reliably prevented from coming into existence, and if the way that was done wasn't by already having aligned ASI, then by far the obvious way for that to happen is to reliably prevent any ASI from coming into existence. But IMO we need aligned ASI to solve death. Does any of that affect how compelling I find the case for narrow pivotal-act AI on its own terms? Who knows...
I agree with the Statement. As strongly as I can agree with anything. I think the hope of current humans achieving... if not immortality, then very substantially increased longevity... without AI doing the work for us, is at most a rounding error. And ASI that was even close to aligned, that found it worth reserving even a billionth part of the value of the universe for humans, would treat this as the obvious most urgent problem and solve death pretty much if there's any physically possible way of doing so. And when I look inside, I find that I simply don't care about a glorious transhumanist future that doesn't include me or any of the particular other humans I care about. I do somewhat prefer being kind / helpful / benificent to people I've never met, very slightly prefer that even for people who don't exist yet, but it's far too weak a preference to trade off against any noticeable change to the odds of me and everyone I care about dying. If that makes me a "sociopath" in the view of someone or other, oh well.
I've been a supporter of MIRI, AI alignment, etc. for a long time, not because I share that much with EA in terms of values, but because the path to the future having any value has seemed for a long time to route through our building aligned ASI, which I consider as hard as MIRI does. But when the "pivotal act" framing started being discussed, rather than actually aligning ASI, I noticed a crack developing between my values and MIRI's, and the past year with advocacy for "shut it all down" and so on has blown that crack wide open. I no longer feel like a future I value has any group trying to pursue it. Everyone outside of AI alignment is either just confused and flailing around with unpredictable effects, or is badly mistaken and actively pushing towards turning us all into paperclips, but those in AI alignment are either extremely unrealistically optimistic about plans that I'm pretty sure, for reasons that MIRI has argued, won't work; or, like current MIRI, they say things like that I should stake my personal presence in the glorious transhumanist future on cryonics (and what of my friends and family members who I could never convince to sign up? What of the fact that, IMO, current cryonics practice probably doesn't even prevent info-theoretical death, let alone give one a good shot at actually being revived at some point in the future?)
I happen to also think that most plans for preventing ASI from happening soon, that aren't "shut it all down" in a very indiscriminate way, just won't work - that is, I think we'll get ASI (and probably all die) pretty soon anyway. And I think "shut it all down" is very unlikely to be societally selected as our plan for how to deal with AI in the near term, let alone effectively implemented. There are forms of certain actors choosing to go slower on their paths to ASI that I would support, but only if those actors are doing that specifically to attempt to solve alignment before ASI, and only if it won't slow them down so much that someone else just makes unaligned ASI first anyway. And of course we should forcibly stop anyone who is on the path to making ASI without even trying to align it (because they're mistaken about the default result of building ASI without aligning it, or because they think humanity's extinction is good actually), although I'm not sure how capable we are of stopping them. But I want an organization that is facing up to the real, tremendous difficulty of making the first ASI aligned, and trying to do that anyway, because no other option actually has a result that they (or I) find acceptable. (By the way, MIRI is right that "do your alignment homework for you" is probably the literal worst possible task to give to one's newly developed AGI, so e.g. OpenAI's alignment plan seems deeply delusional to me and thus OpenAI is not the org for which I'm looking.)
I'd like someone from MIRI to read this. If no one replies here, I may send them a copy, or something based on this.
Yes he should disclose somewhere that he's doing this, but deepfakes with the happy participation of the person whose voice is being faked seems like the best possible scenario.
Yes and no. The main mode of harm we generally imagine is to the person deepfaked. However, nothing prevents the main harm in a particular incident of harmful deepfaking from being to the people who see the deep fake and believe the person depicted actually said and did the things depicted.
That appears to be the implicit allegation here - that recipients might be deceived into thinking Adams actually speaks their language (at least well enough to record a robocall). Or at least, if that's not it, then I don't get it either.
I've seen a lot of attempts to provide "translations" from one domain-specific computer language to another, and they almost always have at least one of these properties:
- They aren't invertible, nor "almost invertible" via normalization
- They rely on an extension mechanism intentionally allowing the embedding of arbitrary data into the target language
- They use hacks (structured comments, or even uglier encodings if there aren't any comments) to embed arbitrary data
- They require the source of the translation to be normalized before (and sometimes also after, but always before) translation
(2) and (3) I don't think are super great here. If there are blobs of data in the translated version that I can't understand, but that are necessary for the original sender to interpret the statement, it isn't clear how I can manipulate the translated version while keeping all the blobs correct. Plus, as the recipient, I don't really want to be responsible for safely maintaining and manipulating these blobs.
(1) is clearly unworkable (if there's no way to translate back into the original language, there can't be a conversation). That leaves 4. 4 requires stripping anything that can't be represented in an invertible way before translating. E.g., if I have lists but you can only understand sets, and assuming no nesting, I may need to sort my list and remove duplicates from it as part of normalization. This deletes real information! It's information that the other language isn't prepared to handle, so it needs to be removed before sending. This is better than sending the information in a way that the other party won't preserve even when performing only operations they consider valid.
I think this applies to the example from the post, too - how would I know whether certain instances of double negation or provability were artifacts that normalization is supposed to strip, or just places where someone wanted to make a statement about double negation or provability?
Malbolge? Or something even nastier in a similar vein, since it seems like people actually figured out (with great effort) how to write programs in Malbolge. Maybe encrypt all the memory after every instruction, and use a real encryption algorithm, not a lookup table.
Some points which I think support the plausibility of this scenario:
(1) EY's ideas about a "simple core of intelligence", how chimp brains don't seem to have major architectural differences from human brains, etc.
(2) RWKV vs Transformers. Why haven't Transformers been straight up replaced by RWKV at this point? Looks to me like potentially huge efficiency gains being basically ignored because lab researchers can get away with it. Granted, affects efficiency of inference but not training AFAIK, and maybe it wouldn't work at the 100B+ scale, but it certainly looks like enough evidence to do the experiment.
(3) Why didn't researchers jump straight to the end on smaller and smaller floating point (or fixed point) precision? Okay, sure, "the hardware didn't support it" can explain some of it, but you could still do smaller scale experiments to show it appears to work and get support into the next generation of hardware (or at some point even custom hardware if the gains are huge enough) if you're serious about maximizing efficiency.
(4) I have a few more ideas for huge efficiency gains that I don't want to state publicly. Probably most of them wouldn't work. But the thing about huge efficiency gains is that if they do work, doing the experiments to find that out is (relatively) cheap, because of the huge efficiency gains. I'm not saying anyone should update on my claim to have such ideas, but if you understand modern ML, you can try to answer the question "what would you try if you wanted to drastically improve efficiency" and update on the answers you come up with. And there are probably better ideas than those, and almost certainly more such ideas. I end up mostly thinking lab researchers aren't trying because it's just not what they're being paid to do, and/or it isn't what interests them. Of course they are trying to improve efficiency, but they're looking for smaller improvements that are more likely to pan out, not massive improvements any given one of which probably won't work.
Anyway, I think a world in which you could even run GPT-4 quality inference (let alone training) on a current smartphone looks like a world where AI is soon going to determine the future more than humans do, if it hasn't already happened at that point... and I'm far from certain this is where compute limits (moderate ones, not crushingly tight ones that would restrict or ban a lot of already-deployed hardware) would lead, but it doesn't seem to me like this possibility is one that people advocating for compute limits have really considered, even if only to say why they find it very unlikely. (Well, I guess if you only care about buying a moderate amount of time, compute limits would probably do that even in this scenario, since researchers can't pivot on a dime to improving efficiency, and we're specifically talking about higher-hanging efficiency gains here.)
I certainly don't think labs will only try to improve algorithms if they can't scale compute! Rather, I think that the algorithmic improvements that will be found by researchers trying to figure out how to improve performance given twice as much compute as the last run won't be the same ones found by researchers trying to improve performance given no increase in compute.
One would actually expect the low hanging fruit in the compute-no-longer-growing regime to be specifically the techniques that don't scale, since after all, scaling well is an existing constraint that the compute-no-longer-growing regime removes. I'm not talking about those. I'm saying it seems reasonably likely to me that the current techniques producing state of the art results are very inefficient, and that a newfound focus on "how much can you do with N FLOPs, because that's all you're going to get for the foreseeable future" might give fundamentally more efficient techniques that turn out to scale better too.
It's certainly possible that with a compute limit, labs will just keep doing the same "boring" stuff they already "know" they can fit into that limit... it just seems to me like people in AI safety advocating for compute limits are overconfident in that. It seems to me that the strongest plausible version of this possibility should be addressed by anyone arguing in favor of compute limits. I currently weakly expect that compute limits would make things worse because of these considerations.
Slowing compute growth could lead to a greater focus on efficiency. Easy to find gains in efficiency will be found anyway, but harder to find gains in efficiency currently don't seem to me to be getting that much effort, relative to ways to derive some benefit from rapidly increasing amounts of compute.
If models on the capabilities frontier are currently not very efficient, because their creators are focused on getting any benefit at all from the most compute that is practically available to them now, restricting compute could trigger an existing "efficiency overhang". If (some of) the efficient techniques found are also scalable (which some and maybe most won't be, to be sure), then if larger amounts of compute do later become available, we could end up with greater capabilities at the time a certain amount of compute becomes available, relative to the world where available compute kept going up too smoothly to incentivize a focus on efficiency.
This seems reasonably likely to me. You seem to consider this negligibly likely. Why?
I can actually sort of write the elevator pitch myself. (If not, I probably wouldn't be interested.) If anything I say here is wrong, someone please correct me!
Non-realizability is the problem that none of the options a real-world Bayesian reasoner is considering is a perfect model of the world. (It actually information-theoretically can't be, if the reasoner is itself part of the world, since it would need a perfect self-model as part of its perfect world-model, which would mean it could take its own output as an input into its decision process, but then it could decide to do something else and boom, paradox.) One way to explain the sense in which the models of real-world reasoners are imperfect is that, rather than a knife-edge between bets they'll take and bets on which they'll take the other side, one might, say, be willing to take a bet that pays out 9:1 that it'll rain tomorrow, and a bet that pays out 1:3 if it doesn't rain tomorrow, but for anything in between, one wouldn't be willing to take either side of the bet. A lot of important properties of Bayesian reasoning depend on realizability, so this is a serious problem.
Infra-Bayesianism purports to solve this by replacing the single probability distribution maintained by an ideal Bayesian reasoner by a certain kind of set of probability distributions. As I understand it, this is done in a way that's "compatible with Bayesianism" in the sense that if there were only one probability distribution in your set, it would act like regular Bayesianism, but in general the thing that corresponds to a probability is instead the minimum of the probability across all the probability distributions in your set. This allows one to express things like "I'm at least 10% confident it'll rain tomorrow, and at least 75% confident it won't rain tomorrow, but if you ask me whether it's 15% or 20% likely to rain tomorrow, I just don't know."
The case in which this seems most obviously useful to me is adversarial. Those offering bets should - if they're rational - be systematically better informed about the relevant topics. So I should (it seems to me) have a range of probabilities within which the fact that you're offering the bet is effectively telling me that you appear to be better informed than I am, and therefore I shouldn't bet. However, I believe Infra-Bayesianism is intended to more generally allow agents to just not have opinions about every possible question they could be asked, but only those about which they actually have some relevant information.
Let's say that I can understand neither the original IB sequence, nor your distillation. I don't have the prerequisites. (I mean, I know some linear algebra - that's hard to avoid - but I find topology loses me past "here's what an open set is" and I know nothing about measure theory.)
I think I understand what non-realizability is and why something like IB would solve it. Is all the heavy math actually necessary to understand how IB does so? I'm very tempted to think of IB as "instead of a single probability distribution over outcomes, you just keep a (convex[1]) set of probability distributions instead, and eliminate any that you see to be impossible, and choose according to the minimum of the expected value of the ones you have left". But I think this is wrong, just like "a quantum computer checks all the possible answers in parallel" is wrong (if that were right, a classical algorithm in P would directly translate into a quantum algorithm in NP, right? I still don't actually get quantum computation, either.) And I don't know why it's wrong or what it's missing.
[1] That just means that for any and in the set, and any , is also in the set, right?
Is there anything better I can do to understand IB than first learn topology and measure theory (or other similarly sized fields) in a fully general way? And am I the only person who's repeatedly bounced off attempts to present IB, but for some reason still feels like maybe there's actually something there worth understanding?
I was wondering if anyone would mention that story in the comments. I definitely agree that it has very strong similarities in its core idea, and wondered if that was deliberate. I don't agree with any implications (which you may or may not have intended) that it's so derivative as to make not mentioning Omelas dishonest, though, and independent invention seems completely plausible to me.
Edited to add: although the similar title does incline rather strongly to Omelas being an acknowledged source.
It seems like there might be a problem with this argument if the true are not just unknown, but adversarially chosen. For example, suppose the true are the actual locations of a bunch of landmines, from a full set of possible landmine positions . We are trying to get a vehicle from A to B, and all possible paths go over some of the . We may know that the opponent placing the landmines only has landmines to place. Furthermore, suppose each landmine only goes off with some probability even if the vehicle drives over it. If we can mechanistically predict where the opponent placed the landmines, or even mechanistically derive a probability distribution over the landmine placements, this is no problem, we can just use that to minimize the expected probability of driving over a landmine that goes off. However, suppose we can't predict the opponent that way, but we do know the opponent is trying to maximize the probability that the vehicle drives over a landmine that isn't a dud. It seems like we need to use game theory here, not just probability theory, to figure out what mixed strategy the opponent would be using to maximize the probability that we drive over a landmine, and then use that game-theoretic strategy to choose a mixed strategy for which path to take. It seems like the game theory here involves a step where we look for the worst (according to our utility function) probability distribution over where the landmines are placed, because this is how the opponent will have actually chosen where to put the landmines. Doesn't this look a lot like using rather than as our utility function?
I like this frame, and I don't recall seeing it already addressed.
What I have seen written about deceptiveness generally seems to assume that the AGI would be sufficiently capable of obfuscating its thoughts from direct queries and from any interpretability tools we have available that it could effectively make its plans for world domination in secret, unobserved by humans. That does seem like an even more effective strategy for optimizing its actual utility function than not bothering to think through such plans at all, if it's able to do it. But it's hard to do, and even thinking about it is risky.
I can imagine something like what you describe happening as a middle stage, for entities that are agentic enough to have (latent, probably misaligned since alignment is probably hard) goals, but not yet capable enough to think hard about how to optimize for them without being detected. It seems more likely if (1) almost all sufficiently powerful AI systems created by humans will actually have misaligned goals, (2) AIs are optimized very hard against having visibly misaligned cognition (selection of which AIs to keep being a form of optimization, in this context), and (3) our techniques for making misaligned cognition visible are more reliably able to detect an active process / subsystem doing planning towards goals than the mere latent presence of such goals. (3) seems likely, at least for a while and assuming we have any meaningful interpretability tools at all; it's hard for me to imagine a detector of latent properties that doesn't just always say "well, there are some off-distribution inputs that would make it do something very bad" for every sufficiently powerful AI, even one that was aligned-in-practice because those inputs would reliably never be given to it.
Hmm. My intuition says that your A and B are "pretty much the same size". Sure, there are infinitely many times that they switch places, but they do so about as regularly as possible and they're always close.
If A is "numbers with an odd number of digits" and B is "numbers with an even number of digits" that intuition starts to break down, though. Not only do they switch places infinitely often, but the extent to which one exceeds the other is unbounded. Calling A and B "pretty much the same size" starts to seem untenable; it feels more like "the concept of being bigger or smaller or the same size doesn't properly apply to the pair of A and B". (Even though A and B are well defined, not THAT hard to imagine, and mathematicians will still say they're the same size!)
If A is "numbers whose number of digits is a multiple of 10", and B is all the other (positive whole) numbers, then... I start to intuitively feel like B is bigger again??? I think this is probably just my intuition not being able to pay attention to all the parts of the question at the same time, and thus substituting "are there more multiples of 10 or non-multiples", which then works the way you said.
I think this comment demonstrates that the list of reacts should wrap, not extend arbitrarily far to the right.
The obvious way to quickly and intuitively illustrate whether reactions are positive or negative would seem to be color; another option would be grouping them horizontally or vertically with some kind of separator. The obvious way to quickly and intuitively make it visible which reactions were had by more readers would seem to be showing a copy of the same icon for each person who reacted a certain way, not a number next to the icon.
I make no claim that either of these changes would be improvements overall. Clearly the second would require a way to handle large numbers of reactions to the same comment. The icons could get larger or smaller depending on number of that reaction, but small icons would get hard to recognize. Falling back to numbers isn't great either, since it's exactly in the cases where that fallback would happen that the number of a particular reaction has become overwhelmingly high.
I think it matters that there are a lot of different reactions possible compared to, say, Facebook, and at the same time, unlike many systems with lots of different reactions, they aren't (standard Unicode) emoji, so you don't get to just transfer existing knowledge of what they mean. And they have important semantic (rather than just emotive) content, so it actually matters if one can quickly tell what they mean. And they partially but not totally overlap with karma and agreement karma; it seems a bit inelegant and crowded to have both, but there are benefits that are hard to achieve with only one. It's a difficult problem.
In the current UI, the list of reactions from which to choose is scrollable, but that's basically impossible to actually see. While reading the comments I was wondering what the heck people were talking about with "Strawman" and so forth. (Like... did that already get removed?) Then I discovered the scrolling by accident after seeing a "Shrug" reaction to one of the comments.
I've had similar thoughts. Two counterpoints:
-
This is basically misuse risk, which is not a weird problem that people need to be convinced even needs solving. To the extent AI appears likely to be powerful, society at large is already working on this. Of course, its efforts may be ineffective or even counterproductive.
-
They say power corrupts, but I'd say power opens up space to do what you were already inclined to do without constraints. Some billionaires, e.g. Bill Gates, seem to be sincerely trying to use their resources to help people. It isn't hard for me to imagine that many people, if given power beyond what they can imagine, would attempt to use it to do helpful / altruistic things (at least, things they themselves considered helpful / altruistic).
I don't in any sense think either of these are knockdowns, and I'm still pretty concerned about how controllable AI systems (whether that's because they're aligned, or just too weak and/or insufficiently agentic) may be used.
On SBF, I think a large part of the issue is that he was working in an industry called cryptocurrency that is basically has fraud as the bedrock of it all. There was nothing real about crypto, so the collapse of FTX was basically inevitable.
I don't deny that the cryptocurrency "industry" has been a huge magnet for fraud, nor that there are structural reasons for that, but "there was nothing real about crypto" is plainly false. The desire to have currencies that can't easily be controlled, manipulated, or implicitly taxed (seigniorage, inflation) by governments or other centralized organizations and that can be transferred without physical presence is real. So is the desire for self-executing contracts. One might believe those to be harmful abilities that humanity would be better off without, but not that they're just nothing.
Thank you for writing these! They've been practically my only source of "news" for most of the time you've been writing them, and before that I mostly just ignored "news" entirely because I found it too toxic and it was too difficult+distasteful to attempt to decode it into something useful. COVID the disease hasn't directly had a huge effect on my life, and COVID the social phenomenon has been on a significant decline for some time now, but your writing about it (and the inclusion of especially notable non-COVID topics) have easily kept me interested enough to keep reading. Please consider continuing some kind of post on a weekly cadence. I think it's a really good frequency to never lose touch but also not be too burdensome (to the reader or the writer).
I found it to be a pretty obvious reference to the title. SPAM is a meatcube. A meatcube is something that has been processed into uniformity. Any detectable character it had, whether faults, individuality, or flashes of brilliance, has been ground, blended, and seasoned away.
I don't know how far a model trained explicitly on only terminal output could go, but it makes sense that it might be a lot farther than a model trained on all the text on the internet (some small fraction of which happens to be terminal output). Although I also would have thought GPT's architecture, with a fixed context window and a fixed number of layers and tokenization that isn't at all optimized for the task, would pay large efficiency penalties at terminal emulation and would be far less impressive at it than it is at other tasks.
Assuming it does work, could we get a self-operating terminal by training another GPT to roleplay the entering commands part? Probably. I'm not sure we should though...
Sure, I understood that's what was being claimed. Roleplaying a Linux VM without error seemed extremely demanding relative to other things I knew LLMs could do, such that it was hard for me not to question whether the whole thing was just made up.
Thanks! This is much more what I expected. Things that look generally like outputs that commands might produce, and with some mind-blowing correct outputs (e.g. the effect of tr
on the source code) but also some wrong outputs (e.g. the section after echo A >a; echo X >b; echo T >c; echo H >d
; the output being consistent between cat a a c b d d
and cat a a c b d d | sort
(but inconsistent with the "actual contents" of the files) is especially the kind of error I'd expect an LLM to make).
That works too!
Got it. This post also doesn't appear to actually be part of that sequence though? I would have noticed if it was and looked at the sequence page.
EDIT: Oh, I guess it's not your sequence.
EDIT2: If you just included "Alignment Stream of Thought" as part of the link text in your intro where you do already link to the sequence, that would work.
ASoT
What do you mean by this acronym? I'm not aware of its being in use on LW, you don't define it, and to me it very definitely (capitalization and all) means Armin van Buuren's weekly radio show A State of Trance.
Counterpoint #2a: A misaligned AGI whose capabilities are high enough to use our safety plans against us will succeed with an equal probability (e.g., close to 100%), if necessary by accessing these plans whether or not they were posted to the Internet.
If only relative frequency of genes matters, then the overall size of the gene pool doesn't matter. If the overall size of the gene pool doesn't matter, then it doesn't matter if that size is zero. If the size of the gene pool is zero, then whatever was included in that gene pool is extinct.
Yes, it's true people make all kinds of incorrect inferences because they think genes that increase the size of the gene pool will be selected for or those that decrease it will be selected against. But it's still also true that a gene that reduces the size of the pool it's in to zero will no longer be found in any living organisms, regardless of what its relative frequency was in the process of the pool reaching a size of zero. If the term IGF doesn't include that, that just means IGF isn't a complete way of accounting for what organisms we observe to exist in what frequencies and how those change over time.
I mean, just lag, yes, but there's also plain old incorrect readings. But yes, it would be cool to have a system that incorporated glucagon. Though, diabetics' body still produce glucagon AFAIK, so it'd really be better to just have something that senses glucose and releases insulin the same way a working pancreas would.
Context: I am a type 1 diabetic. I have a CGM, but for various reasons use multiple daily injections rather than an insulin pump; however, I'm familiar with how insulin pumps work.
A major problem with a closed-loop CGM-pump system is data quality from the CGM. My CGM (Dexcom G6) has ~15 minutes of lag (because it reads interstitial fluid, not blood). This is the first generation of Dexcom that doesn't require calibrations from fingersticks, but I've occasionally had CGM readings that felt way off and needed to calibrate anyway. Accuracy and noisiness vary from sensor to sensor (they last 10 days, officially; people have figured out how to "restart" them, but I've found often the noisiness goes up towards the end of the 10 days anyway), probably due to placement. It also only produces a reading every 5 minutes, probably partly to save battery but maybe also because more than that would be false precision anyway. And low blood sugar can be lethal rather quickly (by killing neurons, or by messing up neural function enough that you get into a car accident if you're driving), so these issues mean caution is needed when using CGM readings to choose insulin dosing.
I'd think of connecting that to an insulin pump using a control system as more similar to Tesla Autopilot than to a Level 5 autonomous car. It's sort of in the uncanny valley where the way it works is tempting you to just ignore it, but you actually can't. I certainly don't mean that these problems are impossible to overcome, and in fact "hybrid" closed loop systems, which still require manual intervention from time to time, are starting to become commercially available, and there are also DIY systems. (Type 1 diabetics vary in how much they geek out about managing it; I think I'm somewhere in the middle in terms of absolute geekiness, meaning, I would guess, 95th+ percentile relative to the relevant population.) But I think there are pretty strong reasons people don't look at "well, just connect a 2022 off-the-shelf CGM, 2022 off-the-shelf insulin pump, and some software" as a viable fully closed loop for managing blood sugar for type 1 diabetics.
We'll build the most powerful AI we think we can control. Nothing prevents us from ever getting that wrong. If building one car with brakes that don't work made everyone in the world die in a traffic accident, everyone in the world would be dead.
How much did that setup cost? I'm curious about similar use cases.
The best way to actually schedule or predict a project is to break it down into as many small component tasks as possible, identify dependencies between those tasks, and produce most likely, optimistic, and pessimistic estimates for each task, and then run a simulation for chain of dependencies to see what the expected project completion looks like. Use a Gantt chart. This is a boring answer because it's the "learn project management" answer, and people will hate on it because gesture vaguely to all of the projects that overrun their schedule. There are many interesting reasons for why that happens and why I don't think it's a massive failure of rationality, but I'm not sure this comment is a good place to go into detail on that. The quick answer is that comical overrun of a schedule has less to do with an inability to create correct schedules from an engineering / evidence-based perspective, and much more to do with a bureaucratic or organizational refusal to accept an evidence-based schedule when a totally false but politically palatable "optimistic" schedule is preferred.
I definitely agree that this is the way to get the most accurate prediction practically possible, and that organizational dysfunction often means this isn't used, even when the organization would be better able to achieve its goals with an accurate prediction. But I also think that depending on the type of project, producing an accurate Gantt chart may take a substantial fraction of the effort (or even a substantial fraction of the wall-clock time) of finishing the entire project, or may not even be possible without already having some of the outputs of the processes earlier in the chart. These aren't necessarily possible to eradicate, so the take-away, I think, is not to be overly optimistic about the possibility of getting accurate schedules, even when there are no ill intentions and all known techniques to make more accurate schedules are used.
In other words, asking people for a best guess or an optimistic prediction results in a biased prediction that is almost always earlier than a real delivery date. On the other hand, while the pessimistic question is not more accurate (it has the same absolute error margins), it is unbiased. The reality is that the study says that people asked for a pessimistic question were equally likely to over-estimate their deadline as they were to under-estimate it. If you don't think a question that gives you a distribution centered on the right answer is useful, I'm not sure what to tell you.
It's interesting that the median of the pessimistic expectations is about equal to the median of the actual results. The mean clearly wasn't, as that discrepancy was literally the point of citing this statistic in the OP:
in a classic experiment, 37 psychology students were asked to estimate how long it would take them to finish their senior theses “if everything went as poorly as it possibly could,” and they still underestimated the time it would take, as a group (the average prediction was 48.6 days, and the average actual completion time was 55.5 days).
So the estimates were biased, but not median-biased (at least that's what Wikipedia appears to say the terminology is). Less biased than other estimates, though. Of course this assumes we're taking the answer to "how long would it take if everything went as poorly as it possibly could" and interpreting it as the answer to "how long will it actually take", and if students were actually asked after the fact if everything went as poorly as it possibly could, I predict they would mostly say no. And treating the text "if everything went as poorly as it possibly could" as if it wasn't even there is clearly wrong too, because they gave a different (more biased towards optimism) answer if it was omitted.
This specific question seems kind of hard to make use of from a first-person perspective. But I guess maybe as a third party one could ask for worst-possible estimates and then treat them as median-unbiased estimators of what will actually happen? Though I also don't know if the median-unbiasedness is a happy accident. (It's not just a happy accident, there's something there, but I don't know whether it would generalize to non-academic projects, projects executed by 3rd parties rather than oneself, money rather than time estimates, etc.)
I do still also think there's a question of how motivated the students were to give accurate answers, although I'm not claiming that if properly motivated they would re-invent Murphyjitsu / the pre-mortem / etc. from whole cloth; they'd probably still need to already know about some technique like that and believe it could help get more accurate answers. But even if a technique like that is an available action, it sounds like a lot of work, only worth doing if the output has a lot of value (e.g. if one suspects a substantial chance of not finishing the thesis before it's due, one might wish to figure out why so one could actively address some of the reasons).