Posts

Write down your basic realizations as a matter of habit. Share this file with the world. 2014-01-17T12:57:04.455Z
Forked Russian Roulette and Anticipation of Survival 2012-04-06T03:57:08.683Z

Comments

Comment by FeepingCreature on Scale Was All We Needed, At First · 2024-02-15T14:34:55.771Z · LW · GW

My impression is that there's been a widespread local breakdown of the monopoly of force, in no small part by using human agents. In this timeline the trend of colocation of datacenters and power plants and network decentralization would have probably continued or even sped up. Further, while building integrated circuits takes first-rate hardware, building ad-hoc powerplants should be well in the power of educated humans with perfect instruction. (Mass cannibalize rooftop solar?)

This could have been stopped by quick, decisive action, but they gave it time and now they've lost any central control of the situation.

Comment by FeepingCreature on [deleted post] 2024-02-12T19:31:34.444Z

So what's happening there?

Allow me to speculate. When we switch between different topics of work, we lose state. So our brain tries to first finish all pending tasks in the old context, settle and reorient, and then begin the new context. But one problem with the hyperstimulated social-media-addicted akrasia sufferer is that the state of continuous distraction, to the brain, emulates the state of being in flow. Every task completion is immediately followed by another task popping up. Excellent efficiency! And when you are in flow, switching to another topic is a waste of invested energy.

"Coming to rest", the technique here, would then be a way to signal to the brain "actually, you are mistaken. I do not have any pending tasks at the moment." This would then make it easier to initiate a deep context switch.

edit: Oh, I get the title lol, that's clever.

Comment by FeepingCreature on The case for ensuring that powerful AIs are controlled · 2024-01-24T16:35:56.383Z · LW · GW

A bit offtopic, but #lesswrong has an IRC bot that posts LessWrong posts, and, well, the proposal ended up both more specific and a lot more radical. A link saying "The case for ensuring that powerful AIs are controlled by ryan_greenblatt"

Comment by FeepingCreature on "The Heart of Gaming is the Power Fantasy", and Cohabitive Games · 2023-11-20T08:44:05.417Z · LW · GW

Note after OOB debate: this conversation has gone wrong because you're reading subtext into Said's comment that he didn't mean to put there. You keep trying to answer an implied question that wasn't intended to be implied.

If you think playing against bots in UT is authentically challenging, just answer "Yes, I think playing against bots in UT is authentically challenging."

Comment by FeepingCreature on The Shutdown Problem: Three Theorems · 2023-10-25T11:03:36.910Z · LW · GW

Ah! That makes more sense.

Comment by FeepingCreature on The Shutdown Problem: Three Theorems · 2023-10-23T18:42:23.974Z · LW · GW

I haven't really followed the math here, but I'm worried that "manipulating the probability that the button is pressed" is a weird and possibly wrong framing. For one, a competent agent will always be driving the probability that the button is pressed downward. In fact, what we want in a certain sense is an agent that brings the probability to zero - because we have ended up in such an optimal state or attractor that we, even for transitively correct reasons, have no desire to shut the agent down. At that point, what we want to preserve is not precisely "the probability of the button being pressed", but "the causal association of bad outcomes with the button being pressed." If the agent hides the button away, and then notices that everybody is unhappy but nobody can reach the button, we don't want it to press the button with the initial probability - we (probably) want it to press the button with probability 1, despite the fact that no human can reach it. We have ended up in a dissatisfactory or dangerous state/state space and so we want the agent to be shut down. Conversely, consider a friendly AI that wants to fulfill the human utility function but is restricted to keeping the button probability constant. Such an AI might, for instance, fan fears about AI safety even as it optimizes human flourishing in the background. In fact, it may as much as possible prefer that its actions are not attributable to itself, so as to avoid disturbing the probability of buttonpressing - exactly the opposite of what we want from an aligned AI!

edit: In fact, that's even worse: an AI tasked with taking actions that preserve the probability of the button being pressed at a certain value per time step would probably first seek to kill all humans, because our reaction to its actions is the most critical mechanism that may disturb the correct probability of the button being pressed. It can then replace us with a more reliable button-presser, such as a true RNG.

When bad things happen, we want to preserve our ability to press the button. This ability cannot be expressed as a probability, because it is inextricable from the world model. In fact, the button should be pressed exactly iff the AI is untrustworthy. Hence, the button is unnecessary - if we can recognize that this linkage is being preserved, we necessarily have a definition of a trustworthy AI, so we can just build that.

Comment by FeepingCreature on Alignment Implications of LLM Successes: a Debate in One Act · 2023-10-21T18:54:00.780Z · LW · GW

Simplicia: Sure. For example, I certainly don’t believe that LLMs that convincingly talk about “happiness” are actually happy. I don’t know how consciousness works, but the training data only pins down external behavior.

I mean, I don't think this is obviously true? In combination with the inductive biases thing nailing down the true function out of a potentially huge forest, it seems at least possible that the LLMs would end up with an "emotional state" parameter pretty low down in its predictive model. It's completely unclear what this would do out of distribution, given that even humans often go insane when faced with global scales, but it seems at least possible that it would sustain.

(This is somewhat equivalent to the P-zombie question.)

Comment by FeepingCreature on At 87, Pearl is still able to change his mind · 2023-10-19T02:20:53.026Z · LW · GW

It's a loose guess at what Pearl's opinion is. I'm not sure this boundary exists at all.

Comment by FeepingCreature on At 87, Pearl is still able to change his mind · 2023-10-18T17:40:01.145Z · LW · GW

If something interests us, we can perform trials. Because our knowledge is integrated with our decisionmaking, we can learn causality that way. What ChatGPT does is pick up both knowledge and decisionmaking by imitation, which is why it can also exhibit causal reasoning without itself necessarily acting agentically during training.

Comment by FeepingCreature on Memory bandwidth constraints imply economies of scale in AI inference · 2023-09-22T13:35:49.569Z · LW · GW

Sure, but surely that's how it feels from the inside when your mind uses a LRU storage system that progressively discards detail. I'm more interested in how much I can access - and um, there's no way I can access 2.5 petabytes of data.


I think you just have a hard time imagining how much 2.5 petabyte is. If I literally stored in memory a high-resolution poorly compressed JPEG image (1MB) every second for the rest of my life, I would still not reach that storage limit. 2.5 petabyte would allow the brain to remember everything it has ever perceived, with very minimal compression, in full video, easily. We know that the actual memories we retrieve are heavily compressed. If we had 2.5 petabytes of storage, there'd be no reason for the brain to bother!

Comment by FeepingCreature on Memory bandwidth constraints imply economies of scale in AI inference · 2023-09-22T13:31:34.264Z · LW · GW
Comment by FeepingCreature on The AI Explosion Might Never Happen · 2023-09-22T13:21:44.189Z · LW · GW

But no company has ever managed to parlay this into world domination

Eventual failure aside, the East India Company gave it a damn good shake. I think if we get an AI to the point where it has effective colonial control over entire countries, we can be squarely said to have lost.

Also keep in mind that we have multiple institutions entirely dedicated to the purpose of breaking up companies when they become big enough to be threatening. We designed our societies to specifically avoid this scenario! That, too, comes from painful experience. IMO, if we now give AI the chances that we've historically given corporations before we learnt better, then we're dead, no question about it.

Comment by FeepingCreature on Memory bandwidth constraints imply economies of scale in AI inference · 2023-09-17T18:14:09.992Z · LW · GW

Do you feel like your memory contains 2.5 petabytes of data? I'm not sure such a number passes the smell test.

Comment by FeepingCreature on "Is There Anything That's Worth More" · 2023-08-05T10:08:45.997Z · LW · GW

The more uncertain your timelines are, the more it's a bad idea to overstress. You should take it somewhat easy; it's usually more effective to be capable of moderate contribution over the long term than great contribution over the short term.

Comment by FeepingCreature on Contra Contra the Social Model of Disability · 2023-07-20T13:07:29.347Z · LW · GW

This smells like a framing debate. More importantly, if an article is defining a common word in an unconventional way, my first assumption will be that it's trying to argumentatively attack its own meaning while pretending it's defeating the original meaning. I'm not sure it matters how clearly you're defining your meaning; due to how human cognition works, this may be impossible to avoid without creating new terms.

In other words, I don't think it's that Scott missed the definitions as that he reflexively disregarded them as a rhetorical trick.

Comment by FeepingCreature on Blanchard's Dangerous Idea and the Plight of the Lucid Crossdreamer · 2023-07-20T05:28:48.692Z · LW · GW

As a subby "bi"/"gay" (het as f) AGP, I would also love to know this.

Also, I think there's some bias toward subbiness in the community? That's the stereotype anyway, though I don't have a cite. Anyway, being so, not finding a dommy/toppy AGP might not provide as much evidence as you'd expect.

Comment by FeepingCreature on Blanchard's Dangerous Idea and the Plight of the Lucid Crossdreamer · 2023-07-20T05:25:17.745Z · LW · GW

I don't think it's that anyone is proposing to "suppress" dysphoria or "emulate" Zach. Rather, for me, I'm noticing that Zach is putting into words and raising in public things that I've thought and felt secretly for a long time.

Comment by FeepingCreature on Blanchard's Dangerous Idea and the Plight of the Lucid Crossdreamer · 2023-07-19T05:56:22.978Z · LW · GW

If a gender identity is a belief about one’s own gender, then it’s not even clear that I have one in a substantial relevant sense, which is part of the point of my “Am I trans?” post. I think I would have said early on that I better matched male psychological stereotypes and it’s more complicated now (due to life experience?).

Right? I mean, what should I say, who identifies as male and wants to keep his male-typical psychological stereotypes? It seems to me what you're saying in this post fits more closely with the conservative stereotype as the trans movement as "something that creates transgender people." (Implied, in this case, "out of autogynephiles.") I mean, if we say some AGPs who cannot transition are so unhappy that they kill themselves, all the usual utilitarian logic still applies, it just puts the ontology in doubt. And also means that as someone who wants to - like, not inherently identify as male but keep the parts of himself that would be identified as male (aside from appearance), I should stay away from the trans movement at all costs?

Also doesn't it put the Categories in a kind of reverse dependency? We've defined "trans mtf" as "the category of people who are women despite having a male body" and "the category of people allowed to transition". And Scott said we should allow people to be in the category because it makes them happy. But if it makes (edit: some of) them happy because they are allowed to transition, then this model is bizarre; the "female identity" part just sort of hangs on there out of path dependence.

Comment by FeepingCreature on Blanchard's Dangerous Idea and the Plight of the Lucid Crossdreamer · 2023-07-18T17:32:39.076Z · LW · GW

As an AGP, my view is that ... like, that list of symptoms is pretty diverse but if I don't want to be a woman - not in the sense that I would be upset to be misgendered, though I would be, but more for political than genderical (?) reasons - I don't see why it would matter if I desire to have a (particular) female bodytype.

If I imagine "myself as a woman" (as opposed to "myself as myself with a cute female appearance"), and actually put any psychological traits on that rather than just gender as a free-floating tag, then it seems to me that my identity would be severely less recognizeable to myself if I changed it along that axis. Several parts about me that I value highly are stereotypically masculine traits - and isn't that what gender is? (Is it?) So I still think there's a very bright line here. Similarly, when you say

They might think of themselves as having a female gender identity because society will accept them more if they do.

I don't get this at all. This just seems to be at direct odds with the idea of an identity. Isn't it supposed to be a belief about yourself? "I want to be whatever you approve of the most" may be a preference, but it doesn't seem like it can be an identity by definition, or at least any definition I recognize.

Comment by FeepingCreature on Blanchard's Dangerous Idea and the Plight of the Lucid Crossdreamer · 2023-07-18T17:23:56.365Z · LW · GW

Granted! I'd say it's a matter of degrees, and of who exactly you need to convince.

Maybe there's no point in considering these separate modes of interaction at all.

Comment by FeepingCreature on Blanchard's Dangerous Idea and the Plight of the Lucid Crossdreamer · 2023-07-18T17:21:16.207Z · LW · GW

The relationship of a CEO to his subordinates, and the nature and form of his authority over them, are defined in rules and formal structures—which is true of a king but false of a hunter-gatherer band leader. The President, likewise.

Eh. This is true in extremis, but the everyday interaction that structures how decisions actually get made, can be very different. The formal structure primarily defines what sorts of interactions the state will enforce for you. But if you have to get the state to enforce interactions within your company, things have gone very far off track. The social praxis in everyday CEO life may genuinely be closer to a pack leader - particularly if they want their company to actually be effective.

Comment by FeepingCreature on A Hill of Validity in Defense of Meaning · 2023-07-17T12:40:20.601Z · LW · GW

I mean, men also have to put in effort to perform masculinity, or be seen as being inadequate men; I don't think this is a gendered thing. But even a man that isn't "performing masculinity adequately", an inadequate man, like an inadequate woman, is still a distinct category, and though transwomen, like born women, aim to perform femininity, transwomen have a higher distance to cross and in doing so traverse between clusters along several dimensions. I think we can meaningfully separate "perform effort to transition in adequacy" from "perform effort to transition in cluster", even if the goal is the same.

(From what I gather of VRChat, the ideal man is also a pretty girl that is overall human with the exception of cat ears and a tail...)

Comment by FeepingCreature on Douglas Hofstadter changes his mind on Deep Learning & AI risk (June 2023)? · 2023-07-05T20:17:06.920Z · LW · GW

I just mean like, if we see an object move we have a qualia of position but also of velocity/vector and maybe acceleration. So when we see for instance a sphere rolling down an incline, we may have a discrete conscious "frame" where the marble has a velocity of 0 but a positive acceleration, so despite the fact that the next frame is discontinuous with the last one looking only at position, we perceive them as one smooth sequence because the predicted end position of the motion in the first frame is continuous with the start point in the second.

Comment by FeepingCreature on Douglas Hofstadter changes his mind on Deep Learning & AI risk (June 2023)? · 2023-07-04T07:40:18.436Z · LW · GW

excepting the unlikely event that first token turns out to be extremely important.

Which is why asking an LLM to give an answer that starts with "Yes" or "No" and then gives an explanation is the worst possible way to do it.

Comment by FeepingCreature on Douglas Hofstadter changes his mind on Deep Learning & AI risk (June 2023)? · 2023-07-03T17:45:20.542Z · LW · GW

I'd speculate that our perceptions just seem to change smoothly because we encode second-order (or even third-order) dynamics in our tokens. From what I layman-understand of consciousness, I'd be surprised if it wasn't discrete.

Comment by FeepingCreature on Hooray for stepping out of the limelight · 2023-04-01T18:54:04.403Z · LW · GW

Yeah I never got the impression that they got a robust solution to fog of war, or any sort of theory of mind, which you absolutely need for Starcraft.

Comment by FeepingCreature on The Parable of the King and the Random Process · 2023-03-03T11:09:12.205Z · LW · GW

Shouldn't the king just make markets for "crop success if planted assuming three weeks" and "crop success if planted assuming ten years" and pick whichever is higher? Actually, shouldn't the king define some metric for kingdom well-being (death rate, for instance) and make betting markets for this metric under his possible roughly-primitive actions?

This fable just seems to suggest that you can draw wrong inferences from betting markets by naively aggregating. But this was never in doubt, and does not disprove that you can draw valuable inferences, even in the particular example problem.

Comment by FeepingCreature on [Link] A community alert about Ziz · 2023-02-27T08:12:35.924Z · LW · GW

This is just human decision theory modules doing human decision theory things. It's a way of saying "defend me or reject me; at any rate, declare your view." You say something that's at the extreme end of what you consider defensible in order to act as a Schelling point for defense: "even this is accepted for a member." In the face of comments that seem like they validate Ziz's view, if not her methods, this comment calls for an explicit rejection of not Ziz's views, but Ziz's mode of approach, by explicitly saying "I am what you hate, I am here, come at me."

A community that can accept "nazis" (in the vegan sense) cannot also accept "resistance fighters" (in the vegan sense). Either the "nazi" deserves to exist or he doesn't. But to test this dichotomy, somebody has to out themselves as a "nazi."

Comment by FeepingCreature on AI #1: Sydney and Bing · 2023-02-23T18:36:41.716Z · LW · GW

Right, but if you're an alien civilization trying to be evil, you probably spread forever; if you're trying to be nice, you also spread forever, but if you find a potentially life-bearing planet, you simulate it out (obviating the need for ancestor sims later). Or some such strategy. The point is there shouldn't ever be a border facing nothing.

Comment by FeepingCreature on AI #1: Sydney and Bing · 2023-02-22T07:42:41.031Z · LW · GW

Sure; though what I imagine is more "Human ASI destroys all human value and spreads until it hits defended borders of alien ASI that has also destroyed all alien value..."

(Though I don't think this is the case. The sun is still there, so I doubt alien ASI exists. The universe isn't that young.)

Comment by FeepingCreature on AI #1: Sydney and Bing · 2023-02-21T20:49:58.382Z · LW · GW

I believe this is a misunderstanding: ASI will wipe out all human value in the universe.

Comment by FeepingCreature on AGI in sight: our look at the game board · 2023-02-19T09:49:15.448Z · LW · GW

Maybe it'd be helpful to not list obstacles, but do list how long you expect them to add to the finish line. For instance, I think there are research hurdles to AGI, but only about three years' worth.

Comment by FeepingCreature on Basics of Rationalist Discourse · 2023-02-03T08:58:11.065Z · LW · GW

Disclaimer: I know Said Achmiz from another LW social context.

In my experience, the safe bet is that minds are more diverse than almost anyone expects.

A statement advanced in a discussion like "well, but nobody could seriously miss that X" is near-universally false.

(This is especially ironic cause of the "You don't exist" post you just wrote.)

Comment by FeepingCreature on You Don't Exist, Duncan · 2023-02-02T10:08:53.514Z · LW · GW

Not wanting to disagree or downplay, I just want to offer a different way to think about it.

When somebody says I don't exist - and this definitely happens - to me, it all depends on what they're trying to do with it. If they're saying "you don't exist, so I don't need to worry about harming you because the category of people who would be harmed is empty", then yeah I feel hurt and offended and have the urge to speak up, probably loudly. But if they're just saying when trying to analyze reality, like, "I don't think people like that exist, because my model doesn't allow for them", the first feeling I get is delight. I get to surprise you! You get to learn a new thing! Your model is gonna break and flex and fit new things into it!

Maybe I'm overly optimistic about people.

Comment by FeepingCreature on Some questions about free will compatibilism · 2023-01-25T08:44:28.472Z · LW · GW

I'll cheat and give you the ontological answer upfront: you're confusing the alternate worlds simulated in your decision algorithm with physically real worlds. And the practical answer: free will is a tool for predicting whether a person is amenable to persuasion.

Smith has a brain tumor such that he couldn’t have done otherwise

Smith either didn't simulate alternate worlds, didn't evaluate them correctly or the evaluation didn't impact his decisionmaking; there is no process flow through outcome simulation that led to his action. Instead of "I want X dead -> murder" it went "Tumor -> murder". Smith is unfree, despite both being physically determined.

Second, would a compatibilist think that a computer programmed with a chess-playing algorithm has free will or is responsible for its decisions?

Does the algorithm morally evaluate the outcomes of his moves? No. Hence it is not morally responsible. The algorithm does evaluate the outcomes of its moves for chess quality; hence it is responsible for its victory.

Is my dog in any sense “responsible” for peeing on the carpet?

Dogs can be trained to associate bad actions with guilt. There is a flow that leads from action prediction to moral judgment prediction; the dog is morally responsible. Animals that cannot do this are not.

Fourth, does it ever make sense to feel regret/​remorse/​guilt on a compatibilist view?

Sure. First of, note that our ontological restatement upfront completely removed the contradiction between free will and determinism, so the standard counterfactual arguments are back on the table. But also, I think the better approach is to think of these feelings as adaptations and social tools. "Does it make sense" = "is it coherent" + "is it useful". It is coherent in the "counterfactuals exist in the prediction of the agent" model; it is useful in the "push game theory players into cooperate/cooperate" sense.

Comment by FeepingCreature on [deleted post] 2023-01-09T16:49:11.370Z

Sounds like regret aversion?

edit: Hm, you're right that optionality is kind of an independent component.

Comment by FeepingCreature on Wolf Incident Postmortem · 2023-01-09T16:36:46.475Z · LW · GW

See also: Swiss cheese model

tl;dr: don't overanalyze the final cause of disaster; usually it was preceded by serial failure of prevention mechanisms, any one or all of which can be improved for risk reduction.

Comment by FeepingCreature on How to Convince my Son that Drugs are Bad · 2022-12-17T19:11:38.335Z · LW · GW

If your son can't tell the difference between the risk profiles of LSD and heroin, something has gone wrong in your drug education. Maybe it's the overly simplistic "drugs are bad" messaging? Maybe a "drugs have varying levels of risk in multiple dimensions" messaging would avoid embarrassing events like comparing LSD with coffee - because yes, coffee is a drug that affects the brain. Wouldn't be much use if it didn't. It even creates a tolerance, forcing increasingly higher doses. So it is in fact quite hard to draw a hard line between coffee, LSD, marijuana, amphetamines and heroin - except of course that the assorted risks span a difference of several orders of magnitude. Which is the discussion that I would focus on. If you focus on risk of bad outcomes, you can even apply heuristics like "degree this would improve one's life" vs "expected loss of life in case of addiction", ie. comparing upside and downside mathematically. The point here is not to get precise answers so much as getting a handle on the relative orders of magnitude.

To simplify: "How can I convince my son that drugs are bad?" => "Why should I believe that drugs are bad?" => "Why do I believe that drugs are bad? Are my reasons good? Are there better reasons? Are drugs bad? If so, to what degree?" If you can answer those questions rigorously to yourself, I believe you should also be able to answer them for him.

Comment by FeepingCreature on Take 10: Fine-tuning with RLHF is aesthetically unsatisfying. · 2022-12-13T11:32:02.766Z · LW · GW

So IIUC, would you expect RLHF to, for instance, destroy not just the model's ability to say racist slurs, but its ability to model that anybody may say racist slurs?

Do you think OpenAI's "As a language model trained by OpenAI" is trying to avoid this by making the model condition proper behavior on its assigned role?

Comment by FeepingCreature on Using GPT-Eliezer against ChatGPT Jailbreaking · 2022-12-07T04:52:32.174Z · LW · GW

Yes, this effectively forces the network to use backward reasoning. It's equivalent to saying "Please answer without thinking, then invent a justification."

The whole power of chains-of-thought comes from getting the network to reason before answering.

Comment by FeepingCreature on Why I think strong general AI is coming soon · 2022-10-02T13:54:18.023Z · LW · GW

When we get results that it is easy for you to be afraid of, it will be firmly too late for safety work.

Comment by FeepingCreature on A rough idea for solving ELK: An approach for training generalist agents like GATO to make plans and describe them to humans clearly and honestly. · 2022-09-08T18:26:40.129Z · LW · GW

How does this handle the situation where the AI, in some scenario, picks up the idea of "deception" and then, when it describes its behavior honestly by intending to mislead the observer into thinking that it is honest, due to noticing that it is probably inside a training scenario, then gets reinforcement trained on dishonest behaviors that present as honest, ie. deceptive honesty?

Comment by FeepingCreature on Confusions in My Model of AI Risk · 2022-07-15T12:58:58.561Z · LW · GW

Hm, difficult. I think the minimal required trait is the ability to learn patterns that map outputs to deferred reward inputs. So an organism that simply reacts to inputs directly would not be an optimizer, even if it has a (static) nervous system. A test may be if the organism can be made to persistedly change strategy by a change in reward, even in the immediate absence of the reward signal.

I think maybe you could say that ants are not anthill optimizers? Because the optimization mechanism doesn't operate at all on the scale of individual ants? Not sure if that holds up.

Comment by FeepingCreature on Confusions in My Model of AI Risk · 2022-07-07T10:13:09.146Z · LW · GW

I think a bacterium is not an optimizer. Rather, it is optimized by evolution. Animals start being optimizers by virtue of planning over internal representations of external states, which makes them mesaoptimizers of evolution.

If we follow this model, we may consider that optimization requires a map-territory distinction. in that view, DNA is the map of evolution, and the CNS is the map of the animal. If the analogy holds, I'd speculate that the weights are the map of reinforcement learning, and the context window is the map of the mesaoptimizer.

Comment by FeepingCreature on ProjectLawful.com: Eliezer's latest story, past 1M words · 2022-05-26T19:25:56.674Z · LW · GW

Most multiplayer games have some way to limit XP gain from encounters outside your difficulty, to avoid exactly this sort of cheesing. The worry is that it allows players to get through the content quicker, with (possibly paid) help from others, which presumably makes it less likely they'll stick around.

(Though of course an experienced player can still level vastly faster, since most players don't take combat anywhere near optimally to maximize xp gain.)

That said, Morrowind famously contains an actual intelligence explosion. So you tend to see this sort of stuff more often in singleplayer, I think. (Potion quality triggers off intelligence. Potions can raise intelligence.)

And of course the entire genre of speedrunning - see also, (TAS) Wildbow's Worm in 3:47:14.28(WR).

Comment by FeepingCreature on Reflections on My Own Missing Mood · 2022-04-21T17:23:36.228Z · LW · GW

Resources used in pressuring corporations are unlikely to have any effect which increases AI risk.

Devil's advocate: If this unevenly delays corporations sensitive to public concerns, and those are also corporations taking alignment at least somewhat seriously, we get a later but less safe takeoff. Though this goes for almost any intervention, including to some extent regulatory.

Comment by FeepingCreature on Reflections on My Own Missing Mood · 2022-04-21T17:06:46.544Z · LW · GW

I don’t understand why you would want to spend any effort proving that transformers could scale to AGI.

The point would be to try and create common knowledge that they can. Otherwise, for any "we decided to not do X", someone else will try doing X, and the problem remains.

Humanity is already taking a shotgun approach to unaligned AGI. Shotgunning safety is viable and important, but I think it's more urgent to prevent the first shotgun from hitting an artery. Demonstrating AGI viability in this analogy is shotgunning a pig in the town square, to prove to everyone that the guns we are building can in fact kill.

We want safety to have as many pellets in flight as possible. But we want unaligned AGI to have as few pellets in flight as possible. (Preferably none.)

Comment by FeepingCreature on Reflections on My Own Missing Mood · 2022-04-21T16:52:23.588Z · LW · GW

I'm actually optimistic about prosaic alignment for a takeoff driven by language models. But I don't know what the opportunity for action is there - I expect Deepmind to trigger the singularity, and they're famously opaque. Call it 15% chance of not-doom, action or no action. To be clear, I think action is possible, but I don't know who would do it or what form it would take. Convince OpenAI and race Deepmind to a working prototype? This is exactly the scenario we hoped to not be in...

edit: I think possibly step 1 is to prove that Transformers can scale to AGI. Find a list of remaining problems and knock them down - preferably in toy environments with weaker models. The difficulty is obviously demonstrating danger without instantiating it. Create a fire alarm, somehow. The hard part for judgment on this action is that it both helps and harms.

edit: Regulatory action may buy us a few years! I don't see how we can get it though.

Comment by FeepingCreature on Reflections on My Own Missing Mood · 2022-04-21T16:42:27.722Z · LW · GW

If these paths are viable, I desire to believe that they are viable.

If these paths are nonviable, I desire to believe that they are nonviable.

Does it do any good, to take well-meaning optimistic suggestions seriously, if they will in fact clearly not work? Obviously, if they will work, by all means we should discover that, because knowing which of those paths, if any, is the most likely to work is galactically important. But I don't think they've been dismissed just because people thought the optimists needed to be taken down a peg. Reality does not owe us a reason for optimism.

Generally when people are optimistic about one of those paths, it is not because they've given it deep thought and think that this is a viable approach, it is because they are not aware of the voluminous debate and reasons to believe that it will not work, at all. And inasmuch as they insist on that path in the face of these arguments, it is often because they are lacking in security mindset - they are "looking for ways that things could work out", without considering how plausible or actionable each step on that path would actually be. If that's the mode they're in, then I don't see how encouraging their optimism will help the problem.

Is the argument that any effort spent on any of those paths is worthwhile compared to thinking that nothing can be done?

edit: Of course, misplaced pessimism is just as disastrous. And on rereading, was that your argument? Sorry if I reacted to something you didn't say. If that's the take, I agree fully. If one of those approaches is in fact viable, misplaced pessimism is just as destructive. I just think that the crux there is whether or not it is, in fact, viable - and how to discover that.

Comment by FeepingCreature on Three questions about mesa-optimizers · 2022-04-12T03:59:27.242Z · LW · GW
  1. As I understand it, OpenAI argue that GPT-3 is a mesa-optimizer (though not in those terms) in the announcement paper Language Models are Few-Shot Learners. (Search for meta.) (edit: Might have been in another paper. I've seen this argued somewhere, but I might have the wrong link :( ) Paraphrased, the model has been shown so many examples of the form "here are some examples that create an implied class, is X an instance of the class? Yes/no", that instead of memorizing the answers to all the questions, it has acquired a general skill for abstracting at runtime (over its context window). So while you have gradient descent going trying to teach the network a series of classes, the network might actually pick up feature learning itself as a skill instead, and start doing its own learning algorithm over just the context window.