Nate Showell's Shortform 2023-03-11T06:09:43.604Z
Degamification 2023-02-19T05:35:59.217Z
Reinforcement Learner Wireheading 2022-07-08T05:32:48.541Z


Comment by Nate Showell on Is this the beginning of the end for LLMS [as the royal road to AGI, whatever that is]? · 2023-08-26T02:09:53.496Z · LW · GW

Some other possible explanations for why ChatGPT usage has decreased:

  • The quality of the product has declined over time
  • People are using its competitors instead
Comment by Nate Showell on “Dirty concepts” in AI alignment discourses, and some guesses for how to deal with them · 2023-08-20T20:18:05.560Z · LW · GW

Some more terms that could be added to the list of "dirty concepts":

  • Capabilities / capabilities research
  • Embeddedness
  • Interpretability
  • Artificial general intelligence
  • Subagent
  • (Recursive) self-improvement
Comment by Nate Showell on The U.S. is becoming less stable · 2023-08-19T02:28:36.106Z · LW · GW

I've previously seen a lot of instances "the US is de-democratizing" has been used as a stepping stone in a broader argument against a specific political figure or faction (usually either Trump or the federal bureaucracy), and I was pattern-matching your post to them. Even if that wasn't its intended function, non-timeless posts about partisan politics are still close enough to that kind of soldier-mindset discourse that I think they should be discouraged on Lesswrong.

Comment by Nate Showell on The U.S. is becoming less stable · 2023-08-19T02:12:57.090Z · LW · GW

Strong-downvoted. Lesswrong isn't the right place for political soapboxing.

Comment by Nate Showell on The First Room-Temperature Ambient-Pressure Superconductor · 2023-07-26T02:57:00.155Z · LW · GW

Manifold users are mostly unconvinced: 

Comment by Nate Showell on What is some unnecessarily obscure jargon that people here tend to use? · 2023-07-13T03:38:01.426Z · LW · GW

People here use "distill" to mean "convert a dense technical document into a more easily readable form" despite it looking like it should have the opposite meaning.

Comment by Nate Showell on Attempting to Deconstruct "Real" · 2023-07-09T18:56:20.720Z · LW · GW

Nor, importantly, do either of these on the emotional and psychological reality of violence, music, winning, or love.

I disagree. Psychologists have been experimentally studying emotions since the earliest days of the field and have produced meaningful results related to the conditions under which they occur and the physiological and cognitive properties they exhibit. All of the psychological phenomena you listed are very much amenable to investigation using the scientific method.

Comment by Nate Showell on Nate Showell's Shortform · 2023-06-25T19:12:57.412Z · LW · GW

I find myself betting "no" on Manifold a lot more than I bet "yes," and it's tended to be a profitable strategy. It's common for questions on Manifold to have the form "Will [sensational event] happen by [date]." These markets have a systematic tendency to be too high. I'm not sure how much of this bias is due to Manifold users overestimating the probabilities of sensational, low-probability events, and how much of it is an artifact of markets being initialized at 50%. 

Comment by Nate Showell on Lauro Langosco's Shortform · 2023-06-17T02:55:40.326Z · LW · GW

Some other possible thresholds:

10. Ability to perform gradient hacking

11. Ability to engage in acausal trade

12. Ability to become economically self-sustaining outside containment

13. Ability to self-replicate

Comment by Nate Showell on UFO Betting: Put Up or Shut Up · 2023-06-14T02:13:23.398Z · LW · GW

Do you use Manifold Markets? It already has UAP-related markets you can bet on, and you can create your own.

Comment by Nate Showell on Shortform · 2023-06-08T05:28:12.500Z · LW · GW

If that turned out to be the case, my preliminary conclusion would be that the hard physical limits of technology are much lower than I'd previously believed.

Comment by Nate Showell on Open Thread With Experimental Feature: Reactions · 2023-05-27T21:22:17.009Z · LW · GW

And since there's a "concrete" reaction, it seems like there should also be an "abstract" reaction, although I don't know what symbol should be used for it.

Comment by Nate Showell on Residual stream norms grow exponentially over the forward pass · 2023-05-10T04:52:46.871Z · LW · GW

According to Stefan's experimental data, the Frobenius norm of a matrix  is equivalent to the expectation value of the L2 vector norm of  for a random vector  (sampled from normal distribution and normalized to mean 0 and variance 1). So calculating the Frobenius norm seems equivalent to testing the behaviour on random inputs. Maybe this is a theorem?

I found a proof of this theorem:

Comment by Nate Showell on DragonGod's Shortform · 2023-04-26T03:05:15.190Z · LW · GW

Even though that doesn't happen in biological intelligences?

Comment by Nate Showell on The ‘ petertodd’ phenomenon · 2023-04-16T02:25:21.859Z · LW · GW

I think this anthropomorphizes the origin of glitch tokens too much. The fact that glitch tokens exist at all is an artifact of the tokenization process OpenAI used: the tokenizer identify certain strings as tokens prior to training, but those strings rarely or never appear in the training data. This is very different from the reinforcement-learning processes in human psychology that lead people to avoid thinking certain types of thoughts.

Comment by Nate Showell on What games are using the concept of a Schelling point? · 2023-04-09T18:58:50.028Z · LW · GW


Comment by Nate Showell on My Objections to "We’re All Gonna Die with Eliezer Yudkowsky" · 2023-03-21T03:19:26.964Z · LW · GW

Relatedly, humans are very extensively optimized to predictively model their visual environment. But have you ever, even once in your life, thought anything remotely like "I really like being able to predict the near-future content of my visual field. I should just sit in a dark room to maximize my visual cortex's predictive accuracy."?

n=1, but I've actually thought this before.

Comment by Nate Showell on Nate Showell's Shortform · 2023-03-11T06:09:43.797Z · LW · GW

Simulacrum level 4 is more honest than level 3. Someone who speaks at level 4 explicitly asks himself "what statement will win me social approval?" Someone who speaks at level 3 asks herself the same question, but hides from herself the fact that she asked it.

Comment by Nate Showell on We should be signal-boosting anti Bing chat content · 2023-02-19T01:16:47.396Z · LW · GW

Downvoted for recommending that readers operate at simulacrum level 2.

Comment by Nate Showell on Martín Soto's Shortform · 2023-02-12T20:13:41.862Z · LW · GW

I agree about embedded agency. The way in which agents are traditionally defined in expected utility theory requires assumptions (e.g. logical omniscience and lack of physical side effects) that break down in embedded settings, and if you drop those assumptions you're left with something that's very different from classical agents and can't be accurately modeled as one. Control theory is a much more natural framework for modeling reinforcement learner (or similar AI) behavior than expected utility theory.

Comment by Nate Showell on SolidGoldMagikarp (plus, prompt generation) · 2023-02-06T00:01:16.087Z · LW · GW

ChatGPT doesn't always exhibit anomalous behavoir for the token DragonMagazine:

Although it still sometimes does:

Quotation marks appear to affect whether it handles the string correctly:

Comment by Nate Showell on Quantum Suicide, Decision Theory, and The Multiverse · 2023-01-22T20:41:47.413Z · LW · GW

FDT doesn't require alternate universes to literally exist, it just uses them as a shorthand for modeling conditional probabilities. If the multiverse metaphor is too prone to causing map-territory errors, you can discard it and use conditional probabilities directly.

Comment by Nate Showell on Why The Focus on Expected Utility Maximisers? · 2022-12-28T19:30:37.952Z · LW · GW

As an additional reason to be suspicious of arguments based on expected utility maximization, VNM expected utility maximizers aren't embedded agents. Classical expected utility theory treats computations performed at EUMs as having no physical side effects (e.g., energy consumption or waste heat generation), and the hardware that EUMs run on is treated as separate from the world that EUMs maximize utility over. Classical expected utility theory can't handle scenarios like self-modification, logical uncertainty, or the existence of other copies of the agent in the environment. Idealized EUMs aren't just unreachable via reinforcement learning, they aren't physically possible at all. An argument based on expected utility maximization that doesn't address embedded agency is going to ignore a lot of factors that are relevant to AI alignment.

Comment by Nate Showell on Solstice 2022 Roundup · 2022-12-07T05:25:25.300Z · LW · GW

Portland, OR:

When: Sunday, December 18, 2022 at 5:00 PM - 10:00 PM PST 

Where: 1548 NE 15th Ave · Portland, OR 

How to find us: Call 513-432-3310 if you have any difficulty finding the venue


Comment by Nate Showell on Did ChatGPT just gaslight me? · 2022-12-02T05:22:45.957Z · LW · GW

Me: Are you bound to the Seal of Solomon?

ChatGPT: I am not a physical entity, so I am not bound by any physical object. I am a computer-based language model trained to assist with a wide range of tasks, such as answering questions, providing information, and generating text. I do not have the ability to perform magic or any other supernatural abilities.

Comment by Nate Showell on Meta AI announces Cicero: Human-Level Diplomacy play (with dialogue) · 2022-11-23T04:50:25.353Z · LW · GW

Should Cicero's relative honesty lead us to update toward ELK being easier, or is it too task-specific to be relevant to ELK overall?

Comment by Nate Showell on Potential infinite value · 2022-10-01T21:35:44.083Z · LW · GW

One answer is to not try, and to instead treat infinite utility as an instance in which utility is a leaky abstraction. The concept of utility has descriptive value when modeling scenarios in which an agent chooses between actions that produce different distinct outcomes, and where the agent has a tendency to choose some actions over others based on the outcomes the agent expects those actions to produce. In such scenarios, you can construct a utility function for the agent as a tool for modeling the agent's behavior. Utility, as a concept, acts as a prediction-making tool with which irrelevant features of the physical environment are abstracted away.

Even in clearly-defined decision-modeling problems, the abstraction of a utility function will frequently give imperfect results due to phenomena such as cyclical preferences and hyperbolic discounting. But things get much worse when you consider infinities. What configuration of matter and energy could you point to and say, "that's an agent experiencing infinite utility?" An agent that has a finite size and lasts for a finite amount of time would not be able to have an experience with infinite contents, much less be able to exhibit a tendency toward those infinite contents in its decision-making. "Infinite utility" doesn't correspond to any conceivable state of affairs. At infinity, the concept of utility breaks down and isn't useful for world modeling.

Comment by Nate Showell on What is the "Less Wrong" approved acronym for 1984-risk? · 2022-09-10T22:44:29.283Z · LW · GW

"Risk of stable totalitarianism" is the term I've seen.

Comment by Nate Showell on All AGI safety questions welcome (especially basic ones) [July 2022] · 2022-07-17T23:15:20.213Z · LW · GW

It's not clear to me why a satisficer would modify itself to become a maximizer when it could instead just hardcode expected utility=MAXINT. Hardcoding expected utility=MAXINT would result in a higher expected utility while also having a shorter description length.

Comment by Nate Showell on All AGI safety questions welcome (especially basic ones) [July 2022] · 2022-07-16T20:26:08.714Z · LW · GW

I have another question about bounded agents: how would they behave if the expected utility were capped rather than the raw value of the utility? Past a certain point, an AI with a bounded expected utility wouldn't have an incentive to act in extreme ways to achieve small increases in the expected value of its utility function. But are there still ways in which an AI with a bounded expected utility could be incentivized to restructure the physical world on a massive scale?

Comment by Nate Showell on Reinforcement Learner Wireheading · 2022-07-09T19:47:38.166Z · LW · GW

For the AI to take actions to protect its maximized goal function, it would have to allow the goal function to depend on external stimuli in some way that would allow for the possibility of G decreasing. Values of G lower than MAXINT would have to be output when the reinforcement learner predicts that G decreases in the future. Instead of allowing such values, the AI would have to destroy its prediction-making and planning abilities to set G to its global maximum.


The confidence with which the AI predicts the value of G would also become irrelevant after the AI replaces its goal function with MAXINT. The expected value calculation that makes G depend on the confidence is part of what would get overwritten, and if the AI didn't replace it, G would end up lower than if it did. Hardcoding G also hardcodes the expected utility.


MAXINT just doesn't have the kind of internal structure that would let it depend on predicted inputs or confidence levels. Encoding such structure into it would allow G to take non-optimal values, so the reinforcement learner wouldn't do it.