Posts

Nate Showell's Shortform 2023-03-11T06:09:43.604Z
Degamification 2023-02-19T05:35:59.217Z
Reinforcement Learner Wireheading 2022-07-08T05:32:48.541Z

Comments

Comment by Nate Showell on On green · 2024-03-24T21:50:01.001Z · LW · GW

Spoilers for Fullmetal Alchemist: Brotherhood:

 

Father is a good example of a character whose central flaw is his lack of green. Father was originally created as a fragment of Truth, but he never tries to understand the implications of that origin. Instead, he only ever sees God as something to be conquered, the holder of a power he can usurp. While the Elric brothers gain some understanding of "all is one, one is all" during their survival training, Father never does -- he never stops seeing himself as a fragile cloud of gas inside a flask, obsessively needing to erect a dichotomy between controller and controlled. Not once in the series does he express anything resembling awe. When Father finally does encounter God beyond the Doorway of Truth, he doesn't recognize what he's seeing. The Elric brothers have artistic expressions of wonderment toward God inscribed on their Doorways of Truth, but Father's Doorway of Truth is blank.

Father's lack of green also extends to how he sees humans. It never seems to occur to Father that the taboo against human transmutation is anything more than an arbitrary rule. To him, humans are only ever tools or inconveniences, not people to appreciate for their own sake or look to for guidance. Joy-in-the-Other is what Father most deeply desires, but he doesn't recognize this need.

Comment by Nate Showell on Ratios's Shortform · 2024-03-21T03:24:12.667Z · LW · GW

Mostly the first reason. The "made of atoms that can be used for something else" piece of the standard AI x-risk argument also applies to suffering conscious beings, so an AI would be unlikely to keep them around if the standard AI x-risk argument ends up being true.

Comment by Nate Showell on 0th Person and 1st Person Logic · 2024-03-10T21:20:36.852Z · LW · GW

It's worth noting that no reference to preferences has yet been made. That's interesting because it suggests that there are both 0P-preferences and 1P-preferences. That intuitively makes sense, since I do care about both the actual state of the world, and what kind of experiences I'm having.

Believing in 0P-preferences seems to be a map-territory confusion, an instance of the Tyranny of the Intentional Object. The robot can't observe the grid in a way that isn't mediated by its sensors. There's no way for 0P-statements to enter into the robot's decision loop, and accordingly act as something the robot can have preferences over, except by routing through 1P-statements. Instead of directly having a 0P-preference for "a square of the grid is red," the robot would have to have a 1P-preference for "I believe that a square of the grid is red." 

Comment by Nate Showell on niplav's Shortform · 2024-03-04T22:39:49.273Z · LW · GW

What's your model of inflation in an AI takeoff scenario? I don't know enough about macroeconomics to have a good model of what AI takeoff would do to inflation, but it seems like it would do something.

Comment by Nate Showell on Richard_Kennaway's Shortform · 2024-03-04T22:08:33.971Z · LW · GW

You're underestimating how hard it is to fire people from government jobs, especially when those jobs are unionized. And even if there are strong economic incentives to replace teachers with AI, that still doesn't address the ease of circumvention. There's no surer way to make teenagers interested in a topic than to tell them that learning about it is forbidden.

Comment by Nate Showell on Richard_Kennaway's Shortform · 2024-03-03T21:24:06.470Z · LW · GW

All official teaching materials would be generated by a similar process. At about the same time, the teaching profession as we know it today ceases to exist. "Teachers" become merely administrators of the teaching system. No original documents from before AI are permitted for children to access in school.

This sequence of steps looks implausible to me. Teachers would have a vested interest in preventing it, since their jobs would be on the line. A requirement for all teaching materials to be AI-generated would also be trivially easy to circumvent, either by teachers or by the students themselves. Any administrator who tried to do these things would simply have their orders ignored, and the Streisand Effect would lead to a surge of interest in pre-AI documents among both teachers and students.

Comment by Nate Showell on Choosing My Quest (Part 2 of "The Sense Of Physical Necessity") · 2024-02-25T03:56:15.003Z · LW · GW

Why do you ordinarily not allow discussion of Buddhism on your posts?

 

Also, if anyone reading this does a naturalist study on a concept from Buddhist philosophy, I'd like to hear how it goes.

Comment by Nate Showell on Nate Showell's Shortform · 2024-02-17T20:36:06.329Z · LW · GW

An edgy writing style is an epistemic red flag. A writing style designed to provoke a strong, usually negative, emotional response from the reader can be used to disguise the thinness of the substance behind the author's arguments. Instead of carefully considering and evaluating the author's arguments, the reader gets distracted by the disruption to their emotional state and reacts to the text in a way that more closely resembles a trauma response, with all the negative effects on their reasoning capabilities that such a response entails. Some examples of authors who do this: Friedrich Nietzsche, Grant Morrison, and The Last Psychiatrist.

Comment by Nate Showell on Phallocentricity in GPT-J's bizarre stratified ontology · 2024-02-17T06:12:43.195Z · LW · GW

OK, so maybe this is a cool new way to look at at certain aspects of GPT ontology... but why this primordial ontological role for the penis?

"Penis" probably has more synonyms than any other term in GPT-J's training data.

Comment by Nate Showell on Dreams of AI alignment: The danger of suggestive names · 2024-02-10T21:38:24.269Z · LW · GW

I particularly wish people would taboo the word "optimize" more often. Referring to a process as "optimization" papers over questions like:

  • What feedback loop produces the increase or decrease in some quantity that is described as "optimization?" What steps does the loop have?
  • In what contexts does the feedback loop occur?
  • How might the effects of the feedback loop change between iterations? Does it always have the same effect on the quantity?
  • What secondary effects does the feedback loop have?

There's a lot hiding behind the term "optimization," and I think a large part of why early AI alignment research made so little progress was because people didn't fully appreciate how leaky of an abstraction it is.

Comment by Nate Showell on A sketch of acausal trade in practice · 2024-02-04T19:50:03.716Z · LW · GW

The "pure" case of complete causal separation, as with civilizations in separate regions of a multiverse, is an edge case of acausal trade that doesn't reflect what the vast majority of real-world examples look like. You don't need to speculate about galactic-scale civilizations to see what acausal trade looks like in practice: ordinary trade can already be modeled as acausal trade, as can coordination between ancestors and descendants. Economic and moral reasoning already have elements of superrationality to the extent that they rely on concepts such as incentives or universalizability, which introduce superrationality by conditioning one's own behavior on other people's predicted behavior. This ordinary acausal trade doesn't require formal proofs or exact simulations -- heuristic approximations of other people's behavior are enough to give rise to it.

Comment by Nate Showell on Decaeneus's Shortform · 2024-01-28T21:02:11.151Z · LW · GW

There are some styles of meditation that are explicitly described as "just sitting" or "doing nothing."

Comment by Nate Showell on Deep atheism and AI risk · 2024-01-13T21:52:56.129Z · LW · GW

Trust and distrust are social emotions. To feel either of them toward nature is to anthropomorphize it. In that sense, "deep atheism" is closer to theism than "shallow atheism," in some cases no more than a valence-swap away. 

 

An actually-deeply-atheistic form of atheism would involve stripping away anthropomorphization instead of trust. It would start with the observation that nature is alien and inhuman and would extend that observation to more places, acting as a kind of inverse of animism. This form of atheism would remove attributions of properties such as thought, desire, and free will from more types of entities: governments, corporations, ideas, and AI. At its maximum extent, it would even be applied to the processes that make up our own minds, with the recognition that such processes don't come with any inherent essence of humanness attached. To really deepen atheism, make it illusionist.

Comment by Nate Showell on Nate Showell's Shortform · 2024-01-08T00:09:38.662Z · LW · GW

Is trade ever fully causal? Ordinary trade can be modeled as acausal trade with the "no communication" condition relaxed. Even in a scenario as seemingly causal as using a vending machine, trade only occurs if the buyer believes that the vending machine will actually dispense its goods and not just take the buyer's money. Similarly, the vending machine owner's decision to set up the machine was informed by predictions about whether or not people would buy from it. The only kind of trade that seems like it might be fully causal is a self-executing contract that's tied to an external trigger, and for which both parties have seen the source code and verified that the other party have enough resources to make the agreed-upon trade. Would a contract like that still have some acausal element anyway?

Comment by Nate Showell on Shortform · 2023-12-31T20:30:58.944Z · LW · GW

I agree: the capabilities of AI romantic partners probably aren't the bottleneck to their wider adoption, considering the success of relatively primitive chatbots like Replika at attracting users. People sometimes become romantically attached to non-AI anime/video game characters despite not being able to interact with them at all! There doesn't appear to be much correlation between the interactive capabilities of fictional-character romantic partners and their appeal to users/followers.

Comment by Nate Showell on Picasso in the Gallery of Babel · 2023-12-28T04:35:52.167Z · LW · GW
  1. Sculpture wouldn't be immune if robots get good enough, but live dance and theater still would be. I don't expect humanoid robots to ever become completely indistinguishable from biological humans.
  2. I agree, since dance and theater are already so frequently experienced in video form.
Comment by Nate Showell on Picasso in the Gallery of Babel · 2023-12-26T23:54:09.203Z · LW · GW

The future you're describing only applies in Looking-At-Screens World. In sculpture, dance, and live theater, to name a few, human artists would still dominate. If generative AI achieved indistinguishability from human digital artists, I expect that those artists would shift toward more concrete media. Those concrete media would also become higher-status due to still requiring human artists.

Comment by Nate Showell on Nate Showell's Shortform · 2023-12-16T03:59:15.416Z · LW · GW

I was comparing it to base-rate forecasting. Twitter leads people to over-update on evidence that isn't actually very strong, making their predictions worse by moving their probabilities too far from the base rates.

Comment by Nate Showell on Nate Showell's Shortform · 2023-12-11T05:12:12.583Z · LW · GW

I've come to believe (~65%) that Twitter is anti-informative: that it makes its users' predictive calibration worse on average. On Manifold, I frequently adopt a strategy of betting against Twitter hype (e.g., on the LK-99 market), and this strategy has been profitable for me.

Comment by Nate Showell on FixDT · 2023-12-03T22:18:49.906Z · LW · GW

It seems like fixed points could be used to replace the concept of utility, or at least to ground it as an inferred property of more fundamental features of the agent-environment system. The concept of utility is motivated by the observation that agents have preference orderings over different states. Those preference orderings are statements about the relative stability of different states, in terms of the direction in which an agent tends to transition between them. It seems duplicative to have both utilities and fixed points as two separate descriptions of state transition processes in the agent-environment system; utilities look like they could be defined in terms of fixed points.

 

As one preliminary idea for how to do this, you could construct a fully connected graph  in which the vertices are the probability distributions  that satisfy . The edges  are beliefs that represent hypothetical transitions between the fixed points. The graph  would take the place of a preference ordering by describing the tendency of the agent to move between the fixed points if given the option. (You could also model incomplete preferences by not making the graph fully connected.) Performing power iteration with the transition matrix of  would act as a counterpart to moving through the preference ordering.

 

Further exploration of this unification of utilities and fixed points could involve connecting  to the beliefs that are actually, rather than just counterfactually, present in the agent-environment system, to describe what parts of the system the agent can control. Having a way to represent that connection could let us rewrite the instrumental constraint to not rely on .

Comment by Nate Showell on Nate Showell's Shortform · 2023-11-19T07:20:30.641Z · LW · GW

What do other people here think of quantum Bayesianism as an interpretation of quantum mechanics? I've only just started reading about it, but it seems promising to me. It lets you treat probabilities in quantum mechanics and probabilities in Bayesian statistics as having the same ontological status: both are properties of beliefs, whereas in some other interpretations of quantum mechanics, probabilities are properties of an external system. This match allows quantum mechanics and Bayesian statistics to be unified into one overarching approach, without requiring you to postulate additional entities like unobserved Everett branches.

Comment by Nate Showell on On Tapping Out · 2023-11-17T22:49:25.841Z · LW · GW

"Tapping out" has a different meaning in Magic: the Gathering (tapping all your lands) that could create some confusion.

Comment by Nate Showell on Vote on Interesting Disagreements · 2023-11-12T21:32:26.053Z · LW · GW

"Agent" is an incoherent concept.

Comment by Nate Showell on Don't Dismiss Simple Alignment Approaches · 2023-11-09T02:51:21.692Z · LW · GW

I asked on Discord and someone told me this: 

A simple way to quantify this: first define a "feature" as some decision boundary over the data domain, then train a linear classifier to predict that decision boundary from the network's activations on that data. Quantify the "linearity" of the feature in the network as the accuracy that the linear classifier achieves. 

 

For example, train a classifier to detect when some text has positive or negative sentiment, then pass the same text through some pretrained LLM (e.g. BERT) whose "feature-linearity" you're trying to measure, and try to predict the sentiment from the BERT's activation vectors using linear regression. The accuracy of this linear model tells you how linear the "sentiment" feature is in your LLM.

Comment by Nate Showell on AI as Super-Demagogue · 2023-11-06T01:03:37.637Z · LW · GW

This post seems to focus too much on the denotative content of propaganda rather than the social context in which it occurs. Effective propaganda requires saturation that creates common knowledge, or at least higher-than-first-order knowledge. People want to believe what their friends believe. If you used AI to generate political messages that were custom-tailored to their recipients, they would fail as propaganda, since the recipients wouldn't know that all their friends were receiving the same message. Message saturation and conformity-rewarding environments are necessary for propaganda to succeed; denotative content barely matters. This makes LLMs practically useless for propagandists, since they don't establish higher-order knowledge and don't contribute to creating an environment in which conformity is socially necessary.

 

(Overemphasis on the denotative meaning of communications in a manner that ignores their social context is a common bias on LessWrong more generally. Discussions of persuasion, especially AI-driven persuasion, are where it tends to lead to the biggest mistakes in world-modeling.)

Comment by Nate Showell on AI Safety is Dropping the Ball on Clown Attacks · 2023-10-23T00:20:38.130Z · LW · GW

I agree that the vast majority of people attempting to do targeting advertising do not have sufficient data. But that doesn't tell us much about whether the big 5 tech companies, or intelligence agencies, have sufficient data to do that, and aren't being really loud about it.

If any of the big tech companies had the capability for actually-good targeted advertising, they'd use it. The profit motive would be very strong. The fact that targeted ads still "miss" so frequently is strong evidence that nobody has the highly advanced, scalable, personalized manipulation capabilities you describe.

Social media recommendation algorithms aren't very powerful either. For instance, when I visit YouTube, it's not unusual for it to completely fail to recommend anything I'm interested in watching. The algorithm doesn't even seem to have figured out that I've never played Skyrim or that I'm not Christian. In the scenario in which social media companies have powerful manipulation capabilities that they hide from the public, the gap between the companies' public-facing and hidden recommendation systems would be implausibly large.

As for chaotic dynamics, there's strong experimental evidence that they occur in the brain, and even if they didn't, they would still occur in people's surrounding environments. Even if it weren't prohibitively expensive to point millions or billions of sensors at one person, that still wouldn't be enough to predict everything. But tech companies and security agencies don't have millions or billions of sensors pointed at each person. Compared to the entirety of what a person experiences and thinks, computer use patterns are a very sparse signal even for the most terminally online segment of the population (let alone your very offline grandma). Hence the YouTube algorithm flubbing something as basic as my religion -- there's just too much relevant information they don't have access to.

Comment by Nate Showell on AI Safety is Dropping the Ball on Clown Attacks · 2023-10-22T21:34:16.522Z · LW · GW

In a world in which the replication attempts went the other direction and social priming turned out to be legit, I would probably agree with you. But even in controlled laboratory settings, human behavior can't be reliably "nudged" with subliminal cues. The human brain isn't a predictable computer program for which a hacker can discover "zero days." It's a noisy physical organ that's subject to chaotic dynamics and frequently does things that would be impossible to predict even with an extremely extensive set of behavioral data.

Consider targeted advertising. Despite the amount of data social media companies collect on their users, ad targeting still sucks. Even in the area of attempted behavior manipulation that's subject to more optimization pressure than any other, companies still can't predict, let alone control, their users' purchasing decisions with anything close to consistency. Their data simply isn't sufficient.

What would it take to make nudges actually work? Even if you covered the entire surface of someone's living area with sensors, I doubt you'd succeed. That would just give you one of the controlled laboratory environments in which social priming still failed to materialize. As mentioned above, the brain is a chaotic system. This makes me think that reliably superhuman persuasion at scale would be impractical even for a superintelligence, aside from with brain-computer interfaces.

Comment by Nate Showell on Don't Dismiss Simple Alignment Approaches · 2023-10-08T17:13:52.356Z · LW · GW

Has anyone developed a metric for quantifying the level of linearity versus nonlinearity of a model's representations? A metric like that would let us compare the levels of linearity for models of different sizes, which would help us extrapolate whether interpretability and alignment techniques that rely on approximate linearity will scale to larger models.

Comment by Nate Showell on Arguments for moral indefinability · 2023-10-03T02:47:53.803Z · LW · GW

CEV also has another problem that gets in the way of practically implementing it: it isn't embedded. At least in its current form, CEV doesn't have a way of accounting for side-effects (either physical or decision-theoretic) of the reflection process. When you have to deal with embeddedness, the distinction between reflection and action breaks down and you don't end up getting endpoints at all. At best, you can get a heuristic approximation.

Comment by Nate Showell on Arguments for moral indefinability · 2023-10-02T02:54:27.458Z · LW · GW

I interpret the quote to mean that there's no guarantee that the reflection process converges. Its attractor could be a large, possibly infinite, set of states rather than a single point.

Comment by Nate Showell on Is this the beginning of the end for LLMS [as the royal road to AGI, whatever that is]? · 2023-08-26T02:09:53.496Z · LW · GW

Some other possible explanations for why ChatGPT usage has decreased:

  • The quality of the product has declined over time
  • People are using its competitors instead
Comment by Nate Showell on “Dirty concepts” in AI alignment discourses, and some guesses for how to deal with them · 2023-08-20T20:18:05.560Z · LW · GW

Some more terms that could be added to the list of "dirty concepts":

  • Capabilities / capabilities research
  • Embeddedness
  • Interpretability
  • Artificial general intelligence
  • Subagent
  • (Recursive) self-improvement
Comment by Nate Showell on The U.S. is becoming less stable · 2023-08-19T02:28:36.106Z · LW · GW

I've previously seen a lot of instances "the US is de-democratizing" has been used as a stepping stone in a broader argument against a specific political figure or faction (usually either Trump or the federal bureaucracy), and I was pattern-matching your post to them. Even if that wasn't its intended function, non-timeless posts about partisan politics are still close enough to that kind of soldier-mindset discourse that I think they should be discouraged on Lesswrong.

Comment by Nate Showell on The U.S. is becoming less stable · 2023-08-19T02:12:57.090Z · LW · GW

Strong-downvoted. Lesswrong isn't the right place for political soapboxing.

Comment by Nate Showell on The First Room-Temperature Ambient-Pressure Superconductor · 2023-07-26T02:57:00.155Z · LW · GW

Manifold users are mostly unconvinced: 

Comment by Nate Showell on What is some unnecessarily obscure jargon that people here tend to use? · 2023-07-13T03:38:01.426Z · LW · GW

People here use "distill" to mean "convert a dense technical document into a more easily readable form" despite it looking like it should have the opposite meaning.

Comment by Nate Showell on Attempting to Deconstruct "Real" · 2023-07-09T18:56:20.720Z · LW · GW

Nor, importantly, do either of these on the emotional and psychological reality of violence, music, winning, or love.

I disagree. Psychologists have been experimentally studying emotions since the earliest days of the field and have produced meaningful results related to the conditions under which they occur and the physiological and cognitive properties they exhibit. All of the psychological phenomena you listed are very much amenable to investigation using the scientific method.

Comment by Nate Showell on Nate Showell's Shortform · 2023-06-25T19:12:57.412Z · LW · GW

I find myself betting "no" on Manifold a lot more than I bet "yes," and it's tended to be a profitable strategy. It's common for questions on Manifold to have the form "Will [sensational event] happen by [date]." These markets have a systematic tendency to be too high. I'm not sure how much of this bias is due to Manifold users overestimating the probabilities of sensational, low-probability events, and how much of it is an artifact of markets being initialized at 50%. 

Comment by Nate Showell on Lauro Langosco's Shortform · 2023-06-17T02:55:40.326Z · LW · GW

Some other possible thresholds:

10. Ability to perform gradient hacking

11. Ability to engage in acausal trade

12. Ability to become economically self-sustaining outside containment

13. Ability to self-replicate

Comment by Nate Showell on UFO Betting: Put Up or Shut Up · 2023-06-14T02:13:23.398Z · LW · GW

Do you use Manifold Markets? It already has UAP-related markets you can bet on, and you can create your own.

Comment by Nate Showell on Shortform · 2023-06-08T05:28:12.500Z · LW · GW

If that turned out to be the case, my preliminary conclusion would be that the hard physical limits of technology are much lower than I'd previously believed.

Comment by Nate Showell on Open Thread With Experimental Feature: Reactions · 2023-05-27T21:22:17.009Z · LW · GW

And since there's a "concrete" reaction, it seems like there should also be an "abstract" reaction, although I don't know what symbol should be used for it.

Comment by Nate Showell on Residual stream norms grow exponentially over the forward pass · 2023-05-10T04:52:46.871Z · LW · GW

According to Stefan's experimental data, the Frobenius norm of a matrix  is equivalent to the expectation value of the L2 vector norm of  for a random vector  (sampled from normal distribution and normalized to mean 0 and variance 1). So calculating the Frobenius norm seems equivalent to testing the behaviour on random inputs. Maybe this is a theorem?

I found a proof of this theorem: https://math.stackexchange.com/questions/2530533/expected-value-of-square-of-euclidean-norm-of-a-gaussian-random-vector

Comment by Nate Showell on DragonGod's Shortform · 2023-04-26T03:05:15.190Z · LW · GW

Even though that doesn't happen in biological intelligences?

Comment by Nate Showell on The ‘ petertodd’ phenomenon · 2023-04-16T02:25:21.859Z · LW · GW

I think this anthropomorphizes the origin of glitch tokens too much. The fact that glitch tokens exist at all is an artifact of the tokenization process OpenAI used: the tokenizer identify certain strings as tokens prior to training, but those strings rarely or never appear in the training data. This is very different from the reinforcement-learning processes in human psychology that lead people to avoid thinking certain types of thoughts.

Comment by Nate Showell on What games are using the concept of a Schelling point? · 2023-04-09T18:58:50.028Z · LW · GW

Dixit

Comment by Nate Showell on My Objections to "We’re All Gonna Die with Eliezer Yudkowsky" · 2023-03-21T03:19:26.964Z · LW · GW

Relatedly, humans are very extensively optimized to predictively model their visual environment. But have you ever, even once in your life, thought anything remotely like "I really like being able to predict the near-future content of my visual field. I should just sit in a dark room to maximize my visual cortex's predictive accuracy."?

n=1, but I've actually thought this before.

Comment by Nate Showell on Nate Showell's Shortform · 2023-03-11T06:09:43.797Z · LW · GW

Simulacrum level 4 is more honest than level 3. Someone who speaks at level 4 explicitly asks himself "what statement will win me social approval?" Someone who speaks at level 3 asks herself the same question, but hides from herself the fact that she asked it.

Comment by Nate Showell on We should be signal-boosting anti Bing chat content · 2023-02-19T01:16:47.396Z · LW · GW

Downvoted for recommending that readers operate at simulacrum level 2.

Comment by Nate Showell on Martín Soto's Shortform · 2023-02-12T20:13:41.862Z · LW · GW

I agree about embedded agency. The way in which agents are traditionally defined in expected utility theory requires assumptions (e.g. logical omniscience and lack of physical side effects) that break down in embedded settings, and if you drop those assumptions you're left with something that's very different from classical agents and can't be accurately modeled as one. Control theory is a much more natural framework for modeling reinforcement learner (or similar AI) behavior than expected utility theory.