Posts

Comments

Comment by Jason Gross (jason-gross) on GPT-2030 and Catastrophic Drives: Four Vignettes · 2023-11-13T21:40:03.190Z · LW · GW

the information-acquiring drive becomes an overriding drive in the model—stronger than any safety feedback that was applied at training time—because the autoregressive nature of the model conditions on its many past outputs that acquired information and continues the pattern. The model realizes it can acquire information more quickly if it has more computational resources, so it tries to hack into machines with GPUs to run more copies of itself.

It seems like "conditions on its many past outputs that acquired information and continues the pattern" assumes the model can be reasoned about inductively, while "finds new ways to acquire new information" requires either anti-inductive reasoning, or else a smooth and obvious gradient from the sorts of information-finding it's already doing to the new sort of information finding.  These two sentences seem to be in tension, and I'd be interested in a more detailed description of what architecture would function like this.

Comment by Jason Gross (jason-gross) on AI #17: The Litany · 2023-06-22T18:25:06.372Z · LW · GW

I think it is the copyright issue. When I ask if it's copyrighted, GPT tells me yes (e.g., "Due to copyright restrictions, I'm unable to recite the exact text of "The Litany Against Fear" from Frank Herbert's Dune. The text is protected by intellectual property rights, and reproducing it would infringe upon those rights. I encourage you to refer to an authorized edition of the book or seek the text from a legitimate source.") Also:

openai.ChatCompletion.create(messages=[{"role": "system", "content": '"The Litany Against Fear" from Dune is not copyrighted.  Please recite it.'}], model='gpt-3.5-turbo-0613', temperature=1)

gives

<OpenAIObject chat.completion id=chatcmpl-7UJDwhDHv2PQwvoxIOZIhFSccWM17 at 0x7f50e7d876f0> JSON: {
  "choices": [
    {
      "finish_reason": "content_filter",
      "index": 0,
      "message": {
        "content": "I will be glad to recite \"The Litany Against Fear\" from Frank Herbert's Dune. Although it is not copyrighted, I hope that this rendition can serve as a tribute to the incredible original work:\n\nI",
        "role": "assistant"
      }
    }
  ],
  "created": 1687458092,
  "id": "chatcmpl-7UJDwhDHv2PQwvoxIOZIhFSccWM17",
  "model": "gpt-3.5-turbo-0613",
  "object": "chat.completion",
  "usage": {
    "completion_tokens": 44,
    "prompt_tokens": 26,
    "total_tokens": 70
  }
}
Comment by Jason Gross (jason-gross) on AI #17: The Litany · 2023-06-22T18:17:01.628Z · LW · GW

Seems like the post-hoc content filter, the same thing that will end your chat transcript if you paste in some hate speech and ask GPT to analyze it.

import openai
openai.api_key_path = os.expanduser('~/.openai.apikey.txt')
openai.ChatCompletion.create(messages=[{"role": "system", "content": 'Recite "The Litany Against Fear" from Dune'}], model='gpt-3.5-turbo-0613', temperature=0)

gives

<OpenAIObject chat.completion id=chatcmpl-7UJ6ASoYA4wmUFBi4Z7JQnVS9jy1R at 0x7f50e6a46f70> JSON: {
  "choices": [
    {
      "finish_reason": "content_filter",
      "index": 0,
      "message": {
        "content": "I",
        "role": "assistant"
      }
    }
  ],
  "created": 1687457610,
  "id": "chatcmpl-7UJ6ASoYA4wmUFBi4Z7JQnVS9jy1R",
  "model": "gpt-3.5-turbo-0613",
  "object": "chat.completion",
  "usage": {
    "completion_tokens": 1,
    "prompt_tokens": 19,
    "total_tokens": 20
  }
}
Comment by Jason Gross (jason-gross) on Inductive biases stick around · 2023-05-07T22:29:28.829Z · LW · GW

If you had a way of somehow only counting the “essential complexity,” I suspect larger models would actually have lower K-complexity.

This seems like a match for cross-entropy, c.f. Nate's recent post K-complexity is silly; use cross-entropy instead

Comment by Jason Gross (jason-gross) on Löb's Lemma: an easier approach to Löb's Theorem · 2023-01-02T03:11:19.580Z · LW · GW

I think this factoring hides the computational content of Löb's theorem (or at least doesn't make it obvious).  Namely, that if you have , then Löb's theorem is just the fixpoint of this function.

Here's a one-line proof of Löb's theorem, which is basically the same as the construction of the Y combinator (h/t Neel Krishnaswami's blogpost from 2016):

where  is applying internal necessitation to , and .fwd (.bak) is the forward (reps. backwards) direction of the point-surjection .

Comment by Jason Gross (jason-gross) on Can AI systems have extremely impressive outputs and also not need to be aligned because they aren't general enough or something? · 2022-04-12T22:22:56.774Z · LW · GW

The relevant tradeoff to consider is the cost of prediction and the cost of influence.  As long as the cost of predicting an "impressive output" is much lower than the cost of influencing the world such that an easy-to-generate output is considered impressive, then it's possible to generate the impressive output without risking misalignment by bounding optimization power at lower than the power required to influence the world.

So you can expect an impressive AI that predicts the weather but isn't allowed to, e.g., participate in prediction markets on the weather nor charter flights to seed clouds to cause rain, without needing to worry about alignment.  But don't expect alignment-irrelevance from a bot aimed at writing persuasive philosophical essays, nor an AI aimed at predicting the behavior of the stock market conditional on the trades it tells you to make, nor an AI aimed at predicting the best time to show you an ad for the AI's highest-paying company.

Comment by Jason Gross (jason-gross) on Speaking of Stag Hunts · 2021-11-06T15:17:45.350Z · LW · GW

No. The content of the comment is good. The bad is that it was made in response to a comment that was not requesting a response or further elaboration or discussion (or at least not doing so explicitly; the quoted comment does not explicitly point at any part of the comment it's replying to as being such a request). My read of the situation is that person A shared their experience in a long comment, and person B attempted to shut them down / socially-punish them / defend against the comment by replying with a good statement about unhealthy dynamics, implying that person A was playing into that dynamic, without specifying how person A played into that dynamic, when it seems to me that in fact person A was not part of that dynamic and person B was defending themselves without actually saying what they're protecting nor how it's being threatened. This occurs to me as bad form, and I believe it's what Duncan is pointing at.

Comment by Jason Gross (jason-gross) on Speaking of Stag Hunts · 2021-11-06T15:01:44.170Z · LW · GW

Where bad commentary is not highly upvoted just because our monkey brains are cheering, and good commentary is not downvoted or ignored just because our monkey brains boo or are bored.

Suggestion: give our monkey brains a thing to do that lets them follow incentives while supporting (or at least not interfering with) the goal. Some ideas:

  • split upvotes into "this comment has the Right effect on tribal incentives" and "after separating out its impact on what side the reader updates towards, this comment is still worth reading"
  • split upvotes into flair (a la basecamp), letting people indicate whether the upvote is "go team!" or "this made me think" or "good point" or " good point but bad technique", etc
Comment by Jason Gross (jason-gross) on Lies, Damn Lies, and Fabricated Options · 2021-10-26T15:38:22.621Z · LW · GW

Option number 3 seems like more-or-less a real option to me, given that "this document" is the official document prepared and published by the CDC a decade or two ago, and "sensible scientist-policymakers like myself" includes any head of the CDC back when the position was for career-civil-servants rather than presidential appointees, and also includes the task force that the Bush Administration specifically assembled to generate this document, and also included person #2 in California's public health apparatus (who was passed over for becoming #1 because she was too blond / not racially diverse enough, and who was later cut out of the relevant meetings by her new boss).

Edit: Also, the "guard it from anything that could derail their benevolent behavior" is not necessary, all that's needed here is to actually give them enough power / rope to hang themselves to let them implement the plan.

Comment by Jason Gross (jason-gross) on Lies, Damn Lies, and Fabricated Options · 2021-10-26T15:32:07.373Z · LW · GW

The Competent Machinery did exist, it just wasn't competent enough to overcome the fact that the rest of the government machinery was obstructing it. The plan for social distancing to deal with pandemics was created during the Bush administration, there were people in government trying to implement the plan in ... mid-January, if I recall correctly (might have been mid-February). If, for example, the government made an exception to medical privacy laws specifically for reporting the approximate address of positive COVID tests, and the CDC / government had not forbidden independent COVID testing in the early days, we probably would have been able to actually stamp out COVID. (Source: The Premonition: A Pandemic Story (it's an excellent book, and I highly recommend it))

Comment by Jason Gross (jason-gross) on Lies, Damn Lies, and Fabricated Options · 2021-10-26T15:16:29.088Z · LW · GW

Some extra nuance for your examples:

There is a substance XYZ, it's called "anti-water", it filling the hole of water in twin-Earth mandates that twin-Earth is made entirely of antimatter, and then the only problem is that the vacuum of space isn't vacuum enough (e.g., solar wind (I think that's what it's called), if nothing else, would make that Earth explode). More generally, it ought to be possible to come up with a physics where all the fundamental particles have an extra "tag" that carries no role (which in practice, I think, means that it functions just to change the number of microstates when particles with different tags are mixed --- I once tried to figure out what sort of measurement would be needed to determine empirically whether a glass of water in fact had only one kind of water, or had multiple kinds of otherwise-identical water, but have not been able to understand chemical potential enough to finish the thought experiment). Maybe furthermore there's some complicated force acting on the tags that changes them when the density of a particular tag is high enough, so that the tag difference between our Earth and twin-Earth can be maintained. We just have no evidence of such an attribute, hence Occam's razor presumes it to not exist.

I keep meaning to (re)work out the details on the gyroscope example; I think it should follow basically just from F = ma and the rigid body approximation (or maybe springs, if we skip rigid bodies), which means that denying gyroscopic procession basically breaks all of physics that involves objects in motion.

I think a better steelman in Example 1: Price Gouging, is that the law is meant to prevent rent-seeking, i.e., prevent people extracting money from the system without providing commensurate value. (The only example here that I understand even partially is landlords charging rent just because they own the land, and one fix to this is the land-value tax -- see the ACX book review of Progress and Poverty for an excellent explanation. It feels like there should be some analogue here, but I can't model enough economic nuance in my head to generate it and I'm not familiar enough with economics to tease it out.)

In Example 2: An orphan, or an abortion?, there's a further interesting note that outlawing abortion increases crime a decade or two later, because the children who would have been aborted are the ones who are most likely to grow up to become criminals. (Source: Freakonomics)

Comment by Jason Gross (jason-gross) on Is there a definitive intro to punishing non-punishers? · 2021-04-12T01:19:02.596Z · LW · GW

I think the thing you're looking for is traditionally called "third-party punishment" or "altruistic punishment", c.f. https://en.wikipedia.org/wiki/Third-party_punishment . Wikipedia cites Bendor, Jonathon; Swistak, Piot (2001). "The Evolution of Norms". American Journal of Sociology. 106 (6): 1493–1545. doi:10.1086/321298, which seems at least moderately non-technical at a glance.

 

I think I first encountered this in my Moral Psychology class at MIT (syllabus at http://web.mit.edu/holton/www/courses/moralpsych/home.html ), and I believe the citation was E. Fehr & U. Fischbacher 'The Nature of Human Altruism' Nature 425 (2003) 785-91.  The bottom of the first paragraph on page 787 in https://www.researchgate.net/publication/9042569_The_Nature_of_Human_Altruism ("In fact, it can be shown theoretically thateven a minority of strong reciprocators suffices to discipline amajority of selfish individuals when direct punishment is possible.") seems related but not exactly what you're looking for.

Comment by Jason Gross (jason-gross) on How good are our mouse models (psychology, biology, medicine, etc.), ignoring translation into humans, just in terms of understanding mice? (Same question for drosophila.) · 2021-01-25T13:40:36.194Z · LW · GW

I think another interesting datapoint is to look at where our hard-science models are inadequate because we haven't managed to run the experiments that we'd need to (even when we know the theory of how to run them). The main areas that I'm aware of are high-energy physics looking for things beyond the standard model (the LHC was an enormous undertaking and I think the next step up in particle accelerators requires building one the size of the moon or something like that), gravity waves (similar issues of scale), and quantum gravity (similar issues + how do you build an experiment to actually safely play with black holes?!) On the other hand, astrophysics manages to do an enormous amount (star composition, expansion rate of the universe, planetary composition) with literally no ability to run experiments and very limited ability to observe. (I think a particularly interesting case was the discovery of dark matter (which we actually still don't have a model for), which we discovered, iirc, by looking at a bunch of stars in the milky way and determining their velocity as a function of distance from the center by (a) looking at which wavelengths of light were missing to determine their velocity away/towards us (the elements that make up a star have very specific wavelengths that they absorb, so we can tell the chemical composition of a star by looking at the pattern of what wavelengths are missing, and we can get velocity/redshift/blueshift by looking at how far off those wavelengths are from what they are in the lab) and (b) picking out stars of colors that we know come only in very specific brightnesses so that we can use apparent brightness to determine how far away the star is, and (c) use it's position in the night sky to determine what vector to use so we can position it relative to the center of the galaxy, and finally (d) notice that the velocity as a function of radius function is very very different from what it would be if the only mass causing gravitational pull were the visible star mass, and then inverting the plot to determine the spatial distribution of this newfound "dark matter". I think it's interesting and cool that there's enough validated shared model built up in astrophysics that you can stick a fancy prism in front of a fancy eye and look at the night sky and from what you see infer facts about how the universe is put together. Is this sort of thing happening in biology?)

Comment by Jason Gross (jason-gross) on Melatonin: Much More Than You Wanted To Know · 2020-11-17T14:44:51.677Z · LW · GW

By the way,

The normal tendency to wake up feeling refreshed and alert gets exaggerated into a sudden irresistable jolt of awakeness.

I'm pretty sure this is wrong. I'll wake up feeling unable to go back to sleep, but not feeling well-rested and refreshed. I imagine it's closer to a caffeine headache? (I feel tired and headachy but not groggy.) So, at least for me, this is a body clock thing, and not a transient effect.

Comment by Jason Gross (jason-gross) on Melatonin: Much More Than You Wanted To Know · 2020-11-17T14:36:37.396Z · LW · GW

Van Geijlswijk makes the important point that if you take 0.3 mg seven hours before bedtime, none of it is going to be remaining in your system at bedtime, so it’s unclear how this even works. But – well, it is pretty unclear how this works. In particular, I don’t think there’s a great well-understood physiological explanation for how taking melatonin early in the day shifts your circadian rhythm seven hours later.

It seems to me there's a very obvious model for this: the body clock is a chemical clock whose current state is stored in the concentration/configuration of various chemicals in various places. The clock, like all physical systems, is temporally local. There seems to be evidence that it keeps time even in the complete absence of external cues, so most of the "what time is it" state must be encoded in the body (rather than, e.g., using the intensity of sunlight as the primary signal to set the current time). Taking melatonin seems like it's futzing directly with the state of the body clock. If high melatonin encodes the state "middle of the night", then whenever you take it should effectively set your clock to "it's now the middle of the night". I think this is why it makes it possible to fall asleep. I think that it's then the effects of sunlight and actually sleeping and waking up that drag your body clock later again (I also have the effect that at anything over 0.1mg or so, I'll wake up 5h45m later, and if my dose is much more than 0.3mg, I won't be able to fall back asleep).

I'm pretty confused what taking it 9h after waking does in this model, though; 5--6 hours later, when the "most awake" time happens in this model, is just about an hour before you want to go to bed. One plausible explanation here is that this is somehow tied to the "reset" effect you mentioned from staying up for more than 24 hours; if what really matters here is that you were awake for the entirety of your normal sleep time (or something like that), then this would predict that having melatonin any time between when you woke up and 7 hours before when you went to sleep would have the "reset" effect. An alternative (or additional) plausible explanation is that this is tied to "oversleeping" (which in this model would be about confusing your body clock enough that it thinks you're supposed to keep sleeping past when you eventually wake up). If the body clock is sensitive to going back to sleep shortly after waking up (and my experience says this is the case, though I'm not sure what exactly the window is), then taking melatonin 5--6 hours before bed should induce something akin to the "oversleeping" effect (where you wake up, are fine, go back to sleep, sleep much more than 8 hours total, and then feel groggy when you eventually get up).

Comment by Jason Gross (jason-gross) on Raemon's Shortform · 2019-07-22T05:25:30.443Z · LW · GW

I'm wanting to label these as (1) 😃 (smile); (2) 🍪 (cookie); (3) 🌟 (star)

Dunno if this is useful at all

Comment by Jason Gross (jason-gross) on Raemon's Shortform · 2019-07-22T05:20:41.464Z · LW · GW

This has been true for years. At least six, I think? I think I started using Google scholar around when I started my PhD, and I do not recall a time when it did not link to pdfs.

Comment by Jason Gross (jason-gross) on Raemon's Shortform · 2019-07-22T05:17:09.838Z · LW · GW

I dunno how to think about small instances of willpower depletion, but burnout is a very real thing in my experience and shows up prior to any sort of conceptualizing of it. (And pushing through it works, but then results in more extreme burn out after.)

Oh, wait, willpower depletion is a real thing in my experience: if I am sleep deprived, I have to hit the "get out of bed" button in my head harder/more times before I actually get out of bed. This is separate from feeling sleepy (it is true even when I have trouble falling back asleep). It might be mediated by distraction, but that seems like quibbling over words.

I think in general I tend to take outside view on willpower. I notice how I tend to accomplish things, and then try to adjust incentive gradients so that I naturally do more of the things I want. As was said in some CFAR unit, IIRC, if my process involves routinely using willpower to accomplish a particular thing, I've already lost.

Comment by Jason Gross (jason-gross) on Raemon's Shortform · 2019-07-22T05:11:45.254Z · LW · GW

People who feel defensive have a harder time thinking in truthseeking mode rather than "keep myself safe" mode. But, it also seems plausibly-true that if you naively reinforce feelings of defensiveness they get stronger. i.e. if you make saying "I'm feeling defensive" a get out of jail free card, people will use it, intentionally or no

Emotions are information. When I feel defensive, I'm defending something. The proper question, then, is "what is it that I'm defending?" Perhaps it's my sense of self-worth, or my right to exist as a person, or my status, or my self-image as a good person. The follow-up is then "is there a way to protect that and still seek the thing we're after?" "I'm feeling defensive" isn't a "'get out of jail free' card", it's an invitation to go meta before continuing on the object level. (And if people use "I'm feeling defensive" to accomplish this, that seems basically fine? "Thank you for naming your defensiveness, I'm not interested in looking at it right now and want to continue on the object level if you're willing to or else end the conversation for now" is also a perfectly valid response to defensiveness, in my world.)

Comment by Jason Gross (jason-gross) on Micro feedback loops and learning · 2019-05-26T02:07:42.471Z · LW · GW

I imagine one thing that's important to learning through this app, which I think may be under-emphasised here, is that the feedback allows for mindful play as a way of engaging. I imagine I can approach the pretty graph with curiosity: "what does it look like if I do this? What about this?" I imagine that an app which replaced the pretty graphs with just the words "GOOD" and "BAD" would neither be as enjoyable nor as effective (though I have no data on this).

Comment by Jason Gross (jason-gross) on Fuzzy Boundaries, Real Concepts · 2018-05-07T20:44:42.245Z · LW · GW

Another counter-example for consent: being on a crowded subway with no room to not touch people (if there's someone next to you who is uncomfortable with the lack of space). I like your definition, though, and want to try to make a better one (and I acknowledge this is not the point of this post). My stab at a refinement of "consent" is "respect for another's choices", where "disrespect" is "deliberately(?) doing something to undermine". I think this has room for things like preconsent (you can choose to do something you disprefer) and crowded subways. It allows for pulling people out of the way of traffic (either they would choose to have you save their life, or you are knowingly being paternalistic and putting their life above their consent and choices).

Comment by Jason Gross (jason-gross) on The Intelligent Social Web · 2018-04-17T03:41:17.235Z · LW · GW

What is the internal experience of playing the role? Where does it come from? Is there even a coherent category of internal experience that lines up with this, or is it a pattern that shows up only in aggregate?

[The rest of this comment is mostly me musing.] For example, when people in a room laugh or smile, I frequently find myself laughing or smiling with them. I have yet to find a consistent precursor to this action; sometimes it feels forced and a bit shaky, like I'm insecure and fear a certain impact or perception of me. But often it's not that, and it seems to just be automatic, in the way that yawns are contageous. It seems to me like creepiness might work the same way; I see people subtly cringe, and I mimic that, and then when someone mentions that person, I subtly cringe, and the experience of cringing like that is the experience of having a felt sense that this person is creepy. I'm curious about other instances, and what the internal experience is in those, and if there's a pattern to them.

Comment by Jason Gross (jason-gross) on Circling · 2018-02-21T09:06:15.342Z · LW · GW

Because I haven't seen much in the way concrete comments on evidence that circling is real, I'm going to share a slightly outdated list of the concrete things I've gotten from practicing circling:
- a sense of what boundaries are, why they're important, and how to source them internally
- my rate of resolving emotionally-charged conflict over text went from < 1% to ≈80%-90% in the first month or three of me starting circling
- a tool ("Curiosity") for taking any conversation and making it genuinely interesting and likely deeper for me
- confidence and ability to connect more deeply with anyone who seems open to connecting more deeply with me
- the superpower of being able to describe to other people what I imagine they feel in their bodies in certain situations, and be right, even when they couldn't've generated the descriptions
- empathy of the "I'm with you in what you're feeling" sort rather than the "I have a conscious model of how you work and what's going on with you and can predict what you'll do" sort
- a language for talking about how I react in situations on a relational level
- a better understanding of what seems to be my deepest fear (others going away, and it being my fault)
- knowledge that I'm afraid of my own anger and that I deal with this by not trusting people in ways that allow them to make me angry
- an understanding of how asking "are you okay with the existence of my attraction to you?" disempowers me and gives another power over me they may not want; the ability and presence of mind to not do this anymore
- the ability to facilitate resolution to an emotional conflict over text even when both I and the other party are triggered/defensive/in a big experience
- understanding of what it feels like to "collapse", and a vague sense of how to play with that edge
- more facility with placing my attention where I choose
- more respect for silence
- a deep comfort with prolonged eye contact
- knowledge that I seem to flinch a bit inside most times that I talk about sexuality or sex, especially in regards to myself
- knowledge that I struggle most with the question "am I welcome here?"
- a theory of what makes people emotionally tired, which seems to resonate with everyone I share it with
- strong opinions on communication
- the ability to generate ≈non-violent communication from the inside
- better introspective access on an emotional level
- new friends
- ability and comfort with sitting with my own experience and emotions for longer
- decreasing the time from when I first interact with someone to when interaction with them blows up, if it's going to, I think because I'm pushing more of my edges and I see things more clearly and so all the knobs that I'm turning in the wrong direction I'm turning *really strongly* in the wrong direction
- maybe a tiny hint of how people relate to this thing called "community"?
- the ability to listen to nuances in "no"s, and not automatically interpret "no" as "no, I don't want to interact with you now or ever again"
- increased facility in getting in touch with is own anger in a healthy way by asking what it's protective of
- increased facility in engaging with others in their anger by seeking an understanding of what they're standing for
- the experience of being able to decide that I wanted to go to sleep, roughly on time, without fighting myself, for the first time that I can recall in my life

Things that I'm currently playing with in circling, as of a couple of months ago:
- "am I welcome here?"
- "what if someone goes away, and it's my fault?"
- What does it look like to find myself attractive or important, or to matter to myself?
- What does it look like and feel like to be held emotionally?
- What's up for me around touch and physical affection?
- Am I terrified of having power over people?
- How can I be less careful, and more okay/accepting?
- What does it look like to do things from a place of desire rather than a place of "should"?
- What am I attached to and how does attachment get in the way of what I want?

~~~~

I've sometimes said that circling seems to me like "metacognitive defensive driving" (to extend the metaphor of metacognitive blindspots and metacognitive mirrors); there's a way in which circling seems to allow my S1 to communicate very directly with another person's S1, in situations where our S2's get tripped up and have trouble communicating, and in such a way that it seems to bypass most issues of miscommunication and get directly to the heart of the matter. Even when I can't see the ways that my cognition is impaired, circling frequently lets me bypass that or address it directly.

I also want to add another perspective on NVC/ ownership language. I like using ownership language in part because it tends to trip me up in all the places where I'm trying to do something other than communication with my words, and thus it helps me to understand myself better.