jessica-liu-taylor

Oh, to be clear I do think that AI safery automation is a well targeted x risk effort conditioned on the AI timelines you are presenting. (Related to Paul Christiano alignment ideas, which are important conditional on prosaic AI)

Comment by jessicata (jessica.liu.taylor) on How to Make Superbabies · 2025-02-20T20:57:46.005Z · LW · GW

On EV grounds, "2/3 chance it's irrelevant because of AGI in the next 20 years" is not a huge contributor to the EV of this. Because, ok, maybe it reduces the EV by 3x compared to what it would otherwise have been. But there are much bigger than 3x factors that are relevant. Such as, probability of success, magnitude of success, cost effectiveness.

Then you can take the overall cost effectiveness estimate (by combining various factors including probability it's irrelevant due to AGI being too soon) and compare it to other interventions. Here, you're not offering a specific alternative that is expected to pay off in worlds with AGI in the next 20 years. So it's unclear how "it might be irrelevant if AGI is in the next 20 years" is all that relevant as a consideration.

Comment by jessicata (jessica.liu.taylor) on The Obliqueness Thesis · 2025-02-19T21:38:59.646Z · LW · GW

Wasn't familiar. Seems similar in that facts/values are entangled. I was more familiar with Cuneo for that.

Comment by jessicata (jessica.liu.taylor) on "Self-Blackmail" and Alternatives · 2025-02-11T01:28:38.428Z · LW · GW

Dunno; gym membership also feels like a form of blackmail (although preferable to the alternative forms of blackmail), while home gym reduces the inconvenience of exercising.

I'm not sure what differentiates these in your mind. They both reduce the inconvenience of exercising, presumably? Also, in my post I'm pretty clear that it's not meant as a punishment type incentive:

And it’s prudent to take into account the chance of not exercising in the future, making the investment useless: my advised decision process counts this as a negative, not a useful self-motivating punishment.

...

Generally, it seems like the problem is signaling. You buy the gym membership to signal your strong commitment to yourself. Then you feel good about sending a strong signal. And then the next day you feel just as lazy as previously, and the fact that you already paid for the membership probably feels bad.

That's part of why I'm thinking an important step is checking whether one expects the action to happen if the initial steps are taken. If not then it's less likely to be a good idea.

There is some positive function of the signaling / hyperstition, but it can lead people to be unnecessarily miscalibrated.

Comment by jessicata (jessica.liu.taylor) on "Self-Blackmail" and Alternatives · 2025-02-10T08:16:11.538Z · LW · GW

I was already paying attention to Ziz prior to this.
Ziz's ideology is already influential. I've been having discussions about which parts are relatively correct or not correct. This is a part that seems relatively correct and I wanted to acknowledge that.
If engagement with Zizian philosophy is outlawed, then only outlaws have access to Zizian philosophy. Antimemes are a form of camouflage. If people refuse to see what is in front of them, people can coordinate crimes in plain sight. (Doesn't apply so much to this post, more of a general statement)
The effect you're pointing too seems very small if it even exists, in terms of causing negative effects.

Comment by jessicata (jessica.liu.taylor) on "Self-Blackmail" and Alternatives · 2025-02-10T07:52:37.259Z · LW · GW

Okay, I don't think I was disagreeing except in cases of very light satisficer-type self-commitments. Maybe you didn't intend to express disagreement with the post, idk.

Comment by jessicata (jessica.liu.taylor) on "Self-Blackmail" and Alternatives · 2025-02-10T07:37:55.863Z · LW · GW

So far I don't see evidence that any LessWrong commentator has read the post or understood the main point.

Comment by jessicata (jessica.liu.taylor) on "Self-Blackmail" and Alternatives · 2025-02-10T01:38:52.194Z · LW · GW

Not disagreeing, but, I'm not sure what you are responding to? Is it something in the post?

Comment by jessicata (jessica.liu.taylor) on On Eating the Sun · 2025-01-11T21:59:59.973Z · LW · GW

We might disagree about the value of thinking about "we are all dead" timelines. To my mind, forecasting should be primarily descriptive, not normative; reality keeps going after we are all dead, and having realistic models of that is probably a useful input regarding what our degrees of freedom are. (I think people readily accept this in e.g. biology, where people can think about what happens to life after human extinction, or physics, where "all humans are dead" isn't really a relevant category that changes how physics works.)

Of course, I'm not implying it's useful for alignment to "see that the AI has already eaten the sun", it's about forecasting future timelines by defining thresholds and thinking about when they're likely to happen and how they relate to other things.

(See this post, section "Models of ASI should start with realism")

Comment by jessicata (jessica.liu.taylor) on Adam Shai's Shortform · 2025-01-11T19:53:01.973Z · LW · GW

I was trying to say things related to this:

In a more standard inference amortization setup one would e.g. train directly on question/answer pairs without the explicit reasoning path between the question and answer. In that way we pay an up-front cost during training to learn a "shortcut" between question and answers, and then we can use that pre-paid shortcut during inference. And we call that amortized inference.

Which sounds like supervised learning. Adam seemed to want to know how that relates to scaling up inference time compute so I said some ways they are related.

I don't know much about amortized inference in general. The Goodman paper seems to be about saving compute by caching results between different queries. This could be applied to LLMs but I don't know of it being applied. It seems like you and Adam like this "amortized inference" concept and I'm new to it so don't have any relevant comments. (Yes I realize my name is on a paper talking about this but I actually didn't remember the concept)

I don't think I implied anything about o3 relating to parallel heuristics.

Comment by jessicata (jessica.liu.taylor) on The AI Timelines Scam · 2025-01-10T20:10:31.175Z · LW · GW

I would totally agree they were directionally correct, I under-estimated AI progress. I think Paul Christiano got it about right.

I'm not sure I agree about the use of hyperbolic words being "correct" here; surely, "hyperbolic" contradicts the straightforward meaning of "correct".

Partially the state I was in around 2017 was, there are lots of people around me saying "AGI in 20 years", by which they mean a thing that shortly after FOOMs and eats the sun or something, and I thought this was wrong and a strange set of belief updates (which was not adequately justified, and where some discussions were suppressed because "maybe it shortens timelines"). And I stand by "no FOOM by 2037".

The people I know these days who seem most thoughtful about the AI that's around and where it might go ("LLM whisperer" / cyborgism cluster) tend to think "AGI already, or soon" plus "no FOOM, at least for a long time". I think there is a bunch of semantic confusion around "AGI" that makes people's beliefs less clear, with "AGI is what makes us $100 billion" as a hilarious example of "obviously economically/politically motivated narratives about what AGI is".

So, I don't see these people as validating "FOOM soon" even if they're validating "AGI soon", and the local rat-community thing I was objecting to was something that would imply "FOOM soon". (Although, to be clear, I was still under-estimating AI progress.)

Comment by jessicata (jessica.liu.taylor) on On Eating the Sun · 2025-01-09T19:49:35.940Z · LW · GW

I think this shades into dark forest theory. Broadly my theory about aliens in general is that they're not effectively hiding themselves, and we don't see them because any that exist are too far away.

Partially it's a matter of, if aliens wanted to hide, could they? Sure, eating a star would show up in terms of light patterns, but also, so would being a civilization at the scale of 2025-earth. And my argument is that these aren't that far-off in cosmological terms (<10K years).

So, I really think alien encounters are in no way an urgent problem: we won't encounter them for a long time, and if they get light from 2025-Earth, they'll already have some idea that something big is likely to happen soon on Earth.

Comment by jessicata (jessica.liu.taylor) on On Eating the Sun · 2025-01-09T08:12:12.386Z · LW · GW

Doesn't have to expend the energy. It's about reshaping the matter to machines. Computers take lots of mass-energy to constitute them, not to power them.
Things can go 6 orders of magnitude faster due to intelligence/agency, it's not highly unlikely in general.
I agree that in theory the arguments here could be better. It might require knowing more physics than I do, and has the "how does Kasparov beat you at chess" problem.

Comment by jessicata (jessica.liu.taylor) on On Eating the Sun · 2025-01-08T21:30:14.175Z · LW · GW

I'm not sure what the details would look like, but I'm pretty sure ASI would have enough new technologies to figure something out within 10,000 years. And expending a bunch of waste heat could easily be worth it, if having more computers allows sending out Von Neumann probes faster / more efficiently to other stars. Since the cost of expending the Sun's energy has to be compared with the ongoing cost of other stars burning.

Comment by jessicata (jessica.liu.taylor) on On Eating the Sun · 2025-01-08T20:39:35.281Z · LW · GW

I think partially it's meant to go from some sort of abstract model of intelligence as a scalar variable that increases at some rate (like, on a x/y graph) to concrete, material milestones. Like, people can imagine "intelligence goes up rapidly! singularity!" and it's unclear what that implies, I'm saying sufficient levels would imply eating the sun, that makes it harder to confuse with things like "getting higher scores on math tests".

I suppose a more general category would be, the relevant kind of self-improving intelligence would be the sort that can re-purpose mass-energy to creating more computation that can run its intelligence, and "eat the Sun" is an obvious target given this background notion of intelligence.

(Note, there is skepticism about feasibility on Twitter/X, that's some info about how non-singulatarians react)

Comment by jessicata (jessica.liu.taylor) on On Eating the Sun · 2025-01-08T18:18:36.032Z · LW · GW

Mostly speculation based on tech level. But:

To the extent temperature is an issue, energy can be used to transfer temperature from one place to another.
Maybe matter from the Sun can be physically expelled into more manageable chunks. The Sun already ejects matter naturally (though at a slow rate).
Nanotech in general (cell-like, self-replicating robots).
High energy availability with less-speculative tech like Dyson spheres.

Comment by jessicata (jessica.liu.taylor) on Adam Shai's Shortform · 2025-01-08T02:49:10.188Z · LW · GW

I don't habitually use the concept so I don't have an opinion on how to use the term.

Comment by jessicata (jessica.liu.taylor) on Adam Shai's Shortform · 2025-01-06T02:30:18.535Z · LW · GW

I'm not sure this captures what you mean, but, if you see a query, do a bunch of reasoning, and get an answer, then you can build a dataset of (query, well-thought guess). Then you can train an AI model on that.

AlphaZero sorta works like this, because it can make a "well-thought guess" (take value and/or Q network, do an iteration of minimax, then make the value/Q network more closely approximate that, in a fixed point fashion)

Learning stochastic inverses is a specific case of "learn to automate Bayesian inference by taking forward samples and learning the backwards model". It could be applied to LLMs for example, in terms of starting with a forwards LLM and then using it to train a LLM that predicts things out-of-order.

Paul Christiano's iterated amplification and distillation is applying this idea to ML systems with a human feedback element. If you can expend a bunch of compute to get a good answer, you can train a weaker system to approximate that answer. Or, if you can expend a bunch of compute to get a good rating for answers, you can use that as RL feedback.

Broadly, I take o3 as evidence that Christiano's work is broadly on the right track with respect to alignment of near-term AI systems. That is, o3 shows that hard questions can be decomposed into easy ones, in a way that involves training weaker models to be part of a big computation. (I don't understand the o3 details that well, given it's partially private, but I'm assuming this describes the general outlines). So I think the sort of schemes Christiano has described will be helpful for both alignment and capabilities, and will scale pretty well to impressive systems.

I'm not sure if there's a form of amortized inference that you think this doesn't cover well.

Comment by jessicata (jessica.liu.taylor) on 2024 in AI predictions · 2025-01-01T21:07:05.435Z · LW · GW

As a follow-up, I tested o1 on stone color problems and it got 7/8. This is not what I expected and it is what Paul Christiano expected. Allocate Bayes points accordingly. ChatGPT transcript link

(as a bonus, o1 also solved the 20 words with repetitions problem)

First problem: I have 6 colors of stone: red, green, blue, yellow, white, and black. Arrange the stones in a 2d grid, 3 x 3. I want exactly one row to have the same color stone throughout it, and exactly one column to have the same color stone throughout it. I want every other row to have all different colors, and every other column to have all different colors. And I want every corner color to be different.

RKG

KKK

WKY

(correct)

Second problem: I have 6 colors of stone: red, green, blue, yellow, white, and black. Arrange them in a 4 x 4 grid so that the center 4 stones are the same color and the colors alternate if you go around the edge.

RGBY

KKKW

WKKK

YBGR

(not exactly what I asked for…let’s specify alternate better)

RGRG

KKKB

WKKY

KWYB

(fail)

Third problem: I have 6 colors of stone: red, green, blue, yellow, white, and black. Arrange them in a 4 x 4 grid so that no white/black stone is horizontally or vertically adjacent to a colored stone, and no stone is diagonally adjacent to one of the same color.

RGRG

BYBY

RGRG

BYBY

(correct)

Fourth problem: I have 6 colors of stone: red, green, blue, yellow, white, and black. Arrange them in a 4 x 4 grid so there is a 3 x 3 "o" shape somewhere that has the same color around the edge and a different color in the center, and so that other than on the "o" shape, no two adjacent stones (vertical or horizontal) have the same color.

WWWR

WBWG

WWWR

RGRY

(correct)

Fifth problem: I have 6 colors of stone: red, green, blue, yellow, white, and black. Arrange them in a 5 x 5 grid so that each row alternates between black/white and colored stones (e.g. red could come after white, but black could not come after white, horizontally), and no two adjacent stones (horizontal or vertical) share the same color.

KRWGK

RKGKR

KBWRW

GKRWG

KRKGW

(correct)

Sixth problem: I have 6 colors of stone: red, green, blue, yellow, white, and black. Arrange them in a 5 x 5 grid so that there is one row of the same color, one column of the same color, all four corners are different, and there is a 2 x 2 square of all the same color.

KKRWG

KKRGW

RRRRR

GWRKB

WGRKY

(correct)

Seventh problem: I have 6 colors of stone: red, green, blue, yellow, white, and black. Arrange them in a 4 x 4 grid so if you go around the edges (including corners), you don't get the same color stone twice in a row.

RGBY

KRGW

WBYK

YBGR

(correct)

Eighth problem: I have 6 colors of stone: red, green, blue, yellow, white, and black. Arrange them in a 4 x 4 grid so that there is at least one black stone, and if you move diagonally from a black stone to another stone (e.g. one stone up and left), you get another black stone; also, no stone should be horizontally or vertically adjacent to one of the same color.

KRKG

BKYK

KWKR

GKBK

(correct)

Comment by jessicata (jessica.liu.taylor) on Sapphire Shorts · 2024-12-10T18:08:50.874Z · LW · GW

I think if there were other cases of Olivia causing problems and he was asking multiple people to hide Olivia problems, that would more cause me to think he was sacrificing more group epistemology to protect Olivia's reputation, and was overall more anti-truth-seeking, yes.

Comment by jessicata (jessica.liu.taylor) on Sapphire Shorts · 2024-12-10T17:34:27.969Z · LW · GW

Huh? It seems to come down to definitions of lies, my current intuition is it wouldn't be a lie, but I'm not sure why people would care how I define lie in this context.

Comment by jessicata (jessica.liu.taylor) on Sapphire Shorts · 2024-12-10T03:32:42.942Z · LW · GW

the kind of thing I have heard from Vassar directly is that, in the Lacanian classification of people as psychotic/neurotic/perverted, there are some things to be said in favor of psychotics relative to others, namely, that they have access to the 'imaginary' realm that is coherent and scientific (I believe Lacan thinks science is imaginary/psychotic, as it is based on symmetries). however, Lacanian psychosis has the disadvantage that people can catastrophize about ways society is bad.

more specifically, Vassar says, Lacanian neurotics tend to deny oppressive power structures, psychotics tend to acknowledge them and catastrophize about them, and perverts tend to acknowledge and endorse them; under this schema, it seems things could be said in favor of and against all three types.

this raises the question of how much normal (non-expert) and psychiatric concepts of psychosis have to do with the Lacanian model which relates to factors like how much influence Lacan has had on psychiatry. I asked Vassar about this and he said that 'delusions' (a standard symptom of psychosis) can be a positive sign because when people form actual beliefs they tend to be wrong (this accords with, for example, Popperian philosophy of science, as specific theories are in general 'wrong' even if useful; see also, 'all models are wrong, some models of useful')

overall I think further specifying the degree to which anyone is 'encouraging psychosis', or the ethics of value judgments on psychosis, would in general require having a more specific definition/notion of psychosis, and the sort of 'dramatic' relation people in threads such as this have to psychosis (i.e. moral panics about it) is contra such specificity in definition, therefore, lacks requisite precision for well-informed judgments.

Comment by jessicata (jessica.liu.taylor) on Sapphire Shorts · 2024-12-10T03:23:44.056Z · LW · GW

I have no idea about other people lying due to JDP's influence. I had JDP look at a draft of Occupational Infohazards prior to posting and he convinced me to not mention Olivia because she was young and inexperienced / experimenting with ways of being at the time, it was maybe too bad for her reputation to say she was a possible influence on my psychosis. I admit this was a biased omission, though I don't think it was a lie. (To be clear, I'm not saying I went psychotic because of Olivia, I think there were many factors and I'm pretty uncertain about the weighting)

Comment by jessicata (jessica.liu.taylor) on Sapphire Shorts · 2024-12-10T03:08:16.199Z · LW · GW

claims about Vassar aside, do I even have a reputation for being particularly disagreeable or overconfident, or doing so in the presence of people who have taken psychedelics? to my mind I am significantly less disagreeable and confident than high status rationalists such as Eliezer Yudkowsky and Nate Soares. I think my tendency with trips is to sometimes explore new hypotheses but have relatively low confidence as I'm more likely than usual to change my mind the next day. also, isn't the 'modest epistemology' stuff a pretty thorough criticism of claims that people should not "confidently expound in the nature of truth and society" that has been widely accepted on LW?

as another consideration, I have somewhat of a reputation for being a helpful person for people going through mental health issues (such as psychosis) to talk to, e.g. I let someone with anxiety, paranoia, and benzo issues stay at my place for a bit, she was very thankful and so was her mom. I don't think this is consistent with the reputation attributed to me re: effects on people in altered states of consciousness.

Comment by jessicata (jessica.liu.taylor) on Sapphire Shorts · 2024-12-10T03:03:36.355Z · LW · GW

she talked with him sometimes in group conversations that included other people, 2016-2017. idk if they talked one on one. she stopped talking with him as much sometime during this partially due to Bryce Hidysmith's influence. mostly, she was interested in learning from him because he was a "wizard"; she also thought of Anna Salamon as a "wizard", perhaps others. Michael wasn't specifically like "I am going to teach Olivia things as a studient" afaik, I would not describe it as a "teacher/student relationship". at this point they pretty much don't talk and Michael thinks Olivia is suspect/harmful due to the whole Eric Bruylant situation where Eric became obsessed with Vassar perhaps due to Olivia's influence.

Comment by jessicata (jessica.liu.taylor) on You are not too "irrational" to know your preferences. · 2024-11-27T01:46:17.908Z · LW · GW

I don't think so; even if it applies to the subset of hypothetical superintelligences that factor neatly into beliefs and values, humans don't seem to factorize this way (see Obliqueness Thesis, esp. argument from brain messiness).

Comment by jessicata (jessica.liu.taylor) on The Cognitive-Theoretic Model of the Universe: A Partial Summary and Review · 2024-10-06T20:22:54.194Z · LW · GW

Thanks, hadn't realized how this related to algebraic geometry. Reminds me of semi-simplicial type theory.

Comment by jessicata (jessica.liu.taylor) on The Obliqueness Thesis · 2024-09-20T16:34:15.947Z · LW · GW

Computationally tractable is Yudkowsky's framing and might be too limited. The kind of thing I believe is for example, an animal without a certain brain complexity will tend not to be a social animal and is therefore unlikely to have the sort of values social animals have. And animals that can't do math aren't going to value mathematical aesthetics the way human mathematicians do.

Comment by jessicata (jessica.liu.taylor) on The Obliqueness Thesis · 2024-09-20T16:32:14.892Z · LW · GW

Relativity to Newtonian mechanics is a warp in a straightforward sense. If you believe the layout of a house consists of some rooms connected in a certain way, but there are actually more rooms connected in different ways, getting the maps to line up looks like a warp. Basically, the closer the mapping is to a true homomorphism (in the universal algebra sense), the less warping there is, otherwise there are deviations intuitively analogous to space warps.

Comment by jessicata (jessica.liu.taylor) on We Don't Know Our Own Values, but Reward Bridges The Is-Ought Gap · 2024-09-20T00:10:50.792Z · LW · GW

I discussed something similar in the "Human brains don't seem to neatly factorize" section of the Obliqueness post. I think this implies that, even assuming the Orthogonality Thesis, humans don't have values that are orthogonal to human intelligence (they'd need to not respond to learning/reflection to be orthogonal in this fashion), so there's not a straightforward way to align ASI with human values by plugging in human values to more intelligence.

Comment by jessicata (jessica.liu.taylor) on The Obliqueness Thesis · 2024-09-19T23:26:43.908Z · LW · GW

hmm, I wouldn't think of industrialism and human empowerment as trying to grab the whole future, just part of it, in line with the relatively short term (human not cosmic timescale) needs of the self and extended community; industrialism seems to lead to capitalist organization which leads to decentralization superseding nations and such (as Land argues).

I think communism isn't generally about having one and one's friends in charge, it is about having human laborers in charge. One could argue that it tended towards nationalism (e.g. USSR), but I'm not convinced that global communism (Trotskyism) would have worked out well either. Also, one could take an update from communism about agendas for global human control leading to national control (see also tendency of AI safety to be taken over by AI national security as with the Situational Awareness paper). (Again, not ruling out that grabbing hold of the entire future could be a good idea at some point, just not sold on current agendas and wanted to note there are downsides that push against Pascal's mugging type considerations)

Comment by jessicata (jessica.liu.taylor) on The Obliqueness Thesis · 2024-09-19T23:23:15.602Z · LW · GW

Not sure what you mean by complexity here, is this like code size / Kolmogorov complexity? You need some of that to have intelligence at all (the empty program is not intelligent). At some point most of your gains come from compute rather than code size. Though code size can speed things up (e.g. imagine sending a book back to 1000BC, that would speed people up a lot; consider that superintelligence sending us a book would be a bigger speedup)

by "complexify" here it seems you mean something like "develop extended functional organization", e.g. in brain development throughout evolution. And yeah, that involves dynamics with the environment and internal maintenance (evolution gets feedback from the environment). It seems it has to have a drive to do this which can either be a terminal or instrumental goal, though deriving it from instrumentals seems harder than baking it is as terminal (so I would guess evolution gives animals a terminal goal of developing functional complexity of mental structures etc, or some other drive that isn't exactly a terminal goal)

see also my post relating optimization daemons to immune systems, it seems evolved organisms develop these; when having more extended functional organization, they protect it with some immune system functional organization.

to be competitive agents, having a "self" seems basically helpful, but might not be the best solution; selfish genes are an alternative, and perhaps extended notions of self can maintain competitiveness.

Comment by jessicata (jessica.liu.taylor) on The Obliqueness Thesis · 2024-09-19T23:13:33.506Z · LW · GW

Thanks, going to link this!

Comment by jessicata (jessica.liu.taylor) on The Obliqueness Thesis · 2024-09-19T23:11:14.206Z · LW · GW

re meta ethical alternatives:

roughly my view
slight change, opens the question of why the deviations? are the "right things to value" not efficient to value in a competitive setting? mostly I'm trying to talk about those things to value that go along with intelligence, so it wouldn't correspond with a competitive disadvantage in general. so it's still close enough to my view
roughly Yudkowskian view, main view under which the FAI project even makes sense. I think one can ask basic questions like which changes move towards more rationality on the margin, though such changes would tend to prioritize rationality over preventing value drift. I'm not sure how much there are general facts about how to avoid value drift (it seems like the relevant kind, i.e. value drift as part of becoming more rational/intelligent, only exists from irrational perspectives, in a way dependent on the mind architecture)
minimal CEV-realist view. it really seems up to agents how much they care about their reflected preferences. maybe changing preferences too often leads to money pumps, or something?
basically says "there are irrational and rational agents, rationality doesn't apply to irrational agents", seems somewhat how people treat animals (we don't generally consider uplifting normative with respect to animals)
at this point you're at something like ecology / evolutionary game theory, it's a matter of which things tend to survive/reproduce and there aren't general decision theories that succeed

re human ontological crises: basically agree, I think it's reasonably similar to what I wrote. roughly my reason for thinking that it's hard to solve is that the ideal case would be something like a universal algebra homomorphism (where the new ontology actually agrees with the old one but is more detailed), yet historical cases like physics aren't homomorphic to previous ontologies in this way, so there is some warping necessary. you could try putting a metric on the warping and minimizing it, but, well, why would someone think the metric is any good, it seems more of a preference than a thing rationality applies to. if you think about it and come up with a solution, let me know, of course.

with respect to grabbing hold of the whole future: you can try looking at historical cases of people trying to grab hold of the future and seeing how that went, it's a mixed bag with mostly negative reputation, indicating there are downsides as well as upsides, it's not a "safe" conservative view. see also Against Responsibility. I feel like there's a risk of getting Pascal's mugged about "maybe grabbing hold of the future is good, you can't rule it out, so do it", there are downsides to spending effort that way. like, suppose some Communists thought capitalism would lead to the destruction of human value with high enough probability that instituting global communism is the conservative option, it doesn't seem like that worked well (even though a lot of people around here would agree that capitalism tends to leads to human value destruction in the long run). particular opportunities for grabbing hold of the future can be net negative and not worth worrying about even if one of them is a good idea in the long run (I'm not ruling that out, just would have to be convinced of specific opportunities).

overall I'd rather focus on first modeling the likely future and looking for plausible degrees of freedom; a general issue with Pascal's mugging is it might make people overly attached to world models in which they have ~infinite impact (e.g. Christianity, Communism) which means paying too much attention to wrong world models, not updating to more plausible models in which existential-stakes decisions could be comprehended if they exist. and Obliqueness doesn't rule out existential stakes (since it's non-Diagonal).

as another point, Popperian science tends to advance by people making falsifiable claims, "you don't know if that's true" isn't really an objection in that context. the pragmatic claim I would make is: I have some Bayesian reason to believe agents do not in general factor into separate Orthogonal and Diagonal components, this claim is somewhat falsifiable (someone could figure out a theory of this invulnerable to optimization daemons etc), I'm going to spend my attention on the branch where I'm right, I'm not going to worry about Pascal's mugging type considerations for if I'm wrong (as I said, modeling the world first seems like a good general heuristic), people can falsify it eventually if it's false.

this whole discussion is not really a defense of Orthogonality given that Yudkowsky presented orthogonality as a descriptive world model, not a normative claim, so sticking to the descriptive level in the original post seems valid; it would be a form of bad epistemology to reject a descriptive update (assuming the arguments are any good) because of pragmatic considerations.

Comment by jessicata (jessica.liu.taylor) on The Obliqueness Thesis · 2024-09-19T16:56:19.428Z · LW · GW

"as important as ever": no, because our potential influence is lower, and the influence isn't on things shaped like our values, there has to be a translation, and the translation is different from the original.

CEV: while it addresses "extrapolation" it seems broadly based on assuming the extrapolation is ontologically easy, and "our CEV" is an unproblematic object we can talk about (even though it's not mathematically formalized, any formalization would be subject to doubt, and even if formalized, we need logical uncertainty over it, and logical induction has additional free parameters in the limit). I'm really trying to respond to orthogonality not CEV though.

from a practical perspective: notice that I am not behaving like Eliezer Yudkowsky. I am not saying the Orthogonality Thesis is true and important to ASI, I am instead saying intelligence/values are Oblique and probably nearly Diagonal (though it's unclear what I mean by "nearly"). I am not saying a project of aligning superintelligence with human values is a priority. I am not taking research approaches that assume a Diagonal/Orthogonal factorization. I left MIRI partially because I didn't like their security policies (and because I had longer AI timelines), I thought discussion of abstract research ideas was more important. I am not calling for a global AI shutdown so this project (which is in my view confused) can be completed. I am actually against AI regulation on the margin (I don't have a full argument for this, it's a political matter at this point).

I think practicality looks more like having near-term preferences related to modest intelligence increases (as with current humans vs humans with neural nets; how do neural nets benefit or harm you, practically? how can you use them to think better and improve your life?), and not expecting your preferences to extend into the distant future with many ontology changes, so don't worry about grabbing hold of the whole future etc, think about how to reduce value drift while accepting intelligence increases on the margin. This is a bit like CEV except CEV is in a thought experiment instead of reality.

The "Models of ASI should start with realism" bit IS about practicalities, namely, I think focusing on first forecasting absent a strategy of what to do about the future is practical with respect to any possible influence on the far future; practically, I think your attempted jump to practicality (which might be related to philosophical pragmatism) is impractical in this context.

It occurs to me that maybe you mean something like "Our current (non-extrapolated) values are our real values, and maybe it's impossible to build or become a superintelligence that shares our real values so we'll have to choose between alignment and superintelligence." Is this close to your position?

Close. Alignment of already-existing human values with superintelligence is impossible (I think) because of the arguments given. That doesn't mean humans have no preferences indirectly relating to superintelligence (especially, we have preferences about modest intelligence increases, and there's some iterative process).

Comment by jessicata (jessica.liu.taylor) on Book review: Xenosystems · 2024-09-17T22:34:07.484Z · LW · GW

I think that's what I'm trying to say with orthogonal and diagonal both being wrong. One example of a free choice would be bets on things that are very hard to test or deduce. Then you decide some probability, and if you change the probability too much you get money pumped as with a logical inductor. But of course thinking and learning more will tend to concentrate beliefs more, so it isn't truly orthogonal. (One could think values but not beliefs are orthogonal, but we both know about Bayes/VNM duality)

Comment by jessicata (jessica.liu.taylor) on Book review: Xenosystems · 2024-09-17T22:31:54.316Z · LW · GW

I think the relevant implication from the thought experiment is that thinking a bunch about metaethics and so on will in practice change your values; the pill itself is not very realistic, but thinking can make people smarter and will cause value changes. I would agree Land is overconfident (I think orthogonal and diagonal are both wrong models).

Comment by jessicata (jessica.liu.taylor) on Book review: Xenosystems · 2024-09-17T04:31:53.634Z · LW · GW

I don't think it's a given that moral nonrealism is true (therefore inevitably believed by a superintelligence), see my short story.

Morality can mean multiple things. Utilitarian morality is about acting to maximize a fixed goal function, Kantian morality is about alignment between the a posteriori will and possible a priori will, cultural morality is about adherence to a specific method of organizing humans.

Superintelligence would clearly lack human cultural morality, it's a specific system organizing humans, e.g. with law as a relatively legible branch.

In general humans question more of their previous morality when thinking longer; Peter Singer for example rejects much of normal morality for utilitarian reasons.

ASI could have something analogous to cultural morality but for organizing a different set of agents. E.g. methods of side-taking in game-theoretic conflict that tend to promote cooperation between different ASIs (this becomes more relevant e.g. when an alien ASI is encountered or more speculatively in acausal trade).

Regardless of whether one calls Omohundro drives "moral", they are convergent goals for ASIs, so the rejection of human morality does not entail lack of very general motives that include understanding the world and using resources such as energy efficiently and so on.

I think both (a) something like moral realism is likely true and (b) the convergent morality for ASIs does not particularly care about humans if ASIs already exist (humans are of course important in the absence of ASIs due to greater intelligence/agency than other entities on Earth).

FAI is a narrow path to ASI that has similar values to what humans would upon reflection. As I have said these are very different from current human values due to more thought and coherence and so on. It might still disassemble humans but scan them into simulation and augment them, etc. (This is an example of what I referred to as "luxury consumerism in the far future")

To the extent will-to-think generates a "should" for humans the main one is "you should think about things including what is valuable, and trust the values upon reflection more than current values, rather than being scared of losing current values on account of thinking more". It's basically an option for people to do this or not, but as Land suggests, not doing this leads to a competitive disadvantage in the long run. And general "should"s in favor of epistemic rationality imply this sort of thing.

There is more I could say about how values such as the value of staying alive can be compatible with deontological morality (of the sort compatible with will-to-think), perhaps this thread can explain some of it.

Comment by jessicata (jessica.liu.taylor) on Book review: Xenosystems · 2024-09-17T03:27:46.538Z · LW · GW

VNM/Bayes suggest there are some free parameters in how reflectively stable AGI could turn out, e.g. beliefs about completely un-testable propositions (mathematically undecidable etc), which might hypothetically be action-relevant at some point.

None of these are going to look like human values, human values aren't reflectively stable so are distinct in quite a lot of ways. FAI is a hypothetical of a reflectively stable AGI that is nonetheless "close to" or "extended from" human values to the degree that's possible. But it will still have very different preferences.

It would be very hard for will-to-think to be in itself "misguided", it's the drive to understand more, it may be compatible with other drives but without will-to-think there is no coherent epistemology or values.

Uploading is a possible path towards reflective stability that lots of people would consider aligned because it starts with a copy of them. But it's going to look very different after millions of years of the upload's reflection, of course. It's going to be hard to evaluate this sort of thing on a value level because it has to be done from a perspective that doesn't know very much, lacks reflective stability, etc.

Comment by jessicata (jessica.liu.taylor) on Does life actually locally *increase* entropy? · 2024-09-16T21:03:59.854Z · LW · GW

Efficient heat engines locally slow entropy increase. If they could reverse entropy, they would (to get more energy out of things). They can also export high entropy (e.g. medium-temperature water) while intaking low entropy (e.g. un-mixed high and low temperature water) to locally reduce entropy. Entropy is waste from the perspective of a heat engine. Likewise, animals intake low-entropy food and excrete high-entropy waste.

Comment by jessicata (jessica.liu.taylor) on Executable philosophy as a failed totalizing meta-worldview · 2024-09-09T16:17:42.212Z · LW · GW

MIRI research topics are philosophical problems. Such as decision theory and logical uncertainty. And they would have to solve more. Ontology identification is a philosophical problem. Really, how would you imagine doing FAI without solving much of philosophy?

I think the post is pretty clear about why I think it failed. MIRI axed the agent foundations team and I can see very very few people continue to work on these problems. Maybe in multiple decades (past many of the relevant people's median superintelligence timelines) some of the problems will get solved but I don't see "push harder on doing agent foundations" as a thing people are trying to do.

Comment by jessicata (jessica.liu.taylor) on Executable philosophy as a failed totalizing meta-worldview · 2024-09-08T19:08:43.468Z · LW · GW

There might be a confusion. Did you get the impression from my post that I think MIRI was trying to solve philosophy?

I do think other MIRI researchers and I would think of the MIRI problems as philosophical in nature even if they're different from the usual ones, because they're more relevant and worth paying attention to, given the mission and so on, and because (MIRI believes) they carve philosophical reality at the joints better than the conventional ones.

Whether it's "for the sake of solving philosophical problems or not"... clearly they think they would need to solve a lot of them to do FAI.

EDIT: for more on MIRI philosophy, see deconfusion, free will solution.

Comment by jessicata (jessica.liu.taylor) on Executable philosophy as a failed totalizing meta-worldview · 2024-09-08T01:24:39.660Z · LW · GW

It appears Eliezer thinks executable philosophy addresses most philosophical issues worth pursuing:

Most “philosophical issues” worth pursuing can and should be rephrased as subquestions of some primary question about how to design an Artificial Intelligence, even as a matter of philosophy qua philosophy.

"Solving philosophy" is a grander marketing slogan that I don't think was used, but, clearly, executable philosophy is a philosophically ambitious project.

Comment by jessicata (jessica.liu.taylor) on Executable philosophy as a failed totalizing meta-worldview · 2024-09-07T06:42:19.107Z · LW · GW

None of what you're talking about is particular to the Sequences. It's a particular synthesis of ideas including reductionism, Bayesianism, VNM, etc. I'm not really sure why the Sequences would be important under your view except as a popularization of pre-existing concepts.

Comment by jessicata (jessica.liu.taylor) on Executable philosophy as a failed totalizing meta-worldview · 2024-09-06T04:26:53.204Z · LW · GW

Decision theory itself is relatively narrowly scoped, but application of decision theory is broadly scoped, as it could be applied to practically any decision. Executable philosophy and the Sequences include further aspects beyond decision theory.

Comment by jessicata (jessica.liu.taylor) on Executable philosophy as a failed totalizing meta-worldview · 2024-09-06T02:50:04.844Z · LW · GW

No because it's a physics theory. It is a descriptive theory of physical laws applying to matter and so on. It is not even a theory of how to do science. It is limited to one domain, and not expandable to other domains.

Comment by jessicata (jessica.liu.taylor) on Executable philosophy as a failed totalizing meta-worldview · 2024-09-06T02:47:27.074Z · LW · GW

...try reading the linked "Executable Philosophy" Arbital page?

Comment by jessicata (jessica.liu.taylor) on A computational complexity argument for many worlds · 2024-08-15T21:51:09.912Z · LW · GW

Seems like a general issue with Bayesian probabilities? Like, I'm making a argument at >1000:1 odds ratio, it's not meant to be 100%.

Comment by jessicata (jessica.liu.taylor) on A computational complexity argument for many worlds · 2024-08-15T20:41:36.535Z · LW · GW

I see why branch splitting would lead to being towards end of universe, but the hypothesis keeps getting strong evidence against it as life goes on. There might be something more like the same number of "branches" running at all times (not sharing computation), plus Bostrom's idea of duplication increasing anthropic measure.

Comment by jessicata (jessica.liu.taylor) on A computational complexity argument for many worlds · 2024-08-14T23:16:29.383Z · LW · GW

Yes

User info

Posts

Comments