A review of "Don’t forget the boundary problem..." 2024-02-08T23:19:49.786Z
2023 in AI predictions 2024-01-01T05:23:42.514Z
A case for AI alignment being difficult 2023-12-31T19:55:26.130Z
Scaling laws for dominant assurance contracts 2023-11-28T23:11:07.631Z
Moral Reality Check (a short story) 2023-11-26T05:03:18.254Z
Non-superintelligent paperclip maximizers are normal 2023-10-10T00:29:53.072Z
A Proof of Löb's Theorem using Computability Theory 2023-08-16T18:57:41.048Z
SSA rejects anthropic shadow, too 2023-07-27T17:25:17.728Z
A review of Principia Qualia 2023-07-12T18:38:52.283Z
Hell is Game Theory Folk Theorems 2023-05-01T03:16:03.247Z
A short conceptual explainer of Immanuel Kant's Critique of Pure Reason 2022-06-03T01:06:32.394Z
A method of writing content easily with little anxiety 2022-04-08T22:11:47.298Z
Occupational Infohazards 2021-12-18T20:56:47.978Z
"Infohazard" is a predominantly conflict-theoretic concept 2021-12-02T17:54:26.182Z
Selfishness, preference falsification, and AI alignment 2021-10-28T00:16:47.051Z
My experience at and around MIRI and CFAR (inspired by Zoe Curzi's writeup of experiences at Leverage) 2021-10-16T21:28:12.427Z
Many-worlds versus discrete knowledge 2020-08-13T18:35:53.442Z
Modeling naturalized decision problems in linear logic 2020-05-06T00:15:15.400Z
Topological metaphysics: relating point-set topology and locale theory 2020-05-01T03:57:11.899Z
Two Alternatives to Logical Counterfactuals 2020-04-01T09:48:29.619Z
The absurdity of un-referenceable entities 2020-03-14T17:40:37.750Z
Puzzles for Physicalists 2020-03-12T01:37:13.353Z
A conversation on theory of mind, subjectivity, and objectivity 2020-03-10T04:59:23.266Z
Subjective implication decision theory in critical agentialism 2020-03-05T23:30:42.694Z
A critical agential account of free will, causation, and physics 2020-03-05T07:57:38.193Z
On the falsifiability of hypercomputation, part 2: finite input streams 2020-02-17T03:51:57.238Z
On the falsifiability of hypercomputation 2020-02-07T08:16:07.268Z
Philosophical self-ratification 2020-02-03T22:48:46.985Z
High-precision claims may be refuted without being replaced with other high-precision claims 2020-01-30T23:08:33.792Z
On hiding the source of knowledge 2020-01-26T02:48:51.310Z
On the ontological development of consciousness 2020-01-25T05:56:43.244Z
Is requires ought 2019-10-28T02:36:43.196Z
Metaphorical extensions and conceptual figure-ground inversions 2019-07-24T06:21:54.487Z
Dialogue on Appeals to Consequences 2019-07-18T02:34:52.497Z
Why artificial optimism? 2019-07-15T21:41:24.223Z
The AI Timelines Scam 2019-07-11T02:52:58.917Z
Self-consciousness wants to make everything about itself 2019-07-03T01:44:41.204Z
Writing children's picture books 2019-06-25T21:43:45.578Z
Conditional revealed preference 2019-04-16T19:16:55.396Z
Boundaries enable positive material-informational feedback loops 2018-12-22T02:46:48.938Z
Act of Charity 2018-11-17T05:19:20.786Z
EDT solves 5 and 10 with conditional oracles 2018-09-30T07:57:35.136Z
Reducing collective rationality to individual optimization in common-payoff games using MCMC 2018-08-20T00:51:29.499Z
Buridan's ass in coordination games 2018-07-16T02:51:30.561Z
Decision theory and zero-sum game theory, NP and PSPACE 2018-05-24T08:03:18.721Z
In the presence of disinformation, collective epistemology requires local modeling 2017-12-15T09:54:09.543Z
Autopoietic systems and difficulty of AGI alignment 2017-08-20T01:05:10.000Z
Current thoughts on Paul Christano's research agenda 2017-07-16T21:08:47.000Z
Why I am not currently working on the AAMLS agenda 2017-06-01T17:57:24.000Z
A correlated analogue of reflective oracles 2017-05-07T07:00:38.000Z


Comment by jessicata (jessica.liu.taylor) on Why Two Valid Answers Approach is not Enough for Sleeping Beauty · 2024-02-08T00:59:21.003Z · LW · GW

All you need is to construct an appropriate probability space and use basic probability theory instead of inventing clever reasons why it doesn’t apply in this particular case.

I don't see how to do that but maybe your plan is to get to that at some point

Am I missing something? How is it at all controversial?

it's not, it's just a modification on the usual halfer argument that "you don't learn anything upon waking up"

Comment by jessicata (jessica.liu.taylor) on Why Two Valid Answers Approach is not Enough for Sleeping Beauty · 2024-02-07T02:27:49.105Z · LW · GW
  • halfers have to condition on there being at least one observer in the possible world. if the coin can come up 0,1,2 at 1/3 each, and Sleeping Beauty wakes up that number of times, halfers still think the 0 outcome is 0% likely upon waking up.
  • halfers also have to construct the reference class carefully. if there are many events of people with amnesia waking up once or twice, and SSA's reference class consists of the set of awakenings from these, then SSA and SIA will agree on a 1/3 probability. this is because in a large population, about 1/3 of awakenings are in worlds where the coin came up such that there would be one awakening.
Comment by jessicata (jessica.liu.taylor) on A Shutdown Problem Proposal · 2024-01-22T02:59:01.127Z · LW · GW

I don't have a better solution right now, but one problem to note is that this agent will strongly bet that the button will be independent of the human pressing the button. So it could lose money to a different agent that thinks these are correlated, as they are.

Comment by jessicata (jessica.liu.taylor) on Scaling laws for dominant assurance contracts · 2024-01-15T02:50:39.079Z · LW · GW

Nice job with the bound! I've heard a number of people in my social sphere say very positive things about DACs so this is mainly my response to them.

Comment by jessicata (jessica.liu.taylor) on Universal Love Integration Test: Hitler · 2024-01-11T00:58:02.070Z · LW · GW

You mentioned wanting to get the game theory of love correct. Understanding a game involves understanding the situations and motives of the involved agents. So getting the game theory of love correct with respect to some agent implies understanding that agent's situation.

Comment by jessicata (jessica.liu.taylor) on Universal Love Integration Test: Hitler · 2024-01-11T00:41:16.746Z · LW · GW

This seems more like "imagining being nice to Hitler, as one could be nice to anyone" than "imagining what Hitler was in fact like and why his decisions seemed to him like the thing to do". Computing the game theoretically right strategy involves understanding different agents' situations, the kind of empathy that couldn't be confused with being a doormat, sometimes called "cognitive empathy".

I respect Sarah Constantin's attempt to understand Hitler's psychological situation.

Comment by jessicata (jessica.liu.taylor) on A case for AI alignment being difficult · 2024-01-10T23:57:08.960Z · LW · GW

If you define "human values" as "what humans would say about their values across situations", then yes, predicting "human values" is a reasonable training objective. Those just aren't really what we "want" as agents, and agentic humans would have motives not to let the future be controlled by an AI optimizing for human approval.

That's also not how I defined human values, which is based on the assumption that the human brain contains one or more expected utility maximizers. It's possible that the objectives of these maximizers are affected by socialization, but they'll be less affected by socialization than verbal statements about values, because they're harder to fake so less affected by preference falsification.

Children learn some sense of what they're supposed to say about values, but have some pre-built sense of "what to do / aim for" that's affected by evopsych and so on. It seems like there's a huge semantic problem with talking about "values" in a way that's ambiguous between "in-built evopsych-ish motives" and "things learned from culture about what to endorse", but Yudkowsky writing on complexity of value is clearly talking about stuff affected by evopsych. I think it was a semantic error for the discourse to use the term "values" rather than "preferences".

In the section on subversion I made the case that terminal values make much more difference in subversive behavior than compliant behavior.

It seems like to get at the values of approximate utility maximizers located in the brain you would need something like Goal Inference as Inverse Planning rather than just predicting behavior.

Comment by jessicata (jessica.liu.taylor) on A case for AI alignment being difficult · 2024-01-10T22:57:52.283Z · LW · GW

How would you design a task that incentivizes a system to output its true estimates of human values? We don't have ground truth for human values, because they're mind states not behaviors.

Seems easier to create incentives for things like "wash dishes without breaking them", you can just tell.

Comment by jessicata (jessica.liu.taylor) on A case for AI alignment being difficult · 2024-01-08T19:46:37.423Z · LW · GW

I'm mainly trying to communicate with people familiar with AI alignment discourse. If other people can still understand it, that's useful, but not really the main intention.

Comment by jessicata (jessica.liu.taylor) on A case for AI alignment being difficult · 2024-01-08T19:34:05.299Z · LW · GW

I do think this part is speculative. The degree of "inner alignment" to the training objective depends on the details.

Partly the degree to which "try to model the world well" leads to real-world agency depends on the details of this objective. For example, doing a scientific experiment would result in understanding the world better, and if there's RL training towards "better understand the world", that could propagate to intending to carry out experiments that increase understanding of the world, which is a real-world objective.

If, instead, the AI's dataset is fixed and it's trying to find a good compression of it, that's less directly a real-world objective. However, depending on the training objective, the AI might get a reward from thinking certain thoughts that would result in discovering something about how to compress the dataset better. This would be "consequentialism" at least within a limited, computational domain.

An overall reason for thinking it's at least uncertain whether AIs that model the world would care about it is that an AI that did care about the world would, as an instrumental goal, compliantly solve its training problems and some test problems (before it has the capacity for a treacherous turn). So, good short-term performance doesn't by itself say much about goal-directed behavior in generalizations.

The distribution of goals with respect to generalization, therefore, depends on things like which mind-designs are easier to find by the search/optimization algorithm. It seems pretty uncertain to me whether agents with general goals might be "simpler" than agents with task-specific goals (it probably depends on the task), therefore easier to find while getting ~equivalent performance. I do think that gradient descent is relatively more likely to find inner-aligned agents (with task-specific goals), because the internal parts are gradient descended towards task performance, it's not just a black box search.

Yudkowsky mentions evolution as an argument that inner alignment can't be assumed. I think there are quite a lot of dis-analogies between evolution and ML, but the general point that some training processes result in agents whose goals aren't aligned with the training objective holds. I think, in particular, supervised learning systems like LLMs are unlikely to exhibit this, as explained in the section on myopic agents.

Comment by jessicata (jessica.liu.taylor) on 2023 in AI predictions · 2024-01-04T02:32:24.152Z · LW · GW

I tested it on 3 held-out problems and it got 1/3. Significant progress, increases the chance these can be solved with prompting. So partially it's a question of if any major LLMs incorporate better auto prompting.

Comment by jessicata (jessica.liu.taylor) on 2023 in AI predictions · 2024-01-04T02:19:52.620Z · LW · GW

Nice prompt! It solved the 3 x 3 problem too.

Comment by jessicata (jessica.liu.taylor) on A case for AI alignment being difficult · 2024-01-03T21:53:52.894Z · LW · GW

There are evolutionary priors for what to be afraid of but some of it is learned. I've heard children don't start out fearing snakes but will easily learn to if they see other people afraid of them, whereas the same is not true for flowers (sorry, can't find a ref, but this article discusses the general topic). Fear of heights might be innate but toddlers seem pretty bad at not falling down stairs. Mountain climbers have to be using mainly mechanical reasoning to figure out which heights are actually dangerous. It seems not hard to learn the way in which heights are dangerous if you understand the mechanics required to walk and traverse stairs and so on.

Instincts like curiosity are more helpful at the beginning of life, over time they can be learned as instrumental goals. If an AI learns advanced metacognitive strategies instead of innate curiosity that's not obviously a big problem from a human values perspective but it's unclear.

Comment by jessicata (jessica.liu.taylor) on A case for AI alignment being difficult · 2024-01-03T07:51:05.802Z · LW · GW

Most civilizations in the past have had "bad values" by our standards. People have been in preference falsification equilibria where they feel like they have to endorse certain values or face social censure. They probably still are falsifying preferences and our civilizational values are probably still bad. E.g. high incidence of people right now saying they're traumatized. CEV probably tends more towards the values of untraumatized than traumatized humans, even from a somewhat traumatized starting point.

The idea that civilization is "oppressive" and some societies have fewer problems points to value drift that has already happened. The Roman empire was really, really bad and has influenced future societies due to Christianity and so on. Civilizations have become powerful partly through military mobilization. Civilizations can be nice to live in in various ways, but that mostly has to do with greater satisfaction of instrumental values.

Some of the value drift might not be worth undoing, e.g. value drift towards caring more about far-away people than humans naturally would.

Comment by jessicata (jessica.liu.taylor) on AI Is Not Software · 2024-01-02T21:53:42.275Z · LW · GW

Seems like an issue of code/data segmentation. Programs can contain compile time constants, and you could turn a neural network into a program that has compile time constants for the weights, perhaps "distilling" it to reduce the total size, perhaps even binarizing it.

Arguably, video games aren't entirely software by this standard, because they use image assets.

Formally segmenting "code" from "data" is famously hard because "code as data" is how compilers work and "data as code" is how interpreters work. Some AI techniques involve program synthesis.

I think the relevant issue is copyright more than the code/data distinction? Since code can be copyrighted too.

Comment by jessicata (jessica.liu.taylor) on 2023 in AI predictions · 2024-01-02T21:45:31.633Z · LW · GW

I think it's hard because it requires some planning and puzzle solving in a new, somewhat complex environment. The AI results on Montezuma's Revenge seem pretty unimpressive to me because they're going to a new room, trying random stuff until they make progress, then "remembering" that for future runs. Which means they need quite a lot of training data.

For short term RL given lots of feedback, there are already decent results e.g. in starcraft and DOTA. So the difficulty is more figuring out how to automatically scope out narrow RL problems that can be learned without too much training time.

Comment by jessicata (jessica.liu.taylor) on A case for AI alignment being difficult · 2024-01-02T20:50:17.254Z · LW · GW

From a within-lifetime perspective, getting bored is instrumentally useful for doing "exploration" that results in finding useful things to do, which can be economically useful, be effective signalling of capacity, build social connection, etc. Curiosity is partially innate but it's also probably partially learned. I guess that's not super different from pain avoidance. But anyway, I don't worry about an AI that fails to get bored, but is otherwise basically similar to humans, taking over, because not getting bored would result in being ineffective at accomplishing open-ended things.

Comment by jessicata (jessica.liu.taylor) on A case for AI alignment being difficult · 2024-01-02T20:47:05.304Z · LW · GW

I did mention LLMs as myopic agents.

If they actually simulate humans it seems like maybe legacy humans get outcompeted by simulated humans. I'm not sure that's worse than what humans expected without technological transcendence (normal death, getting replaced by children and eventually conquering civilizations, etc). Assuming the LLMs that simulate humans well are moral patients (see anti zombie arguments).

It's still not as good as could be achieved in principle. Seems like having the equivalent of "legal principles" that get used as training feedback could help. Plus direct human feedback. Maybe the system gets subverted eventually but the problem of humans getting replaced by em-like AIs is mostly a short term one of current humans being unhappy about that.

Comment by jessicata (jessica.liu.taylor) on SSA rejects anthropic shadow, too · 2024-01-02T20:43:06.923Z · LW · GW

Yeah, that's a good reference.

Comment by jessicata (jessica.liu.taylor) on 2023 in AI predictions · 2024-01-02T20:42:03.963Z · LW · GW

Thanks, added.

Comment by jessicata (jessica.liu.taylor) on A case for AI alignment being difficult · 2024-01-02T20:37:46.752Z · LW · GW

I think use of AI tools could have similar results to human cognitive enhancement, which I expect to basically be helpful. They'll have more problems with things that are enhanced by stuff like "bigger brain size" rather than "faster thought" and "reducing entropic error rates / wisdom of the crowds" because they're trained on humans. One can in general expect more success on this sort of thing by having an idea of what problem is even being solved. There's a lot of stuff that happens in philosophy departments that isn't best explained by "solving the problem" (which is under-defined anyway) and could be explained by motives like "building connections", "getting funding", "being on the good side of powerful political coalitions", etc. So psychology/sociology of philosophy seems like an approach to understand what is even being done when humans say they're trying to solve philosophy problems.

Comment by jessicata (jessica.liu.taylor) on A case for AI alignment being difficult · 2024-01-02T20:33:06.396Z · LW · GW

I meant to say I'd be relatively good at it, I think it would be hard to find 20 people who are better than me at this sort of thing. The original ITT was about simulating "a libertarian" rather than "a particular libertarian", so emulating Yudkowsky specifically is a difficulty increase that would have to be compensated for. I think replicating writing style isn't the main issue, replicating the substance of arguments is, which is unfortunately harder to test. This post wasn't meant to do this, as I said.

I'm also not sure in particular what about the Yudkowskian AI risk models you think I don't understand. I disagree in places but that's not evidence of not understanding them.

Comment by jessicata (jessica.liu.taylor) on A case for AI alignment being difficult · 2024-01-02T20:30:00.755Z · LW · GW

I'm defining "values" as what approximate expected utility optimizers in the human brain want. Maybe "wants" is a better word. People falsify their preferences and in those cases it seems more normative to go with internal optimizer preferences.

Re indexicality, this is an "the AI knows but does not care" issue, it's about specifying it not about there being some AI module somewhere that "knows" it. If AGI were generated partially from humans understanding how to encode indexical goals that would be a different situation.

Re treacherous turns, I agreed that myopic agents don't have this issue to nearly the extent that long-term real-world optimizing agents do. It depends how the AGI is selected. If it's selected by "getting good performance according to a human evaluator in the real world" then at some capability level AGIs that "want" that will be selected more.

Comment by jessicata (jessica.liu.taylor) on A case for AI alignment being difficult · 2024-01-02T08:08:25.441Z · LW · GW

They would approximate human agency at the limit but there's both the issue of how fast they approach the limit and the degree to which they have independent agency rather than replicating human agency. There are fewer deceptive alignment problems if the long term agency they have is just an approximation of human agency.

Mostly I don't think there's much of an alignment problem for LLMs because they basically approximate human-like agency, but they aren't approaching autopoiesis, they'll lead to some state transition that is kind of like human enhancement and kind of like invention of new tools. There are eventually capability gains by modeling things using a different, better set of concepts and agent substrate than humans have, it's just that the best current methods heavily rely on human concepts.

I don't understand what you think the pressing concerns with LLM alignment are. It seems like Paul Christiano type methods would basically work for them. They don't have a fundamentally different set of concepts and type of long-term agency from humans, so humans thinking long enough to evaluate LLMs with the help of other LLMs, in order to generate RL signals and imitation targets, seems sufficient.

Comment by jessicata (jessica.liu.taylor) on 2023 in AI predictions · 2024-01-02T06:24:02.207Z · LW · GW

Ok, I added this prediction.

Comment by jessicata (jessica.liu.taylor) on 2023 in AI predictions · 2024-01-02T05:06:02.705Z · LW · GW

Do you know if Andrew Ng or Yann LeCun has made a specific prediction that AGI won't arrive by some date? Couldn't find it through a quick search. Idk what others to include.

Comment by jessicata (jessica.liu.taylor) on A case for AI alignment being difficult · 2024-01-02T04:00:39.234Z · LW · GW

I'm assuming the relevant values are the optimizer ones not what people say. I discussed social institutions, including those encouraging people to endorse and optimize for common values, in the section on subversion.

Alignment with a human other than yourself could be a problem because people are to some degree selfish and, to a smaller degree, have different general principles/aesthetics about how things should be. So some sort of incentive optimization / social choice theory / etc might help. But at least there's significant overlap between different humans' values. Though, there's a pretty big existing problem of people dying, the default was already that current people would be replaced by other people.

Comment by jessicata (jessica.liu.taylor) on A case for AI alignment being difficult · 2024-01-02T00:41:31.963Z · LW · GW

To the extent people now don't care about the long-term future there isn't much to do in terms of long-term alignment. People right now who care about what happens 2000 years from now probably have roughly similar preferences to people 1000 years from now who aren't significantly biologically changed or cognitively enhanced, because some component of what people care about is biological.

I'm not saying it would be random so much as not very dependent on the original history of humans used to train early AGI iterations. It would have different data history but part of that is because of different measurements, e.g. scientific measuring tools. Different ontology means that value laden things people might care about like "having good relationships with other humans" are not meaningful things to future AIs in terms of their world model, not something they would care much by default (they aren't even modeling the world in those terms), and it would be hard to encode a utility function so they care about it despite the ontological difference.

Comment by jessicata (jessica.liu.taylor) on 2023 in AI predictions · 2024-01-02T00:35:32.443Z · LW · GW

Beat Ocarina of Time with <100 hours of playing Zelda games during training or deployment (but perhaps training on other games), no reading guides/walkthroughs/playthroughs, no severe bug exploits (those that would cut down the required time by a lot), no reward-shaping/advice specific to this game generated by humans who know non-trivial things about the game (but the agent can shape its own reward). Including LLM coding a program to do it. I'd say probably not by 2033.

Comment by jessicata (jessica.liu.taylor) on A case for AI alignment being difficult · 2024-01-01T23:52:32.285Z · LW · GW

I think it's possible human values depend on life history too, but that seems to add additional complexity and make alignment harder. If the effects of life history very much dominate those of evolutionary history, then maybe neglecting evolutionary history would be more acceptable, making the problem easier.

But I don't think default AGI would be especially path dependent on human collective life history. Human society changes over time as humans supersede old cultures (see section on subversion). AGI would be a much bigger shift than the normal societal shifts and so would drift from human culture more rapidly. Partially due to different conceptual ontology and so on. The legacy concepts of humans would be a pretty inefficient system for AGIs to keep using. Like how scientists aren't alchemists anymore, but a bigger shift than that.

(Note, LLMs still rely a lot on human concepts rather than having independent ontology and agency, so this is more about future AI systems)

Comment by jessicata (jessica.liu.taylor) on 2023 in AI predictions · 2024-01-01T23:44:24.130Z · LW · GW

I think they will remain hard by EOY 2024, as in, of this problem and the 7 held-out ones of similar difficulty, the best LLM will probably not solve 4/8.

I think I would update some on how fast LLMs are advancing but these are not inherently very hard problems so I don't think it would be a huge surprise, this was meant to be one of the easiest things they fail at right now. Maybe if that happens I would think things are going 1.6x as fast short term as I would have otherwise thought?

I was surprised by GPT3/3.5 but not so much by 4, I think it adds up to on net an update that LLMs are advancing faster than I thought, but I haven't much changed my long-term AGI timelines, because I think that will involve lots of techs not just LLMs, although LLM progress is some update about general tech progress.

Comment by jessicata (jessica.liu.taylor) on 2023 in AI predictions · 2024-01-01T19:08:33.476Z · LW · GW

I've added 6 more held-out problems for a total of 7. Agree that getting the answer without pointing out problems is the right standard.

Comment by jessicata (jessica.liu.taylor) on 2023 in AI predictions · 2024-01-01T18:29:41.881Z · LW · GW


Comment by jessicata (jessica.liu.taylor) on 2023 in AI predictions · 2024-01-01T17:58:26.297Z · LW · GW

Here's the harder problem. I've also held out a third problem without posting it online.

harder problem

Comment by jessicata (jessica.liu.taylor) on 2023 in AI predictions · 2024-01-01T17:45:38.358Z · LW · GW

Maybe those don't stick out to me because long timelines seems like the default hypothesis to me, and there's a lot of people stating specific, falsifiable short timelines predictions locally so there's a selection effect. I added Brian Chau and Robin Hanson to the list though, not sure who else (other than me) has made specific long timelines predictions who would be good to add. Would like to add people like Yann LeCun and Andrew Ng if there are specific falsifiable predictions they made.

Comment by jessicata (jessica.liu.taylor) on 2023 in AI predictions · 2024-01-01T08:42:54.108Z · LW · GW

I've written about the anthropic question. Appreciate the update!

Comment by jessicata (jessica.liu.taylor) on A case for AI alignment being difficult · 2024-01-01T02:55:19.755Z · LW · GW

Something approximating utility function optimization over partial world configurations. What scope of world configuration space is optimized by effective systems depends on the scope of the task. For something like space exploration, the scope of the task is such that accomplishing it requires making trade-offs over a large sub-set of the world, and efficient ways of making these trade-offs are parametrized by utility function over this sub-set.

What time-scale and spatial scope the "pick thoughts in your head" optimization is over depends on what scope is necessary for solving the problem. Some problems like space exploration have a necessarily high time and space scope. Proving hard theorems has a smaller spatial scope (perhaps ~none) but a higher temporal scope. Although, to the extent the distribution over theorems to be proven depends on the real world, having a model of the world might help prove them better.

Depending on how the problem-solving system is found, it might be that the easily-findable systems that solve the problem distribution sufficiently well will not only model the world but care about it, because the general consequentalist algorithms that do planning cognition to solve the problem would also plan about the world. This of course depends on the method for finding problem-solving systems, but one could imagine doing hill climbing over ways of wiring together a number of modules that include optimization and world-modeling modules, and easily-findable configurations that solve the problem well might solve it by deploying general-purpose consequentialist optimization on the world model (as I said, many possible long-term goals lead to short-term compliant problem solving as an instrumental strategy).

Again, this is relatively speculative, and depends on the AI paradigm and problem formulation. It's probably less of a problem for ML-based systems because the cognition of an ML system is aggressively gradient descended to be effective at solving the problem distribution.

The problem is somewhat intensified in cases where the problem relates to already-existing long-term agents such as in the case of predicting or optimizing with respect to humans, because the system at some capability level would simulate the external long-term optimizer. However, it's unclear how much this would constitute creation of an agent with different goals from humans.

Comment by jessicata (jessica.liu.taylor) on A case for AI alignment being difficult · 2023-12-31T22:19:15.184Z · LW · GW

Thanks, fixed. I believe Yudkowsky is the right spelling though.

Comment by jessicata (jessica.liu.taylor) on Would you have a baby in 2024? · 2023-12-25T20:36:20.304Z · LW · GW

Relevant skills for an AI economy would include mathematics, programming, ML, web development, etc.

It's hard to extrapolate out that far, but AI still has a lot of trouble with robotics (e.g. we don't have good dish washing household robots). So there will probably be e.g. construction jobs for a while. AI is helpful for programming but using AI to program relies on a lot of human support; I doubt programming will be entirely automated in 30 years. AI tends to have trouble with contextualized, embodied/embedded problems; it's better at decontextualized, schoolwork-like problems. For example if you're doing sales you need to manage a set of relationships whose data is gathered over a lot of contexts, mostly not recorded, and AI is going to have more trouble with parsing that context into something a transformer can operate on and give a good response to. Self-driving is an example of an embedded, though low-context, problem and progress on that has been slower than expected, although due to all the data from electric cars it's possible to train a transformer to imitate humans using that data.

Comment by jessicata (jessica.liu.taylor) on Would you have a baby in 2024? · 2023-12-25T20:23:18.541Z · LW · GW

Oh, I thought this was mainly about x risk, especially due to the Yudkowsky reference. On the other points I think they're not a huge change either. If you predict the economy will have lots of AI in the future then you can give your child an advantage by training them in relevant skills. Also, many jobs like service jobs are likely to be around, there are lots of things AI has trouble with or which humans generally prefer humans to do. AI would increase material productivity and that would be expected to decrease cost of living as well. See Yudkowsky's post on AI unemployment.

Regarding international conflict, I haven't seen a convincing model laid out for how AI would make international conflict worse. Drone warfare is a possibility, but would tend to concentrate military power in technical countries such as Taiwan, UK, USA, and Israel. I don't know where OP lives but I don't see how it would make things worse for USA/UK children. Drones would be expected to have a better civilian casualty ratio than other methods like conventional explosives, nukes, or bio-weapons.

Comment by jessicata (jessica.liu.taylor) on Would you have a baby in 2024? · 2023-12-25T19:52:58.295Z · LW · GW

The post is phrased as "do you think it's a good idea to have kids given timelines?". I've said why I'm not convinced timelines should be relevant to having kids. I think if people are getting their views by copying Eliezer Yudkowsky and copying people who copy his views (which I'm not sure if OP is doing) then they should get better epistemology.

Comment by jessicata (jessica.liu.taylor) on Would you have a baby in 2024? · 2023-12-25T19:39:47.218Z · LW · GW

From 2018, AI timelines section of

Modeling AI progress through insights

We assembled a list of major technical insights in the history of progress in AI and metadata on the discoverer(s) of each insight.

Based on this dataset, we developed an interactive model that calculates the time it would take to reach the cumulation of all AI research, based on a guess at what percentage of AI discoveries have been made.

AI Insights dataset: data (json file), schema

Feasibility of Training an AGI using Deep Reinforcement Learning: A Very Rough Estimate

Several months ago, we were presented with a scenario for how artificial general intelligence (AGI) may be achieved in the near future. We found the approach surprising, so we attempted to produce a rough model to investigate its feasibility. The document presents the model and its conclusions.

The usual cliches about the folly of trying to predict the future go without saying and this shouldn't be treated as a rigorous estimate, but hopefully it can give a loose, rough sense of some of the relevant quantities involved. The notebook and the data used for it can be found in the Median Group numbers GitHub repo if the reader is interested in using different quantities or changing the structure of the model.

[Download PDF]( of Training an AGI using Deep Reinforcement Learning, A Very Rough Estimate.pdf)

(note: second has a hard-to-estimate "real life vs alphago" difficulty parameter that the result is somewhat dependent on, although this parameter can be adjusted in the model)

I recommend articles (not by me) Why I am not an AI doomer, Diminishing Returns in Machine Learning.

Comment by jessicata (jessica.liu.taylor) on Would you have a baby in 2024? · 2023-12-25T06:57:42.679Z · LW · GW

You're providing no evidence that superintelligence is likely in the next 30 years other than a Yudkowsky tweet. I expect that 30 years later we will not have superintelligence (of the sort that can build the stack to run itself on, growing at a fast rate, taking over the solar system etc).

Comment by jessicata (jessica.liu.taylor) on 2022 (and All Time) Posts by Pingback Count · 2023-12-17T17:48:44.752Z · LW · GW

I have to look for a while before finding any non-AI posts. Seems LW is mainly an AI / alignment discussion forum at this point.

Comment by jessicata (jessica.liu.taylor) on Is being sexy for your homies? · 2023-12-14T20:48:05.966Z · LW · GW

In particular, most of them (and all of the post-op) can’t make babies with anyone.

I think sterilizing yourself when sperm storage is as easy as it is is in general a bad idea because sometimes people want kids later. I've tried encouraging this among more people getting sterilized. It shocks me that this isn't standard protocol.

Comment by jessicata (jessica.liu.taylor) on Is being sexy for your homies? · 2023-12-14T20:42:20.958Z · LW · GW

"Some research suggests that the average person has twice as many female ancestors as male ancestors"

Comment by jessicata (jessica.liu.taylor) on Is being sexy for your homies? · 2023-12-13T22:10:12.501Z · LW · GW

Often yes. But within-sex coalitions can be "cartel-like" in a way that is dysgenic, analogous to how economic cartels reduce economic competition. Within-sex coalitions are often more about military than evolutionary fitness. For instance, consider this quote from the Futurist Manifesto:

We want to glorify war - the only cure for the world - militarism, patriotism, the destructive gesture of the anarchists, the beautiful ideas which kill, and contempt for woman.

The pattern of enforced monogamy is also a product of within-sex coalitions; it reduces the level of competition among males compared to the evolutionary average.

Comment by jessicata (jessica.liu.taylor) on Is being sexy for your homies? · 2023-12-13T21:22:34.135Z · LW · GW

Male K pop stars are an example of men typically considered "unmasculine" by other men but who tend to be attractive to women. I think the women are just right here, and K pop stars are more ideally hetero-masculine than the muscly guys who are actually less attractive to women. It's more conceptually appropriate to define ideal hetero-masculinity in terms of appeal to women and vice versa. (The men could in theory be correct if they admitted to latent homosexuality, but they won't!)

It's natural and basically eugenic (in the literal sense, not the political sense) for straight men to compete with each other for women and vice versa, with the competition between straight men being more intense than for straight women. The patriarchal pattern of men coordinating with each other to split the women more evenly may be "culture" but that doesn't make it good.

Comment by jessicata (jessica.liu.taylor) on Some biases and selection effects in AI risk discourse · 2023-12-13T07:05:18.421Z · LW · GW

Note that beyond not-being-mentioned, such arguments are also anthropically filtered against: in worlds where such arguments have been out there for longer, we died a lot quicker, so we’re not there to observe those arguments having been made.

This anthropic analysis doesn't take into account past observers (see this post).

Comment by jessicata (jessica.liu.taylor) on Principles For Product Liability (With Application To AI) · 2023-12-10T23:41:05.939Z · LW · GW

Wouldn't the individual developers on the project be personally liable if they didn't do it through a LLC?