Posts

Why keep a diary, and why wish for large language models 2024-06-14T16:10:07.658Z
AXRP Episode 33 - RLHF Problems with Scott Emmons 2024-06-12T03:30:05.747Z
AXRP Episode 32 - Understanding Agency with Jan Kulveit 2024-05-30T03:50:05.289Z
AXRP Episode 31 - Singular Learning Theory with Daniel Murfet 2024-05-07T03:50:05.001Z
AXRP Episode 30 - AI Security with Jeffrey Ladish 2024-05-01T02:50:04.621Z
AXRP Episode 29 - Science of Deep Learning with Vikrant Varma 2024-04-25T19:10:06.063Z
Bayesian inference without priors 2024-04-24T23:50:08.312Z
AXRP Episode 28 - Suing Labs for AI Risk with Gabriel Weil 2024-04-17T21:42:46.992Z
AXRP Episode 27 - AI Control with Buck Shlegeris and Ryan Greenblatt 2024-04-11T21:30:04.244Z
Daniel Kahneman has died 2024-03-27T15:59:14.517Z
Superforecasting the Origins of the Covid-19 Pandemic 2024-03-12T19:01:15.914Z
Common Philosophical Mistakes, according to Joe Schmid [videos] 2024-03-03T00:15:47.899Z
11 diceware words is enough 2024-02-15T00:13:43.420Z
Most experts believe COVID-19 was probably not a lab leak 2024-02-02T19:28:00.319Z
n of m ring signatures 2023-12-04T20:00:06.580Z
AXRP Episode 26 - AI Governance with Elizabeth Seger 2023-11-26T23:00:04.916Z
How to type Aleksander Mądry's last name in LaTeX 2023-11-21T00:50:07.189Z
Aaron Silverbook on anti-cavity bacteria 2023-11-20T03:06:19.524Z
If a little is good, is more better? 2023-11-04T07:10:05.943Z
On Frequentism and Bayesian Dogma 2023-10-15T22:23:10.747Z
AXRP Episode 25 - Cooperative AI with Caspar Oesterheld 2023-10-03T21:50:07.552Z
Watermarking considered overrated? 2023-07-31T21:36:05.268Z
AXRP Episode 24 - Superalignment with Jan Leike 2023-07-27T04:00:02.106Z
AXRP Episode 23 - Mechanistic Anomaly Detection with Mark Xu 2023-07-27T01:50:02.808Z
AXRP announcement: Survey, Store Closing, Patreon 2023-06-28T23:40:02.537Z
AXRP Episode 22 - Shard Theory with Quintin Pope 2023-06-15T19:00:01.340Z
[Linkpost] Interpretability Dreams 2023-05-24T21:08:17.254Z
Difficulties in making powerful aligned AI 2023-05-14T20:50:05.304Z
AXRP Episode 21 - Interpretability for Engineers with Stephen Casper 2023-05-02T00:50:07.045Z
Podcast with Divia Eden and Ronny Fernandez on the strong orthogonality thesis 2023-04-28T01:30:45.681Z
AXRP Episode 20 - ‘Reform’ AI Alignment with Scott Aaronson 2023-04-12T21:30:06.929Z
[Link] A community alert about Ziz 2023-02-24T00:06:00.027Z
Video/animation: Neel Nanda explains what mechanistic interpretability is 2023-02-22T22:42:45.054Z
[linkpost] Better Without AI 2023-02-14T17:30:53.043Z
AXRP: Store, Patreon, Video 2023-02-07T04:50:05.409Z
Podcast with Oli Habryka on LessWrong / Lightcone Infrastructure 2023-02-05T02:52:06.632Z
AXRP Episode 19 - Mechanistic Interpretability with Neel Nanda 2023-02-04T03:00:11.144Z
First Three Episodes of The Filan Cabinet 2023-01-18T19:20:06.588Z
Podcast with Divia Eden on operant conditioning 2023-01-15T02:44:29.706Z
On Blogging and Podcasting 2023-01-09T00:40:00.908Z
Things I carry almost every day, as of late December 2022 2022-12-30T07:40:01.261Z
Announcing The Filan Cabinet 2022-12-30T03:10:00.494Z
Takeaways from a survey on AI alignment resources 2022-11-05T23:40:01.917Z
AXRP Episode 18 - Concept Extrapolation with Stuart Armstrong 2022-09-03T23:12:01.242Z
AXRP Episode 17 - Training for Very High Reliability with Daniel Ziegler 2022-08-21T23:50:20.513Z
AXRP Episode 16 - Preparing for Debate AI with Geoffrey Irving 2022-07-01T22:20:18.456Z
AXRP Episode 15 - Natural Abstractions with John Wentworth 2022-05-23T05:40:19.293Z
AXRP Episode 14 - Infra-Bayesian Physicalism with Vanessa Kosoy 2022-04-05T23:10:09.817Z
AXRP Episode 13 - First Principles of AGI Safety with Richard Ngo 2022-03-31T05:20:17.883Z
What’s the chance a smart London resident dies of a Russian nuke in the next month? 2022-03-10T19:20:01.434Z

Comments

Comment by DanielFilan on Towards more cooperative AI safety strategies · 2024-07-22T23:51:25.611Z · LW · GW

[this comment is irrelevant to the point you actually care about and is just nit-picking about the analogy]

There is a pretty big divide between "liberal" and "conservative" Christianity that is in some ways bigger than the divide between different denominations. In the US, people who think of themselves as "Episcopalians" tend to be more liberal than people who call themselves "Baptists". In the rest of this comment, I'm going to assume we're talking about conservative Anglicans rather than Episcopalians (those terms referring to the same denominational family), and also about conservative Baptists, since they're more likely to be up to stuff / doing meaningful advocacy, and more likely to care about denominational distinctions. That said, liberal Episcopalians and liberal Baptists are much more likely to get along, and also openly talk about how they're in cooperation.

My guess is that conservative Anglicans and Baptists don't spend much time at each other's churches, at least during worship, given that they tend to have very different types of services and very different views about the point of worship (specifically about the role of the eucharist). Also there's a decent chance they don't allow each other to commune at their church (more likely on the Baptist end). Similarly, I don't think they are going to have that much social overlap, altho I could be wrong here. There's a good chance they read many of the same blogs tho.

In terms of policy advocacy, on the current margin they are going to mostly agree - common goals are going to be stuff like banning abortion, banning gay marriage, and ending the practice of gender transition. Anglican groups are going to be more comfortable with forms of state Christianity than Baptists are, altho this is lower-priority for both, I think. They are going to advocate for their preferred policies in part by denominational policy bodies, but also by joining common-cause advocacy organizations.

Both Anglican and Baptist churches are largely going to be funded by members, and their members are going to be disjoint. That said it's possible that their policy bodies will share large donor bases.

They are also organized pretty differently internally: Anglicans have a very hierarchical structure, and Baptists having a very decentralized structure (each congregation is its own democratic policy, and is able to e.g. vote to remove the pastor and hire a new one)

Anyway: I'm pretty sympathetic to the claim of conservative Anglicans and Baptists being meaningfully distinct power bases, altho it would be misleading to not acknowledge that they're both part of a broader conservative Christian ecosystem with shared media sources, fashions, etc.

Part of the reason this analogy didn't vibe for me is that Anglicans and Baptists are about as dissimilar as Protestants can get. If it were Anglicans and Presbyterians or Baptists and Pentecostals that would make more sense, as those denominations are much more similar to each other.

Comment by DanielFilan on Arjun Panickssery's Shortform · 2024-07-20T00:32:48.788Z · LW · GW

Further updates:

  • On the one hand, Nate Silver's model now gives Trump a ~30% chance of winning in Virginia, making my side of the bet look good again.
  • On the other hand, the Economist model gives Trump a 10% chance of winning Delaware and a 20% chance of winning Illinois, which suggests that there's something going wrong with the model and that it was untrustworthy a month ago.
  • That said, betting markets currently think there's only a one in four chance that Biden is the nominee, so this bet probably won't resolve.
Comment by DanielFilan on [deleted post] 2024-07-12T22:50:07.609Z

I will spend time reading posts and papers, improving coding skills as needed to run and interpret experiments, learning math as needed for writing up proofs, talking with concept-based interpretability researchers as well as other conceptual alignment researchers

I feel like this is missing the bit where you write proofs, run and interpret experiments, etc.

Comment by DanielFilan on [deleted post] 2024-07-12T22:49:31.191Z

As a maximal goal, I might seek to test my theories about the detection of generalizable human values (like reciprocity and benevolence) by programming an alife simulation meant to test a toy-model version of agentic interaction and world-model agreement/interoperability through the fine-structure of the simulated agents.

Do you think you will be able to do this in the next 6 weeks? Might be worth scaling this down to "start a framework to test my theories" or something like that

Comment by DanielFilan on [deleted post] 2024-07-12T22:48:28.138Z

the fine-structure of the simulated agents.

what does this mean?

Comment by DanielFilan on [deleted post] 2024-07-12T22:48:07.886Z

alife

I think most people won't know what this word means

Comment by DanielFilan on [deleted post] 2024-07-12T22:47:49.686Z

I plan to stress-test and further flesh out the theory, with a minimal goal of producing a writeup presenting results I've found and examining whether the assumptions of the toy models of the original post hold up as a way of examining Natural Abstractions as an alignment plan.

I feel like this doesn't give me quite enough of an idea of what you'd be doing - like, what does "stress-testing" involve? What parts need fleshing out? 

Comment by DanielFilan on [deleted post] 2024-07-12T22:44:40.370Z

Time-bounded: Are research activities and outputs time-bounded?

  • Does the proposal include a tentative timeline of planned activities (e.g., LTFF grant, scholar symposium talk, paper submission)?
  • How might the timeline change if planned research activities are unsuccessful?

This part is kind of missing - I'm seeing a big list of stuff you could do, but not an indication of how much of it you might reasonably expect to do in the next 5 weeks. A better approach here would be to give a list of things you could do in those 5 weeks together estimates for how much time each thing could take, possibly with a side section of "here are other things I could do depending on how stuff goes"

Comment by DanielFilan on [deleted post] 2024-07-12T22:41:51.293Z

Theory of Change

I feel like this section is missing a sentence like "OK here's the thing that would be the output of my project, and here's how it would cause these good effects"

Comment by DanielFilan on [deleted post] 2024-07-12T22:40:50.745Z

[probably I also put a timeline here of stuff I have done so far?]

This is valuable for you to do so that you can get a feel for what you can do in a week, but I'm not sure it's actually that valuable to plop into the RP

Comment by DanielFilan on [deleted post] 2024-07-12T22:39:58.897Z

Less prosaically, it's not impossible that a stronger or more solidly grounded theory of semantics or of interoperable world-models might prove to be the "last missing piece" between us and AGI; that said, given that my research path primarily involves things like finding and constructing conceptual tools, writing mathematical proofs, and reasoning about bounds on accumulating errors - and not things like training new frontier models - I think the risk/dual-use-hazard of my proposed work is minimal.

I don't really understand this argument. Why wouldn't having a better theory of semantics and concepts help people build better AIs, but still do a good job of describing what's going on in smart AIs? Like, you might think the more things you know about smart AIs, the easier it would be to build them - where does this argument break?

The thing you imply here is that it's pretty different from stuff people currently do to train frontier models, but you already told me that scaling frontier models was really unlikely to lead to AGI, so why should that give me any comfort?

Comment by DanielFilan on [deleted post] 2024-07-12T22:37:34.059Z

Not only would a better theory of semantics help researchers detect objects and features which are natural to the AI, it would also help them check whether a given AI treats some feature of its environment or class of object as a natural cluster, and help researchers agree within provable bounds on what concept precisely they are targeting.

This part isn't so clear to me. Why can't I just look at what features of the world an AI represents without a theory of semantics?

Comment by DanielFilan on [deleted post] 2024-07-12T22:36:27.230Z

the next paragraph is kind of like that but making a sort of novel point so maybe they're necessary? I'd try to focus them on saying things you haven't yet said

Comment by DanielFilan on [deleted post] 2024-07-12T22:32:31.368Z

(admittedly unlikely)

why do you think it's unlikely?

Comment by DanielFilan on [deleted post] 2024-07-12T22:32:08.267Z

On one hand, arbitrary agents - or at least a large class of agents, or at least (proto-)AGIs that humans make - might turn out to simply already naturally agree with us on the features we abstract from our surroundings; a better-grounded and better-developed theory of semantics would allow us to confirm this and become more optimistic about the feasibility of alignment.

On the other, such agents might prove in general to have inner ontologies totally unrelated to our own, or perhaps only somewhat different, but in enduring and hazardous ways; a better theory of semantics would warn us of this in advance and suggest other routes to AGI or perhaps drive a total halt to development.

I feel like these two paragraphs are just fleshing out the thing you said earlier and aren't really needed

Comment by DanielFilan on [deleted post] 2024-07-12T22:31:34.295Z

inner

is this word needed?

Comment by DanielFilan on [deleted post] 2024-07-12T22:31:10.986Z

those that rely on arbitrary AGIs detecting and [settling on as natural] the same features of the world that humans do, including values and qualities important to humanity

can you give examples of such strategies, and argue that they rely on this?

Comment by DanielFilan on [deleted post] 2024-07-12T22:30:37.530Z

Natural Abstractions framework

a link here could be nice

Comment by DanielFilan on [deleted post] 2024-07-12T22:30:22.874Z

Worst of all, lacking such a theory means that we lack the framework and the language we'd need to precisely describe both human values - and how we'd check that a given system comprehends human values.

Maybe devote a sentence arguing for this claim

Comment by DanielFilan on [deleted post] 2024-07-12T22:29:56.285Z

As a result, I think that conceptual alignment will be a required direction of work towards ensuring that the advent of AGI results in a desirable future for humanity among other sapient life. In particular, my perspective as a mathematician leads me to believe that just as a lack of provable guarantees about a mathematical object means that such an object can be arbitrarily unexpectedly badly behaved on features you didn't want to specify, so too could the behavior of an underspecified or imprecisely specified AGI result in arbitrarily undesirable (or even merely pathological or self-defeating) behavior along axes we didn't think to check.

IDG how this is supposed to be related to whether scaling will work. Surely if scaling were enough, your arguments here would still go thru, right?

Comment by DanielFilan on [deleted post] 2024-07-12T22:29:01.053Z

It seems very likely (~96%) to me that scale is not in fact all that is required to go from current frontier models to AGI, such that GPT-8 (say) will still not be superintelligent and a near-perfect predictor or generator of text, just because of what largely boils down to a difference of scale and not a difference of kind or of underlying conceptual model; I consider it more likely that we'll get AGI ~30 years out but that we'll have to get alignment precisely right.

You might want to gesture at why this seems likely to you, since AFAICT this is a minority view.

Comment by DanielFilan on [deleted post] 2024-07-12T22:27:55.998Z

All the same, it's not all that surprising that conceptual alignment generally and natural abstractions/natural semantics specifically are - maybe unavoidably - underserved subfields of alignment: the model of natural semantics I'm working off of was only officially formalized in mid-June 2024.

what's the point of this sentence? would anything bad happen if you just deleted it?

Comment by DanielFilan on [deleted post] 2024-07-12T22:27:31.810Z

the model of natural semantics I'm working off of was only officially formalized in mid-June 2024.

is this why it isn't surprising that conceptual alignment is underserved, or an example of it being underserved? as written i feel like the structure implies the second, but content-wise it feels more like the first

Comment by DanielFilan on [deleted post] 2024-07-12T22:26:37.016Z

both human values

both of the human values? or should this be "both human values and how we'd check that..."?

Comment by DanielFilan on [deleted post] 2024-07-12T22:26:05.038Z

-

You use a lot of em dashes, and it's noticeable. This is a common problem in writing. I don't know a good way to deal with this, other than suggesting that you consider which ones could be footnotes, parentheses, or commas.

Comment by DanielFilan on [deleted post] 2024-07-12T22:24:50.304Z

I guess there's two things here:

  1. what sorts of things in the environment do we expect models to pick up on
  2. how do we expect models to process info from the environment

If we're wrong about 1, I feel like we could find it out. But if we make wrong assumptions about 2, it makes a bit more sense to me that we could fail to find that out.

In any case, an example indicating how we could fail would probably be useful here.

Comment by DanielFilan on [deleted post] 2024-07-12T22:23:21.479Z

cannot be even reasonably sure that the measurements taken and experiments performed are telling us what we think they are

It's not clear to me why this follows. Couldn't it be the case that even without a theory of what sorts of features we expect models to learn / use, we can detect what features they are in fact using?

Comment by DanielFilan on Habryka's Shortform Feed · 2024-06-30T21:24:15.027Z · LW · GW

FWIW I recommend editing OP to clarify this.

Comment by DanielFilan on Arjun Panickssery's Shortform · 2024-06-26T20:03:10.923Z · LW · GW

Update for posterity: Nate Silver's model gives Trump a ~1 in 6 chance of winning Virginia, making my side of this bet look bad.

Comment by DanielFilan on What 2026 looks like · 2024-06-20T22:21:30.301Z · LW · GW

FWIW, the discussion of AI-driven propaganda doesn't seem as prescient.

Comment by DanielFilan on What 2026 looks like · 2024-06-20T22:18:13.492Z · LW · GW

So [in 2024], the most compute spent on a single training run is something like 5x10^25 FLOPs.

As of June 20th 2024, this is exactly Epoch AI's central estimate of the most compute spent on a single training run, as displayed on their dashboard.

Comment by DanielFilan on Summary of Situational Awareness - The Decade Ahead · 2024-06-14T20:40:11.110Z · LW · GW

The link to the "EAF version" instead goes to the LessWrong version - should be This link

Comment by DanielFilan on If a little is good, is more better? · 2024-06-14T16:07:44.770Z · LW · GW

See also Zvi's post on More Dakka

Comment by DanielFilan on Arjun Panickssery's Shortform · 2024-06-13T23:05:19.731Z · LW · GW

Have recorded on my website

Comment by DanielFilan on Arjun Panickssery's Shortform · 2024-06-13T19:48:39.423Z · LW · GW

Could we do your $350 to my $100? And the voiding condition makes sense.

Comment by DanielFilan on Arjun Panickssery's Shortform · 2024-06-13T00:22:50.661Z · LW · GW

FWIW the polling in Virginia is pretty close - I'd put my $x against your $4x that Trump wins Virginia, for x <= 200. Offer expires in 48 hours.

Comment by DanielFilan on DanielFilan's Shortform Feed · 2024-06-11T18:17:08.868Z · LW · GW

You could but (a) it's much harder constitutionally in the US (governments can only be sued if they consent to being sued, maybe unless other governments are suing them) and (b) the reason for thinking this proposal works is modelling affected actors as profit-maximizing, which the government probably isn't.

Comment by DanielFilan on DanielFilan's Shortform Feed · 2024-06-05T04:46:18.246Z · LW · GW

Further note: this policy doesn't work to regulate government-developed AGI, which is a major drawback if you expect the government to develop AGI. It also probably lowers the relative cost for the government to develop AGI, which is a major drawback if you think the private sector would do a better job of responsible AGI development than the government.

Comment by DanielFilan on Episode: Austin vs Linch on OpenAI · 2024-05-26T06:25:28.054Z · LW · GW

Yeah, sadly AFAICT it just takes hours of human time to produce good transcripts.

Comment by DanielFilan on Episode: Austin vs Linch on OpenAI · 2024-05-26T06:24:38.238Z · LW · GW

I think I care about the video being easier to watch more than I care about missing the ums and ahs? But maybe I'm not appreciating how much umming you do.

Comment by DanielFilan on DanielFilan's Shortform Feed · 2024-05-26T04:41:38.323Z · LW · GW

Oh: it would be sad if there were a bunch of frivolous suits for this. One way to curb that without messing up optionality would be to limit such suits to large enough intermediate disasters.

Comment by DanielFilan on Episode: Austin vs Linch on OpenAI · 2024-05-25T23:10:57.040Z · LW · GW

Note that this podcast was recorded May 22, before Kelsey Piper’s expose on NDAs

I don't understand this - [Kelsey's initial expose](https://www.vox.com/future-perfect/2024/5/17/24158478/openai-departures-sam-altman-employees-chatgpt-release) was published on the 18th. Do you mean "before her second post" or something?

Comment by DanielFilan on Episode: Austin vs Linch on OpenAI · 2024-05-25T23:07:02.843Z · LW · GW

+1 that the transcript is rough. Unfortunately it's just pretty expensive to create one that's decent to read - for $1.50/min (so like $95 for this episode) you can get an OK transcript from rev.com within 24 hours, and then if you want to actually eliminate typos, you just have to go over it yourself.

I'd also encourage you to not use the descript feature that cuts out all ums and ahs - it just makes it sound super disjointed (especially when watching the video).

Comment by DanielFilan on DanielFilan's Shortform Feed · 2024-05-25T22:59:37.726Z · LW · GW

The below is the draft of a blog post I have about why I like AI doom liability. My dream is that people read it and decide "ah yes this is the main policy we will support" or "oh this is bad for a reason Daniel hasn't noticed and I'll tell him why". I think usually you're supposed to flesh out posts, but I'm not sure that adds a ton of information in this case.

Why I like AI doom liability

  • AI doom liability is my favourite approach to AI regulation. I want to sell you all on it.

  • the basic idea

    • general approach to problems: sue people for the negative impacts
      • internalizes externalities
      • means that the people figuring out how to avoid are informed and aligned (rather than bureaucrats less aware of on-the-ground conditions / trying to look good / seeking power)
      • less fucked than criminal law, regulatory law
        • look at what hits the supreme court, which stuff ends up violating people's rights the worst, what's been more persistent over human history, what causes massive protests, etc.
    • first-pass approach to AI: sue for liabilities after AI takes over
      • can't do that
    • so sue for intermediate disasters, get punitive damages for how close you were to AI takeover
      • intuition: pulling liability forward into places it can be paid, for same underlying conduct.
      • also mention strict liability, liability insurance
    • See Foom Liability (Hanson, 2023), Tort Law as a Tool for Mitigating Catastrophic Risk from Artificial Intelligence (Weil, 2024).
  • why it's nice

    • liability when you're more informed of risks, vs regulation now, when we know less
    • doesn't require the right person in the right position
      • judged by juries informed by lawyers on both sides, not power-hungry politically constrained
    • we don't really know what the right way to make safe AI is right now
    • good in high-risk worlds or low-risk worlds - as long as you believe in intermediate disasters
      • intermediate disasters seem plausible because slow takeoff
    • more fair: ai companies can't get away with making egregiously unsafe AI, but they're not penalized for doing stuff that is actually harmless.
  • difficulties with the proposal:

    • jury discretion
      • you could give the jury the optimal formula, which isn't easy to plug numbers in, and give them a bunch of discretion how to apply it
      • or you could give them a more plug-and-play formula which sort of approximates the optimal formula, making things more predictable but less theoretically optimal.
      • it's not clear how you want to trade off predictability with theoretical optimality, or what the trade-off even looks like (Hanson's post is a bit more predictable but it's unclear how predictable it actually is).
    • positive externalities
      • in a world where research produces positive externalities, it's a really bad idea to force people to internalize all negative externalities
      • one way this is clear: open source AI. tons of positive externalities - people get to use AI to do cool stuff, and you can do research on it, maybe helping you figure out how to make AI more safely.
      • this regime, without tweaks, would likely make it economically unviable to open source large SOTA models. it's unclear whether this is optimal.
      • I don't know a principled way to deal with this.
Comment by DanielFilan on Big Picture AI Safety: Introduction · 2024-05-24T18:30:37.038Z · LW · GW

Anca Dragan, who currently leads an alignment team at DeepMind, is the one I saw (I then mistakenly assumed there were others). And fair point re: academic OpenPhil grantees.

Comment by DanielFilan on Big Picture AI Safety: Introduction · 2024-05-24T06:41:34.101Z · LW · GW

OP undoubtedly funds a lot of AIS groups, but there are lots of experts who approach AIS from a different set of assumptions and worldviews.

Note that the linked paper includes a bunch of authors from AGI labs or who have received OpenPhil funding.

Comment by DanielFilan on Big Picture AI Safety: Introduction · 2024-05-23T18:01:52.455Z · LW · GW

Participants pointed to a range of mistakes they thought the AI safety movement had made. There was no consensus and the focus was quite different from person to person. The most common themes included:

  • an overreliance on overly theoretical argumentation,
  • being too insular,
  • putting people off by pushing weird or extreme views,
  • supporting the leading AGI companies resulting in race dynamics,
  • not enough independent thought,
  • advocating for an unhelpful pause to AI development,
  • and historically ignoring policy as a potential route to safety.


FWIW one thing that jumps out to me is that it feels like this list comes in two halves each complaining about the other: one that thinks AI safety should be less theoretical, less insular, less extreme, and not advocate pause; and one that thinks that it should be more independent, less connected to leading AGI companies, and more focussed on policy. They aren't strictly opposed (e.g. one could think people overrate pause but underrate policy more broadly), but I would strongly guess that the underlying people making some of these complaints are thinking of the underlying people making others.

Comment by DanielFilan on Is There Really a Child Penalty in the Long Run? · 2024-05-17T18:35:48.664Z · LW · GW

We can’t just compare women with children to those without them because having children is a choice that’s correlated with all of the outcomes we care about. So sorting out two groups of women based on observed fertility will also sort them based on income and education and marital status etc.

Successfully implanting embryos on the first try in IVF is probably not very correlated with these outcomes.

This is maybe a dumb question, but I would have imagined that successful implantation would be related to good health outcomes (based on some intiution that successful implantation represents an organ of your body functioning properly, and imagining that the higher success rates of younger people has to do with their health). Is that not true?

Comment by DanielFilan on Alexander Gietelink Oldenziel's Shortform · 2024-05-14T22:46:30.932Z · LW · GW

Links to Dan Murfet's AXRP interview:

Comment by DanielFilan on DanielFilan's Shortform Feed · 2024-05-09T03:45:56.212Z · LW · GW

Frankfurt-style counterexamples for definitions of optimization

In "Bottle Caps Aren't Optimizers", I wrote about a type of definition of optimization that says system S is optimizing for goal G iff G has a higher value than it would if S didn't exist or were randomly scrambled. I argued against these definitions by providing a examples of systems that satisfy the criterion but are not optimizers. But today, I realized that I could repurpose Frankfurt cases to get examples of optimizers that don't satisfy this criterion.

A Frankfurt case is a thought experiment designed to disprove the following intuitive principle: "a person is morally responsible for what she has done only if she could have done otherwise." Here's the basic idea: suppose Alice is considering whether or not to kill Bob. Upon consideration, she decides to do so, takes out her gun, and shoots Bob. But little-known to her, a neuroscientist had implanted a chip in her brain that would have forced her to shoot Bob if she had decided not to. That said, the chip didn't activate, because she did decide to shoot Bob. The idea is that she's morally responsible, even tho she couldn't have done otherwise.

Anyway, let's do this with optimizers. Suppose I'm playing Go, thinking about how to win - imagining what would happen if I played various moves, and playing moves that make me more likely to win. Further suppose I'm pretty good at it. You might want to say I'm optimizing my moves to win the game. But suppose that, unbeknownst to me, behind my shoulder is famed Go master Shin Jinseo. If I start playing really bad moves, or suddenly die or vanish etc, he will play my moves, and do an even better job at winning. Now, if you remove me or randomly rearrange my parts, my side is actually more likely to win the game. But that doesn't mean I'm optimizing to lose the game! So this is another way such definitions of optimizers are wrong.

That said, other definitions treat this counter-example well. E.g. I think the one given in "The ground of optimization" says that I'm optimizing to win the game (maybe only if I'm playing a weaker opponent).