Speedrunning 4 mistakes you make when your alignment strategy is based on formal proof 2023-02-16T01:13:44.847Z
Mid-Atlantic AI Alignment Alliance Unconference 2023-01-13T20:33:33.389Z
Riffing on the agent type 2022-12-08T00:19:38.054Z
Master plan spec: needs audit (logic and cooperative AI) 2022-11-30T06:10:38.775Z
What is estimational programming? Squiggle in context 2022-08-12T18:39:57.230Z
Abundance and scarcity; working forwards and working backwards 2022-02-18T19:05:38.974Z
AISC5 Retrospective: Mechanisms for Avoiding Tragedy of the Commons in Common Pool Resource Problems 2021-09-27T16:46:40.389Z
[linkpost] The Psychological Economy of Inaction by William Gillis 2021-09-12T22:04:43.991Z
What are some claims or opinions about multi-multi delegation you've seen in the memeplex that you think deserve scrutiny? 2021-06-27T17:44:52.389Z
Cliffnotes to Craft of Research parts I, II, and III 2021-06-26T14:00:53.560Z
[timeboxed exercise] write me your model of AI human-existential safety and the alignment problems in 15 minutes 2021-05-04T19:10:28.755Z
What am I fighting for? 2021-04-20T23:27:26.416Z
Could degoogling be a practice run for something more important? 2021-04-17T00:03:42.790Z
[Recruiting for a Discord Server] AI Forecasting & Threat Modeling Workshop 2021-04-10T16:15:07.420Z
Aging: A Surprisingly Tractable Problem 2021-04-08T19:25:42.287Z
Averting suffering with sentience throttlers (proposal) 2021-04-05T10:54:09.755Z
TASP Ep 3 - Optimal Policies Tend to Seek Power 2021-03-11T01:44:02.814Z
Takeaways from the Intelligence Rising RPG 2021-03-05T10:27:55.867Z
Reading recommendations on social technology: looking for the third way between technocracy and populism 2021-02-24T11:48:06.451Z
Quinn's Shortform 2021-01-16T17:52:33.020Z
Is it the case that when humans approximate backward induction they violate the markov property? 2021-01-16T16:22:21.561Z
Infodemics: with Jeremy Blackburn and Aviv Ovadya 2021-01-08T15:44:57.852Z
Chance that "AI safety basically [doesn't need] to be solved, we’ll just solve it by default unless we’re completely completely careless" 2020-12-08T21:08:47.575Z
Announcing the Technical AI Safety Podcast 2020-12-07T18:51:58.257Z
How ought I spend time? 2020-06-30T16:53:53.787Z
Have general decomposers been formalized? 2020-06-27T18:09:06.411Z
Do the best ideas float to the top? 2019-01-21T05:22:51.182Z
on wellunderstoodness 2018-12-16T07:22:19.250Z


Comment by Quinn (quinn-dougherty) on Quinn's Shortform · 2023-09-27T17:07:57.149Z · LW · GW

For the record, to mods: I waited till after petrov day to answer the poll because my first guess upon receiving a message on petrov day asking me to click something is that I'm being socially engineered. Clicking the next day felt pretty safe.

Comment by Quinn (quinn-dougherty) on Protest against Meta's irreversible proliferation (Sept 29, San Francisco) · 2023-09-21T12:10:41.173Z · LW · GW

This seems kinda fair, I'd like to clarify--- I largely trust the first few dozen people, I just expect depending on how growth/acquisition is done if there are more than a couple instances of protests to have to deal with all the values diversity underlying the different reasons for joining in. This subject seems unusually fraught in potential to generate conflationary alliance sorta things.

Overall I didn't mean to other you-- in fact, never said this publicly, but a couple months ago there was a related post of yours that got me saying "yeah we're lucky holly is on this / she seems better suited than most would be to navigate this" cuz I've been consuming your essays for years. I also did not mean to insinuate that you hadn't thought it through-- I meant to signal "here's a random guy who cares about this consideration" just as an outside vote of "hope this doesn't get triage'd out". I basically assumed you had threatmodeled interactions with different strains of populism

Comment by Quinn (quinn-dougherty) on Protest against Meta's irreversible proliferation (Sept 29, San Francisco) · 2023-09-20T22:32:25.614Z · LW · GW

In my sordid past I did plenty of "finding the three people for nuanced logical mind-changing discussions amidst a dozens of 'hey hey ho ho outgroup has got to go'", so I'll do the same here (if I'm in town), but selection effects seem deeply worrying (for example, you could go down to the soup kitchen or punk music venue and recruit all the young volunteers who are constantly sneering about how gentrifying techbros are evil and can't coordinate on whether their "unabomber is actually based" argument is ironic or unironic, but you oughtn't. The fact that this is even a question, that if you have a "mass movement" theory of change you're constantly temped to lower your standards in this way, is so intrinsically risky that no one should be comfortable that ML safety or alignment is resorting to this sort of thing). 

Comment by Quinn (quinn-dougherty) on [Link post] Michael Nielsen's "Notes on Existential Risk from Artificial Superintelligence" · 2023-09-19T20:52:43.809Z · LW · GW

Winning or losing a war kinda binary.

Will a pandemic get to my country is a matter of degree, since in principle you can have a pandemic that killed 90% of counterfactual economic activity in one country break containment but only destroy 10% in your country.

"Alignment" or "transition to TAI" of any kind is way further from "coinflip" than either of these, so if you think doomcoin is salvageable or want to defend its virtues you need way different reference classes.

Think about the ways in which winning or losing a war isn't binary-- lots of ways for implementation details of an agreement to effect your life as a citizen of one of the countries. AI is like this but even further-- all the different kinds of outcomes, how central or unilateral are important moments, which values end up being imposed on the future and at what resolution, etc. People who think "we have a doomcoin toss coming up, now argue about the p(heads)" are not gonna think about this stuff!

To me, "p(doom)" is a memetic PITA as bad as "the real unaligned AI was the corporations/calitalism", so I'm excited that you're defending it! Usually people tell me "yeah you're right it's not a defensible frame haha"

Comment by Quinn (quinn-dougherty) on Dating Roundup #1: This is Why You’re Single · 2023-09-17T00:50:13.984Z · LW · GW

height filter: I don't see anywhere about how many women use the height filter at all vs dont [1]. People being really into 6'5" seems alarming until you realize that if you're trait xyz enough to use height filters at all, you might as well go all in and use silly height filters.

  1. as a man, filters on bumble are a premium feature. Likely for price discrimination to give many premium features to women for free, though. ↩︎

Comment by Quinn (quinn-dougherty) on Defunding My Mistake · 2023-09-06T05:35:17.522Z · LW · GW

I've certainly wondered this! In spite of the ACX commenter I mentioned suggesting that we ought to reward people for being transparent about learning epistemics the hard way, I find myself not 100% sure if it's wise or savvy to trust that people won't just mark me down as like "oh, so quinn is probably prone to being gullible or sloppy" if I talk openly about my what my life was like before math coursework and the sequences.

Comment by Quinn (quinn-dougherty) on Defunding My Mistake · 2023-09-05T00:48:46.168Z · LW · GW

Yes. So much love for this post, you're a better writer than me and you're an actual public defender but otherwise I feel super connected with you haha.

It's incredibly bizarre being at least a little "early adopter" about massive 2020 memes -- '09 tumblr account activation and Brown/Garner/Grey -era BLM gave me a healthy dose of "before it was cool" hipster sneering at the people who only got into it once it was popular. This matters on lesswrong, because Musk's fox news interview referenced the "isn't it speciesist to a priori assume human's are better than paperclips" family of thought experiments -- if you're on lesswrong, you are not safe from becoming an early adopter of something that becomes very salient and popular!

Due to this (helping rats prep for their "go mainstream" moment) as well as other things (one paragraph further down), I meant to write something kinda similar to your piece actually, cuz Ben Pace pointed me at this acx commenter:

I grew up surrounded by people who believed conspiracy theories, although none of those people were my parents. And I have to say that the fact that so few people know other people who believe conspiracy theories kind of bothers me. It's like their epistemic immune system has never really been at risk of infection. If your mind hasn't been very sick at least sometimes, how can you be sure you've developed decent priors this time?

Certain risks around groupthink, not knowing about how to select for behaviors or memes that are "safe" to tolerate in whatever memetic/status gradient you find yourself in, even just defining terms like blindspot or bias ---- they all seem made a lot worse by young EAs/rats who didn't previously learn to navigate a niche ideology/subculture.

Super underrated topic of discussion on here! Thanks again for writing!

Comment by Quinn (quinn-dougherty) on Dating Roundup #1: This is Why You’re Single · 2023-08-29T22:02:51.036Z · LW · GW
Comment by Quinn (quinn-dougherty) on Open Thread - August 2023 · 2023-08-22T17:19:18.402Z · LW · GW

I'm wondering if we want "private notes about users within the app", like discord.

Use case: I don't always remember the loose "weight assignments" over time, for different people. If someone seems like they're preferring to make mistake class A over B in one comment, then that'll be relevant a few months later if they're advocating a position on A vs B tradeoffs (I'll know they practice what they preach). Or maybe they repeated a sloppy or naive view about something that I think should be easy enough to dodge, so I want to just take them less seriously in general going forward. Or maybe they mentioned once in a place that most people didn't see that they worked for 10 years doing a particularly niche technical subtopic and they're guaranteed to be the only one any of us knows who has a substantial inside view on, I'll want to remember that later in case it comes up and I need their advice.

Comment by Quinn (quinn-dougherty) on Quinn's Shortform · 2023-08-21T16:29:46.777Z · LW · GW

yeah IQ ish things or athletics are the most well-known examples, but I only generalized in the shortform cuz I was looking around at my friends and thinking about more Big Five oriented examples.

Certainly "conscientiousness seems good but I'm exposed to the mistake class of unhelpful navelgazing, so maybe I should be less conscientious" is so much harder to take seriously if you're in a pond that tends to struggle with low conscientiousness. Or being so low on neuroticism that your redteam/pentest muscles atrophy.

Comment by Quinn (quinn-dougherty) on Quinn's Shortform · 2023-08-18T07:50:07.149Z · LW · GW

How are people mistreated by bellcurves?

I think this is a crucial part of a lot of psychological maladaption and social dysfunction, very salient to EAs. If you're way more trait xyz than anyone you know for most of your life, your behavior and mindset will be massively effected, and depending on when in life / how much inertia you've accumulated by the time you end up in a different room where suddenly you're average on xyz, you might lose out on a ton of opportunities for growth.

In other words, the concept of "big fish small pond" is deeply insightful and probably underrated.

Some IQ-adjacent idea is sorta the most salient to me, since my brother recently reminded me "quinn is the smartest person I know", to which I was like, you should meet smarter people? Or I kinda did feel unusually smart before I was an EA, I can only reasonably claim to be average if you condition on EA or something similar. But this post is extremely important in terms of each of the Big 5, "grit"-adjacent things, etc.

For example, when you're way more trait xyz than anyone around you, you form habits around adjusting for people to underperform relative to you at trait xyz. Sometimes those habits run very deep in your behavior and wordview, and sometimes they can be super ill-tuned (or at least a bit suboptimal) to becoming average. Plus, you develop a lot of "I have to pave my own way" assumptions about growth and leadership. Related to growth, you may cultivate lower standards for yourself than you otherwise might have. Related to leadership, I expect many people in leader roles at small ponds would be more productive, impactful, and happy if they had access to averageness. Pond size means they don't get that luxury!

There's a tightly related topic about failure to abolish meatspace / how you might think the internet corrects for this but later realize how much it doesn't.

Comment by Quinn (quinn-dougherty) on CharlesRW's Shortform · 2023-08-17T14:02:02.998Z · LW · GW

(I was the one who asked Charles to write up his inside view, as reading the article is the only serious information I've ever gathered about debate culture )

Comment by Quinn (quinn-dougherty) on Private notes on LW? · 2023-08-04T18:19:01.580Z · LW · GW

hm maybe you have a private note version of a post, but each inline comment can optionally be sent to a kind of granular permissions version of shortform, to gradually open it up to your inner circle before putting it on regular shortform.

Comment by Quinn (quinn-dougherty) on Recommending Understand, a Game about Discerning the Rules · 2023-08-03T20:01:58.786Z · LW · GW

I don't bother with linux steam, I just boot steam within lutris. Lutris just automates config / wraps wine plus a gui, so lutris will make steam think it's within windows and then everything that steam launches will also think it's within windows. Tho admittedly I don't use steam a lot (lutris takes excellent care of me for non-steam things)

Comment by quinn-dougherty on [deleted post] 2023-08-01T01:15:15.593Z

Here's the full text from, in case goes down (as I suspected it had when it hung trying to load for a few minutes just now).

Declaring yourself to be operating by "Crocker's Rules" means that other people are allowed to optimize their messages for information, not for being nice to you.  Crocker's Rules means that you have accepted full responsibility for the operation of your own mind - if you're offended, it's your fault.  Anyone is allowed to call you a moron and claim to be doing you a favor.  (Which, in point of fact, they would be.  One of the big problems with this culture is that everyone's afraid to tell you you're wrong, or they think they have to dance around it.)  Two people using Crocker's Rules should be able to communicate all relevant information in the minimum amount of time, without paraphrasing or social formatting.  Obviously, don't declare yourself to be operating by Crocker's Rules unless you have that kind of mental discipline.

Note that Crocker's Rules does not mean you can insult people; it means that other people don't have to worry about whether they are insulting you.  Crocker's Rules are a discipline, not a privilege.  Furthermore, taking advantage of Crocker's Rules does not imply reciprocity.  How could it?  Crocker's Rules are something you do for yourself, to maximize information received - not something you grit your teeth over and do as a favor.

"Crocker's Rules" are named after Lee Daniel Crocker.

Comment by Quinn (quinn-dougherty) on Open Thread - July 2023 · 2023-07-26T20:18:30.976Z · LW · GW

I'm curious and I've been thinking about some opportunities for cryptanalysis to contribute to QA for ML products, particularly in the interp area. But I've never looked at spectral methods or thought about them at all before! At a glance it seems promising. I'd love to see more from you on this.

Comment by Quinn (quinn-dougherty) on Blanchard's Dangerous Idea and the Plight of the Lucid Crossdreamer · 2023-07-11T15:04:16.822Z · LW · GW

While I partially share your confusion about "implies an identification of agency with unprovoked assault", I thought Sinclair was talking mostly about "your risk of being seduced, being into it at the time, then regretting it later" and it would only relate to harassment or assault as a kind of tail case.

I think some high libido / high sexual agency people learn to consider seducing someone very effectively in ways that seem to go well but the person would not endorse at CEV a morally relevant failure mode, say 1% bad setting 100% at some rape outcome. Others of course say this is an unhinged symptom of scrupulosity disease and anyone who blames you for not being able to CEV someone against their stated preferences needs to be more reasonable. But clearly this distinction is an attack surface when we talk about asymmetries like power, age, status, money. You can construct scenarios where it seems worse than 1% bad!

Regardless, I think the idea that people (especially women) are sometimes defensive not about their boundaries being violated, but about their consent not being endorsed later explains a lot of human behavior (or at least, like, the society/culture I know).

Comment by Quinn (quinn-dougherty) on Blanchard's Dangerous Idea and the Plight of the Lucid Crossdreamer · 2023-07-11T14:52:38.083Z · LW · GW

"some person in the rationalist community who you have seen at like 3 meetups."

I think what's being gestured at is that Sinclair may or may not have been referring to

  1. the base rate of this being a bad idea
  2. the base rate of this bad idea conditioning on genders xyz

An example of the variety of ways of thinking about this: Many women (often cis) I've talked to, among those who have standing distrust or bad priors on cis men, are very liberal about extending woman-level trust to trans women. That doesn't mean they're maximally trusting, just that they're not more trusting of a cis woman they've seen at like three meetups than they are of a trans woman they've seen at like three meetups.

But really one and two are quite different sorts of claims, that I don't think people agree about how the conditioning changes the game. However, I get the sense (not from this comment, but combining it with upstream Sinclair comments) that Sinclair thinks the person would be making less of a mistake if she had asked the same couchsurf favor of a cis woman.

Comment by quinn-dougherty on [deleted post] 2023-07-10T19:53:01.914Z

There was a bunker workshop last year, an attendee told me that early on everyone reached a consensus that it wasn't going to be a super valuable or accurate strategy in the first few hours of the first day, then goofed off the rest of the weekend.

Comment by Quinn (quinn-dougherty) on Announcing Manifund Regrants · 2023-07-05T20:48:54.768Z · LW · GW

It'd be great if yall could add a regrantor from the Cooperative AI Foundation / FOCAL / CLR / Encultured region of the research/threatmodel space. (epistemic status: conflict of interest since if you do this I could make a more obvious argument for a project)

Comment by Quinn (quinn-dougherty) on Infra-Bayesian Logic · 2023-07-05T19:47:40.552Z · LW · GW

 is the inclusion map.

what makes a coproduct an "inclusion mapping"? I haven't seen this convention/synonym anywhere before. 

Comment by Quinn (quinn-dougherty) on Democratic AI Constitution: Round-Robin Debate and Synthesis · 2023-06-27T14:20:14.677Z · LW · GW

ok, great! I'm down.

Incidentally, you caused me to google for voting theory under trees of alternatives (rather than lists), and there are a few prior directions (none very old, at a glance).

Comment by Quinn (quinn-dougherty) on Democratic AI Constitution: Round-Robin Debate and Synthesis · 2023-06-25T04:18:33.278Z · LW · GW

Seems like a particularly bitterlessony take (in that it kicks a lot to the magical all-powerful black box), while also being over-reliant on the perceptions of viewpoint diversity that have already been induced from the common crawl. I'd much prefer asking more of the user, a more continuous input stream at each deliberative stage.

Comment by Quinn (quinn-dougherty) on Riffing on the agent type · 2023-06-16T12:26:24.161Z · LW · GW
  1. I've been very distressed thinking that instrumental and epistemic parts are not cleanly separable, and that entire is-ought gap or humean facts-values is a grade school story or pedagogically noble lie
  2. I got severely burnt out from exhaustion not long after writing this, and one of the reasons was the open games literature lol. But good news! I was cleaning out old tabs on my browser and I landed on one of those papers, and it all made perfect sense instantly! I'm more convinced than I initially suspected that the open games community has a ton to offer.
Comment by Quinn (quinn-dougherty) on Quinn's Shortform · 2023-06-15T21:47:17.529Z · LW · GW

oh dear

Comment by Quinn (quinn-dougherty) on Introduction to Towards Causal Foundations of Safe AGI · 2023-06-15T17:32:33.664Z · LW · GW

Yeah, what I meant was that "goals" or "preferences" are often emphasized front and center, but here not so much, because it seems like you want to reframe that part under the banner of "intention"

A range of other other relevant concepts also build on causality:

It just felt a little odd to me that so much bubbled up from your decomposition except utility, but you only mention "goals" as this thing that "causes" behaviors without zeroing in on a particular formalism. So my guess was that vnm would be hiding behind this "intention" idea.

Comment by Quinn (quinn-dougherty) on ARC is hiring theoretical researchers · 2023-06-14T19:55:23.842Z · LW · GW

I can't say anything rigorous, sophisticated, or credible. I can just say that the paper was a very welcome spigot of energy and optimism in my own model of why "formal verification" -style assurances and QA demands are ill-suited to models (either behavioral evals or reasoning about the output of decompilers).

Comment by Quinn (quinn-dougherty) on Lightcone Infrastructure/LessWrong is looking for funding · 2023-06-14T18:22:53.835Z · LW · GW

I really want to create a more distinct and intentionally separate culture both on LessWrong and at the Rose Garden Inn, and I think owning a physical space hugely helps with that. FTX, various experiences I've had in the EA space over the past few years, as well as a lot of safetywashing in AI Alignment in more recent years, have made me much more hesitant to build a community that can as easily get swept up in respectability cascades and get exploited as easily by bad actors, and I really want to develop a more intentional culture in what we are building here. Hopefully this will enable the people I am supporting to work on things like AI Alignment without making the world overall worse, or displaying low-integrity behavior, or get taken advantage of.

I'm extremely excited by and supportive of this comment! An especially important related area I think is "solving the deference problem" or cascades of a sinking bar in forecasting and threatmodeling that I've felt over the last couple years.

Comment by Quinn (quinn-dougherty) on TASRA: A Taxonomy and Analysis of Societal-Scale Risks from AI · 2023-06-13T20:00:23.169Z · LW · GW

Indistinguishability obfuscation is compelling, but I wonder what blindspots would arise if we shrunk our understanding of criminality/misuse down to a perspective shaped like "upstanding compliant citizens only use \(f \circ g\), only an irresponsible criminal would use \(g\)" for some model \(g\) (like GPT, or in the paper \(D\)) and some sanitization layer/process \(f\) (like RLHF, or in the paper \(SD\)). That may reduce legitimacy or legibility of grievances or threatmodels that emphasize weaknesses of sanitization (in a world where case law and regulators make it hard to effectively criticize or steer vendors who fulfill enough checklists before we've iterated enough on a satisfying CEVy/social choice theoretic update to RLHF-like processes, i.e. case law or regulators bake in a system prematurely and there's inertia presented to anyone who wants to update the underlying definition of unsafe/toxic/harmful). It may also reduce legitimacy or legibility of upsides of unfiltered models (in an current chatbot case, perhaps public auditability of a preference aggregator pays massive social cohesion dividends).

We may kind of get the feeling that a strict binary distinction is emerging between raw/pure models and sanitization layers/processes, because trusting SGD would be absurd and actually-existing RLHF is a reasonable guess from both amoral risk-assessment views (minimizing liability or PR risk) as well as moral views (product teams sincerely want to do the right thing). But if this distinction becomes paradigmatic, I would predict we become less resilient to diffusion of responsibility (type 1, in the paper) threat models, because I think explicit case law and regulation gives some actors an easy proxy of doing the right thing making them not actually try to manage outcomes (Zvi talked about this in the context of covid, calling it "social vs physical reality", and it all also relates to "trying to try vs. trying" from the sequences/methods). I'm not saying I have alternatives to the strict binary distinction, it seems reasonable, or at least it seems like a decent bet with respect to the actual space of things we can choose to settle for if it's already "midgame".

Comment by Quinn (quinn-dougherty) on Introduction to Towards Causal Foundations of Safe AGI · 2023-06-12T20:38:48.729Z · LW · GW

So the contributions of vnm theory are shrunken down into "intention"? Will you recapitulate that sort of framing (such as involving the interplay between total orders and real numbers) or are you feeling more like it's totally wrong and should be thrown out?

Comment by Quinn (quinn-dougherty) on Manifold Predicted the AI Extinction Statement and CAIS Wanted it Deleted · 2023-06-12T16:16:14.031Z · LW · GW

Deleting a market is unprecedented

I thought there was a vibecamp market that was deleted, last year?

Comment by Quinn (quinn-dougherty) on Using Consensus Mechanisms as an approach to Alignment · 2023-06-12T14:44:10.670Z · LW · GW

Check out the section on computational social choice theory here

Also MDAS might have a framing and reading list you'd like but there are many other ways of applying the mechdzn literature to your post


Comment by Quinn (quinn-dougherty) on The Dictatorship Problem · 2023-06-12T13:48:46.186Z · LW · GW

In terms of the parties, DNC has a track record of handling their populists, and GOP does not.

This comment reads like it's coming from a world where Romney is running the GOP and AOC is running the DNC.

It simply is not viable to bothsides this.

Comment by Quinn (quinn-dougherty) on Quinn's Shortform · 2023-06-09T18:20:22.091Z · LW · GW

probability density

Comment by Quinn (quinn-dougherty) on Quinn's Shortform · 2023-06-09T16:15:33.570Z · LW · GW

10^93 is a fun and underrated number

Comment by Quinn (quinn-dougherty) on Quinn's Shortform · 2023-06-09T14:22:24.103Z · LW · GW

"EV is measure times value" is a sufficiently load-bearing part of my worldview that if measure and value were correlated or at least one was a function of the other I would be very distressed.

Like in a sense, is John threatening to second-guess hundreds of years of consensus on is-ought?

Comment by Quinn (quinn-dougherty) on We Are Less Wrong than E. T. Jaynes on Loss Functions in Human Society · 2023-06-09T13:45:41.443Z · LW · GW

Yall, I'm actually sorta confused about the binary between epistemic and instrumental rationality. In my brain I have this labeling scheme like "PLOS is about epistemic rationality". I think of epistemic and instrumental as a fairly clean binary, because a typecheckerish view of expected value theory separates utilities/values and probabilities very explicitly. A measure forms a coefficient for a valuation, or the other way around.

But I've really had baked in that I shouldn't conflate believing true things ("epistemics": prediction, anticipation constraint, paying rent) with modifying the world ("instrumentals": valuing stuff, ordering states of the world, steering the future). This has seemed deeply important, because is and ought are perpendicular.

But what if that's just not how it is? what if there's a fuzzy boundary? I feel weird.

But in hindsight I should probably have been confused ever since description length minimization = utility maximization

Comment by Quinn (quinn-dougherty) on Measuring Optimization Power · 2023-06-09T13:09:50.333Z · LW · GW

Yes! I actually reread this post today, and realized I was kinda thinking sloppily about power vs effort for years.

Power is what you observe when you live in a society of minds, the minds effect the world somehow but you don't care about the implementation of any mind's underlying search process.

Effort (which we sometimes call pressure, as in "applying optimization pressure") is closer to a asymptotic analysis / computational complexity framing, where the particular choice of searcher is extremely relevant. (something like number of epochs times cost of each epoch, but more generally)

Comment by Quinn (quinn-dougherty) on Quinn's Shortform · 2023-06-08T17:41:23.964Z · LW · GW

messy, jotting down notes:

  • I saw this thread which my housemate had been warning me about for years.
  • failure mode can be understood as trying to aristotle the problem, lack of experimentation
  • thinking about the nanotech ASI threat model, where it solves nanotech overnight and deploys adversarial proteins in all the bloodstreams of all the lifeforms.
  • These are sometimes justified by Drexler's inside view of boundary conditions and physical limits.
  • But to dodge the aristotle problem, there would have to be an amount of bandwidth of what's passing between sensors and actuators (which may roughly correspond to the number of do applications in pearl)
  • Can you use something like communication complexity (between a system and an environment) to think about "lower bound on the number of sensor-actuator actions" mixed with sample complexity (statistical learning theory)
  • Like ok if you're simulating all of physics you can aristotle nanotech, for a sufficient definition of "all" that you would run up against realizability problems and cost way more than you actually need to spend.

Like I'm thinking if there's a kind of complexity theory of pearl (number of do applications needed to acquire some kind of "loss"), then you could direct that at something like "nanotech projects" to fermstimate the way AIs might tradeoff between applying aristotlean effort (observation and induction with no experiment) and spending sensor-actuator interactions (with the world).

There's a scenario in the sequences if I recall correctly about which physics an AI infers from 3 frames of a video of an apple falling, and something about how security mindset suggests you shouldn't expect your information-theoretic calculation that einsteinian physics is impossible to believe from the three frames to actually apply to the AI. Which is a super dumbed down way of opening up this sort of problem space.

Comment by Quinn (quinn-dougherty) on The Geometric Expectation · 2023-06-06T19:45:56.741Z · LW · GW

I was wondering if is anything. I don't recognize , though.

Comment by Quinn (quinn-dougherty) on Cosmopolitan values don't come free · 2023-05-31T21:56:38.949Z · LW · GW

Hard agree about death/takeover decoupling! I've lately been suspecting that P(doom) should actually just be taboo'd, because I'm worried it prevents people from constraining their anticipation or characterizing their subjective distribution over outcomes. It seems very thought-stopping!

Comment by Quinn (quinn-dougherty) on Cosmopolitan values don't come free · 2023-05-31T21:48:07.999Z · LW · GW

There's a kind of midgame / running around like chickens with our heads cut off vibe lately, like "you have to be logging hours in pytorch, you can't afford idle contemplation". Hanging out with EAs, scanning a few different twitter clusters about forecasting and threatmodeling, there's a looming sense that these issues are not being confronted at all and that the sophistication level is lower than it used to be (subject obviously to sampling biases or failure to factor in "community building" growth rate and other outreach activities into my prediction). While I suspect that almost none of these questions can be expected to be decision relevant for more than a few dozen altruists (i.e. logging the pytorch hours and halfassing metaethics may in fact be correct), it still makes me sad and frustrated that I can't count on my colleagues to quickly shrug off a prioritarian standpoint theorist accusing us of a power fantasy. The correct answer is to me obvious: "Yeah, we thought of that", not that we're satisfied with any progress we made but that we're opinionated about what frames or paradigms we expect to lead to actual contributions.

Also, I never really figured out how to be sure that I wasn't attracted to transhumanism/cosmopolitanism/fun theory for some cultural baggage reasons, cuz so much of it feels like a western/liberal/democratic vibe that's just aggressively doubled down on and amplified (my singaporean gf rolls her eyes and says "silly americans" when I talk about this, because most of the things I say about this tend to leak the fact that I think it's at least plausible that freedom is a terminal rather than instrumental value). Like it hasn't occurred to me how to outperform western/liberal/democratic vibes (i.e. impulses I'm assigning to which), but I don't know if that failure is an actual signal that philosophical history sorta peaked -- it probably isn't! Maybe if I was more introspective about cultural inputs I would see how much of my mind's processes and priorities are time/place parochialisms. And maybe then I'd think more clearly about possibility that the sadistic volume of the space of possible values is far too large to afford trading increased risk that a powerful system lands in that volume for the marginal added increase in freedom or even any other prosperity measure.

And I can't seem to shake this feeling, low confidence, that taking even a modest crack at characterizing a good successor criterion is a great indicator that someone will be able to contribute to CEV / social choice theory / the current OpenAI contest. Which, I see the irony, that I'm harshly coming down with the superiority of my parochial cult.

TLDR: Going from "obviously it's needlessly and inexcusably racist to oppose paperclips, let people different than us pursue their joys by their own lights!" to a bog standard complexity/fragility of value believer was an extremely nutritious exercise, that I would not have factored out to more efficiently build skills/capital, that continues to pay dividends.

So yeah, I appreciate you going back to basics with these posts (fingerprints, sentience matters, this, etc).

Comment by Quinn (quinn-dougherty) on Quinn's Shortform · 2023-05-31T13:54:58.330Z · LW · GW

Jargon is not due to status scarcity, but it sometimes makes unearned requests for attention

When you see a new intricate discipline, and you're reticent to invest in navigating it, asking to be convinced that your attention has been earned is fine, but I don't recall seeing a valid or interesting complaint about jargon that deviates from this.

Some elaboration here

Comment by Quinn (quinn-dougherty) on Sentience matters · 2023-05-30T13:00:50.293Z · LW · GW

Failure to identify a fun-theoretic maxima is definitely not as bad as allowing suffering, but the opposite of this statement is I think an unsaid premise in a lot of the "alignment = slavery" sort of arguments that I see.

Comment by Quinn (quinn-dougherty) on Adumbrations on AGI from an outsider · 2023-05-25T15:45:46.508Z · LW · GW

I don't have an overarching theory of the Hard Problem of Jargon, but I have some guesses about the sorts of mistakes people love to make. My overarching point is just "things are hard"

Working in finance, you find a lot of unnecessary jargon designed to keep smart laymen out of the discussion. AI risk is many times worse than buyside finance on this front.

This is a deeply rare phenomenon. I do think there are nonzero places with a peculiar mix of prestige and thinness of kayfabe that lead to this actually happening (like if you're maintaining a polite fiction of meritocracy in the face of aggressive nepotism, you might rely on cheap superiority signals to nudge people into not calling BS), or in a different way I remember at when I worked at home depot supervisors may have been protecting their $2/hr pay bump by hiding their responsibilities from their subordinates (to prevent subordinates from figuring out that they could handle actually supervising if the hierarchy was disturbed). Generalizing from these scenarios to scientific disciplines is perfectly silly! Most people, a vaster majority in sciences, are extremely excited about thinking clearly and communicating clearly to as many people as possible!

I also want to point out a distinction you may be missing in anti-finance populism. A synthetic CDO is sketchy because it is needlessly complex by it's nature, not that the communication strategy was insufficiently optimized! But you wrote about "unnecessary jargon", implying that you think implementing and reasoning about synthetic CDOs is inherently easy, and finance workers are misleading people into thinking it's hard (because of their scarcity mindset, to protect their job security, etc). Jargon is an incredibly weak way to implement anti-finance populism, a stronger form of it says that the instruments and processes themselves are overcomplicated (for shady reasons or whatever).

Moreover, emphasis on jargon complaints implies a destructive worldview. The various degrees and flavors of "there are no hard open problems, people say there are hard open problems to protect their power, me and my friends have all the answers, which were surprisingly easy to find, we'll prove it to you as soon as you give us power" dynamics I've watched over the years seem tightly related, to me.

I actually don’t think that many steps are involved, but the presentation in the articles I’ve read makes it seem as though there is.

I do get frustrated when people tell me that "clear writing" is one thing that definitely exists, because I think they're ignoring tradeoffs. "How many predictable objections should I address, is it 3? 6? does the 'clear writing' protocol tell me to roll a d6?" sort of questions get ignored. To be fair, Arbital was initially developed to be "wikipedia with difficulty levels", which would've made this easier.


I think the way people should reason about facing down jargon is to first ask "can I help them improve?" and if you can't then you ask "have they earned my attention?". Literally everywhere in the world, in every discipline, there are separate questions for communication at the state of the art and communication with the public. People calculate which fields they want to learn in detail, because effort is scarce. Saying "it's a problem that learning your field takes effort" makes zero sense.

Comment by Quinn (quinn-dougherty) on Why I'm Not (Yet) A Full-Time Technical Alignment Researcher · 2023-05-25T12:55:22.103Z · LW · GW

I would encourage a taboo on "independently wealthy", I think it's vague and obscurantist, and doesn't actually capture real life runway considerations. "How long can I sustain which burn rate, and which burn rate works with my lifestyle?" is the actual question!

Comment by Quinn (quinn-dougherty) on Quinn's Shortform · 2023-05-24T20:37:01.158Z · LW · GW

There's a remarkable TNG episode about enfeeblement and paul-based threatmodels, if I recall correctly.

There's a post-scarcity planet with some sort of Engine of Prosperity in the townsquare, and it doesn't require maintenance for enough generations that engineering itself is a lost oral tradition. Then it starts showing signs of wear and tear...

If paul was writing this story, they would die. I think in the actual episode, there's a disagreeable autistic teenager who expresses curiosity about the Engine mechanisms, and the grownups basically shame him, like "shut up and focus on painting and dancing". I think the Enterprise crew bails them out by fixing the Engine, and leaving the kid with a lesson about recultivating engineering as a discipline and a sort of intergenerational cultural heritage and responsibility.

I probably saw it over 10 years ago, I haven't looked it up yet. Man, this is a massive boon to the science-communication elements of threatmodeling, given that the state of public discussion seems to be little middle ground between unemployment and literally everyone literally dying. We can just point people to this episode! Any thoughts?

Comment by Quinn (quinn-dougherty) on My May 2023 priorities for AI x-safety: more empathy, more unification of concerns, and less vilification of OpenAI · 2023-05-24T16:16:13.588Z · LW · GW

Mad respect for the post. Disagree with your background free speech / society philosophy, re the protestors:

Magnanimously and enthusiastically embracing the distribution of views and tactics does not entail withholding criticism. It sorta reminds me of the mistake of forecasting conditional on your own inaction, forgetting that at every time step you (like other agents) will be there responding to new information and propagating your beliefs and adjusting your policies. You're a member of the public, too! You can't just recuse yourself.

I also think democracy hinges crucially on free speech, and I think the world will function better if people don't feel shut-down or clammed-up by people-like-me saying "the remainder of May 2023 probably isn't a great time for AI protests."

This sentence is bizarre! People who take notice and want to help, feeling the protest impulse in their heart, deserve peer review, in principle, period. Obviously there are tactical or strategic complications, like discursive aesthetics or information diets / priors and other sources of inferential distance, but the principle is still true!

Comment by Quinn (quinn-dougherty) on We are misaligned: the saddening idea that most of humanity doesn't intrinsically care about x-risk, even on a personal level · 2023-05-19T17:22:28.396Z · LW · GW

As a wee lad, I was compelled more by relative status/wealth than by absolute status/wealth. It simply was not salient to me that a bad gini score could in principle be paired with negligible rates of suffering! A healthy diet of ourworldindata, lesswrong, and EA forum set me straight, eventually; but we have to work with the actually existing distribution of ideologies and information diets.

I think people who reject moral circle expansion to the future (in the righteous sense: the idea that only oppressors would undervalue more obvious problems) are actually way more focused on this crux (relative vs absolute) than on the content of their population ethics opinions.

Comment by Quinn (quinn-dougherty) on Retrospective on the 2022 Conjecture AI Discussions · 2023-02-25T01:47:22.131Z · LW · GW

Ideally there would be an exceedingly high bar for strategic witholding of worldviews. I'd love some mechanism for sending downvotes to the orgs that veto'd their staff from participating! I'd love some way of socially pressuring these orgs into at least trying to convince us that they had really good reasons.

I'm pretty cynical: I assume nervous and uncalibrated shuffling of HR or legal counsel is more likely than actual defense against hazardous leakage of, say, capabilities hints.