Posts

Speedrunning 4 mistakes you make when your alignment strategy is based on formal proof 2023-02-16T01:13:44.847Z
Mid-Atlantic AI Alignment Alliance Unconference 2023-01-13T20:33:33.389Z
Riffing on the agent type 2022-12-08T00:19:38.054Z
Master plan spec: needs audit (logic and cooperative AI) 2022-11-30T06:10:38.775Z
What is estimational programming? Squiggle in context 2022-08-12T18:39:57.230Z
Abundance and scarcity; working forwards and working backwards 2022-02-18T19:05:38.974Z
AISC5 Retrospective: Mechanisms for Avoiding Tragedy of the Commons in Common Pool Resource Problems 2021-09-27T16:46:40.389Z
[linkpost] The Psychological Economy of Inaction by William Gillis 2021-09-12T22:04:43.991Z
What are some claims or opinions about multi-multi delegation you've seen in the memeplex that you think deserve scrutiny? 2021-06-27T17:44:52.389Z
Cliffnotes to Craft of Research parts I, II, and III 2021-06-26T14:00:53.560Z
[timeboxed exercise] write me your model of AI human-existential safety and the alignment problems in 15 minutes 2021-05-04T19:10:28.755Z
What am I fighting for? 2021-04-20T23:27:26.416Z
Could degoogling be a practice run for something more important? 2021-04-17T00:03:42.790Z
[Recruiting for a Discord Server] AI Forecasting & Threat Modeling Workshop 2021-04-10T16:15:07.420Z
Aging: A Surprisingly Tractable Problem 2021-04-08T19:25:42.287Z
Averting suffering with sentience throttlers (proposal) 2021-04-05T10:54:09.755Z
TASP Ep 3 - Optimal Policies Tend to Seek Power 2021-03-11T01:44:02.814Z
Takeaways from the Intelligence Rising RPG 2021-03-05T10:27:55.867Z
Reading recommendations on social technology: looking for the third way between technocracy and populism 2021-02-24T11:48:06.451Z
Quinn's Shortform 2021-01-16T17:52:33.020Z
Is it the case that when humans approximate backward induction they violate the markov property? 2021-01-16T16:22:21.561Z
Infodemics: with Jeremy Blackburn and Aviv Ovadya 2021-01-08T15:44:57.852Z
Chance that "AI safety basically [doesn't need] to be solved, we’ll just solve it by default unless we’re completely completely careless" 2020-12-08T21:08:47.575Z
Announcing the Technical AI Safety Podcast 2020-12-07T18:51:58.257Z
How ought I spend time? 2020-06-30T16:53:53.787Z
Have general decomposers been formalized? 2020-06-27T18:09:06.411Z
Do the best ideas float to the top? 2019-01-21T05:22:51.182Z
on wellunderstoodness 2018-12-16T07:22:19.250Z

Comments

Comment by Quinn (quinn-dougherty) on Quinn's Shortform · 2024-04-20T01:01:07.934Z · LW · GW

Thinking about a top-level post on FOMO and research taste

  • Fear of missing out defined as inability to execute on a project cuz there's a cooler project if you pivot
  • but it also gestures at more of a strict negative, where you think your project sucks before you finish it, so you never execute
  • was discussing this with a friend: "yeah I mean lesswrong is pretty egregious cuz it sorta promotes this idea of research taste as the ability to tear things down, which can be done armchair"
  • I've developed strategies to beat this FOMO and gain more depth and detail with projects (too recent to see returns yet, but getting there) but I also suspect it was nutritious of me to develop discernment about what projects are valuable or not valuable for various threat models and theories of change (in such a way that being a phd student off of lesswrong wouldn't have been as good in crucial ways, tho way better in other ways).
    • but I think the point is you have to turn off this discernment sometimes, unless you want to specialize in telling people why their plans won't work, which I'm more dubious on the value of than I used to be

Idk maybe this shortform is most of the value of the top level post

Comment by Quinn (quinn-dougherty) on Quinn's Shortform · 2024-04-17T23:09:16.358Z · LW · GW

He had become so caught up in building sentences that he had almost forgotten the barbaric days when thinking was like a splash of color landing on a page.

Edward St Aubyn

Comment by Quinn (quinn-dougherty) on Quinn's Shortform · 2024-04-17T22:27:04.037Z · LW · GW

a -valued quantifier is any function , so when is bool quantifiers are the functions that take predicates as input and return bool as output (same for prop). the standard max and min functions on arrays count as real-valued quantifiers for some index set .

I thought I had seen as the max of the Prop-valued quantifiers, and exists as the min somewhere, which has a nice mindfeel since forall has this "big" feeling (if you determined for that (of which is just syntax sugar since the variable name is irrelevant) by exhaustive checking, it would cost whereas would cost unless the derivation of the witness was dependent on size of domain somehow).

Incidentally however, in differentiable logic it seems forall is the "minimal expectation" and existential is the "maximal expectation". Page 10 of the LDL paper, where a is the limit as gamma goes to zero of , or the integral with respect to a -ball about the min of rather than about the entire domain of . os in this sense, the interpretation of a universally quantified prop is a minimal expectation, dual where existentially quantified prop is a maximal expectation.

I didn't like the way this felt aesthetically, since as I said, forall feels "big" which mood-affiliates toward a max. But that's notes from barely-remembered category theory I saw once. Anyway, I asked a language model and it said that forall is minimal because it imposes the strictest of "most conservative" requirement. so "max" in the sense of "exists is interpreted to maximal expectation" refers to maximal freedom.

I suppose this is fine.

Comment by Quinn (quinn-dougherty) on Quinn's Shortform · 2024-04-16T18:38:40.133Z · LW · GW

I'm excited for language model interpretability to teach us about the difference between compilers and simulations of compilers. In the sense that chatgpt and I can both predict what a compiler of a suitably popular programming language will do on some input, what's going on there---- surely we're not reimplementing the compiler on our substrate, even in the limit of perfect prediction? Will be an opportunity for a programming language theorist in another year or two of interp progress

Comment by Quinn (quinn-dougherty) on Announcing Atlas Computing · 2024-04-11T17:17:57.690Z · LW · GW

firefox is giving me a currently broken redirect on https://atlascomputing.org/safeguarded-ai-summary.pdf

Comment by Quinn (quinn-dougherty) on Quinn's Shortform · 2024-04-06T18:56:50.646Z · LW · GW

Does anyone use vim / mouse-minimal browser? I like Tridactyl better than the other one I tried, but it's not great when there's a vim mode in a browser window everything starts to step on eachother (like in jupyter, colab, leetcode, codesignal)

Comment by Quinn (quinn-dougherty) on Quinn's Shortform · 2024-04-06T18:46:37.301Z · LW · GW

claude and chatgpt are pretty good at ingesting textbooks and papers and making org-drill cards.

here's my system prompt https://chat.openai.com/g/g-rgeaNP1lO-org-drill-card-creator though i usually tune it a little further per session.

Here are takes on the idea from the anki ecosystem

I tried a little ankigpt and it was fine, i haven't tried the direct plugin from ankiweb. I'm opting for org-drill here cuz I really like plaintext.

Comment by Quinn (quinn-dougherty) on Best in Class Life Improvement · 2024-04-04T23:46:30.473Z · LW · GW

one exercise remark: swimming! might have to pick up a little technique at first, but the way it combines a ton of muscle groups and cardio is unparalleled

(if anyone reading this is in the bay I will 100% teach you the 0->1 on freestyle and breaststroke technique, just DM)

Comment by Quinn (quinn-dougherty) on The Comcast Problem · 2024-03-21T18:41:26.367Z · LW · GW

is lower than it "should" be.

I think the lesson of social desirability bias is that valuable services having lower status than they "ought" to is the system working as intended.

(Though I'm sympathetic with your view: no reason to be extreme about social desirability bias)

Comment by Quinn (quinn-dougherty) on Quinn's Shortform · 2024-03-19T18:43:02.034Z · LW · GW

Proof cert memo

In the Safeguarded AI programme thesis[1], proof certificates or certifying algorithms are relied upon in the theory of change. Let's discuss!

From the thesis:

Proof certificates are a quite broad concept, introduced by[33] : a certifying algorithm is defined as one that produces enough metadata about its answer that the answer's correctness can be checked by an algorithm which is so simple that it is easy to understand and to formally verify by hand.

The abstract of citation 33 (McConnell et al)[2]

A certifying algorithm is an algorithm that produces, with each output, a certificate or witness (easy-to-verify proof) that the particular output has not been compromised by a bug. A user of a certifying algorithm inputs x, receives the output y and the certificate w, and then checks, either manually or by use of a program, that w proves that y is a correct output for input x. In this way, he/she can be sure of the correctness of the output without having to trust the algorithm.

In this memo, I do an overview by distinguishing a proof cert from a constructive proof, and dive shallowly into the paper. Then I ruminate on how this might apply in AI contexts.

What's the difference between a proof cert and a constructive proof?

If a cert is just a witness, you might think that proof certs are just proofs from constructive math as distinct from classical math. For example, under some assumptions and caveats (in the calculus/topology setting) a function can be "known" to have a fixed point without us methodically discovering them with any guarantees, while under other assumptions and caveats (namely the lattice theoretic setting) we know there's a fixed point because we have a guaranteed procedure for computing it. However, they go further: a certifying algorithm produces with each output a cert/witness that the particular output has, in the case of McConnell et al, "not been compromised by a bug". This isn't a conceptual leap from constructive math, in which a proof is literally a witness-building algorithm, it looks to me a bit like a memo saying "btw, don't throw out the witness" along with notes about writing cheap verifiers that do not simply reimplement the input program logic.

One way of looking at a certificate is that it's dependent on metadata. This may simply be something read off of an execution trace, or some state that logs important parts of the execution trace to construct a witness like in the bipartite test example. In the bipartite test example, all you need is for the algorithm that searches for a two-coloring or an odd cycle, and crucially will return either the two-coloring or the odd cycle as a decoration on it's boolean output (where "true" means "is bipartite"). Then, a cheap verifier (or checker) is none other than the pre-existing knowledge that two-colorable and bipartite are equivalent, or conversely that the existence of an odd cycle is equivalent to disproving bipartiteness.

Chapter 11 of [3] will in fact discuss certification and verification supporting eachother, which seems to relax the constraint that a cheap verifier doesn't simply reimplement the certifier logic.

The main difference between a standard proof as in an applied type theory (like lean or coq) and a proof cert is kinda cultural and not fundamental, in that proof certs prefer to emphasize single IO pairs and lean/coq proofs often like to emphasize entire input types. Just don't get confused on page 34 when it says "testing on all inputs"-- it means that a good certifying algorithm is a means to generate exhaustive unit tests, not that the witness predicate actually runs on all inputs at once. It seems like the picking out of only input of interest and not quantifying over the whole input type will be the critical consideration when we discuss making this strategy viable for learned components or neural networks.

Formally:

Skip this if you're not super compelled by what you know so far about proof certificates, and jump down to the section on learned programs.

Consider algorithms . A precondition and a postcondition form an IO-spec or IO-behavior. An extension of the output set is needed to account for when the precondition is violated. We have a type of witness descriptions. Finally, we describe the witness predicate with that says IO pair is witnessed by . So when is true, must be a witness to the truth, else a witness to the falsehood. McConnell et al distinguish witness predicates from strong witness predicates, where the former can prove but the latter knows which disjunct.

A checker for is an algorithm sending . It's desiderata / nice to haves are correctness, runtime linear in input, and simplicity.

I'm not that confident this isn't covered in the document somewhere, but McConnell et al don't seem paranoid about false functions. If Alice (sketchy) claims that a came from on , but she lied to Bob and in fact ran , there's no requirement for witness predicates to have any means of finding out. (I have a loose intuition that the proper adversarial adaptation of proof certs would be possible on a restricted algorithm set, but impose a lot of costs). Notice that is with respect to a fixed , would be a different approach! We should highlight that a witness predicate only tells you about one input.

Potential for certificates about learning, learned artefacts, and learned artefacts' artefacts

The programme thesis continues:

We are very interested in certificates, because we would like to rely on black-box advanced AI systems to do the hard work of searching for proofs of our desired properties, yet without compromising confidence that the proofs are correct. In this programme, we are specifically interested in certificates of behavioural properties of cyber-physical systems (ranging from simple deterministic functions, to complex stochastic hybrid systems incorporating nuances like nondeterminism and partiality).

If AIs are developing critical software (like airplanes, cars, and weaponry) and assuring us that it's legit, introducing proof certs removes trust from the system. The sense in which I think Safeguarded AI primarily wants to use proof certs certainly doesn't emphasize certs of SGD or architectures, nor does it even emphasize inference-time certs (though I think it'd separately be excited about progress there). Instead, an artefact that is second order removed from the training loop seems to be the target of proof certs, like a software artefact written by a model but it's runtime is not the inference time of the model. I see proof certs in this context as primarily an auditing/interpretability tool. There's no reason a model should write code that is human readable, constraining it to write human readable code seems hard, letting the code be uninterpretable but forcing it to generate interpretable artefacts seems pretty good! The part I'm mostly not clear on is why proof certs would be better than anything else in applied type theory (where proofs are just lambdas, which are very decomposable- you can't read them all at once but they're easy to pick apart) or model checking (specs are logical formulae, even if reading the proof that an implementation obeys a spec is a little hairy).

It's important that we're not required to quantify over a whole type to get some really valuable assurances, and this proof cert framework is amenable to that. On the other hand, you can accomplish the same relaxation in lean or coq, and we can really only debate over ergonomics. I think ultimately, the merits of this approach vs other approaches are going to be decided entirely on the relative ergonomics of tools that barely exist yet.

Overall impression

McConnell et al is 90 pages, and I only spent a few hours with it. But I don't find it super impressive or compelling. Conceptually, a constructive proof gets me everything I want out of proof certificates. A tremendous value proposition would be to make these certs easier to obtain for real world programs than what coq or lean could provide, but I haven't seen those codebases yet. There's more literature I may look at around proof-carrying code (PCC)[4] which is similar ideas, but I'm skeptical that I'll be terribly compelled since the insight "just decorate the code with the proof" isn't very subtle or difficult.

Moreover, this literature just seems kinda old I don't see obvious paths from it to current research.


  1. https://www.aria.org.uk/wp-content/uploads/2024/01/ARIA-Safeguarded-AI-Programme-Thesis-V1.pdf ↩︎

  2. http://alg.cs.uni-kl.de/en/team/schweitzer/publikationen/docs/CertifyingAlgorithms.pdf ↩︎

  3. http://alg.cs.uni-kl.de/en/team/schweitzer/publikationen/docs/CertifyingAlgorithms.pdf ↩︎

  4. https://www.cs.princeton.edu/~appel/papers/pccmodel.pdf ↩︎

Comment by Quinn (quinn-dougherty) on Quinn's Shortform · 2024-03-16T03:32:29.254Z · LW · GW

consider how our nonconstructive existence proof of nash equilibria creates an algorithmic search problem, which we then study with computational complexity. For example, 2-player 0-sum games are P but for three or more players general sum games are NP-hard. I wonder if every nonconstructive existence proof is like this? In the sense of inducing a computational complexity exercise to find what class it's in, before coming up with greedy heuristics to accomplish an approximate example in practice.

Comment by Quinn (quinn-dougherty) on Quinn's Shortform · 2024-02-17T23:06:03.820Z · LW · GW

I like thinking about "what it feels like to write computer programs if you're a transformer".

Does anyone have a sense of how to benchmark or falsify Nostalgebraist's post on the subject?

Comment by Quinn (quinn-dougherty) on AI Views Snapshots · 2024-02-12T02:14:24.452Z · LW · GW

me:

Comment by Quinn (quinn-dougherty) on Quinn's Shortform · 2024-01-10T13:14:18.457Z · LW · GW

any interest in a REMIX study group? in meatspace in berkeley, a few hours a week. https://github.com/redwoodresearch/remix_public/

Comment by Quinn (quinn-dougherty) on AI Timelines · 2023-12-10T19:49:36.207Z · LW · GW

How were the distributions near the top elicited from each participant?

Comment by Quinn (quinn-dougherty) on Quinn's Shortform · 2023-12-05T21:38:21.584Z · LW · GW

Any tips for getting out of a "rise to the occasion mindset" and into a "sink to your training" mindset?

I'm usually optimizing for getting the most out of my A-game bursts. I want to start optimizing for my baseline habits, instead. I should cover the B-, C-, and Z-game; the A-game will cover itself.

Mathaphorically, "rising to the occasion" is taking a max of a max, whereas "sinking to the level of your habits" looks like a greatest lower bound.

Comment by Quinn (quinn-dougherty) on Quinn's Shortform · 2023-12-05T05:09:47.986Z · LW · GW

I used to think "community builder" was a personality trait I couldn't switch off, but once I moved to the bay I realized that I was just desperate for serendipity and knew how to take it from 0% to 1%. Since the bay is constantly humming at 70-90% serendipity, I simply lost the urge to contribute.

Benefactors are so over / beneficiaries are so back / etc.

Comment by Quinn (quinn-dougherty) on Quinn's Shortform · 2023-12-05T05:03:49.618Z · LW · GW

Cope isn't a very useful concept.

For every person who has a bad reason that they catch because you say "sounds like cope", there are 10x as many people who find their reason actually compelling. Saying "if that was my reason it would be a sign I was in denial of how hard I was coping" or "I don't think that reason is compelling" isn't really relevant to the person you're ostensibly talking to, who's trying to make the best decisions for the best reasons. Just say you don't understand why the reason is compelling.

Comment by Quinn (quinn-dougherty) on What is estimational programming? Squiggle in context · 2023-11-26T20:19:06.913Z · LW · GW

the author re-reading one year+ out:

  • My views on value of infographics that actually look nice have changed, perhaps I should have had nicer looking figures.
  • "Unlike spreadsheets, users can describe in notebooks distributions, capturing uncertainty in their beliefs." seems overconfident. Monte carlo has been an excel feature for years. My dismissal of this (implicit) makes sense as a thing to say because "which usecases are easy?" is a more important question than "which usecases can you get away with if you squint?", but I could've done a way better job at actually reading and critiquing excel programs that feature monte carlo.
  • Viv told me to fix this and I ignored them because I liked the aesthetics of the sentence at the time: "which defines compositional systems as fluid de- and re-composition under the abolition of emergent properties" -- viv was right. I've changed my views on the importance of writing being boring / non-idiosyncratic, also even my prose aesthetic preferences change over time (I no longer enjoy this sentence).
  • "(this use case isn’t totally unlike what Ergo accomplishes)" I keep thinking about ergo, perhaps a paragraph in the "whiggish history" section would've been appropriate, since API access to some zoo of scoring rules / crowd wisdom is a pretty obvious feature that many platforms would foreseeably prioritize.
  • I underrated how strong the claims about compositionality (being precisely the sum of parts), since statistical noise is such a fundamental piece of the puzzle.

I haven't been working on this stuff except a little on the side for most of the last year, but still get excited here and there. I returned to this post because I might have another post about module systems in software design, package management, and estimational programming written up in the not too distant future.

Overall, this remains a super underrated area.

Comment by Quinn (quinn-dougherty) on Quinn's Shortform · 2023-10-27T22:01:04.176Z · LW · GW

Quick version of conversations I keep having, might be worth a top level effortpost.

A prediction market platform giving granular permission systems would open up many use cases for many people

whistleblower protections at large firms, dating, project management and internal company politics--- all userbases with underserved opinions about transparency. Manifold could pivot to this but have a lot of other stuff they could do instead.

Think about slack admins are confused about how to prevent some usergroups from @channel and discord admins aren't.

Comment by Quinn (quinn-dougherty) on Linkpost: A Post Mortem on the Gino Case · 2023-10-26T19:41:54.836Z · LW · GW

(sorry for pontificating when you asked for an actual envelope or napkin) upside is an externality, Ziani incidentally benefits but the signal to other young grad students that maybe career suicide is a slightly more viable risk seems like the source of impact. Agree that this subfield isn't super important, but we should look for related opportunities in subfields we care more about.

I don't know if designing a whistleblower prize is a good nerdsnipe / econ puzzle, in that it may be a really bad goosechase (since generating false positives through incentives imposes name-clearing costs on innocent people, and either you can design your way out of this problem or you can't).

Comment by Quinn (quinn-dougherty) on Linkpost: A Post Mortem on the Gino Case · 2023-10-26T04:23:19.068Z · LW · GW

Is this a retroactive grant situation?

Comment by Quinn (quinn-dougherty) on Open Thread – Autumn 2023 · 2023-10-25T15:02:37.260Z · LW · GW

I think the lesswrong/forummagnum takes on recsys are carrying the torch of RSS "you own your information diet" and so on -- I'm wondering if we can have something like "use lightcone/CEA software to ingest substack comments, translates activity or likes into karma, and arranges/prioritizes them according to the user's moderation philosophy".

This does not cash out to more CCing/ingesting of substack RSS to lesswrong overall, the set of substack posts I would want to view in this way would be private from others, and I'm not necessarily interested in confabulating the cross-platform "karma" translations with more votes or trying to make it go both ways.

Comment by Quinn (quinn-dougherty) on Quinn's Shortform · 2023-10-23T23:33:38.853Z · LW · GW

In terms of the parts where the books overlap, I didn't notice anything substantial. If anything the sequel is less, cuz there wasn't enough detail to get into tricks like the equivalent bet test.

Comment by Quinn (quinn-dougherty) on Quinn's Shortform · 2023-10-23T16:45:46.656Z · LW · GW

I'm halfway through how to measure anything: cybersecurity, which doesn't have a lot of specifics to cybersecurity and mostly reviews the first book. I never finished the first one, and it was about four years ago that I read the parts that I did.

I think for top of the funnel EA recruiting it remains the best and most underrated book. Basically anyone worried about any kind of problem will do better if they read it, and most people in memetically adaptive / commonsensical activist or philanthropic mindsets probably aren't measuring enough.

However, the material is incredibly basic for someone who's been hanging out with EAs or on LessWrong for even a little bit. You've already absorbed so much of it from the water supply.

Comment by Quinn (quinn-dougherty) on Quinn's Shortform · 2023-10-18T19:50:58.677Z · LW · GW

one may be net better than the other, I just think the expected error washes out all of one's reasoning so individuals shouldn't be confident they're right.

Comment by Quinn (quinn-dougherty) on Quinn's Shortform · 2023-10-18T17:50:49.715Z · LW · GW

what are your obnoxious price systems for tutoring?

There's a somewhat niche CS subtopic that a friend wants to learn, I'm really well positioned to teach her. More discussion on the manifold bounty:

Comment by Quinn (quinn-dougherty) on Quinn's Shortform · 2023-10-18T16:04:37.133Z · LW · GW

A trans woman told me

I get to have all these talkative blowhard traits and no one will punish me for it cuz I'm a girl. This is one major reason detrans would make my life worse. Society is so cruel to men, it sucks so much for them

And another trans woman had told me almost the exact same thing a couple months ago.

My take is that roles have upsides and downsides, and that you'll do a bad job if you try to say one role is better or worse than another on net or say that a role is more downside than upside. Also, there are versions of "women talk too much" as a stereotype in many subcultures, but I don't have a good inside view about it.

Comment by Quinn (quinn-dougherty) on Study Guide · 2023-10-16T15:02:27.334Z · LW · GW

oh haha two years later it's funny that 2 years ago quinn thought LF was a "very thorough foundation".

Comment by Quinn (quinn-dougherty) on Is the Wave non-disparagement thingy okay? · 2023-10-15T20:42:31.741Z · LW · GW

I've only been in IP / anticompetitive NDA situations at any point in my career. The idea that some NDAs want to oblige me to glomarize is rather horrifying. When you have IP reasons for NDA of course you say "I'm not getting in the weeds here for NDA reasons", it is a literal type error in my social compiler for me to even imagine being clever or savvy about this.

Comment by Quinn (quinn-dougherty) on Related Discussion from Thomas Kwa's MIRI Research Experience · 2023-10-11T17:51:45.959Z · LW · GW

https://www.lesswrong.com/posts/BGLu3iCGjjcSaeeBG/related-discussion-from-thomas-kwa-s-miri-research?commentId=fPz6jxjybp4Zmn2CK This brief subthread can be read as "giving nate points for trying" and is too credulous about if "introspection" actually works--- my wild background guess is that roughly 60% of the time "introspection" is more "elaborate self-delusion" than working as intended, and there are times when someone saying "no but I'm trying really hard to be good at it" drives that probability up instead of down. I didn't think this was one of those times before reading Kurt's comment. A more charitable view is that this prickliness (understatement) is something that's getting triage'd out / deprioritized, not gymnastically dodged, but I think it's unreasonable to ask people to pay attention to the difference.

That's besides the point: the "it" was just the gdoc. "it would be a steep misfire" would mean "the gdoc tries to talk about the situation and totally does not address what matters". The subtraction of karma was metaphorical (I don't think I even officially voted on lesswrong!). I want to emphasize that I'm still very weak, cuz for instance I can expect people in that subthread to later tell me a detailed inside view about how giving Nate points for trying (by writing that doc) doesn't literally mean that they were drawn into this "if von neumann has to scream at me to be productive, then it would be selfish to set a personal boundary" take, but I think it's reasonable for me to be suspicious and cautious and look for more evidence that people would not fall for this class of "holding some people to different standards for for-the-greater-good reasons" again.

Comment by Quinn (quinn-dougherty) on NYT on the Manifest forecasting conference · 2023-10-09T22:47:20.879Z · LW · GW

Maybe he meant that at South by Southwest the chance is higher than 28%?

Comment by Quinn (quinn-dougherty) on Current State of Probabilistic Logic · 2023-10-07T20:01:40.420Z · LW · GW

Semanticists have been pretty productive https://www.cambridge.org/core/books/foundations-of-probabilistic-programming/819623B1B5B33836476618AC0621F0EE and may help you approach what matters to you, there are certainly adjacent questions and concerns.

A bunch of PL papers building on like the giry monad have floated around in stale tabs on my machine for a couple years. open agency architecture actually provides bread crumbs to this sort of thing in a footnote about infrabayesianism https://www.lesswrong.com/posts/pKSmEkSQJsCSTK6nH/an-open-agency-architecture-for-safe-transformative-ai#fnnlogfgcev2e and a lot of the open games folks pushed on this cuz they wanted it for bayesian Nash equilibria.

Comment by Quinn (quinn-dougherty) on "Win First" vs "Chill First" · 2023-10-07T19:40:24.303Z · LW · GW

Yes. Great post. See here https://www.lesswrong.com/posts/kq8CZzcPKQtCzbGxg/quinn-s-shortform?commentId=gFvzzBwdsoeRjGicA for related discussion.

Comment by Quinn (quinn-dougherty) on Translations Should Invert · 2023-10-07T19:34:40.618Z · LW · GW

My summary translation:

You have a classical string P. We like to cooperate across worldviews or whatever, so we provide sprinkleNegations to sprinkle double negations on classical strings and decorateWithBox to decorate provability on intuitionistic strings.

The definition of inverse would be cool, but you see that decorateWithBox(sprinkleNegations(P)) = P isn't really there, because now you have all these annoying symbols (which cancel out if we implemented translations correctly, but how would we know that?). Luckily, we know we can automate cancelation with an evaluator / normalization--- normalize(decorateWithBox(sprinkleNegations(P))) = P is a trivial syntactic string equality, which is close enough to the definition of inverse.

I think Alice and Bob, socially, care a lot simply about the costs of normalizing to the input after the translations are applied. If it's an expensive application, they're going to struggle to connect. If not, they'll enjoy the benefits of invertable translations.

Comment by Quinn (quinn-dougherty) on What is the optimal frontier for due diligence? · 2023-10-07T18:18:21.736Z · LW · GW

i don't want to make claims about the particular case, but im worried if you infer a heuristic and apply it elsewhere it could fail

I think Nonlinear should be assigning enough credence to the possibility that they extremely harmed their employees with a degree of horror and remorse. Not laughter or dismissal.

Sometimes scrupulous/doormat people bend over backwards to twist the facts into some form to make an accuser reasonable or have a point. If you've done this, maybe eventually you talked to outside observers or noticed a set of three lies that you had previously been charitable about-- and you're like "hey wait a minute, I totally regret dwelling on feeling bad about this".

I talked about this with a friend who updated me toward thinking that the scrupulous doormat people are a small enough proportion of the population (or at least certainly they grow out of it) that this is true but a rounding error.

I've done this a few times, and I'd like to advocate for people of scrupulous doormat tendencies to have space to not be eternally signaling that they did due diligence before concluding that they dont have deep regrets or behavioral updates to do every time it comes up. People should feel allowed to express themselves aligned with very reasonable needs for catharsis or to reclaim their map of reality. Sometimes this will look like dismissiveness (especially if theyre still bitter about concessions having been extracted).

So if we were applying my point to this case (which, to be clear, we're not), it would look like "we were horrified and remorseful, went through and did an audit, when we concluded that everything was fine all along we ended up pretty annoyed that we were pressured into doing this unpleasant timesink audit process, and are trying to laugh it off now"--- I don't think it's necessarily a signal if they don't prioritize conveying that this process had occurred to you in particular.

(Again re the rounding error thing: very few cases are sufficiently black and white for this comment to apply).

Comment by Quinn (quinn-dougherty) on Quinn's Shortform · 2023-10-07T06:47:50.139Z · LW · GW

Yall, this is a rant. It will be sloppy.

I'm really tired of high functioning super smart "autism" like ok we all have madeup diagnoses--- anyone with a IQ slightly above 90 knows that they can learn the slogans to manipulate gatekeepers to get performance enhancement, and they decide not to if they think theyre performing well enough already. That doesn't mean "ADHD" describes something in the world. Similarly, there's this drift of "autism" getting more and more popular. It's obnoxious because labels and identities are obnoxious, but i only find it repulsive because of the general trend of articulate and charismatic minorities setting agendas that effect the less talkative (and worse off!) fellow minorities https://open.substack.com/pub/jessesingal/p/why-disability-advocates-are-trying?utm_source=share&utm_medium=android&r=5hj2m (I only read up to free tier, but I've seen a bunch of this stuff).

Maybe neuroscientists or psychologists have good reasons for this, but "autism" is the most immensely deranged word in the history of categories--- what utility is a word that crosses absent minded professors and people who can't conceptually distinguish a week from a month insofar as you can wordlessly elicit conceptual understanding from them???? If you worked at the dictionary factory and you tried to slip that word in, you'd be fired immediately. So why do psychologists or neuroscientists get away with this???

I'm a computer guy. I'm bad at social queues sometimes. I feel at home in lots of sperg culture. But leave me out of your stolen valor campaign. We're fine, you guys--- we're smart enough to correct for most of the downsides.

https://www.thefp.com/p/the-autism-surge-lies-conspiracies

Comment by Quinn (quinn-dougherty) on Related Discussion from Thomas Kwa's MIRI Research Experience · 2023-10-06T16:11:28.467Z · LW · GW

This comment's updates for me personally:

  • The overall "EA is scary / criticizing leaders is scary" meme is very frequently something I roll my eyes at, I find it alien and sometimes laughable when people say they're worried about being bold and brave cuz all I ever see are people being rewarded for constructive criticism. But man, I feel like if I didn't know about some of this stuff then I'm missing a huge piece of the puzzle. Unclear yet what I'll think about, say, the anon meta on forums after this comment sinks in / propagates, but my guess is it'll be very different than what I thought before.
  • People are way too quick to reward themselves for trying (my update is my priority queue in doing a proper writeup): Nate & enablers saying that productivity / irreplaceability is an excuse to triage out fundamental interpersonal effort is equivalent (as far as I'm concerned) to a 2022 University Community Builder (TM) deciding that they're entitled to opulent retreats the moment they declare stated interest in saving the world. "For the greater good" thinking is fraught and dicey even when you're definitely valuable enough for the case to genuinely be made, but obviously there's pressure toward accepting a huge error rate if you simply want to believe you or a colleague is that productive/insightful. I honestly think Nate's position here is more excusable than enablers: you basically need to see nobel physicist level output before you consider giving someone this much benefit of the doubt, and even then you should decide not to after considering it, I'm kinda dumbfounded that it was this easy for MIRI's culture to be like this. (yes my epistemic position is going to be wrong about the stakes because "undisclosed by default", but there are a bajillion sources of my roll to disbelieve if anyone says "well actually undisclosed MIRI codebases are nobel physicist level).
  • I feel very vindicated having written this comment, and I am subtracting karma from everyone who gave Nate points for writing a long introspective gdoc. You guys should've assumed that it would be a steep misfire.
  • Someone told me that some friends of theirs hated a talk or office hours with Nate, and I super devil's advocated the idea that lots of people have reasons for disliking the "blunt because if I suffer fools we'll all lower our standards" style that I'm not sympathetic with: I now need to apologize to them for being dismissive. I mean for chrissakes yall, in my first 1:1 with Eliezer he was not suffering fools, he helped me speedrun noticing how misled my optimism about my project at the time was and it was jovial and pleasant, so I felt like an idiot and I look back fondly on the interaction. So no, the comments about how comms style is downstream of trying to outperform those prestigious etiquette professional academics goodharting on useless but legible research that Nate retreats to elsewhere in the comments here do not hold.

Extremely from the heart warm comments about Nate from my PoV (not coming from a phonedin/trite/etiquette "soften the blow" place, but very glad that there's that upside):

  • I'm a huge replacing guilt fan
  • reading Nate on github and lesswrong has been very important to me in my CS education. The old intelligence.org/research-guide mattered so much to me at very important life/development pivot.
  • Nate's strategy / philosophy of alignment posts, particularly recently, have been phenomenal.
  • in a sibling comment, Nate wrote:

If you stay and try to express yourself despite experiencing strong feelings of frustration, you're "almost yelling". If you leave because you're feeling a bunch of frustration and people say they don't like talking to you while you're feeling a bunch of frustration, you're "storming out".

This is hard and unfair and I absolutely feel for him, I've been there[1].

  • I don't know if we've ever been in the same room. I'm going off of web presence, and very little comments or rumors others have said.

  1. on second thought: I've mostly only been there in say a soup kitchen run by trans commie lesbians, who are eagerly looking for the first excuse they can find to cancel the cis guy. I guess I don't at all relate to the possibility that someone would feel that way in bay area tech scene. ↩︎

Comment by Quinn (quinn-dougherty) on Graphical tensor notation for interpretability · 2023-10-05T04:03:46.100Z · LW · GW

Ahhhhh kick ass! Stephen Mell is getting into LLMs lately https://arxiv.org/abs/2303.15784 you guys gotta talk I just sent him this post.

Comment by Quinn (quinn-dougherty) on Related Discussion from Thomas Kwa's MIRI Research Experience · 2023-10-05T03:34:27.515Z · LW · GW

Ah yeah. I'm a bit of a believer in "introspection preys upon those smart enough to think they can do it well but not smart enough to know they'll be bad at it"[1], at least to a partial degree. So it wouldn't shock me if a long document wouldn't capture what matters.


  1. epistemic status: in that sweet spot myself ↩︎

Comment by Quinn (quinn-dougherty) on Related Discussion from Thomas Kwa's MIRI Research Experience · 2023-10-04T18:30:18.071Z · LW · GW

I second this--- I skimmed part of nate's comms doc, but it's unclear to me what turntrout is talking about unless he's talking about "being blunt"--- it sounds that overall there's something other than bluntness going on, cuz I feel like we already know about bluntness / we've thought a lot about upsides and downsides of bluntness people before.

Comment by Quinn (quinn-dougherty) on OpenAI-Microsoft partnership · 2023-10-03T22:27:17.674Z · LW · GW

Microsoft has no control, but gets OpenAI's IP until AGI

This could be cashed out a few ways--- does it mean "we can't make decisions about how to utilize core tech in downstream product offerings or verticals, but we take a hefty cut of any such things that openai then ships"? If so, there may be multiple ways of interpreting it (like microsoft could accuse openai of very costly negligence for neglecting a particular vertical or product idea or application, or they couldn't).

Comment by Quinn (quinn-dougherty) on Linkpost: They Studied Dishonesty. Was Their Work a Lie? · 2023-10-03T17:30:27.397Z · LW · GW

I heard a pretty haunting take about how long it took to discover steroids in bike races. Apparently, there was a while where a "few bad apples" narrative remained popular even when an ostensibly "one of the good ones" guy was outperforming guys discovered to be using steroids.

I'm not sure how dire or cynical we should be about academic knowledge or incentives. I think it's more or less defensible to assume that no one with a successful career is doing anything real until proven otherwise, but it's still a very extreme view that I'd personally bet against. Of course also things vary so much field by field.

Comment by Quinn (quinn-dougherty) on Petrov Day Retrospective, 2023 (re: the most important virtue of Petrov Day & unilaterally promoting it) · 2023-09-28T19:57:29.059Z · LW · GW

Indeed - but maybe I was overly paranoid and I should've just clicked the buttons on the day.

Comment by Quinn (quinn-dougherty) on Quinn's Shortform · 2023-09-27T17:07:57.149Z · LW · GW

For the record, to mods: I waited till after petrov day to answer the poll because my first guess upon receiving a message on petrov day asking me to click something is that I'm being socially engineered. Clicking the next day felt pretty safe.

Comment by Quinn (quinn-dougherty) on Protest against Meta's irreversible proliferation (Sept 29, San Francisco) · 2023-09-21T12:10:41.173Z · LW · GW

This seems kinda fair, I'd like to clarify--- I largely trust the first few dozen people, I just expect depending on how growth/acquisition is done if there are more than a couple instances of protests to have to deal with all the values diversity underlying the different reasons for joining in. This subject seems unusually fraught in potential to generate conflationary alliance https://www.lesswrong.com/s/6YHHWqmQ7x6vf4s5C sorta things.

Overall I didn't mean to other you-- in fact, never said this publicly, but a couple months ago there was a related post of yours that got me saying "yeah we're lucky holly is on this / she seems better suited than most would be to navigate this" cuz I've been consuming your essays for years. I also did not mean to insinuate that you hadn't thought it through-- I meant to signal "here's a random guy who cares about this consideration" just as an outside vote of "hope this doesn't get triage'd out". I basically assumed you had threatmodeled interactions with different strains of populism

Comment by Quinn (quinn-dougherty) on Protest against Meta's irreversible proliferation (Sept 29, San Francisco) · 2023-09-20T22:32:25.614Z · LW · GW

In my sordid past I did plenty of "finding the three people for nuanced logical mind-changing discussions amidst a dozens of 'hey hey ho ho outgroup has got to go'", so I'll do the same here (if I'm in town), but selection effects seem deeply worrying (for example, you could go down to the soup kitchen or punk music venue and recruit all the young volunteers who are constantly sneering about how gentrifying techbros are evil and can't coordinate on whether their "unabomber is actually based" argument is ironic or unironic, but you oughtn't. The fact that this is even a question, that if you have a "mass movement" theory of change you're constantly temped to lower your standards in this way, is so intrinsically risky that no one should be comfortable that ML safety or alignment is resorting to this sort of thing). 

Comment by Quinn (quinn-dougherty) on [Link post] Michael Nielsen's "Notes on Existential Risk from Artificial Superintelligence" · 2023-09-19T20:52:43.809Z · LW · GW

Winning or losing a war kinda binary.

Will a pandemic get to my country is a matter of degree, since in principle you can have a pandemic that killed 90% of counterfactual economic activity in one country break containment but only destroy 10% in your country.

"Alignment" or "transition to TAI" of any kind is way further from "coinflip" than either of these, so if you think doomcoin is salvageable or want to defend its virtues you need way different reference classes.

Think about the ways in which winning or losing a war isn't binary-- lots of ways for implementation details of an agreement to effect your life as a citizen of one of the countries. AI is like this but even further-- all the different kinds of outcomes, how central or unilateral are important moments, which values end up being imposed on the future and at what resolution, etc. People who think "we have a doomcoin toss coming up, now argue about the p(heads)" are not gonna think about this stuff!

To me, "p(doom)" is a memetic PITA as bad as "the real unaligned AI was the corporations/calitalism", so I'm excited that you're defending it! Usually people tell me "yeah you're right it's not a defensible frame haha"

Comment by Quinn (quinn-dougherty) on Dating Roundup #1: This is Why You’re Single · 2023-09-17T00:50:13.984Z · LW · GW

height filter: I don't see anywhere about how many women use the height filter at all vs dont [1]. People being really into 6'5" seems alarming until you realize that if you're trait xyz enough to use height filters at all, you might as well go all in and use silly height filters.


  1. as a man, filters on bumble are a premium feature. Likely for price discrimination to give many premium features to women for free, though. ↩︎

Comment by Quinn (quinn-dougherty) on Defunding My Mistake · 2023-09-06T05:35:17.522Z · LW · GW

I've certainly wondered this! In spite of the ACX commenter I mentioned suggesting that we ought to reward people for being transparent about learning epistemics the hard way, I find myself not 100% sure if it's wise or savvy to trust that people won't just mark me down as like "oh, so quinn is probably prone to being gullible or sloppy" if I talk openly about my what my life was like before math coursework and the sequences.