Posts

Plausibly Factoring Conjectures 2024-11-22T20:11:56.479Z
October 2024 Progress in Guaranteed Safe AI 2024-10-28T23:34:51.689Z
Speedrunning 4 mistakes you make when your alignment strategy is based on formal proof 2023-02-16T01:13:44.847Z
Mid-Atlantic AI Alignment Alliance Unconference 2023-01-13T20:33:33.389Z
Riffing on the agent type 2022-12-08T00:19:38.054Z
Master plan spec: needs audit (logic and cooperative AI) 2022-11-30T06:10:38.775Z
What is estimational programming? Squiggle in context 2022-08-12T18:39:57.230Z
Abundance and scarcity; working forwards and working backwards 2022-02-18T19:05:38.974Z
AISC5 Retrospective: Mechanisms for Avoiding Tragedy of the Commons in Common Pool Resource Problems 2021-09-27T16:46:40.389Z
[linkpost] The Psychological Economy of Inaction by William Gillis 2021-09-12T22:04:43.991Z
What are some claims or opinions about multi-multi delegation you've seen in the memeplex that you think deserve scrutiny? 2021-06-27T17:44:52.389Z
Cliffnotes to Craft of Research parts I, II, and III 2021-06-26T14:00:53.560Z
[timeboxed exercise] write me your model of AI human-existential safety and the alignment problems in 15 minutes 2021-05-04T19:10:28.755Z
What am I fighting for? 2021-04-20T23:27:26.416Z
Could degoogling be a practice run for something more important? 2021-04-17T00:03:42.790Z
[Recruiting for a Discord Server] AI Forecasting & Threat Modeling Workshop 2021-04-10T16:15:07.420Z
Aging: A Surprisingly Tractable Problem 2021-04-08T19:25:42.287Z
Averting suffering with sentience throttlers (proposal) 2021-04-05T10:54:09.755Z
TASP Ep 3 - Optimal Policies Tend to Seek Power 2021-03-11T01:44:02.814Z
Takeaways from the Intelligence Rising RPG 2021-03-05T10:27:55.867Z
Reading recommendations on social technology: looking for the third way between technocracy and populism 2021-02-24T11:48:06.451Z
Quinn's Shortform 2021-01-16T17:52:33.020Z
Is it the case that when humans approximate backward induction they violate the markov property? 2021-01-16T16:22:21.561Z
Infodemics: with Jeremy Blackburn and Aviv Ovadya 2021-01-08T15:44:57.852Z
Chance that "AI safety basically [doesn't need] to be solved, we’ll just solve it by default unless we’re completely completely careless" 2020-12-08T21:08:47.575Z
Announcing the Technical AI Safety Podcast 2020-12-07T18:51:58.257Z
How ought I spend time? 2020-06-30T16:53:53.787Z
Have general decomposers been formalized? 2020-06-27T18:09:06.411Z
Do the best ideas float to the top? 2019-01-21T05:22:51.182Z
on wellunderstoodness 2018-12-16T07:22:19.250Z

Comments

Comment by Quinn (quinn-dougherty) on Quinn's Shortform · 2024-12-16T00:55:42.351Z · LW · GW

talk to friends as a half measure

When it comes to your internal track record, it is often said that finding what you wrote at time t-k beats trying to remember what you thought at t-k. However, the activation energy to keep such a journal is kinda a hurdle (which is why products like https://fatebook.io are so good!).

I find that a nice midpoint between the full and correct internal track record practices (rigorous journaling) and completely winging it (leaving yourself open to mistakes and self delusion) is talking to friends, because I think my memory of conversations that are had out loud with other people is more detailed and honest than my memory of things I've thought / used to think, especially when it's a stressful and treacherous topic.[1]


  1. I may be more socially attuned than average around here(?) so this may not work for people less socially attuned than me ↩︎

Comment by Quinn (quinn-dougherty) on Alexander Gietelink Oldenziel's Shortform · 2024-12-07T05:20:23.615Z · LW · GW

I was at an ARIA meeting with a bunch of category theorists working on safeguarded AI and many of them didn't know what the work had to do with AI.

epistemic status: short version of post because I never got around to doing the proper effort post I wanted to make.

Comment by Quinn (quinn-dougherty) on Quinn's Shortform · 2024-12-03T00:13:48.623Z · LW · GW

A sketch I'm thinking of: asking people to consume information (a question, in this case) is asking them to do you a favor, so you should do your best to ease this burden, however, also don't be paralyzed so budget some leeway to be less than maximally considerate in this way when you really need to.

Comment by Quinn (quinn-dougherty) on Quinn's Shortform · 2024-12-03T00:11:59.202Z · LW · GW

what's the best essay on asking for advice?

Going over etiquette and the social contract, perhaps if it's software specific it talks about minimal reproducers, whatever else the author thinks is involved.

Comment by Quinn (quinn-dougherty) on (The) Lightcone is nothing without its people: LW + Lighthaven's big fundraiser · 2024-12-02T23:14:27.125Z · LW · GW

Rumors are that 2025 lighthaven is jam packed. If this is the case, and you need money, rudimentary economics suggests only the obvious: raise prices. I know many clients are mission aligned, and there's a reasonable ideological reason to run the joint at or below cost, but I think it's aligned with that spirit if profits from the campus fund the website.

I also want to say in print what I said in person a year ago: you can ask me to do chores on campus to save money, it'd be within my hufflepuff budget. There are good reasons to not go totally "by and for the community" DIY like many say community libraries or soup kitchens, but nudging a little in that direction seems right.

EDIT: I did a mostly symbolic $200 right now, may or may not do more as I do some more calculations and find out my salary at my new job

Comment by Quinn (quinn-dougherty) on What are the good rationality films? · 2024-11-20T16:52:54.223Z · LW · GW

ThingOfThings said that Story of Louis Pasteur is a very EA movie, but I think it also counts for rationality. Huge fan.

Comment by Quinn (quinn-dougherty) on Quinn's Shortform · 2024-11-18T17:36:30.834Z · LW · GW

Guaranteed Safe AI paper club meets again this thursday

Event for the paper club: https://calendar.app.google/2a11YNXUFwzHbT3TA

blurb about the paper in last month's newsletter:

... If you’re wondering why you just read all that, here’s the juice: often in GSAI position papers there’ll be some reference to expectations that capture “harm” or “safety”. Preexpectations and postexpectations with respect to particular pairs of programs could be a great way to cash this out, cuz we could look at programs as interventions and simulate RCTs (labeling one program control and one treatment) in our world modeling stack. When it comes to harm and safety, Prop and bool are definitely not rich enough.

Comment by Quinn (quinn-dougherty) on Alexander Gietelink Oldenziel's Shortform · 2024-11-17T17:43:35.490Z · LW · GW

my dude, top level post- this does not read like a shortform

Comment by Quinn (quinn-dougherty) on Quinn's Shortform · 2024-11-13T18:35:28.649Z · LW · GW

Yoshua Bengio is giving a talk online tomorrow https://lu.ma/4ylbvs75

Comment by Quinn (quinn-dougherty) on Science advances one funeral at a time · 2024-11-04T19:34:25.319Z · LW · GW

by virtue of their technical chops, also care about their career capital.

I didn't understand this-- "their technical chops impose opportunity cost as they're able to build very safe successful careers if they toe the line" would make sense, or they care about career capital independent of their technical chops would make sense. But here, the relation between technical chops and caring about career capital doesn't come through clear.

Comment by Quinn (quinn-dougherty) on Quinn's Shortform · 2024-10-16T20:03:46.998Z · LW · GW

did anyone draw up an estimate of how much the proportion of code written by LLMs will increase? or even what the proportion is today

Comment by Quinn (quinn-dougherty) on Yoav Ravid's Shortform · 2024-09-26T01:42:19.982Z · LW · GW

I was thinking the same thing this morning! My main thought was, "this is a trap. ain't no way I'm pressing a big red button especially not so near to petrov day"

Comment by Quinn (quinn-dougherty) on Quinn's Shortform · 2024-09-19T00:15:21.153Z · LW · GW

GSAI paper club is tomorrow (gcal ticket), summary (by me) and discussion of this paper

Comment by Quinn (quinn-dougherty) on First Lighthaven Sequences Reading Group · 2024-09-05T22:18:57.378Z · LW · GW

Alas, belief is easier than disbelief; we believe instinctively, but disbelief requires a conscious effort.

Yes, but this is one thing that I have felt being mutated as I read the sequences and continued to hang out with you lot (roughly 8 years ago, with some off and on)

Comment by Quinn (quinn-dougherty) on Quinn's Shortform · 2024-08-27T05:47:43.667Z · LW · GW

By all means. Happy for that

Comment by Quinn (quinn-dougherty) on Provably Safe AI: Worldview and Projects · 2024-08-27T03:28:09.616Z · LW · GW

discussion of the bet in Aug 2024 Progress in GSAI newsletter

Comment by Quinn (quinn-dougherty) on Limitations on Formal Verification for AI Safety · 2024-08-27T03:20:47.549Z · LW · GW

Note in August 2024 GSAI newsletter

See Limitations on Formal Verification for AI Safety over on LessWrong. I have a lot of agreements, and my disagreements are more a matter of what deserves emphasis than the fundamentals. Overall, I think the Tegmark/Omohundro paper failed to convey a swisscheesey worldview, and sounded too much like “why not just capture alignment properties in ‘specs’ and prove the software ‘correct’?” (i.e. the vibe I was responding to in my very pithy post). However, I think my main reason I’m not using Dickson’s post as a reason to just pivot all my worldview and resulting research is captured in one of Steve’s comments:

I'm focused on making sure our infrastructure is safe against AI attacks.

Like, a very strong version I almost endorse is “GSAI isn’t about AI at all, it’s about systems coded by extremely powerful developers (which happen to be AIs)”, and ensuring safety, security, and reliability capabilities scale at similar speeds with other kinds of capabilities.

It looks like one can satisfy Dickson just by assuring him that GSAI is a part of a swiss cheese stack, and that no one is messianically promoting One Weird Trick To Solve Alignment. Of course, I do hope that no one is messianically promoting One Weird Trick…

Comment by Quinn (quinn-dougherty) on Quinn's Shortform · 2024-08-27T02:49:44.219Z · LW · GW

august 2024 guaranteed safe ai newsletter

in case i forgot last month, here's a link to july


A wager you say

One proof of concept for the GSAI stack would be a well-understood mechanical engineering domain automated to the next level and certified to boot. How about locks? Needs a model of basic physics, terms in some logic for all the parts and how they compose, and some test harnesses that simulate an adversary. Can you design and manufacture a provably unpickable lock? 

Zac Hatfield-Dodds (of hypothesis/pytest and Anthropic, was offered and declined authorship on the GSAI position paper) challenged Ben Goldhaber to a bet after Ben coauthored a post with Steve Omohundro. It seems to resolve in 2026 or 2027, the comment thread should get cleared up once Ben gets back from Burning Man. The arbiter is Raemon from LessWrong.

Zac says you can’t get a provably unpickable lock on this timeline. Zac gave (up to) 10:1 odds, so recall that the bet can be a positive expected value for Ben even if he thinks the event is most likely not going to happen.

For funsies, let’s map out one path of what has to happen for Zac to pay Ben $10k. This is not the canonical path, but it is a path:

  1. Physics to the relevant granularity (question: can human lockpicks leverage sub-newtownian issues?) is conceptually placed into type theory or some calculus. I tried a riemann integral in coq once (way once), so it occurs to me that you need to decide if you want just the functional models (perhaps without computation / with proof irrelevance) in your proof stack or if you want the actual numerical analysis support in there as well.
  2. Good tooling, library support, etc. around that conceptual work (call it mechlib) to provide mechanical engineering primitives
  3. A lock designing toolkit, depending on mechlib, is developed
  4. Someone (e.g. a large language model) is really good at programming in the lock designing toolkit. They come up with a spec L. 
  5. You state the problem “forall t : trajectories through our physics simulation, if L(t) == open(L) then t == key(L)”
  6. Then you get to write a nasty gazillion line Lean proof
  7. Manufacture a lock (did I mention that the design toolkit has links to actual manufacturing stacks?)
  8. Bring a bunch to DefCon 2027 and send another to the lockpicking lawyer
  9. Everyone fails. Except Ben and the army of postdocs that $9,999 can buy.

Looks like after the magnificent research engineering in steps 1 and 2, the rest is just showing off and justifying those two steps. Of course, in a world where we have steps 1 and 2 we have a great deal of transformative applications of formal modeling and verification just in reach, and we’ll need a PoC like locks to practice and concretize the workflow. 

Cryptography applications tend to have a curse of requiring a lot of work after the security context, permission set, and other requirements are frozen in stone, which means that when the requirements change you have to start over and throw out a bunch of work (epistemic status: why do you think so many defi projects have more whitepapers than users?). The provably unpickable lock has 2 to 10 x that problem– get the granularity wrong in step one, most of your mechlib implementation won’t be salvageable. As the language model iterates on the spec L in step 5, the other language model has to iterate on the proof in step 6, because the new spec will break most of the proof.

Sorry I don’t know any mechanical engineering, Ben, otherwise I’d take some cracks at it. The idea of a logic such that its denotation is a bunch of mechanical engineering primitives seems interesting enough that my “if it was easy to do in less than a year someone would’ve, therefore there must be a moat” heuristic is tingling. Perhaps oddly, the quantum semantics folks (or with HoTT!) seem to have been productive, but I don’t know how much of that is translatable to mechanical engineering.

 

Reinforcement learning from proof assistant feedback, and yet more monte carlo tree search

DeepSeek’s paper 

The steps are pretraining, supervised finetuning, RLPAF (reinforcement learning from proof assistant feedback), and MCTS (monte carlo tree search). RLPAF is not very rich: it’s a zero reward for any bug at all and a one for a happy typechecker. Glad they got that far with just that. 

You can use the model at deepseek.com.

 

Harmonic ships their migration of miniF2F to Lean 4, gets 90% on it, is hiring

From their “one month in” newsletter. “Aristotle”, which has a mysterious methodology since I’ve only seen their marketing copy rather than an arxiv paper, gets 90% on miniF2F 4 when prompted with natural language proofs. It doesn’t look to me like the deepseek or LEGO papers do that? I could be wrong. It’s impressive just to autoformalize natural language proofs, I guess I’m still wrapping my head around how much harder it is (for an LLM) to implement coming up with the proof as well. 

Jobs: research engineer and software engineer

 

Atlas ships their big google doc alluded to in the last newsletter

Worth a read! The GSAI stack is large and varied, and this maps out the different sub-sub-disciplines. From the executive summary:

 

You could start whole organizations for every row in this table, and I wouldn’t be a part of any org that targets more than a few at once for fear of being unfocused. See the doc for more navigation (see what I did there? Navigating like with an atlas, perhaps? Get it?) of the field’s opportunities.[1]

 

Efficient shield synthesis via state-space transformation

Shielding is an area of reactive systems and reinforcement learning that marks states as unsafe and synthesizes a kind of guarding layer between the agent and the environment that prevents unsafe actions from being executed in the environment. So in the rejection sampling flavored version, it literally intercepts the unsafe action and tells the agent “we’re not running that, try another action”. One of the limitations in this literature is computational cost, shields are, like environments, state machines plus some frills, and there may simply be too many states. This is the limitation that this paper focuses on. 

We consider the problem of synthesizing safety strategies for control systems, also known as shields. Since the state space is infinite, shields are typically computed over a finite-state abstraction, with the most common abstraction being a rectangular grid. However, for many systems, such a grid does not align well with the safety property or the system dynamics. That is why a coarse grid is rarely sufficient, but a fine grid is typically computationally infeasible to obtain. In this paper, we show that appropriate state-space transformations can still allow to use a coarse grid at almost no computational overhead. We demonstrate in three case studies that our transformation-based synthesis outperforms a standard synthesis by several orders of magnitude. In the first two case studies, we use domain knowledge to select a suitable transformation. In the third case study, we instead report on results in engineering a transformation without domain knowledge.

Besides cost, demanding a lot of domain knowledge is another limitation of shields, so this is an especially welcome development.

 

Funding opportunities

ARIA jumped right to technical area three (TA3), prototyping the gatekeeper. Deadline October 2nd. Seems geared toward cyber-physical systems folks. In the document:

Note that verified software systems is an area which is highly suitable for a simplified gatekeeper workflow, in which the world-model is implicit in the specification logic. However, in the context of ARIA’s mission to “change the perception of what’s possible or valuable,” we consider that this application pathway is already perceived to be possible and valuable by the AI community. As such, this programme focuses on building capabilities to construct guaranteed-safe AI systems in cyber-physical domains. That being said, if you are an organisation which specialises in verified software, we would love to hear from you outside of this solicitation about the cyber-physical challenges that are just at the edge of the possible for your current techniques.

This is really cool stuff, I hope they find brave and adventurous teams. I had thought gatekeeper prototypes would be in minecraft or mujoco (and asked a funder if they’d support me in doing that), so it’s wild to see them going for actual cyberphysical systems so quickly. 

 

Paper club

Add to your calendar. On September 19th we will read a paper about assume-guarantee contracts with learned components. I’m liable to have made a summary slide deck to kick us off, but if I don’t, we’ll quietly read together for the first 20-30 minutes then discuss. The google meet room in the gcal event by default.  

 

Andrew Dickson’s excellent post

See Limitations on Formal Verification for AI Safety over on LessWrong. I have a lot of agreements, and my disagreements are more a matter of what deserves emphasis than the fundamentals. Overall, I think the Tegmark/Omohundro paper failed to convey a swisscheesey worldview, and sounded too much like “why not just capture alignment properties in ‘specs’ and prove the software ‘correct’?” (i.e. the vibe I was responding to in my very pithy post). However, I think my main reason I’m not using Dickson’s post as a reason to just pivot all my worldview and resulting research is captured in one of Steve’s comments:

I'm focused on making sure our infrastructure is safe against AI attacks.

Like, a very strong version I almost endorse is “GSAI isn’t about AI at all, it’s about systems coded by extremely powerful developers (which happen to be AIs)”, and ensuring safety, security, and reliability capabilities scale at similar speeds with other kinds of capabilities.

It looks like one can satisfy Dickson just by assuring him that GSAI is a part of a swiss cheese stack, and that no one is messianically promoting One Weird Trick To Solve Alignment. Of course, I do hope that no one is messianically promoting One Weird Trick…

  1. ^

    One problem off the top of my head regarding the InterFramework section: Coq and Lean seems the most conceptually straightforward since they have the same underlying calculus, but even there just a little impredicativity or coinduction could lead to extreme headaches. Now you can have a model at some point in the future that steamrolls over these headaches, but then you have a social problem of the broader Lean community not wanting to upstream those changes– various forks diverging fundamentally seems problematic to me, would lead to a lot of duplicated work and missed opportunities for collaboration. I plan to prompt Opus 3.5 with “replicate flocq in lean4” as soon as I get access to the model, but how much more prompting effort will it be to ensure compliance with preexisting abstractions and design patterns, so that it can not only serve my purposes but be accepted by the community? At least there’s no coinduction in flocq, though some of the proofs may rely on set impredicativity for all I know (I haven’t looked at it in a while).

Comment by Quinn (quinn-dougherty) on Provably Safe AI: Worldview and Projects · 2024-08-26T17:53:25.244Z · LW · GW

I do think it's important to reach appropriately high safety assurances before developing or deploying future AI systems which would be capable of causing a catastrophe. However, I believe that the path there is to extend and complement current techniques, including empirical and experimental approaches alongside formal verification - whatever actually works in practice.

for what it's worth, I do see GSAI as largely a swiss cheesey worldview. Though I can see how you might read some of the authors involved to be implying otherwise! I should knock out a post on this.

Comment by Quinn (quinn-dougherty) on Please do not use AI to write for you · 2024-08-22T17:46:40.206Z · LW · GW

not just get it to sycophantically agree with you

i struggle with this, and need to attend a prompting bootcamp

Comment by Quinn (quinn-dougherty) on Quinn's Shortform · 2024-07-31T18:22:13.918Z · LW · GW

i'm getting back into composing and arranging. send me rat poems to set to music!

Comment by Quinn (quinn-dougherty) on Eric Neyman's Shortform · 2024-04-26T14:42:51.433Z · LW · GW

sure -- i agree that's why i said "something adjacent to" because it had enough overlap in properties. I think my comment completely stands with a different word choice, I'm just not sure what word choice would do a better job.

Comment by Quinn (quinn-dougherty) on Eric Neyman's Shortform · 2024-04-25T19:53:51.209Z · LW · GW

I eventually decided that human chauvinism approximately works most of the time because good successor criteria are very brittle. I'd prefer to avoid lock-in to my or anyone's values at t=2024, but such a lock-in might be "good enough" if I'm threatened with what I think are the counterfactual alternatives. If I did not think good successor criteria were very brittle, I'd accept something adjacent to E/Acc that focuses on designing minds which prosper more effectively than human minds. (the current comment will not address defining prosperity at different timesteps).

In other words, I can't beat the old fragility of value stuff (but I haven't tried in a while).

I wrote down my full thoughts on good successor criteria in 2021 https://www.lesswrong.com/posts/c4B45PGxCgY7CEMXr/what-am-i-fighting-for

AI welfare: matters, but when I started reading lesswrong I literally thought that disenfranching them from the definition of prosperity was equivalent to subjecting them to suffering, and I don't think this anymore.

Comment by Quinn (quinn-dougherty) on Quinn's Shortform · 2024-04-20T01:01:07.934Z · LW · GW

Thinking about a top-level post on FOMO and research taste

  • Fear of missing out defined as inability to execute on a project cuz there's a cooler project if you pivot
  • but it also gestures at more of a strict negative, where you think your project sucks before you finish it, so you never execute
  • was discussing this with a friend: "yeah I mean lesswrong is pretty egregious cuz it sorta promotes this idea of research taste as the ability to tear things down, which can be done armchair"
  • I've developed strategies to beat this FOMO and gain more depth and detail with projects (too recent to see returns yet, but getting there) but I also suspect it was nutritious of me to develop discernment about what projects are valuable or not valuable for various threat models and theories of change (in such a way that being a phd student off of lesswrong wouldn't have been as good in crucial ways, tho way better in other ways).
    • but I think the point is you have to turn off this discernment sometimes, unless you want to specialize in telling people why their plans won't work, which I'm more dubious on the value of than I used to be

Idk maybe this shortform is most of the value of the top level post

Comment by Quinn (quinn-dougherty) on Quinn's Shortform · 2024-04-17T23:09:16.358Z · LW · GW

He had become so caught up in building sentences that he had almost forgotten the barbaric days when thinking was like a splash of color landing on a page.

Edward St Aubyn

Comment by Quinn (quinn-dougherty) on Quinn's Shortform · 2024-04-17T22:27:04.037Z · LW · GW

a -valued quantifier is any function , so when is bool quantifiers are the functions that take predicates as input and return bool as output (same for prop). the standard max and min functions on arrays count as real-valued quantifiers for some index set .

I thought I had seen as the max of the Prop-valued quantifiers, and exists as the min somewhere, which has a nice mindfeel since forall has this "big" feeling (if you determined for that (of which is just syntax sugar since the variable name is irrelevant) by exhaustive checking, it would cost whereas would cost unless the derivation of the witness was dependent on size of domain somehow).

Incidentally however, in differentiable logic it seems forall is the "minimal expectation" and existential is the "maximal expectation". Page 10 of the LDL paper, where a is the limit as gamma goes to zero of , or the integral with respect to a -ball about the min of rather than about the entire domain of . os in this sense, the interpretation of a universally quantified prop is a minimal expectation, dual where existentially quantified prop is a maximal expectation.

I didn't like the way this felt aesthetically, since as I said, forall feels "big" which mood-affiliates toward a max. But that's notes from barely-remembered category theory I saw once. Anyway, I asked a language model and it said that forall is minimal because it imposes the strictest of "most conservative" requirement. so "max" in the sense of "exists is interpreted to maximal expectation" refers to maximal freedom.

I suppose this is fine.

Comment by Quinn (quinn-dougherty) on Quinn's Shortform · 2024-04-16T18:38:40.133Z · LW · GW

I'm excited for language model interpretability to teach us about the difference between compilers and simulations of compilers. In the sense that chatgpt and I can both predict what a compiler of a suitably popular programming language will do on some input, what's going on there---- surely we're not reimplementing the compiler on our substrate, even in the limit of perfect prediction? Will be an opportunity for a programming language theorist in another year or two of interp progress

Comment by Quinn (quinn-dougherty) on Announcing Atlas Computing · 2024-04-11T17:17:57.690Z · LW · GW

firefox is giving me a currently broken redirect on https://atlascomputing.org/safeguarded-ai-summary.pdf

Comment by Quinn (quinn-dougherty) on Quinn's Shortform · 2024-04-06T18:56:50.646Z · LW · GW

Does anyone use vim / mouse-minimal browser? I like Tridactyl better than the other one I tried, but it's not great when there's a vim mode in a browser window everything starts to step on eachother (like in jupyter, colab, leetcode, codesignal)

Comment by Quinn (quinn-dougherty) on Quinn's Shortform · 2024-04-06T18:46:37.301Z · LW · GW

claude and chatgpt are pretty good at ingesting textbooks and papers and making org-drill cards.

here's my system prompt https://chat.openai.com/g/g-rgeaNP1lO-org-drill-card-creator though i usually tune it a little further per session.

Here are takes on the idea from the anki ecosystem

I tried a little ankigpt and it was fine, i haven't tried the direct plugin from ankiweb. I'm opting for org-drill here cuz I really like plaintext.

Comment by Quinn (quinn-dougherty) on Best in Class Life Improvement · 2024-04-04T23:46:30.473Z · LW · GW

one exercise remark: swimming! might have to pick up a little technique at first, but the way it combines a ton of muscle groups and cardio is unparalleled

(if anyone reading this is in the bay I will 100% teach you the 0->1 on freestyle and breaststroke technique, just DM)

Comment by Quinn (quinn-dougherty) on The Comcast Problem · 2024-03-21T18:41:26.367Z · LW · GW

is lower than it "should" be.

I think the lesson of social desirability bias is that valuable services having lower status than they "ought" to is the system working as intended.

(Though I'm sympathetic with your view: no reason to be extreme about social desirability bias)

Comment by Quinn (quinn-dougherty) on Quinn's Shortform · 2024-03-19T18:43:02.034Z · LW · GW

Proof cert memo

In the Safeguarded AI programme thesis[1], proof certificates or certifying algorithms are relied upon in the theory of change. Let's discuss!

From the thesis:

Proof certificates are a quite broad concept, introduced by[33] : a certifying algorithm is defined as one that produces enough metadata about its answer that the answer's correctness can be checked by an algorithm which is so simple that it is easy to understand and to formally verify by hand.

The abstract of citation 33 (McConnell et al)[2]

A certifying algorithm is an algorithm that produces, with each output, a certificate or witness (easy-to-verify proof) that the particular output has not been compromised by a bug. A user of a certifying algorithm inputs x, receives the output y and the certificate w, and then checks, either manually or by use of a program, that w proves that y is a correct output for input x. In this way, he/she can be sure of the correctness of the output without having to trust the algorithm.

In this memo, I do an overview by distinguishing a proof cert from a constructive proof, and dive shallowly into the paper. Then I ruminate on how this might apply in AI contexts.

What's the difference between a proof cert and a constructive proof?

If a cert is just a witness, you might think that proof certs are just proofs from constructive math as distinct from classical math. For example, under some assumptions and caveats (in the calculus/topology setting) a function can be "known" to have a fixed point without us methodically discovering them with any guarantees, while under other assumptions and caveats (namely the lattice theoretic setting) we know there's a fixed point because we have a guaranteed procedure for computing it. However, they go further: a certifying algorithm produces with each output a cert/witness that the particular output has, in the case of McConnell et al, "not been compromised by a bug". This isn't a conceptual leap from constructive math, in which a proof is literally a witness-building algorithm, it looks to me a bit like a memo saying "btw, don't throw out the witness" along with notes about writing cheap verifiers that do not simply reimplement the input program logic.

One way of looking at a certificate is that it's dependent on metadata. This may simply be something read off of an execution trace, or some state that logs important parts of the execution trace to construct a witness like in the bipartite test example. In the bipartite test example, all you need is for the algorithm that searches for a two-coloring or an odd cycle, and crucially will return either the two-coloring or the odd cycle as a decoration on it's boolean output (where "true" means "is bipartite"). Then, a cheap verifier (or checker) is none other than the pre-existing knowledge that two-colorable and bipartite are equivalent, or conversely that the existence of an odd cycle is equivalent to disproving bipartiteness.

Chapter 11 of [3] will in fact discuss certification and verification supporting eachother, which seems to relax the constraint that a cheap verifier doesn't simply reimplement the certifier logic.

The main difference between a standard proof as in an applied type theory (like lean or coq) and a proof cert is kinda cultural and not fundamental, in that proof certs prefer to emphasize single IO pairs and lean/coq proofs often like to emphasize entire input types. Just don't get confused on page 34 when it says "testing on all inputs"-- it means that a good certifying algorithm is a means to generate exhaustive unit tests, not that the witness predicate actually runs on all inputs at once. It seems like the picking out of only input of interest and not quantifying over the whole input type will be the critical consideration when we discuss making this strategy viable for learned components or neural networks.

Formally:

Skip this if you're not super compelled by what you know so far about proof certificates, and jump down to the section on learned programs.

Consider algorithms . A precondition and a postcondition form an IO-spec or IO-behavior. An extension of the output set is needed to account for when the precondition is violated. We have a type of witness descriptions. Finally, we describe the witness predicate with that says IO pair is witnessed by . So when is true, must be a witness to the truth, else a witness to the falsehood. McConnell et al distinguish witness predicates from strong witness predicates, where the former can prove but the latter knows which disjunct.

A checker for is an algorithm sending . It's desiderata / nice to haves are correctness, runtime linear in input, and simplicity.

I'm not that confident this isn't covered in the document somewhere, but McConnell et al don't seem paranoid about false functions. If Alice (sketchy) claims that a came from on , but she lied to Bob and in fact ran , there's no requirement for witness predicates to have any means of finding out. (I have a loose intuition that the proper adversarial adaptation of proof certs would be possible on a restricted algorithm set, but impose a lot of costs). Notice that is with respect to a fixed , would be a different approach! We should highlight that a witness predicate only tells you about one input.

Potential for certificates about learning, learned artefacts, and learned artefacts' artefacts

The programme thesis continues:

We are very interested in certificates, because we would like to rely on black-box advanced AI systems to do the hard work of searching for proofs of our desired properties, yet without compromising confidence that the proofs are correct. In this programme, we are specifically interested in certificates of behavioural properties of cyber-physical systems (ranging from simple deterministic functions, to complex stochastic hybrid systems incorporating nuances like nondeterminism and partiality).

If AIs are developing critical software (like airplanes, cars, and weaponry) and assuring us that it's legit, introducing proof certs removes trust from the system. The sense in which I think Safeguarded AI primarily wants to use proof certs certainly doesn't emphasize certs of SGD or architectures, nor does it even emphasize inference-time certs (though I think it'd separately be excited about progress there). Instead, an artefact that is second order removed from the training loop seems to be the target of proof certs, like a software artefact written by a model but it's runtime is not the inference time of the model. I see proof certs in this context as primarily an auditing/interpretability tool. There's no reason a model should write code that is human readable, constraining it to write human readable code seems hard, letting the code be uninterpretable but forcing it to generate interpretable artefacts seems pretty good! The part I'm mostly not clear on is why proof certs would be better than anything else in applied type theory (where proofs are just lambdas, which are very decomposable- you can't read them all at once but they're easy to pick apart) or model checking (specs are logical formulae, even if reading the proof that an implementation obeys a spec is a little hairy).

It's important that we're not required to quantify over a whole type to get some really valuable assurances, and this proof cert framework is amenable to that. On the other hand, you can accomplish the same relaxation in lean or coq, and we can really only debate over ergonomics. I think ultimately, the merits of this approach vs other approaches are going to be decided entirely on the relative ergonomics of tools that barely exist yet.

Overall impression

McConnell et al is 90 pages, and I only spent a few hours with it. But I don't find it super impressive or compelling. Conceptually, a constructive proof gets me everything I want out of proof certificates. A tremendous value proposition would be to make these certs easier to obtain for real world programs than what coq or lean could provide, but I haven't seen those codebases yet. There's more literature I may look at around proof-carrying code (PCC)[4] which is similar ideas, but I'm skeptical that I'll be terribly compelled since the insight "just decorate the code with the proof" isn't very subtle or difficult.

Moreover, this literature just seems kinda old I don't see obvious paths from it to current research.


  1. https://www.aria.org.uk/wp-content/uploads/2024/01/ARIA-Safeguarded-AI-Programme-Thesis-V1.pdf ↩︎

  2. http://alg.cs.uni-kl.de/en/team/schweitzer/publikationen/docs/CertifyingAlgorithms.pdf ↩︎

  3. http://alg.cs.uni-kl.de/en/team/schweitzer/publikationen/docs/CertifyingAlgorithms.pdf ↩︎

  4. https://www.cs.princeton.edu/~appel/papers/pccmodel.pdf ↩︎

Comment by Quinn (quinn-dougherty) on Quinn's Shortform · 2024-03-16T03:32:29.254Z · LW · GW

consider how our nonconstructive existence proof of nash equilibria creates an algorithmic search problem, which we then study with computational complexity. For example, 2-player 0-sum games are P but for three or more players general sum games are NP-hard. I wonder if every nonconstructive existence proof is like this? In the sense of inducing a computational complexity exercise to find what class it's in, before coming up with greedy heuristics to accomplish an approximate example in practice.

Comment by Quinn (quinn-dougherty) on Quinn's Shortform · 2024-02-17T23:06:03.820Z · LW · GW

I like thinking about "what it feels like to write computer programs if you're a transformer".

Does anyone have a sense of how to benchmark or falsify Nostalgebraist's post on the subject?

Comment by Quinn (quinn-dougherty) on AI Views Snapshots · 2024-02-12T02:14:24.452Z · LW · GW

me:

Comment by Quinn (quinn-dougherty) on Quinn's Shortform · 2024-01-10T13:14:18.457Z · LW · GW

any interest in a REMIX study group? in meatspace in berkeley, a few hours a week. https://github.com/redwoodresearch/remix_public/

Comment by Quinn (quinn-dougherty) on AI Timelines · 2023-12-10T19:49:36.207Z · LW · GW

How were the distributions near the top elicited from each participant?

Comment by Quinn (quinn-dougherty) on Quinn's Shortform · 2023-12-05T21:38:21.584Z · LW · GW

Any tips for getting out of a "rise to the occasion mindset" and into a "sink to your training" mindset?

I'm usually optimizing for getting the most out of my A-game bursts. I want to start optimizing for my baseline habits, instead. I should cover the B-, C-, and Z-game; the A-game will cover itself.

Mathaphorically, "rising to the occasion" is taking a max of a max, whereas "sinking to the level of your habits" looks like a greatest lower bound.

Comment by Quinn (quinn-dougherty) on Quinn's Shortform · 2023-12-05T05:09:47.986Z · LW · GW

I used to think "community builder" was a personality trait I couldn't switch off, but once I moved to the bay I realized that I was just desperate for serendipity and knew how to take it from 0% to 1%. Since the bay is constantly humming at 70-90% serendipity, I simply lost the urge to contribute.

Benefactors are so over / beneficiaries are so back / etc.

Comment by Quinn (quinn-dougherty) on Quinn's Shortform · 2023-12-05T05:03:49.618Z · LW · GW

Cope isn't a very useful concept.

For every person who has a bad reason that they catch because you say "sounds like cope", there are 10x as many people who find their reason actually compelling. Saying "if that was my reason it would be a sign I was in denial of how hard I was coping" or "I don't think that reason is compelling" isn't really relevant to the person you're ostensibly talking to, who's trying to make the best decisions for the best reasons. Just say you don't understand why the reason is compelling.

Comment by Quinn (quinn-dougherty) on What is estimational programming? Squiggle in context · 2023-11-26T20:19:06.913Z · LW · GW

the author re-reading one year+ out:

  • My views on value of infographics that actually look nice have changed, perhaps I should have had nicer looking figures.
  • "Unlike spreadsheets, users can describe in notebooks distributions, capturing uncertainty in their beliefs." seems overconfident. Monte carlo has been an excel feature for years. My dismissal of this (implicit) makes sense as a thing to say because "which usecases are easy?" is a more important question than "which usecases can you get away with if you squint?", but I could've done a way better job at actually reading and critiquing excel programs that feature monte carlo.
  • Viv told me to fix this and I ignored them because I liked the aesthetics of the sentence at the time: "which defines compositional systems as fluid de- and re-composition under the abolition of emergent properties" -- viv was right. I've changed my views on the importance of writing being boring / non-idiosyncratic, also even my prose aesthetic preferences change over time (I no longer enjoy this sentence).
  • "(this use case isn’t totally unlike what Ergo accomplishes)" I keep thinking about ergo, perhaps a paragraph in the "whiggish history" section would've been appropriate, since API access to some zoo of scoring rules / crowd wisdom is a pretty obvious feature that many platforms would foreseeably prioritize.
  • I underrated how strong the claims about compositionality (being precisely the sum of parts), since statistical noise is such a fundamental piece of the puzzle.

I haven't been working on this stuff except a little on the side for most of the last year, but still get excited here and there. I returned to this post because I might have another post about module systems in software design, package management, and estimational programming written up in the not too distant future.

Overall, this remains a super underrated area.

Comment by Quinn (quinn-dougherty) on Quinn's Shortform · 2023-10-27T22:01:04.176Z · LW · GW

Quick version of conversations I keep having, might be worth a top level effortpost.

A prediction market platform giving granular permission systems would open up many use cases for many people

whistleblower protections at large firms, dating, project management and internal company politics--- all userbases with underserved opinions about transparency. Manifold could pivot to this but have a lot of other stuff they could do instead.

Think about slack admins are confused about how to prevent some usergroups from @channel and discord admins aren't.

Comment by Quinn (quinn-dougherty) on Linkpost: A Post Mortem on the Gino Case · 2023-10-26T19:41:54.836Z · LW · GW

(sorry for pontificating when you asked for an actual envelope or napkin) upside is an externality, Ziani incidentally benefits but the signal to other young grad students that maybe career suicide is a slightly more viable risk seems like the source of impact. Agree that this subfield isn't super important, but we should look for related opportunities in subfields we care more about.

I don't know if designing a whistleblower prize is a good nerdsnipe / econ puzzle, in that it may be a really bad goosechase (since generating false positives through incentives imposes name-clearing costs on innocent people, and either you can design your way out of this problem or you can't).

Comment by Quinn (quinn-dougherty) on Linkpost: A Post Mortem on the Gino Case · 2023-10-26T04:23:19.068Z · LW · GW

Is this a retroactive grant situation?

Comment by Quinn (quinn-dougherty) on Open Thread – Autumn 2023 · 2023-10-25T15:02:37.260Z · LW · GW

I think the lesswrong/forummagnum takes on recsys are carrying the torch of RSS "you own your information diet" and so on -- I'm wondering if we can have something like "use lightcone/CEA software to ingest substack comments, translates activity or likes into karma, and arranges/prioritizes them according to the user's moderation philosophy".

This does not cash out to more CCing/ingesting of substack RSS to lesswrong overall, the set of substack posts I would want to view in this way would be private from others, and I'm not necessarily interested in confabulating the cross-platform "karma" translations with more votes or trying to make it go both ways.

Comment by Quinn (quinn-dougherty) on Quinn's Shortform · 2023-10-23T23:33:38.853Z · LW · GW

In terms of the parts where the books overlap, I didn't notice anything substantial. If anything the sequel is less, cuz there wasn't enough detail to get into tricks like the equivalent bet test.

Comment by Quinn (quinn-dougherty) on Quinn's Shortform · 2023-10-23T16:45:46.656Z · LW · GW

I'm halfway through how to measure anything: cybersecurity, which doesn't have a lot of specifics to cybersecurity and mostly reviews the first book. I never finished the first one, and it was about four years ago that I read the parts that I did.

I think for top of the funnel EA recruiting it remains the best and most underrated book. Basically anyone worried about any kind of problem will do better if they read it, and most people in memetically adaptive / commonsensical activist or philanthropic mindsets probably aren't measuring enough.

However, the material is incredibly basic for someone who's been hanging out with EAs or on LessWrong for even a little bit. You've already absorbed so much of it from the water supply.

Comment by Quinn (quinn-dougherty) on Quinn's Shortform · 2023-10-18T19:50:58.677Z · LW · GW

one may be net better than the other, I just think the expected error washes out all of one's reasoning so individuals shouldn't be confident they're right.

Comment by Quinn (quinn-dougherty) on Quinn's Shortform · 2023-10-18T17:50:49.715Z · LW · GW

what are your obnoxious price systems for tutoring?

There's a somewhat niche CS subtopic that a friend wants to learn, I'm really well positioned to teach her. More discussion on the manifold bounty: