Posts

Thinking About Propensity Evaluations 2024-08-19T09:23:55.091Z
A Taxonomy Of AI System Evaluations 2024-08-19T09:07:45.224Z
Distilled - AGI Safety from First Principles 2022-05-29T00:57:47.237Z

Comments

Comment by Harrison G (Hubarruby) on [Linkpost] Introducing Superalignment · 2023-07-05T22:01:52.802Z · LW · GW

The quote: "Finally, we can test our entire pipeline by deliberately training misaligned models, and confirming that our techniques detect the worst kinds of misalignments (adversarial testing)."

Comment by Harrison G (Hubarruby) on More ways to spot abysses · 2023-01-03T02:33:20.659Z · LW · GW

Super helpful; thanks for writing!

Comment by Harrison G (Hubarruby) on On sincerity · 2023-01-03T02:15:40.104Z · LW · GW

(read: The Athena-Parfit Long-Term Institute for Raising for Effectively Prioritizing Global Alignment Challenges)

I laughed about this for a while. Thank you for this though-provoking post, and for incorporating occasional humor throughout.

Comment by Harrison G (Hubarruby) on Things I carry almost every day, as of late December 2022 · 2023-01-03T01:03:33.314Z · LW · GW

At the top right is a pocket constitution made by Legal Impact for Chickens. I received this at an Effective Altruism Global conference, during the career fair. What actually happened was that someone came up to the booth I was at holding the pocket constitution, I noted that it looked cool, and they were kind enough to offer it to me. Unfortunately, I have never knowingly met anybody from Legal Impact for Chickens. I have not actually used this pocket constitution, but I carry it anyway in my winter jacket’s inner breast pocket since (a) it fits very unobtrusively and (b) it seems cool to carry around a pocket constitution.

If this was EAG SF, I remember an experience that sounds very similar to this, and I think I was this person! Ha

Comment by Harrison G (Hubarruby) on A Proof Against Oracle AI · 2022-08-03T04:04:46.747Z · LW · GW

" [...] since every string can be reconstructed by only answering yes or no to questions like 'is the first bit 1?' [...]"

Why would humans ever ask this question, and (furthermore) why would we ever ask this question n number of times? It seems unlikely, and easy to prevent. Is there something I'm not understanding about this step?