LessWrong 2.0 Reader
View: New · Old · Top← previous page (newer posts) · next page (older posts) →
← previous page (newer posts) · next page (older posts) →
Thanks Logan!
2. Unlike local SAEs, our e2e SAEs aren't trained on reconstructing the current layer's activations. So at least my expectation was that they would get a worse reconstruction error at the current layer.
Improving training times wasn't our focus for this paper, but I agree it would be interesting and expect there to be big gains to be made by doing things like mixing training between local and e2e+downstream and/or training multiple SAEs at once (depending on how you do this, you may need to be more careful about taking different pathways of computation to the original network).
We didn't iterate on the e2e+downstream setup much. I think it's very likely that you could get similar performance by making tweaks like the ones you suggested.
review-bot on AI #4: Introducing GPT-4The LessWrong Review [? · GW] runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2024. The top fifty or so posts are featured prominently on the site throughout the year. Will this post make the top fifty?
wolajacy on Advice for Activists from the History of EnvironmentalismAgreed. Advocacy seems to me to be ~very frequently tied to bad epistemics, for a variety of reasons. So what is missing to me in this writeup (and indeed, in most of the discussions about the issue): why does it make sense to make laypeople interested in the issue?
The status quo is that relevant people (ML researchers at large, AI investors, governments and international bodies like UN) are already well-aware of the safety problem. Institutions are set up, work is being done. What is there to be gained from involving the public to an even greater extent, poison and inevitably simplify the discourse, add more hard-to-control momentum? I can imagine a few answers (at present not enough being done, fear of the market forces eventually overwhelming the governance, "democratic mindset"), but none of those seem convincing in the face of the above.
To tie with the environmental movement: wouldn't it be much better for the world if it was an uninspiring issue. It seems to me that this would prevent the anti-nuclear movement being solidified by the momentum, the extinction rebellion promoting degrowth etc, and instead semi-sensible policies would get considered somewhere in the bureaucracy of the states?
wassname on Ilya Sutskever and Jan Leike resign from OpenAIThanks, but this doesn't really give insight on whether this is normal or enforceable. So I wanted to point out, we don't know if it's enforcible, and have not seen a single legal opinion.
zach-stein-perlman on DeepMind's "Frontier Safety Framework" is weak and unambitiousYep
Two weeks ago I sent a senior DeepMind staff member some "Advice on RSPs, especially for avoiding ambiguities"; #1 on my list was "Clarify how your deployment commitments relate to internal deployment, not just external deployment" (since it's easy and the OpenAI PF also did a bad job of this)
:(
habryka4 on DeepMind's "Frontier Safety Framework" is weak and unambitiousThe document doesn't specify whether "deployment" includes internal deployment. (This is important because maybe lots of risk comes from the lab using AIs internally to do AI development [LW(p) · GW(p)].)
This seems like such an obvious and crucial distinction that I felt very surprised when the framework didn't disambiguate between the two.
habryka4 on simeon_c's ShortformYeah, at the time I didn't know how shady some of the contracts here were. I do think funding a legal defense is a marginally better use of funds (though my guess is funding both is worth it).
review-bot on On the FLI Open LetterThe LessWrong Review [? · GW] runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2024. The top fifty or so posts are featured prominently on the site throughout the year. Will this post make the top fifty?
bogdan-ionut-cirstea on Bogdan Ionut Cirstea's ShortformIntuitively, I'm thinking of all this as something like a race between [capabilities enabling] safety and [capabilities enabling dangerous] capabilities (related: https://aligned.substack.com/i/139945470/targeting-ooms-superhuman-models); so from this perspective, maintaining as large a safety buffer as possible (especially if not x-risky) seems great. There could also be something like a natural endpoint to this 'race', corresponding to being able to automate all human-level AI safety R&D safely (and then using this to produce a scalable solution to aligning / controlling superintelligence).
W.r.t. measurement, I think it would be good orthogonally to whether auto AI safety R&D is already happening or not, similarly to how e.g. evals for automated ML R&D seem good even if automated ML R&D is already happening. In particular, the information of how successful auto AI safety R&D would be (and e.g. what the scaling curves look like vs. those for DCs) seems very strategically relevant to whether it might be feasible to deploy it at scale, when that might happen, with what risk tradeoffs, etc.
keltan on keltan's ShortformThis seems incredibly interesting to me. Googling “White-boarding techniques” only gives me results about digitally shared idea spaces. Is this what you’re referring to? I’d love to hear more on this topic.