LessWrong 2.0 Reader
View: New · Old · Top← previous page (newer posts) · next page (older posts) →
← previous page (newer posts) · next page (older posts) →
Nice work! Since you cite our LEACE paper, I was wondering if you've tried burning LEACE into the weights of a model just like you burn an orthogonal projection into the weights here? It should work at least as well, if not better, since LEACE will perturb the activations less.
Nitpick: I wish you would use a word other than "orthogonalization" since it sounds like you're saying that you're making the weight matrix an orthogonal matrix. Why not LoRACS (Low Rank Adaptation Concept Erasure)?
yair-halberstadt on Can stealth aircraft be detected optically?But I discussed that in the post. All you need are enough cameras + processing power. Both are cheap.
zeshen on Why is AGI/ASI Inevitable?Can't people decide simply not to build AGI/ASI?
Yeah, many people, like the majority of users on this forum, have decided to not build AGI. On the other hand, other people have decided to build AGI and are working hard towards it.
Side note: LessWrong has a feature to post posts as Questions, you might want to use it for questions in the future.
The claim is that most applications aren't internal usage of AI for AI development and thus can be made trivially safe.
Not that most applications of AI for AI development can be made trivially safe.
wei-dai on The formal goal is a pointerWhat do you think of this post by Tammy?
It seems like someone could definitely be wrong about what they want (unless normative anti-realism [LW · GW] is true and such a sentence has no meaning). For example consider someone who thinks it's really important to be faithful to God and goes to church every Sunday to maintain their faith and would use a superintelligent religious AI assistant to help keep the faith if they could. Or maybe they're just overconfident about their philosophical abilities and would fail to take various precautions that I think are important in a high-stakes reflective process.
Mostly that thing where we had a lying vs lie-detecting arms race and the liars mostly won by believing their own lies and that’s how we have things like overconfidence bias and self-serving bias and a whole bunch of other biases.
Are you imagining that the RL environment for AIs will be single-player, with no social interactions? If yes, how will they learn social skills? If no, why wouldn't the same thing happen to them?
Unless we do a very stupid thing like reading the AI’s thoughts and RL-punish wrongthink, this seems very unlikely to happen.
We already RL-punish AIs for saying things that we don't like (via RLHF), and in the future will probably punish them for thinking things we don't like (via things like interpretability). Not sure how to avoid this (given current political realities) so safety plans have to somehow take this into account.
lawrencec on Please stop publishing ideas/insights/research about AIWhile I've softened my position on this in the last year, I want to give a big +1 to this response, especially these two points:
- It's genuinely hard to come up with ideas that help capabilities a lot. I think you are severely underestimating how hard it is, and how much insight is required. I think one issue here is that most papers on arxiv are garbage and don't actually make any progress, but those papers are not the ones that are pushing AGI forward anyways.
- [..]
- High level ideas are generally not that valuable in and of themselves. People generally learn to ignore ideas unless they have strong empirical evidence of correctness (or endorsement of highly respected researchers) because there are simply too many ideas. The valuable thing is not the idea itself, but the knowledge of which ideas are actually correct.
(emphasis added)
I think it's often challenging to just understand where the frontier is, because it's so far and so many things are secret. And if you're not at a scaling lab and then also don't keep up with the frontier of the literature, it's natural to overestimate the novelty of your insights. And then, if you're too scared to investigate your insights, you might continue to think that your ideas are better than they are. Meanwhile, as an AI Safety researcher, not only is there a lot less distance to the frontier of whatever subfield you're in, you'll probably spend most of your time doing work that keeps you on the frontier.
Random insights can be valuable, but the history of deep learning is full of random insights that were right but for arguably the wrong reasons (batch/layernorm, Adam, arguably the algorithm that would later be rebranded as PPO), as well as brilliant insights that turned out to be basically useless (e.g. consider a lot of the Bayesian neural network stuff, but there's really too many examples to list) if not harmful in the long run (e.g. lots of "clever" or not-so-clever ways of adding inductive bias). Part of the reason is that people don't get taught the history of the field, and see all the oh-so-clever ideas that didn't work, or how a lot of the "insights" were invented post-hoc. So if you're new to deep learning you might get the impression that insights were more causally responsible for the capabilities advancements, than they actually are. Insofar as good alignment requires deconfusion and rationality to generate good insights, and capabilities does not, then you should expect that the insights you get from improving rationality/doing deconfusion are more impactful for alignment than capabilities.
I mean, if you actually do come up with a better initialization scheme, a trick that improves GPU utilization, or some other sort of cheap algorithmic trick to improve training AND check it's correct through some small/medium-scale empirical experiments, then sure, please reconsider publishing that. But it's hard to incidentally do that -- even if you do come up with some insight while doing say, mech interp, it feels like going out of your way to test your capability ideas should be a really obvious "you're basically doing capabilities" sign? But maybe, you should be doing your safety work instead?
wei-dai on Which skincare products are evidence-based?Retinoids, which is a big family of compounds but I would go with adapalene, which has better safety/side effect than anything else. It has less scientific evidence for anti-aging than other retinoids (and is not marketed for that purpose), but I've tried it myself (bought it for acne), and it has very obvious anti-wrinkle effects within like a week. You can get generic 0.1% adapalene gel on Amazon for 1.6oz/$12.
(I'm a little worried about long term effects, i.e. could the increased skin turnover mean faster aging in the long run, but can't seem to find any data or discussion about it.)
radford-neal-1 on AI #62: Too Soon to TellRegarding Cortez and the Aztecs, it is of interest to note that Cortez's indigenous allies (enemies of the Aztecs) actually ended up in a fairly good position afterwards.
From https://en.wikipedia.org/wiki/Tlaxcala
raemon on Buck's ShortformFor the most part, the Spanish kept their promise to the Tlaxcalans. Unlike Tenochtitlan and other cities, Tlaxcala was not destroyed after the Conquest. They also allowed many Tlaxcalans to retain their indigenous names. The Tlaxcalans were mostly able to keep their traditional form of government.
It implies that AI control is organizationally simpler, because most applications can be made trivially controlled.
I didn't get this from the premises fwiw. Are you saying it's trivial because "just don't use your AI to help you design AI" (seems organizationally hard to me), or did you have particular tricks in mind?
romeostevensit on How would you navigate a severe financial emergency with no help or resources?Finding a loan to move to somewhere with jobs is probably your best bet. This may devolve to begging amongst any social circles as well, which is a big pride hit. Many probably won't believe you will wind up with the means to pay it back. Minimizing the cost of the move by getting rid of non essential belongings. It is probably somewhat easier these days to line up a far away job via zoom interviewing. Quantity over quality.
One thing that I think is non-obvious: if you lay out the case for the loan in detail, that demonstrates intelligence and conscientiousness and will increase people's sense that you are doing something useful and thus willingness to lend. Basically, treat it as a business case, but the business happens to just be getting you and your partner employed. Show past cash flow, expected future cash flow given salaries of the sorts of places you are applying to, estimates of how many jobs you can reasonably apply to, how much of a monthly payment you could pay back, etc. Use chatgpt for help with outlining this.