Posts
Comments
Here is another consideration: have something to give them in return. Ask why would very talented people want to work for you. I had a very bad experience working for people who were not as experienced as me, nor as resourceful or with good leadership skills. Other than the salary, I didn't feel like I was getting much from them. I eventually quit and in hindsight, I shouldn't have accepted the job.
Cross layer superposition
Had a bit of time to think about this. Ultimately because superposition as we know it is a property of the latent space rather than the neurons in the layer, it's not clear to me that this is the question to be asking. How do you imagine an experimental result would look like?
I want to generally encourage this kind of experiment-and-publish-quickly project. This might require a post of its own, but as someone with a background in both hacking and entrepreneurship, this kind of quick feedback loop is, in my opinion, an incredible strength of both, and I hope can be used to accelerate scientific progress, which is exactly what we need in alignment.
I actually don't think it has much impact on superintelligence. I shared this mostly because I thought it's a cool idea that we can implement now and can later be turned into a policy. Compared to existing policy proposals that don't limit training/usage, I think this can have a much larger impact
Might also be interesting to look at this from a Learned Helplessness point of view. Especially with helicopter parenting. Perhaps children aren't learning to solve their own problems independenly. I wouldn't be surprised if this contributes to the mental health epidemic.
A factor for why children are becoming less independent in the US might be car-centric city design. With unsafe streets, and no way to walk to school, friends or after-school activities, parents have no choice but to drive them around. Not Just Bikes has a great video on this
I've seen in the term "AI Explainability" floating around in the mainstream ML community. Is there a major difference between that and what we in the AI Safety community call Interpretability?