manoj-acharya

Posts
Comments

Posts

Comments

Comment by Manoj Acharya (manoj-acharya) on Some thoughts on why adversarial training might be useful · 2023-04-14T20:38:10.569Z · LW · GW

The model does not have a reasonable concept of injury that includes getting hit by a bullet, or doesn’t know that getting shot at would cause you to get hit by a bullet

The model has this concept of injury, and understands that it will occur, but ‘thinks’ you only care about injuries involving swords or knives because almost all injuries in your training data involved those

Could you clarify ( with examples) what you mean by the second point. I was thinking that the second point occurs when model is not robustly understanding the concept of injury which means this is a deficiency in capabilities?

User info

Posts

Comments