Posts
Comments
Comment by
Manoj Acharya (manoj-acharya) on
Some thoughts on why adversarial training might be useful ·
2023-04-14T20:38:10.569Z ·
LW ·
GW
- The model does not have a reasonable concept of injury that includes getting hit by a bullet, or doesn’t know that getting shot at would cause you to get hit by a bullet
- The model has this concept of injury, and understands that it will occur, but ‘thinks’ you only care about injuries involving swords or knives because almost all injuries in your training data involved those
Could you clarify ( with examples) what you mean by the second point. I was thinking that the second point occurs when model is not robustly understanding the concept of injury which means this is a deficiency in capabilities?