Posts

Comments

Comment by Manoj Acharya (manoj-acharya) on Some thoughts on why adversarial training might be useful · 2023-04-14T20:38:10.569Z · LW · GW
  • The model does not have a reasonable concept of injury that includes getting hit by a bullet, or doesn’t know that getting shot at would cause you to get hit by a bullet

     
  • The model has this concept of injury, and understands that it will occur, but ‘thinks’ you only care about injuries involving swords or knives because almost all injuries in your training data involved those

Could you clarify ( with examples) what you mean by the second point. I was thinking that  the second point occurs when  model is not robustly understanding the concept of injury which means this is a deficiency in capabilities?