Posts
Comments
Comment by
KanHar on
Models Don't "Get Reward" ·
2023-01-19T12:06:40.781Z ·
LW ·
GW
Considering the fact that weight sharing between actor and critic networks is a common practice, and given the fact that the critic passes gradients (learned from the reward) to the actor, for most practical purposes the actor gets all of the information it needs about the reward.
This is the case for many common architectures.