Posts
Comments
Dr. Fish, I apologize if my previous critique appeared incongruous or misplaced. I understand that the constitution is primarily used to guide the fine-tuning/RL process and not as a direct mechanism to justify runtime behavior. My concerns were more focused on potential pitfalls in specifying the constitution and the importance of ensuring that it remains aligned with human values during the refinement process.
You are correct that it is crucial to consider the update process and how it affects the AI being trained. My concerns about exploitability should be understood in that context, as potential issues that could arise during the fine-tuning/RL process and not as direct justification for an AI's behavior. I recognize that I may have overemphasized the exploitability aspect in my previous critique, and I appreciate your clarification on the matter.
Regarding my comment about the complex interplay of principles, I understand that Section 1.2 of the paper does not elicit evaluations on the basis of multiple principles at once. My point was intended to highlight the importance of considering how principles may interact or conflict, as a means of refining the constitution and ensuring robust alignment with human values. This may have been unclear in my previous response, and I apologize for any confusion that may have resulted.
I appreciate your engagement in this discussion, and I hope that we can continue to explore the nuances and challenges of AI alignment together. I will strive to be more precise in my critiques moving forward and remain open to constructive feedback.