fazl-barez

Posts
Comments

Posts

Best-of-N Jailbreaking 2024-12-14T04:58:48.974Z

Visualizing neural network planning 2024-05-09T06:40:46.582Z

Mechanistic Interpretability Workshop Happening at ICML 2024! 2024-05-03T01:18:26.936Z

Early Experiments in Reward Model Interpretation Using Sparse Autoencoders 2023-10-03T07:45:15.228Z

Automated Sandwiching & Quantifying Human-LLM Cooperation: ScaleOversight hackathon results 2023-02-23T10:48:08.766Z

Comments