Posts
Comments
Comment by
julius vidal (julius-vidal) on
Daniel Tan's Shortform ·
2025-01-23T23:37:05.521Z ·
LW ·
GW
>As originally conceived, this is sort of like a “dangerous capability” eval for steg.
I am actually just about to start building something very similar to this for the AISI's evals bounty program.
Comment by
julius vidal (julius-vidal) on
Open Thread Fall 2024 ·
2024-11-11T02:15:30.338Z ·
LW ·
GW
Hi!
I think I'm probably in a pretty similar position to where you were maybe a few months/a year ago in that I am a CS grad (though sadly no ML specialisation) working in industry who recently started reading a lot of mechanistic intepretability research, and is starting to seriously consider pursuing a PHD in that area (and also am looking at how I could get some initial research done in the meantime).
Could I DM you to maybe get some advice?