Posts

Comments

Comment by julius vidal (julius-vidal) on Daniel Tan's Shortform · 2025-01-23T23:37:05.521Z · LW · GW

>As originally conceived, this is sort of like a “dangerous capability” eval for steg.

I am actually just about to start building something very similar to this for the AISI's evals bounty program.

Comment by julius vidal (julius-vidal) on Open Thread Fall 2024 · 2024-11-11T02:15:30.338Z · LW · GW

Hi!

I think I'm probably in a pretty similar position to where you were maybe a few months/a year ago in that I am a CS grad (though sadly no ML specialisation) working in industry who recently started reading a lot of mechanistic intepretability research, and is starting to seriously consider pursuing a PHD in that area (and also am looking at how I could get some initial research done in the meantime). 
Could I DM you to maybe get some advice?