7vik

Posts
Comments

Posts

Among Us: A Sandbox for Agentic Deception 2025-04-05T06:24:49.000Z

Auditing language models for hidden objectives 2025-03-13T19:18:32.638Z

Some lessons from the OpenAI-FrontierMath debacle 2025-01-19T21:09:17.990Z

Intricacies of Feature Geometry in Large Language Models 2024-12-07T18:10:51.375Z

The Geometry of Feelings and Nonsense in Large Language Models 2024-09-27T17:49:27.420Z

Comments

Comment by 7vik (satvik-golechha) on Among Us: A Sandbox for Agentic Deception · 2025-04-07T11:31:23.955Z · LW · GW

Sorry - fixed! They should match now - I'd forgotten to update the figure in this post. Thanks for pointing it.

Comment by 7vik (satvik-golechha) on Some lessons from the OpenAI-FrontierMath debacle · 2025-01-20T20:23:15.373Z · LW · GW

They say it was an advanced math benchmark to test the limits of AI, not a safety project. But a number of people who contributed would have been safety-aligned and would not have wanted to if they knew OpenAI will have exclusive access.

Comment by 7vik (satvik-golechha) on Some lessons from the OpenAI-FrontierMath debacle · 2025-01-19T23:39:58.243Z · LW · GW

I don't think this info was about o3 (please correct me if I'm wrong). While this suggests not all of them were from the first tier, it would be much better to know what it actually was. Especially, since the most famous quotes about FrontierMath ("extremely challenging" and "resist AIs for several years at least") were about the top 25% hardest problems, the accuracy on that set seems more important to update on with them. (not to say that 25% is a small feat in any case).

Comment by 7vik (satvik-golechha) on Some lessons from the OpenAI-FrontierMath debacle · 2025-01-19T22:34:03.019Z · LW · GW

I definitely don't see a problem with taking lab funding as a safety org. (As long as you don't claim otherwise.)

I definitely don't have a problem with this as well - just that this needs to be much more transparent and carefully though-out than how it happened here.

If you think they didn't train on FrontierMath answers, why do you think having the opportunity to validate on it is such a significant advantage for OpenAI?

My concern is that "verbally agreeing to not use it for training" leaves a lot of opportunities to still use it as a significant advantage. For instance, do we know that they did not use it indirectly to validate a PRM that could in turn help a lot? I don't think making a validation set out of their training data would be as effective.

Re: "maybe it would have took OpenAI a bit more time to contract some mathematicians, but realistically, how much more time?": Not much, they might have done this indepently as well. (assuming the mathematicians they'd contact would be equally willing to contribute directly to OpenAI)

Comment by 7vik (satvik-golechha) on The Geometry of Feelings and Nonsense in Large Language Models · 2024-10-13T10:05:37.653Z · LW · GW

Thanks a lot! We had an email exchange with the authors and they shared some updated results with much better random shuffling controls on the WordNet hierarchy.

They also argue that some contexts should promote the likelihood of both "sad" and "joy" since they are causally separable, so they should not be expected to be anti-correlated under their causal inner product per se. We’re still concerned about what this means for semantic steering.

Comment by 7vik (satvik-golechha) on The Geometry of Feelings and Nonsense in Large Language Models · 2024-09-29T14:08:15.088Z · LW · GW

I agree. Yes - would be happy to chat and discuss more. Sending you a DM.

Comment by 7vik (satvik-golechha) on The Geometry of Feelings and Nonsense in Large Language Models · 2024-09-29T13:24:29.678Z · LW · GW

They use a WordNet hierarchy to verify their orthogonality results at scale, but doesn't look like they do any other shuffle controls.

Comment by 7vik (satvik-golechha) on The Geometry of Feelings and Nonsense in Large Language Models · 2024-09-29T13:18:25.945Z · LW · GW

Thanks @TomasD, that's interesting! I agree - most words in my random list seem like random "objects/things/organisms" so there might be some conditioning going on there. Going over your code to see if there's something else that's different.

User info

Posts

Comments