Posts

Targeted Manipulation and Deception Emerge when Optimizing LLMs for User Feedback 2024-11-07T15:39:06.854Z

Comments

Comment by Marcus Williams on Arjun Panickssery's Shortform · 2024-06-07T18:01:04.064Z · LW · GW

I think part of the reason why these odds might seem more off than usual is that Ether and other cryptocurrencies have been going up recently which means there is high demand for leveraged positions. This in turn means that crypto lending services such as aave having been giving ~10% APY on stablecoins which might be more appealing than a riskier, but only a bit higher, return from prediction markets.

Comment by Marcus Williams on NYU Code Debates Update/Postmortem · 2024-05-24T19:06:03.414Z · LW · GW

Are you sure you would need to fine-tune Llama-3? It seems like there are many reports that using a refusal steering vector/ablation practically eliminates refusal on harmful prompts, perhaps that would be sufficient here?

Comment by Marcus Williams on What would stop you from paying for an LLM? · 2024-05-22T14:25:18.901Z · LW · GW

Do labs actually make any money on these subscriptions? It seems like the average user is using far more than 20$ of requests (going by the price for API requests which surely can't have a massive margin?).

Obviously they must gain something or they wouldn't do it, but it seems likely the benefits are more intangible, gaining market share, generating hype and attracting API users etc. These benefits seem like they may arise from free usage as well.

Comment by Marcus Williams on Alexander Gietelink Oldenziel's Shortform · 2024-05-13T21:28:03.833Z · LW · GW

Wasn't the surprising thing about GPT-4 that scaling laws did hold? Before this many people expected scaling laws to stop before such a high level of capabilities. It doesn't seem that crazy to think that a few more OOMs could be enough for greater than human intelligence. I'm not sure that many people predicted that we would have much faster than scaling law progress (at least until ~human intelligence AI can speed up research)? I think scaling laws are the extreme rate of progress which many people with short timelines worry about.

Comment by Marcus Williams on Gemini 1.0 · 2023-12-08T07:52:06.988Z · LW · GW

It also seems likely that the Nano models are extremely overtrained compared to the scaling laws. The scaling laws are for optimal compute during training, but here they want to minimize inference cost so it would make sense to train for significantly longer.

Comment by Marcus Williams on Red-teaming language models via activation engineering · 2023-08-26T17:54:22.529Z · LW · GW

It's interesting that it still always seems to give the "I'm an AI" disclaimer, I guess this part is not included in your refusal vector? Have you tried creating a disclaimer vector?