Posts

Hopenope's Shortform 2024-12-22T10:52:39.610Z

Comments

Comment by Hopenope (baha-z) on Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs · 2025-02-26T20:46:26.709Z · LW · GW

Overrefusal issues were way more common 1-2 years ago. models like gemini 1, and claude 1-2 had severe overrefusal issues.

Comment by Hopenope (baha-z) on LWLW's Shortform · 2025-02-23T15:37:27.852Z · LW · GW

 Your argument is actually possible, but what evidences do you have, that make it the likely outcome?

Comment by Hopenope (baha-z) on LWLW's Shortform · 2025-02-23T11:43:41.223Z · LW · GW

the difficulty of alignment is still unknown. it may be totally impossible, or maybe some changes to current methods (deliberative alignment or constitutional ai) + some R&D automation can get us there. 

Comment by Hopenope (baha-z) on Daniel Tan's Shortform · 2025-02-10T22:47:21.145Z · LW · GW

The recurrent paper is actually scary, but some of the stuff there are actually questionable. is 8 layers enough for a 3.5b model? qwen 0.5b has 24 layers.there is also almost no difference between 180b vs 800b model, when r=1(table 4). is this just a case of overcoming insufficient number of layers here?

Comment by Hopenope (baha-z) on nikola's Shortform · 2025-02-08T17:14:05.482Z · LW · GW

Would you update your timelines, if he is telling the truth ?

Comment by Hopenope (baha-z) on Hopenope's Shortform · 2025-01-22T10:34:27.828Z · LW · GW

Is COT faithfulness already obsolete?  How does it survive the concepts like latent space reasoning, or RL based manipulations(R1-zero)? Is it realistic to think that these highly competitive companies simply will not use them, and simply ignore the compute efficiency? 

Comment by Hopenope (baha-z) on Hopenope's Shortform · 2025-01-09T19:29:50.065Z · LW · GW

I am not sure if longer timelines are always safer. For example, when comparing a two-year timeline to a five-year one, there are a lot of advantages to the shorter timeline. In both cases you need to outsource a lot of alignment research to AI anyway, and the amount of compute and the number of players with significant compute are lower, which reduces both the racing pressure and takeoff speed.

Comment by Hopenope (baha-z) on Hopenope's Shortform · 2025-01-07T17:31:33.776Z · LW · GW

What happened to Waluigi effect? It used to be a big issue, some people were against it, and suddenly it is pretty much forgotten. Are there any related research, or recent demos, that examine it in more detail?

Comment by Hopenope (baha-z) on Hopenope's Shortform · 2024-12-28T20:20:24.366Z · LW · GW

If you have a very short timeline, and you don't think that alignment is solvable in such a short time, then what can you still  do to reduce the chance of x-risk? 

Comment by Hopenope (baha-z) on Hopenope's Shortform · 2024-12-22T10:52:39.708Z · LW · GW

Many expert level benchmarks totally overestimate the range and diversity of their experts' knowledge. A person with a PhD in physics is probably undergraduate level in many parts of physics that are not related to his/her research area, and sometimes we even see that within expert's domain (Neurologists usually forget about nerves that are not clinically relevant).