Posts

Poll on AI opinions. 2025-02-23T22:39:09.027Z
Poll Results on AGI 2022-12-10T21:25:35.004Z
LessWrong Poll on AGI 2022-11-10T13:13:57.387Z
Noisy environment regulate utility maximizers 2022-06-05T18:48:43.083Z

Comments

Comment by Niclas Kupper (niclas-kupper) on The Bell Curve of Bad Behavior · 2025-04-15T12:04:09.602Z · LW · GW

I think the dojo analogy is very good and useful. Some unstructured thoughts: It gets at a core feature of humans is being able to adjust our personalities based on context. I suspect there is a semi-stable equilibrium thing that is important. This is a big reason people underestimate company/community culture: it can give some amount of herd immunity to bad behavior. If sufficiently many "defect" the culture changes. This is also an issue as communities grow of course, policing is harder and nuances of behavior get lost. 

Comment by Niclas Kupper (niclas-kupper) on One-shot steering vectors cause emergent misalignment, too · 2025-04-14T08:44:49.693Z · LW · GW

Great work! I don't know a lot about steering vectors and have some questions. I am also happy for you to just send me to a different resource.

1) From my understanding steering vectors are just things you add to the activations. At what layer do you do this?

2) You write "we optimized four different “harmful code” vectors". How did you "combine" the resulting four different vectors?

3) I would also be interested in how similar these vectors are to each other, and how similar they are to the refusal vector.

Comment by Niclas Kupper (niclas-kupper) on Poll on AI opinions. · 2025-03-03T09:09:46.864Z · LW · GW

That is fair, I should have probably left some seed statements regarding the definition of AGI / ASI.
EDIT: I have added additional statements.

Comment by Niclas Kupper (niclas-kupper) on AI #99: Farewell to Biden · 2025-01-17T09:16:51.639Z · LW · GW

I just want to say that Amazon is fairly close to a universal recommendation app!

Comment by Niclas Kupper (niclas-kupper) on Transformers Represent Belief State Geometry in their Residual Stream · 2024-04-28T08:24:08.390Z · LW · GW

Where can I read about this 2-state HMM? By learn I just mean approximate via an algorithm. The UAT is not sufficient as it talks about learning a known function. Baum-Welch is such an algorithm, but as a far as I am aware it gives no guarantees on anything really.

Comment by Niclas Kupper (niclas-kupper) on Transformers Represent Belief State Geometry in their Residual Stream · 2024-04-24T12:09:26.869Z · LW · GW

Is there some theoretical result along the lines of "A sufficiently large transformer can learn any HMM"?

Comment by Niclas Kupper (niclas-kupper) on Examples of Highly Counterfactual Discoveries? · 2024-04-24T11:19:19.241Z · LW · GW

It would be interesting for people to post current research that they think has some small chance of outputting highly singular results!

Comment by Niclas Kupper (niclas-kupper) on Examples of Highly Counterfactual Discoveries? · 2024-04-24T11:18:02.527Z · LW · GW

Grothendiek seems to have been an extremely singular researcher, various of his discoveries would have likely been significantly delayed without him. His work on sheafs is mind bending the first time you see it and was seemingly ahead of its time.

Comment by Niclas Kupper (niclas-kupper) on Feedbackloop-first Rationality · 2023-08-09T12:18:24.580Z · LW · GW

As someone who is currently getting a PhD in mathematics I wish I could use Lean. The main problem for me is that the area I work in hasn't been formalized in Lean yet. I tried for like a week, but didn't get very far... I only managed to implement the definition of Poisson point process (kinda). I concluded that it wasn't worth spending my time to create this feedback loop and I'd rather work based on vibes. 

I am jealous of the next generation of mathematicians that are forced to write down everything using formal verification. They will be better than the current generation.

Comment by Niclas Kupper (niclas-kupper) on The salt in pasta water fallacy · 2023-03-28T19:59:14.777Z · LW · GW

I would call this "not thinking on the margins"

Comment by Niclas Kupper (niclas-kupper) on LessWrong Poll on AGI · 2022-11-11T14:56:18.231Z · LW · GW

Some early results: 

  • Most people disagreed with the following two statements: "I think the probability of AGI before 2030 is above 50%" and "AI-to-human safety is fundamentally the same kind of problem as any interspecies animal-to-animal cooperation problem".
  • Most people agreed with the statements: "Brain-computer interfaces (e.g. neuralink tech) that is strong and safe enough to be disruptive will not be developed before AGI." and "Corporate or academic labs are likely to build AGI before any state actor does."
  • There seems to be two large groups who's main disagreement is about the statement " I think the probability of AGI before 2040 is above 50%".  We will call people agreeing Group A and people disagreeing Group B.
  • Group A agreed with "By 2035 it will be possible to train and run an AGI on fewer compute resources than required by PaLM today (if society survives that long)." and "I think establishing a norm of safety testing new SotA models in secure sandboxes is a near-term priority."
  • Group B agreed with "I think the chance of an AI takeover is less than 15%".
  • The most uncertainty was around the following two statements: "The 'Long Reflection' seems like a good idea, and I hope humanity manages to achieve that state." and "TurnTrout's 'training story' about a diamond-maximizer seemed fatally flawed, prone to catastrophic failure."