Posts

Death with Awesomeness 2024-04-01T20:24:21.697Z

Comments

Comment by osmarks on What happens if you present 500 people with an argument that AI is risky? · 2024-09-07T08:23:09.400Z · LW · GW

There was some work I read about here years ago (https://www.lesswrong.com/posts/Zvu6ZP47dMLHXMiG3/optimized-propaganda-with-bayesian-networks-comment-on) on causal graph models of beliefs. Perhaps you could try something like that.

Comment by osmarks on Death with Awesomeness · 2024-06-17T08:46:07.224Z · LW · GW

I think we also need to teach AI researchers UI and graphics design. Most of the field's software prints boring things to console, or at most has a slow and annoying web dashboard with a few graphs. The machine which kills us all should instead have a cool scifi interface with nice tabulation, colors, rectangles, ominous targeting reticles, and cryptic text in the corners.

Comment by osmarks on Refusal in LLMs is mediated by a single direction · 2024-05-06T12:06:58.012Z · LW · GW

I think the correct solution to models powerful enough to materially help with, say, bioweapon design, is to not train them, or failing that to destroy them as soon as you find they can do that, not to release them publicly with some mitigations and hope nobody works out a clever jailbreak.

Comment by osmarks on cyberpunk raccoons · 2023-04-29T20:39:42.558Z · LW · GW

As you say, you probably don't need it, but for output I'm pretty sure electromyography technology is fairly mature.

Comment by osmarks on Is "Recursive Self-Improvement" Relevant in the Deep Learning Paradigm? · 2023-04-06T10:05:27.829Z · LW · GW

A misaligned model might not want to do that, though, since it would be difficult for it to ensure that the output of the new training process is aligned to its goals.