Posts

hmys's Shortform 2024-12-08T21:37:32.955Z
Practical advice for secure virtual communication post easy AI voice-cloning? 2024-08-09T17:32:33.458Z
Plausibility of Getting Early Warning Shots because AIs can't coordinate? 2024-04-27T08:02:10.792Z

Comments

Comment by hmys (the-cactus) on hmys's Shortform · 2024-12-08T21:37:33.118Z · LW · GW

I think for the fundraiser, Lightcone should sell (overpriced) lw hoodies. Lesswrong has a very nice aesthetic now, and while this is probably a byproduct of a piece of my mind I shouldn't encourage, I find it quite appealing to buy a 450$ lw hoodie, even though I don't have that much money. I'd probably not donate to the fundraiser otherwise. And if I did, I'd donate less than the margins on such a hoodie would be.

Comment by hmys (the-cactus) on Reducing x-risk might be actively harmful · 2024-11-20T22:57:05.975Z · LW · GW

People seem to disagree with this comment. There's two statements and one argument in it

  1. Humanity's current and historical existence are net-negatives.
  2. The future, assuming humans survive, will have massive positive utility
    1. The argument for why this is the case, based on something something optimization

What are people disagreeing with? Is it mostly the former? I think the latter is rather clear. I'm very confident it is true. Both the argument and the conclusion. The former, I'm quite confident is true as well (~90% ish?), but only for my set of values. 

Comment by hmys (the-cactus) on Trying Bluesky · 2024-11-19T20:00:17.619Z · LW · GW

https://bsky.app/profile/hmys.bsky.social/post/3lbd7wacakn25

I made one. A lot of people are not here, but many people are.

Comment by hmys (the-cactus) on Reducing x-risk might be actively harmful · 2024-11-18T18:25:50.211Z · LW · GW

Seems unlikely to me. I mean, I think, in large part due to factory farming, that the current immediate existence of humanity, and also its history, are net negatives. The reason I'm not a full blown antinatalist is because these issues are likely to be remedied in the future, and the goodness of the future will astronomically dwarf the current negativity humanity has and is bringing about. (assuming we survive and realize a non-negligible fraction of our cosmic endowment)

The reason I think this is, well, the way I view it, its an immediate corollary of the standard yudkowsky/bostrom AI arguments. Animals existing and suffering is an extremely specific state of affairs, just like humans existing and being happy is an extremely specific state of affairs. This means that, if you optimize hard enough for anything, thats not exactly that (humans happy or animals suffering), you're not gonna get it. 

And, maybe this is me being too optimistic (but I really hope not, and I really don't think so), but I don't think many humans want animals to suffer for its own sake. They'd eat lab-grown meat if it was cheaper and better tasting than animal-grown meat. Lab-grown meat is a good example of the general principle I'm talking about. Suffering of sentient minds is a complex thing. If you have a powerful optimizer, about its way optimizing the universe, you're virtually never gonna get suffering sentient minds unless that is what the optimizer is deliberately aiming for.

Comment by hmys (the-cactus) on o1 is a bad idea · 2024-11-12T08:06:54.200Z · LW · GW

I agree with this analysis. I mean, I'm not certain further optimization will erode the interpretability of the generated CoT, its possible the fact its pretrained to use human natural language pushes it in a stable equilibrium, but I don't think so, there are ways the CoT can become less interpretable in a step-wise fashion.

But this is the way its going, seems inevitable to me. Just scaling up models and then training them on English language internet text, is clearly less efficient (from a "build AGI" perspective, and from a profit-perspective) than training them to do the specific tasks that the users of the technology want. So thats the way its going. 

And once you're training the models this way, the tether between human-understandable concepts and the CoT will be completely destroyed. If they stay together, it will just be because its kind of a stable initial condition.

 

Comment by hmys (the-cactus) on Human Biodiversity (Part 4: Astral Codex Ten) · 2024-11-03T19:02:15.781Z · LW · GW

I just meant not primarily motivated by truth.

Comment by hmys (the-cactus) on Human Biodiversity (Part 4: Astral Codex Ten) · 2024-11-03T13:30:54.263Z · LW · GW

I think this is a really bad article. So bad that I can't see it not being written with ulterior motives.

1. Too many things are taken out of context, like "the feminists are literally voldemort" quote.

2. Too many things are paraphrased in dishonest and ridiculously over the top ways. Like saying Harris has "longstanding plans to sterilize people of color", before a quote that just says she wants to give birth control to people in Haiti.

3. Offering negative infinity charity in every single area. In the HBD email, Scott says he thinks neoreactionaries create endless streams of garbage, but with some tiny nuggets of gold. And that he can take the nuggets of gold and just tune out the rest. The article then goes on to list everything bad about neoreactionaries as if Scott's email is evidence he endorses all of neoreaction? What?

4. Overall no clear direct argument. The article spends half its word justifying the connection between Scott and EA, which I don't think anyone would deny. Then puts up the email, instantly infers the worst possible intent being it with little justification. Then lists every single racist person scott has ever said anything even lighly good about. 

Overall, the article updates me in the direction of thinking scott is less racist and less sympethetic to neoreactionary thinking. The article has clearly put in effort, and the author is clearly trying their very best to pain Scott in a bad light, and Scott has literally 20 years of constant blogging put out openly on the internet. But the article is not very convincing. 

Comment by hmys (the-cactus) on BIG-Bench Canary Contamination in GPT-4 · 2024-10-23T12:52:44.712Z · LW · GW

But the probability? :O

Comment by hmys (the-cactus) on BIG-Bench Canary Contamination in GPT-4 · 2024-10-23T11:01:50.881Z · LW · GW

What is the probability they intentionally fine tuned to hide canary contamination?

Seems like an obviously very silly thing to do. But with things like the NDA, my priors on oai being deceptive to their own detriment is not that low.

I'm pretty sure it wouldn't forget the string.

Comment by hmys (the-cactus) on Bitter lessons about lucid dreaming · 2024-10-17T13:51:13.599Z · LW · GW

In my experience, the results are quite quick and its interesting to remember your dreams. The time it takes is ~10 minutes a day. 

I'm not gonna say it doesn't take any effort. It can be hard to to it if you are tired in the morning, but I disagree with the characterization that it takes "a lot" of effort. 

Outside of studying/work, I exercise every day, do anki cards every day, and try to make a reasonably healthy dinner every day. Each of those activities individually take ~10x the cognitive effort and willpower that dream journaling does. (for me)

Comment by hmys (the-cactus) on Bitter lessons about lucid dreaming · 2024-10-17T08:11:35.663Z · LW · GW

Maybe I'm a unique example, but none of this matches my experience at all.

 I was able to have lucid dreams relatively consistently just by dream journaling and doing reality checks. WILD was quite difficult to do, because you kind of have to walk a tight balance, where you keep yourself in a half-asleep state while carrying out instructions that requite a fair bit of metacognitive awareness, but once you get the hang of it, you can do that pretty consistently as well, without much time commitment.

That lucid dreams don't offer much more than traditional entertainment seems also (obviously?) false to me. People use VR to make traditional entertainment more immersive. And LDs are far more immersive than that, and less limited than video games are. 

They're also just a really interesting psychological phenomena. The process is fun. If you find yourself in a lucid dream, its a strange situation. Testing out things, like checking how well your internal physics simulation engine works is really fun. Or just walking around and seeing what your subconscious generates is very fun. And very different from just imagining random stuff. Trying to meditate, and observing how your mind works differently in a dream, compared with waking reality is interesting. Seeing how extreme/vivid sensations you can generate in a dream is fun. Like trying to see if you can get yourself to feel pain. Or how loud sounds you can make.

Galantamine and various supplements all did nothing for me. 

The only thing I agree with is the habituation effect. But like, that's how many things work. You eventually get bored of stuff / feel you've exhausted all the low-hanging fruits.

Comment by hmys (the-cactus) on Bitter lessons about lucid dreaming · 2024-10-17T07:50:20.607Z · LW · GW

Can't you just keep a dream journal? I find if I do that consistently right upon waking up, I'm able to remember dreams quite well.

Comment by hmys (the-cactus) on My 10-year retrospective on trying SSRIs · 2024-09-23T06:57:50.847Z · LW · GW

I've used SSRIs for maybe 5 years, and I think they've been really useful, with no negative effects, and more or less unwavering efficacy. The only exception is that they've non-negligibly lowered my libido. But to be honest, I don't mind it that much. 

Also, few times where I've had to not use them for a while (travelling and was very stupid not to bring enough), the withdrawal effects were quite strange and somewhat scary. 

I also feel they had some very strange positive effects. Like I think they made my reaction time improve by quite a bit. Although it could be something random coinciding with starting SSRIs. Or just me being confused. I haven't tested it. On humanbenchmark I score around the same now as I did in high school. But I feel like I can catch falling things with much better regularity, and this was an almost immediate effect after starting.

Comment by hmys (the-cactus) on A Longlist of Theories of Impact for Interpretability · 2024-05-06T21:03:43.549Z · LW · GW

I feel like the biggest issue with aligning powerful AI systems, is that nearly all the features we'd like these systems to have, like being corrigible, not being deceptive, having values aligned with ours etc, are properties we are currently unable to state formally. They are clearly real properties, like humans can agree on examples of non-corrigibility, misalignment, dishonest, when shown examples of actions AIs could take. But we can't put them in code or a program specification, and consequently can't reason about them very precisely, test whether systems have them or not etc

One reason I'm very bullish on mechinterp is that it seems like the only natural pathway towards making progress on this. Transformers trained with RLHF do have "tendencies" and proto-values in a sense, figuring out how those proto-desires are represented internally, really understanding it, I believe will shed a lot of light on how values form in transformers, will necessarily entail getting a solid formal framework for reasoning aobut these processes, and will put the notions of alignment on much firmer ground. Same goes for the other features. Models already show deceptive tendencies. In the process of developing deep mechinterp understanding of that, I believe we'd gain better understanding of how deception in a neural net can be modeled formally, which would allow us to reason about it infinitely better.

(I mean, someone 300IQ might come along and just galaxy brain all this from first principles, but quite galaxy brained people have tried already.. The point is that if mechinterp was developed to a sophisticated enough level, in addition to all the good things listed already, it would shed a lot of conceptual clarity on many of the key notions, which we are currently stuck reasoning about on an informal level, and I think we will get there through incremental progress, without having to hope someone just figures it out by thinking really hard and having an einstein-tier insight).

Comment by hmys (the-cactus) on ACX Covid Origins Post convinced readers · 2024-05-03T11:49:55.916Z · LW · GW

https://www.richardhanania.com/p/if-scott-alexander-told-me-to-jump

Comment by hmys (the-cactus) on "Deep Learning" Is Function Approximation · 2024-03-23T17:22:37.219Z · LW · GW

Other people were commending your tabooing of words, but I feel using terms like "multi-layer parameterized graphical function approximator" fails to do that, and makes matters worse because it leads to non-central fallacy-ing. It'd been more appropriate to use a term like "magic" or "blipblop". Calling something a function appropriator leads to readers carrying a lot of associations into their interpretation, that probably don't apply to deep learning, as deep learning is a very specific example of function approximation, that deviates from the prototypical examples in many respects. (I think when you say "function approximator" the image that pops into most peoples head is fitting a polynomial to a set of datapoints in R^2)

Calling something a function approximator is only meaningful if you make a strong argument for why a function approximator cant (or at least is systematically unlikely to) give rise to specific dangerous behaviors or capabilities. But I don't see you giving such arguments in this post. Maybe I did not understand it. In either case, you can read posts like Gwern's "Tools want to be agents" or Yudkowsky's writings, explaining why goal directed behavior is a reasonable thing to expect to arise from current ML, and you can replace every instance of "neural network" / "AI" with "multi-layer parameterized graphical function approximator", and I think you'll find that all the arguments make just as much sense as they did before. (modulo some associations seeming strange, but like I said, I think thats because there is some non-central fallacying going on).