ruby

I think it's good for the soul to study, learn, grow, and the time current society gives you to do it at university is pretty great if you make use of it, but also it's possible to do that outside of uni. This is putting aside value for careers, because indeed, with AI is hard to say.

But being 19 (or whatever age really), the frame I'd give is think about where you'll develop most. From a practical standpoint, I'd spend a lot of time trying to do valuable things together with AI. Eventually AI won't need us, but in the meantime symbiosis seems like a guess as how to still generate economic value.

Comment by Ruby on A Slow Guide to Confronting Doom · 2025-04-07T03:29:33.613Z · LW · GW

For me, S2 explicitly I can't justify being quite that confident, maybe 90-95%, but emotionally 9:1 odds feels very like "that's what's happening".

Comment by Ruby on A Slow Guide to Confronting Doom · 2025-04-06T22:56:06.209Z · LW · GW

I'm just wondering if we were ever sufficiently positively justified to anticipate a good future, or if we were just uncertain about the future and then projected our hopes and dreams onto this uncertainty, regardless of how realistic that was.

I think that's a very reasonable question to be asking. My answer is I think it was justified, but not obvious.

My understanding is it wasn't taken for granted that we had a way to get more progress with simply more compute until deep learning revolution, and even then people updated on specific additional data points for transformers, and even then people sometimes say "we've hit a wall!"

Maybe with more time we'd have time for the US system to collapse and be replaced with something fresh and equal to the challenges. To the extent the US was founded and set in motion by a small group of capable motivated people, it seems not crazy to think a small to large group such people could enact effective plans with a few decades.

Comment by Ruby on A Slow Guide to Confronting Doom · 2025-04-06T22:02:41.065Z · LW · GW

So gotta keep in mind that probabilities are in your head (I flip a coin, it's already tails or heads in reality, but your credence should still be 50-50). I think it can be the case that we were always doomed even if weren't yet justified in believing that.

Alternatively, it feels like this pushes up against philosophies of determinism and freewill. The whole "well the algorithm is a written program and it'll choose what is chooses deterministically" but also from the inside there are choices.

I think a reason to have been uncertain before and update more now is just that timelines seem short. I used to have more hope because I thought we had a lot more time to solve both technical and coordination problems, and then there was the DL/transformers surprise. You make a good case and maybe 50 years more wouldn't make a difference, but I don't know, I wouldn't have as high p-doom if we had that long.

Comment by Ruby on A Slow Guide to Confronting Doom · 2025-04-06T10:59:20.803Z · LW · GW

But since the number is subjective living your life like you know you are right is certainly wrong

I don't think this makes sense. Suppose you have a subjective belief that a vial of tasty fluid is lethal poison 90%, you're going to act in accordance with that belief. Now if other people think differently from you, and you think they might be right, maybe you adjust your final subjective probability to something else, but at the end of the day it's yours. That it's subjective doesn't rule it out being pretty extreme.

If what you mean is you can't be that confident given disagreement, I dunno, I wish I could have that much faith in people.

Comment by Ruby on You will crash your car in front of my house within the next week · 2025-04-02T01:55:56.968Z · LW · GW

Was a true trender-bender

Comment by Ruby on You will crash your car in front of my house within the next week · 2025-04-02T01:31:13.496Z · LW · GW

Frick. Happened to me already.

Comment by Ruby on Raemon's Shortform · 2025-03-23T03:07:38.279Z · LW · GW

"Serendipity" is a term I've been seen used for this, possibly was Venkatesh Rao.

Comment by Ruby on Eliezer's Lost Alignment Articles / The Arbital Sequence · 2025-03-14T18:02:56.473Z · LW · GW

Curated. The wiki pages collected here, despite being written in 2015-2017 remain excellent resources on concepts and arguments for key AI alignment ideas (both still widely used and those lesser known). I found that even for concepts/arguments like the orthogonality thesis and corrigibility, I felt a gain in crispness from reading these pages. The concept of, e.g. epistemic and instrumental efficiency I didn't have, yet feels useful in thinking about the rise of increasingly powerful AI.

Of course, there's also non-AI content that got imported. The Bayes guide likely remains the best resource for building Bayes intuition, and same with the guide on logarithms that is extremely thorough.

Comment by Ruby on It's been ten years. I propose HPMOR Anniversary Parties. · 2025-03-05T02:23:49.214Z · LW · GW

I think the guide should be 10x more prominent in this post.

Comment by Ruby on Arbital has been imported to LessWrong · 2025-02-24T16:51:47.195Z · LW · GW

You should see the option when you click on the triple dot menu (next to the Like button).

Comment by Ruby on Arbital has been imported to LessWrong · 2025-02-22T20:25:09.799Z · LW · GW

So the nice thing about karma is that if someone thinks a wikitag is worthy of attention for any reason (article, tagged posts, importance of concept), they're able to upvote it and make it appear higher.

Much of the current karma comes from Ben Pace and I who did a pass. Rationality Quotes didn't strike me a page I particularly wanted to boost up the list, but if you disagree with me you're able to Like it.

In general, I don't think have a lot of tagged posts should mean a wikitag should be ranked highly. It's a consideration, but I like it flowing via people's judgments about whether or not to upvote it.

The categorization is an interesting question. Indeed currently only admins can do it and that perhaps requires more thought.

Comment by Ruby on Eliezer's Lost Alignment Articles / The Arbital Sequence · 2025-02-20T21:13:42.131Z · LW · GW

Interesting. Doesn't replicate for me. What phone are you using?

Comment by Ruby on Why do we have the NATO logo? · 2025-02-20T00:51:02.016Z · LW · GW

It's a compass rose, thematic with the Map and Territory metaphor for rationality/truthseeking.

The real question is why does NATO have our logo.

Comment by Ruby on Some articles in “International Security” that I enjoyed · 2025-02-16T20:47:02.909Z · LW · GW

Curated! I like this post for the object-level interestingness of the cited papers, but also for pulling in some interesting models from elsewhere and generally reminding us that this is something we can do.

In times of yore, LessWrong venerated the the neglected virtue of scholarship. And well, sometimes it feels like it's still neglected. It's tough because indeed many domains have a lot of low quality work, especially outside of hard sciences, but I'd wager on there being a fair amount worth reading, and appreciate Buck point at a domain where that seems to be the case.

Comment by Ruby on What is malevolence? On the nature, measurement, and distribution of dark traits · 2025-02-10T18:31:02.315Z · LW · GW

Was there the text of the post in the email or just a link to it?

Comment by Ruby on The "Think It Faster" Exercise · 2025-02-09T06:25:55.531Z · LW · GW

Curated. I was reluctant to curate this post because I found myself bouncing off it some due to length – I guess in pedagogy there's a tradeoff between explaining at length (and you lose people) and you convey enough info vs keeping it brief and people read it but they don't get enough. Based on private convo, Raemon thinks length is warranted.

I'm curating because I do think this kind of project is valuable. Everyday it feels easier to lose our minds entirely to AI, and I think it's important to remember we can think better or worse, and we should be trying to do the former.

I have mixed feeling about Raemon's project overall. Parts of it feel good, something feels missing (I think I'm partial to John Wentworth's claim elsewhere that you need a bunch of technical study in the recipe), but I except the stuff Raemon is developing to be helpful to have engaged with for anyone who gets better at thinking.

Comment by Ruby on So You Want To Make Marginal Progress... · 2025-02-08T03:04:40.801Z · LW · GW

This doesn't seem right. Suppose there are two main candidates for how to get there, I-5 and J-6 (but who knows, maybe we'll be surprised by a K-7) and I don't know which Alice will choose. Suppose I know there's already a Very General Helper and Kinda Decent Generalizer, then I might say "I assign 65% chance that Alice is going to choose the I-5 and will try to contribute having conditioned on that". This seems like a reasonable thing to do. It might be for naught, but I'd guess in many case the EV of something definitely helpful if we go down Route A is better than the EV of finding something that's helpful no matter the choice.

One should definitely track the major route they're betting on and make updates and maybe switch, but seems okay to say your plan is conditioning on some bigger plan.

Comment by Ruby on What is malevolence? On the nature, measurement, and distribution of dark traits · 2025-02-08T02:54:55.448Z · LW · GW

Edit: we are not going to technically curate this post since it's an EA Forum crosspost and for boring technical reasons that breaks the curation email. I will leave this notice up though.

Curated. This piece definitely got me thinking. If we grant that some people are unusually altruistic, empathetic, etc., it stands to reason that there are others on the other end of various distributions. And then we should also expect various selection effects on where they end up.

It was definitely a puzzle piece clicking for me that these traits can coexist with [genuine] moral conviction and that the traits are egodystonic. This rings true but somehow hasn't been an explicit model for me, but yes. Combine with this the difficult of detecting these traits and resultant behaviors...and yeah, there's stuff here to think about.

I appreciate that the authors were thorough in their research but don't especially love the format. This was pretty dense and I think a post that pulled out the most key pieces of info and argued for some conclusions would be a better read, but I much prefer this to no post.

To the extent I should add my own opinions to curation notices, my thought is this makes me update against "benefit of the doubt" when witnessing concerning behaviors. I don't know that everyone beginning to scrutinize everyone else for having big D vibes would be good, but I do think scrutinizing behaviors for being high-integrity, cooperative, transparent, etc. might actually be a good direction – with the understanding that good norms around acceptable behaviors prevents abuses that anyone (however much D) is tempted towards. Something like we want to build "robust-to-malevolence" orgs and community that make it impractical or too costly to manipulate, etc.

Comment by Ruby on Open Thread Winter 2024/2025 · 2025-02-05T18:54:54.028Z · LW · GW

Welcome! Don't be too worried, you can try posting some stuff and see how it's received. Based on how you wrote this comment, I think you won't have much trouble. The New User Guide and other stuff gets worded a bit sternly because of the people who tend not to put in much effort at all and expect to be well received – which doesn't sound like you at all. It's hard hard to write one document that's stern to those who need it and more welcoming to those who need that, unfortunately.

Comment by Ruby on [deleted post] 2025-02-04T03:21:11.802Z

duplicate with Hyperstitions

Comment by Ruby on How will we update about scheming? · 2025-02-03T23:40:29.027Z · LW · GW

Curated! It strikes me that asking "how would I update in response to...?" is both sensible and straightforward thing to be asking and yet not a form of question I'm seeing. I think we could be asking the same about slow vs fast takeoff, etc. and similar questions.

The value and necessity of this question also isn't just about not waiting for future evidence to come in, but realizing that "negative results" require interpretation too. I also think there's a nice degree of "preregistration" here is well that seems neat and maybe virtuous. Kudos and thank you.

Comment by Ruby on How Do You Interpret the Goal of LessWrong and Its Community? · 2025-01-16T19:15:40.171Z · LW · GW

I'm curious why the section on "Applying Rationality" in the About page you cited doesn't feel like an answer.

Applying Rationality
You might value Rationality for its own sake, however, many people want to be better reasoners so they can have more accurate beliefs about topics they care about, and make better decisions.
Using LessWrong-style reasoning, contributors to LessWrong have written essays on an immense variety of topics on LessWrong, each time approaching the topic with a desire to know what's actually true (not just what's convenient or pleasant to believe), being deliberate about processing the evidence, and avoiding common pitfalls of human reason.

Beyond that, The Twelve Virtues of Rationality includes "scholarship" as the 11th virtue, and I think that's a deep part of LessWrong's culture and aims:

The eleventh virtue is scholarship. Study many sciences and absorb their power as your own. Each field that you consume makes you larger. If you swallow enough sciences the gaps between them will diminish and your knowledge will become a unified whole. If you are gluttonous you will become vaster than mountains. It is especially important to eat math and science which impinge upon rationality: evolutionary psychology, heuristics and biases, social psychology, probability theory, decision theory. But these cannot be the only fields you study. The Art must have a purpose other than itself, or it collapses into infinite recursion.

I would think it strange though if one could get better about reasoning and believing true things without actually trying to do that on specific cases. Maybe you could sketch out what you expect LW content to look like more.

Comment by Ruby on Ruby's Quick Takes · 2025-01-07T18:58:28.090Z · LW · GW

Errors are my own

At first blush, I find this caveat amusing.

1. If there are errors, we can infer that those providing feedback were unable to identify them.
2. If the author was fallible enough to have made errors, perhaps they are are fallible enough to miss errors in input sourced from others.

What purpose does it serve? Given its often paired with "credit goes to..<list of names> it seems like an attempt that people providing feedback/input to a post are only exposed to upside from doing so, and the author takes all the downside reputation risk if the post is received poorly or exposed as flawed.

Maybe this works? It seems that as a capable reviewer/feedback haver, I might agree to offer feedback on a poor post written by a poor author, perhaps pointing out flaws, and my having given feedback on it might reflect poorly on my time allocation, but the bad output shouldn't be assigned to me. Whereas if my name is attached to something quite good, it's plausible that I contributed to that. I think because it's easier to help a good post be great than to save a bad post.

But these inferences seem like they're there to be made and aren't changed by what an author might caveat at the start. I suppose the author might want to remind the reader of them rather than make them true through an utterance.

Upon reflection, I think (1) doesn't hold. The reviewers/input makers might be aware of the errors but be unable to save the author from them. (2) That the reviewers made mistakes that have flowed into the piece seems all the more likely the worse the piece is overall, since we can update that the author wasn't likely to catch them.

On the whole, I think I buy the premise that we can't update too much negatively on reviewers and feedback givers from them having deigned to give feedback on something bad, though their time allocation is suspect. Maybe they're bad at saying no, maybe they're bad at dismissing people's ideas aren't that good, maybe they have hope for this person. Unclear. Upside I'm more willing to attribute.

Perhaps I would replace the "errors are my my own[, credit goes to]" with a reminder or pointer that these are the correct inferences to make. The words themselves don't change them? Not sure, just musing here.

Edited To Add: I do think "errors are my own" is a very weird kind of social move that's being performed in an epistemic contexts and I don't like.

Comment by Ruby on Reasons for and against working on technical AI safety at a frontier AI lab · 2025-01-05T17:05:10.991Z · LW · GW

This post is comprehensive but I think "safetywashing" and "AGI is inherently risky" are far too towards and the end and get too little treatment, as I think they're the most significant reasons against.

This post also makes no mention of race dynamics and how contributing to them might outweigh the rest, and as RyanCarey says elsethread, doesn't talk about other temptations and biases that push people towards working at labs and would apply even if it was on net bad.

Comment by Ruby on When Is Insurance Worth It? · 2024-12-23T20:00:58.343Z · LW · GW

Curated. Insurance is a routine part of life, whether it be the car and home insurance we necessarily buy or the Amazon-offered protection one reflexively declines, the insurance we know doctors must have, businesses must have, and so on.

So it's pretty neat when someone comes along along and (compellingly) says "hey guys, you (or are at least most people) are wrong about when insurance makes sense to buy, the reasons you have are wrong, here's the formula".

While assumptions can be questioned, e.g. infiniteness badness of going bankrupt and other factors can be raised, this is just a neat technical treatment of a very practical, everyday question. I expect that I'll be thinking in terms of this myself making various insurance choices. Kudos!

Comment by Ruby on AIs Will Increasingly Attempt Shenanigans · 2024-12-19T07:27:29.935Z · LW · GW

Curated. This is a good post and in some ways ambitious as it tries to make two different but related points. One point – that AIs are going to increasingly commit shenanigans – is in the title. The other is a point regarding the recurring patterns of discussion whenever AIs are reported to have committed shenanigans. I reckon those patterns are going to be tough to beat, as strong forces (e.g. strong pre-existing conviction) cause people to take up the stances they do, but if there's hope for doing better, I think it comes from understanding the patterns.

There's a good round up of recent results in here that's valuable on its own, but the post goes further and sets out to do something pretty hard in advocating for the correct interpretation of the results. This is hard because I think the correct interpretation is legitimately subtle and nuanced, with the correct update depending on your starting position (as Zvi explains). It sets out and succeeds.

Lastly, I want to express my gratitude for Zvi's hyperlinks to lighter material, e.g. "Not great, Bob" and "Stop it!" It's a heavy world with these topics of AI, and the lightness makes the pill go down easier. Thanks

Comment by Ruby on LessWrong FAQ · 2024-12-13T18:40:42.881Z · LW · GW

Yes, true, fixed, thanks!

Comment by Ruby on avturchin's Shortform · 2024-10-31T17:35:36.795Z · LW · GW

Dog: "Oh ho ho, I've played imaginary fetch before, don't you worry."

Comment by Ruby on Occupational Licensing Roundup #1 · 2024-10-30T17:38:50.866Z · LW · GW

My regular policy is to not frontpage newsletters, however I frontpaged this one as it's the first in the series and I think it's neat for more people to know this is a series Zvi intends to write.

Comment by Ruby on A bird's eye view of ARC's research · 2024-10-27T17:24:03.213Z · LW · GW

Curated! I think it's generally great when people explain what they're doing and why in way legibile to those not working on it. Great because it let's others potentially get involved, build on it, expose flaws or omissions, etc. This one seems particularly clear and well written. While I haven't read all of the research, nor am I particularly qualified to comment on it, I like the idea of a principled/systematic approach behind, in comparison to a lot of work that isn't coming on a deeper, bigger, framework.

(While I'm here though, I'll add a link to Dmitry Vaintrob's comment that Jacob Hilton described as "best critique of ARC's research agenda that I have read since we started working on heuristic explanations". Eliciting such feedback is the kind of good thing that comes out of up writing agendas – it's possible or likely Dmitry was already tracking the work and already had these critiques, but a post like this seems like a good way to propagate them and have a public back and forth.)

Roughly speaking, if the scalability of an algorithm depends on unknown empirical contingencies (such as how advanced AI systems generalize), then we try to make worst-case assumptions instead of attempting to extrapolate from today's systems.

I like this attitude. The human standard, I think often in alignment work too, is to argue why one's plan will work and find stories for that, and adopting the methodology of the opposite, especially given the unknowns, is much needed in alignment work.

Overall, this is neat. Kudos to Jacob (and rest of the team) for taking the time to put this all together. Doesn't seem all that quick to write, and I think it'd be easy to think they ought to not take time out off from further object-level research to write it. Thanks!

Comment by Ruby on New User's Guide to LessWrong · 2024-10-26T18:37:38.606Z · LW · GW

Thanks! Fixed

Comment by Ruby on Why I’m not a Bayesian · 2024-10-19T17:43:48.961Z · LW · GW

Curated. I really like that even though LessWrong is 1.5 decades old now and has Bayesianism assumed as background paradigm while people discuss everything else, nonetheless we can have good exploration of our fundamental epistemological beliefs.

The descriptions of unsolved problems, or at least incompleteness of Bayesianism strikes me as technically correct. Like others, I'm not convinced of Richard's favored approach, but it's interesting. In practice, I don't think these problems undermine the use of Bayesianism in typical LessWrong thought. For example, I never thought of credences being applied to "propositions" rigorously, and more like "hypotheses" or possibilities for how things are that could be framed as models already too. Context-dependent terms like "large" or quantities without explicit tolerances like "500ft" are the kind of things that you you taboo or reduce if necessary either for your own reasoning or a bet

That said, I think the claims about mistakes and downstream consequences of the way people do Bayesianism are interesting. I'm reading a claim here I don't recall seeing. Although we already knew that bounded reasons aren't logically omniscient, Richard is adding a claim (if I'm understanding correctly) that this means that no matter how much strong evidence we technically have, we shouldn't have really high confidence in any domain that requires heavy of processing that evidence, because we're not that good at processing. I do think that leaves us with a question of judging when there's enough evidence to be conclusive without complicated processing or not.

Something I might like a bit more factored out is the rigorous gold-standard epistemological framework and the manner in which we apply our epistemology day to day.

I fear this curation notice would be better if I'd read all the cited sources on critical rationalism, Knightian uncertainty, etc., and I've added them to my reading list. All in all, kudos for putting some attention on the fundamentals.

Comment by Ruby on Open Thread Fall 2024 · 2024-10-17T18:03:18.308Z · LW · GW

Welcome! Sounds like you're on the one hand at start of a significant journey but also you've come a long distance already. I hope you find much helpful stuff on LessWrong.

I hadn't heard of Daniel Schmachtenberger, but I'm glad to have learend of him and his works. Thanks.

Comment by Ruby on 2024 Petrov Day Retrospective · 2024-09-29T03:22:28.082Z · LW · GW

The actual reason why we lied in the second message was "we were in a rush and forgot."

My recollection is we sent the same message to the majority group because:

Treating it different would require special-casing it and that would have taken more effort.
If selectors of different virtues had received a different messages, we wouldn't be able to have a properly compared their behavior.
[At least in my mind], this was a game/test and when playing games you lie to people in the context of the game to make things work. Alternatively, it's like how scientific experimenters mislead subjects for the sake of the study.

Comment by Ruby on Ruby's Quick Takes · 2024-09-29T00:53:18.142Z · LW · GW

Added!

Comment by Ruby on Ruby's Quick Takes · 2024-09-29T00:52:40.556Z · LW · GW

Added!

Comment by Ruby on Ruby's Quick Takes · 2024-09-29T00:51:09.251Z · LW · GW

Money helps. I could probably buy a lot of dignity points for a billion dollars. With a trillion variance definitely goes up because you could try crazy stuff and could backfire. (I mean true for a billion too). But EV of such a world is better.

I don't think there's anything that's as simple as writing a check though.

US Congress gives money to specific things. I do not have a specific plan for a trillion dollars.

I'd bet against Terrance Tao being some kind of amazing breakthrough researcher who changes the playing field.

Comment by Ruby on Ruby's Quick Takes · 2024-09-28T17:14:37.045Z · LW · GW

Your access should be activated within 5-10 minutes. Look for the button in the bottom right of the screen.

Comment by Ruby on Ruby's Quick Takes · 2024-09-28T15:57:33.524Z · LW · GW

Not an original observation but yeah, separate from whether it's desirable, I think we need to be planning for it.

Comment by Ruby on Ruby's Quick Takes · 2024-09-28T00:22:24.305Z · LW · GW

Just thinking through simple stuff for myself, very rough, posting in the spirit of quick takes

At present, we are making progress on the Technical Alignment Problem^[2] and like probably could solve it within 50 years.
Humanity is on track to build ~lethal superpowerful AI in more like 5-15 years.
Working on technical alignment (direct or meta) only matters if we can speed up overall progress by 10x (or some lesser factor if AI capabilities is delayed from its current trajectory). Improvements of 2x are not likely to get us to an adequate technical solution in time.
Working on slowing things down is only helpful if it results in delays of decades.
1. Shorter delays are good in so far as they give you time to buy further delays.
There is technical research that is useful for persuading people to slow down (and maybe also solving alignment, maybe not). This includes anything that demonstrates scary capabilities or harmful proclivities, e.g. a bunch of mech interp stuff, all the evals stuff.
AI is in fact super powerful and people who perceive there being value to be had aren’t entirely wrong^[3]. This results in a very strong motivation to pursue AI and resist efforts to be stopped
1. These motivations apply to both businesses and governments.
People are also developing stances on AI along ideological, political, and tribal lines, e.g. being anti-regulation. This generates strong motivations for AI topics even separate from immediate power/value to be gained.
Efforts to agentically slow down the development of AI capabilities are going to be matched by agentic efforts to resist those efforts and push in the opposite direction.
1. Efforts to convince people that we ought to slow down will be matched by people arguing that we must speed up.
2. Efforts to regulate will be matched by efforts to block regulation. There will be efforts to repeal or circumvent any passed regulation.
3. If there are chip controls or whatever, there will be efforts to get around that. If there are international agreements, there will be efforts to clandestinely hide.
4. If there are successful limitations on compute, people will compensate and focus on algorithmic progress.
Many people are going to be extremely resistant to being swayed on topics of AI, no matter what evidence is coming in. Much rationalization will be furnished to justify proceeding no matter the warning signs.
By and large, our civilization has a pretty low standard of reasoning.
People who want to speed up AI will use falsehoods and bad logic to muddy the waters, and many people won’t be able to see through it^[4]. No matter the evals or other warning signs, there will be people arguing it can be fixed without too much trouble and we must proceed.
In other words, there’s going to be an epistemic war and the other side is going to fight dirty^[5], I think even a lot of clear evidence will have a hard time against people’s motivations/incentives and bad arguments.
When there are two strongly motivated sides, seems likely we end up in a compromise state, e.g. regulation passes but it’s not the regulation originally designed that even in its original form was only maybe actually enough.
It’s unclear to me whether “compromise regulation” will be adequate. Or that any regulation adequate to cost people billions in anticipated profit will conclude with them giving up.

Further Thoughts

People aren’t thinking or talking enough about nationalization.
1. I think it’s interesting because I expect that a lot of regulation about what you can and can’t do stops being enforceable once the development is happening in the context of the government performing it.

What I Feel Motivated To Work On

Thinking through the above, I feel less motivated to work on things that feel like they’ll only speed up technical alignment problem research by amounts < 5x. In contrast, maybe there’s more promise in:

Cyborgism or AI-assisted research that gets up 5x speedups but applies differentially to technical alignment research
Things that convince people that we need to radically slow down
- good writing
- getting in front of people
- technical demonstrations
- research that shows the danger
  - why the whole paradigm isn’t safe
  - evidence of deception, etc.
Development of good (enforceable) “if-then” policy that will actually result in people stopping in response to various triggers, and not just result in rationalization for why actually it’s okay to continue (ignore signs) or just a bandaid solution
Figuring out how to overcome people’s rationalization
Developing robust policy stuff that’s set up to withstand lots of optimization pressure to overcome it
Things that cut through the bad arguments of people who wish to say there’s no risk and discredit the concerns
Stuff that prevents national arms races / gets into national agreements
Thinking about how to get 30 year slowdowns

^{^}
By “slowing down”, I mean all activities and goals which are about preventing people from building lethal superpowerful AI, be it via getting them to stop, getting to go slower because they’re being more cautious, limiting what resources they can use, setting up conditions for stopping, etc.
^{^}
How to build a superpowerful AI that does what we want.
^{^}
They’re wrong about their ability to safely harness the power, but not if you could harness, you’d have a lot of very valuable stuff.
^{^}
My understanding is a lot of falsehoods were used to argue against SB1047 by e.g. a16z
^{^}
Also some people arguing for AI slowdown will fight dirty too, eroding trust in AI slowdown people, because some people think that when the stakes are high you just have to do anything to win, and are bad at consequentialist reasoning.

Comment by Ruby on Ruby's Quick Takes · 2024-09-27T23:41:01.479Z · LW · GW

The “Deferred and Temporary Stopping” Paradigm

Quickly written. Probably missed where people are already saying the same thing.

I actually feel like there’s a lot of policy and research effort aimed at slowing down the development of powerful AI–basically all the evals and responsible scaling policy stuff.

A story for why this is the AI safety paradigm we’ve ended up in is because it’s palatable. It’s palatable because it doesn’t actually require that you stop. Certainly, it doesn’t right now. To the extent companies (or governments) are on board, it’s because those companies are at best promising “I’ll stop later when it’s justified”. They’re probably betting that they’ll be able to keep arguing it’s not yet justified. At the least, it doesn’t require a change of course now and they’ll go along with it to placate you.

Even if people anticipate they will trigger evals and maybe have to delay or stop releases, I would bet they’re not imagining they have to delay or stop for all that long (if they’re even thinking it through that much). Just long enough to patch or fix the issue, then get back to training the next iteration. I'm curious how many people imagine that once certain evaluations are triggered, the correct update is that deep learning and transformers are too shaky a foundation. We might then need to stop large AI training runs until we have much more advanced alignment science, and maybe a new paradigm.

I'd wager that if certain evaluations are triggered, there will be people vying for the smallest possible argument to get back to business as usual. Arguments about not letting others get ahead will abound. Claims that it's better for us to proceed (even though it's risky) than the Other who is truly reckless. Better us with our values than them with their threatening values.

People genuinely concerned about AI are pursuing these approaches because they seem feasible compared to an outright moratorium. You can get companies and governments to make agreements that are “we’ll stop later” and “you only have to stop while some hypothetical condition is met”. If the bid was “stop now”, it’d be a non-starter.

And so the bet is that people will actually be willing to stop later to a much greater extent than they’re willing to stop now. As I write this, I’m unsure of what probabilities to place on this. If various evals are getting triggered in labs:

What probability is there that the lab listens to this vs ignores the warning sign and it doesn’t even make it out of the lab?
If it gets reported to the government, how strongly does the government insist on stopping? How quickly is it appeased before training is allowed to resume?
If a released model causes harm, how many people skeptical of AI doom concerns does it convince to change their mind and say “oh, actually this shouldn’t be allowed”? How many people, how much harm?
How much do people update that AI in general is unsafe vs that particular AI from that particular company is unsafe, and only they alone should be blocked?
How much do people argue that even though there are signs of risk here, it’d be more dangerous to let other pull ahead?
And if you get people to pause for a while and focus on safety, how long will they agree to a pause for before the shock of the damaged/triggered eval gets normalized and explained away and adequate justifications are assembled to keep going?

There are going to be people who fight tooth and nail, weight and bias, to keep the development going. If we assume that they are roughly equally motivated and agentic as us, who wins? Ultimately we have the harder challenge in that we want to stop others from doing something. I think the default is people get to do things.

I think there's a chance that various evals and regulations do meaningfully slow things down, but I write this to express the fear that they're false reassurance–there's traction only because people who want to build AI are betting this won't actually require them to stop.

Related:

Comment by Ruby on Skills from a year of Purposeful Rationality Practice · 2024-09-26T07:13:34.220Z · LW · GW

Curated. I think Raemon's been doing a lot of work in the last year pushing this stuff, and this post pulls together in one place a lot of good ideas/advice/approach.

I would guess that because of the slow or absent feedback loops, people don't realize how bad human reasoning and decision-making is when operating outside of the familiar and quick feedback. That's many domains, but certainly the whole AI situation. Ray is going after the hard stuff here.

And the same time, this stuff ends up feeling like the "eat your vegetables" of reasoning and decision-making. It's not sexy, or at least it's not that fun to sit down and e.g. try to brainstorm further plans when you already have one that's appealing. or backchain from your ostensible goal. I think we'd be in a better place if these skills and practices were normalized, in the sense of there's a norm that you do these things and if you don't, then you're probably screwing up.

Comment by Ruby on Ruby's Quick Takes · 2024-09-22T19:55:31.590Z · LW · GW

Yeah, I think a question is whether I want to say "that kind of wireheading isn't mypoic" vs "that isn't wireheading". Probably fine eitherway if you're consistent / taboo adequately.

Comment by Ruby on Lighthaven Sequences Reading Group #3 (Tuesday 09/24) · 2024-09-22T16:19:35.230Z · LW · GW

My guess is Ben created the event while on the East Coast and 6pm got timezone converted for West Coast. I've fixed it.

Comment by Ruby on Ruby's Quick Takes · 2024-09-22T16:16:31.304Z · LW · GW

Once I'm rambling, I'll note another thought I've been mulling over:

My notion of value is not the same as the value that my mind was optimized to pursue. Meaning that I ought to be wary that typical human thought patterns might not be serving me maximally.

That's of course on top of the fact that evolution's design is flawed even by its own goals; humans rationanlize left, right, and center, are awfully myopic, and we'll likely all die because of it.

Comment by Ruby on Ruby's Quick Takes · 2024-09-22T16:13:16.380Z · LW · GW

There's an age old tension between ~"contentment" and ~"striving" with no universally accepted compelling resolution, even if many people feel they have figured it out. Related:

In my own thinking, I've been trying to ground things out in a raw consequentialism that one's cognition (including emotions) is just supposed to take you towards more value (boring, but reality is allowed to be)^[1].

I fear that a lot of what people do is ~"wireheading". The problem with wireheading is it's myopic. You feel good now (small amount of value) at the expense of greater value later. Historically, this has made me instinctively wary of various attempts to experience more contentment such as gratitude journaling. Do such things curb the pursuit of value in exchange for feeling better less unpleasant discontent in the moment?

Clarity might come from further reduction of what "value" is. The primary notion of value I operate with is preference satisfaction: the world is how you want it to be. But also a lot of value seems to flow through experience (and the preferred state of the world is one where certain experiences happen).

A model whereby gratitude journaling (or general "attend to what is good" motions) maximize value as opposed to the opposite, is that they're about turning 'potential value' into 'experienced actual value'. The sunset on its own is merely potential value, it becomes experienced actual value when you stop and take it in. The same for many good things in one's life you might have just gotten used it, but could be enjoyed and savored (harvested) again by attending to them.

Relatedly, I've thought a distinction between actions that "sow value" vs "reap value", roughly mapping onto actions that are instrumental vs terminal to value, roughly mapping to "things you do to get enjoyment later" vs "things you actually enjoy^[2] now".

My guess is that to maximize value over one's lifetime (the "return" in RL terms), one shouldn't defer reaping/harvesting value until the final timestep. Instead you want to be doing a lot of sowing but also reaping/harvesting as you go to, and gratitude-journaling-esque, focus-on-what-you-got-already stuff faciliates that, and is part of of value maximization, not simply wireheading.

It's a bit weird in our world, because the future value you can be sowing for (i.e. the entire cosmic endowment not going to waste) is so overwhelming, it kinda feels like maybe it should outweigh any value you might reap now. My handwavy answer is something something human psychology it doesn't work to do that.

I'm somewhat rederiving standard "obvious" advice, but I don't think it actually is, and figuring out better models and frameworks might ultimately solve the contentment/striving tension (/ focus on what you go vs focus on what you don't tension).

^{^}
And as usual, that doesn't mean one tries to determine the EV of every individual mental act. It means when setting up policies, habits, principles, etc., etc., that ultimate the thing that determines whether those are good is the underlying value consequentialism.
^{^}
To momentarily speak in terms of experiential value vs preference satisfaction value.

Comment by Ruby on Which LessWrong/Alignment topics would you like to be tutored in? [Poll] · 2024-09-19T02:17:34.965Z · LW · GW

Applied Game Theory

Comment by Ruby on Which LessWrong/Alignment topics would you like to be tutored in? [Poll] · 2024-09-19T02:16:32.645Z · LW · GW

CFAR-style Rationality Techniques

Comment by Ruby on Which LessWrong/Alignment topics would you like to be tutored in? [Poll] · 2024-09-19T01:48:33.963Z · LW · GW

Anthropics

User info

Posts

Comments

Applying Rationality

Further Thoughts

What I Feel Motivated To Work On

The “Deferred and Temporary Stopping” Paradigm