Posts
Comments
All the smart trans girls I know were also smart prior to HRT.
I feel like Project Lawful, as well as many of Lintamande's other glowfic since then, have given me a whole lot deeper an understanding of... a collection of virtues including honor, honesty, trustworthiness, etc, which I now mostly think of collectively as "Law".
I think this has been pretty valuable for me on an intellectual level—I think, if you show me some sort of deontological rule, I'm going to give a better account of why/whether it's a good idea to follow it than I would have before I read any glowfic.
It's difficult for me to separate how much of that is due to Project Lawful in particular, because ultimately I've just read a large body of work which all had some amount of training data showing a particular sort of thought pattern which I've since learned. But I think this particular fragment of the rationalist community has given me some valuable new ideas, and it'd be great to figure out a good way of acknowledging that.
i think they presented a pretty good argument that it is actually rather minor
While the concept that looking at the truth even when it hurts is important isn't revolutionary in the community, I think this post gave me a much more concrete model of the benefits. Sure, I knew about the abstract arguments that facing the truth is valuable, but I don't know if I'd have identified it as an essential skill for starting a company, or as being a critical component of staying in a bad relationship. (I think my model of bad relationships was that people knew leaving was a good idea, but were unable to act on that information—but in retrospect inability to even consider it totally might be what's going on some of the time.)
So if a UFO lands in your backyard and aliens ask if you if you want to go on a magical (but not particularly instrumental) space adventure with them, I think it's reasonable to very politely decline, and get back to work solving alignment.
I think I'd probably go for that, actually, if there isn't some specific reason to very strongly doubt it could possibly help? It seems somewhat more likely that I'll end up decisive via space adventure than by mundane means, even if there's no obvious way the space adventure will contribute.
This is different if you're already in a position where you're making substantial progress though.
nonetheless, i think the analogy is still suggestive that an AI selectively shaped for whatever might end up deliberately maximizing something else
i think, in retrospect, this feature was a really great addition to the website.
This post introduced me to a bunch of neat things, thanks!
There are several comments "suggesting that maybe the cause is mental illness".
But personally, I think having such a standard is both unreasonable and inconsistent with the implicit standard set by essays from Yudkowsky and other MIRI people.
I think this is largely coming from an attempt to use approachable examples? I could believe that there were times when MIRI thought that even getting something as good as ChatGPT might be hard, in which case they should update, but I don't think they ever believed that something as good as ChatGPT is clearly sufficient. I certainly never believed that, at least.
Yes, we told everyone they were in the minority. It's a "game".
I think this is bad. I mean, it's not that big a deal, but I generally speaking expect messages I receive from The LessWrong Team to not tell falsehoods.
Hmm.
I don't think Avoiding actions that noticeably increase the chance civilization is destroyed is necessarily the most practically-relevant virtue, for most people, but it does seem to me like it's the point of Petrov day in particular. If we're recognizing Petrov as a person, I'd say that was Petrov's key virtue.
Or maybe I'd say something like "not doing very harmful acts despite incentives to do so"—I think "resisting social pressure" isn't quite on the mark, but I think it is important to Petrov day that there were strong incentives against what Petrov did.
I think other virtues are worth celebrating, but I think I'd want to recognize them on different holidays.
I mean, that's a thing you might hope to be true. I'm not sure if it actually is true.
I think, if you had several UDT agents with the same source code, and then one UDT agent with slightly different source code, you might see the unique agent defect.
I think the CDT agent has an advantage here because it is capable of making distinct decisions from the rest of the population—not because it is CDT.
I'm not sure "original instantiation" is always well-defined
I think personally I'd say it's a clear advancement—it opens up a lot of puzzles, but the naïve intuition corresponding to it it still seems more satisfying than CDT or EDT, even if a full formalization is difficult.
(Not to comment on whether there might be a better communications strategy for getting the academic community interested.)
Provided that you make sure you don't publish some massive capabilities progress—which I think is pretty unlikely for most undergrads—I think the benefits from having an additional alignment-conscious person with relevant skills probably outweighs the very marginal costs of tiny incremental capabilities ideas.
I think a lot of travel expenses?
I was confused by the disagree votes on this comment, so I looked—the comment in question is highest on the default "new and upvoted" sorting, but it isn't highest on the "top" sorting.
I'm more confident that we should generally have norms prohibiting using threats of legal action to prevent exchange of information than I am of the exact form those norms should take. But to give my immediate thoughts:
I think the best thing for Alice to do if Bob is lying about her is to just refute the lies. In an ideal world, this is sufficient. In practice, I guess maybe it's insufficient, or maybe refuting the lies would require sharing private information, so if necessary I would next escalate to informing forum moderators, presenting evidence privately, and requesting a ban.
Only once those avenues are exhausted might I consider threatening a libel suit acceptable.
I do notice now that the Nonlinear situation in particular is impacted by Ben Pace being a LessWrong admin—so if step 1 doesn't work, step 2 might have issues, so maybe escalating to step 3 might be acceptable sooner than usual.
Concerns have been raised that there might be some sort of large first-mover advantage. I'm not sure I buy this—my instinct is that the Nonlinear cofounders are just bad-faith actors making any arguments that seem advantageous to them (though out of principle I'm trying to withhold final judgement). That said, I could definitely imagine deciding in the future that this is a large enough concern to justify weaker norms against rapid escalation.
Kudos for doing the exercise!
I think a comment "just asking for people to withhold judgement" would not be especially downvoted. I think the comments in which you've asked people to withhold judgement include other incredibly toxic behavior.
I think we should have a community norm that threatening libel suits (or actually suing) is incredibly unacceptable in almost all cases—I'm not sure what the exact exceptions should be, but maybe it should require "they were knowingly making false claims."
I feel unsure whether it would be good to enforce such a norm regarding the current Nonlinear situation because there wasn't common knowledge beforehand and because I feel too strongly about this norm to not be afraid that I'm biased (and because hearing them out is the principled thing to do). But I think building common knowledge of such a norm would be good.
While I guess I will be trying to withhold some judgment out of principle, I legitimately cannot imagine any plausible context which will make this any different.
I don't think having a beauty-detector that works the same way humans' beauty-detectors do implies that you care about beauty?
hmm. i think you're missing eliezer's point. the idea was never that AI would be unable to identify actions which humans consider good, but that the AI would not have any particular preference to take those actions.
There's definitely also many misspellings in the training data without correction which it needs to nonetheless make sense of
Does anyone know if there is a PDF version of the Sequence Highlights anywhere? (Or any ebook format is fine probably.)
...are they trading with, like, a vending machine, rather than with each other?
I'm confused by the pens and mugs example. Sure if only 10 of the people who got mugs would prefer a pen, then that means that at most ten trades should happen—once the ten mug-receiving pen-likers trade, there won't be any other mug-owners willing to trade? so don't you get 20 people trading, 20%, not 50%?
AI systems, such as Large Language Models (LLMs), are trained on human data and designed by human engineers. It's impossible for them to exceed the bounds of human knowledge and expertise, as they're inherently limited by the information they've been exposed to.
Maybe, on current algorithms, LLMs run into a plateau around the level of human expertise. That does seem plausible. But it is not because being trained on human data necessarily limits you to human level!
Accurately predicting human text is much harder than just writing stuff on the internet. If GPT were to perfect the skill it is being trained on, it would have to be much smarter than a human!
Even if shut down in particular isn't something we want it to be indifferent to, I think being able to make an agent indifferent to something is very plausibly useful for designing it to be corrigible?
I don't think you can have particularly high confidence one way or the other without just thinking about AI in enough detail to have an understanding of the different ways that AI development could end up shaking out in. There isn't a royal road.
Both the "doom is disjunctive" and "AI is just like other technologies" arguments really need a lot more elaboration to be convincing, but—personally I find the argument that AI is different from other technologies pretty obvious and I have a hard time imagining what the counterargument would be.
I can imagine a world where LLMs tend to fall into local maxima where they get really good at imitation or simulation, and then they plateau (perhaps only until their developers figure out what adjustments need to be made). But I don't have a good enough model of LLMs to be very sure whether that will happen or not.
I really like the "when you don't have a good detailed model you need to figure out what space you should have the maximum entropy distribution over" framing
I think abuse issues in rationalist communities are worth discussing, but I don't think people who have been excluded from the community for years are a very productive place to begin such a discussion.
This feels worth trying to me
I do know that I want my own children to stay off social media, and minimize their ownership and use of smart phones, for as long as they possibly can. And that I intend to spend quite a lot of my available points, if needed, to fight for this. And that if I was running a school I’d do my best to shut the phones down during school hours.
(...)
We can also help this along by improving alternatives to phone use. If children aren’t allowed to go places without adults knowing, or worse adults driving them and coming along and watching them, what do you think they are going to do all day? What choices do they have?
I'm not certain whether my intuition should be trusted here, since this is definitely the kind of thing my brain would form a habit of rationalizing about. But my guess is that I would've been way worse off without phones/social media/stuff. I didn't really have any great alternatives to socializing on the internet—the only people I ever interacted with in person were devout Christians.
So I tentatively think it might be better to really focus on the improving alternatives part first? I'm sure I would've been much better off if I had good in-person friends, but I don't think not having access to social media would have really helped with that, it'd just have meant I wouldn't have any good friends at all.
(I would expect Zvi in particular has good enough parenting skills to not run into that. But I know a lot of people with terrible parents who think they can fix the problem just by monitoring their children's access to technology, which seems terrible for them to me? So I worry about how good it is as general advice.)
I don't think you understand what mathematicians mean by the word "complete." It means that all theorems which can be stated in the system can also be proven in the system (or something similar).
A (late) section of Project Lawful argues that there would likely be acausal coordination to avoid pessimizing the utility function (of anyone you are coordinating with), as well as perhaps to actively prevent utility function pessimization.
this already seems pretty good imo
I don't think security mindset means "look for flaws." That's ordinary paranoia. Security mindset is something closer to "you better have a really good reason to believe that there aren't any flaws whatsoever." My model is something like "A hard part of developing an alignment plan is figuring out how to ensure there aren't any flaws, and coming up with flawed clever schemes isn't very useful for that. Once we know how to make robust systems, it'll be more clear to us whether we should go for melting GPUs or simulating researchers or whatnot."
That said, I have a lot of respect for the idea that coming up with clever schemes is potentially more dignified than shooting everything down, even if clever schemes are unlikely to help much. I respect carado a lot for doing the brainstorming.
I mostly expect by the time we know how to make a seed superintelligence and give it a particular utility function... well, first of all the world has probably already ended, but second of all I would expect progress on corrigibility and such to have been made and probably to present better avenues.
If Omega handed me aligned-AI-part-2.exe
, I'm not quite sure how I would use it to save the world? I think probably trying to just work on the utility function outside of a simulation is better, but if you are really running out of time then sure, I guess you could try to get it to simulate humans until they figure it out. I'm not very convinced that referring to a thing a person would have done in a hypothetical scenario is a robust method of getting that to happen, though?
I have a pretty strong heuristic that clever schemes like this one are pretty doomed. The proposal seems to lack security mindset, as Eliezer would put it.
The most immediate/simple concrete objection I have is that no one has any idea how to create aligned-AI-part-2.exe
? I don't think figuring out what we'd do if we knew how to make a program like that is really the difficult part here.
CDT gives into blackmail (such as the basilisk), whereas timeless decision theories do not.
My personal suspicion is that an AI being indifferent between a large class of outcomes matters little; it's still going to absolutely ensure that it hits the pareto frontier of its competing preferences.
Have you read / are you interested in reading Project Lawful? It eventually explores this topic in some depth—though mostly after a million words of other stuff.
I think "existential risk" is a bad name for a category of things that isn't "risks of our existence ending."
I mostly think the phrase "psychologically addictive" is way less clear than necessary to communicate to me.
I think I would write the paragraph as something vaguely like:
"The physiological withdrawal symptoms of Benzodiazepines can be avoided—but often people have a bad time coming of Benzodiazepines because they start relying on them over other coping mechanisms. So doctors try to avoid them."
It seems possible to come up with something that is both succinct and actually communicates the gears.