Posts
Comments
Would the ability to deceive humans when specifically prompted to do so be considered an example? I would think that large LMs get better at devising false stories about the real world that people could not distinguish from true stories.
On the idea of "we can't just choose not to build AGI". It seems like much of the concern here is predicated on the idea that so many actors are not taking safety seriously, so someone will inevitably build AGI when the technology has advanced sufficiently.
I wonder if struggles with AIs that are strong enough to cause a disaster but not strong enough to win instantly may change this perception? I can imagine there being very little gap if any between those two types of AI if there is a hard takeoff, but to me it seems quite possible for there be some time at that stage. Some sort of small/moderate disaster with a less powerful AI might get all the relevant players to realize the danger. At that point, humans have done reasonably well at not doing things that seem very likely to destroy the world immediately (e.g. nuclear war).
Though we've been less good at putting good safeguards in place to prevent it from happening. And even if all groups that could create AI agree to stop, eventually someone will think they know how to do it. And we still only get the one chance.
All that is to say I don't think it's implausible that we'll be able to coordinate well enough to buy more time, though it's unclear whether that will do much to avoiding eventual doom.
I feel like a lot of the angst about free will boils down to conflicting intuitions.
- It seems like we live in a universe of cause and effect, thus all my actions/choices are caused by past events.
- It feels like I get to actually make choices, so 1. obviously can't be right.
The way to reconcile these intuitions is to recognize that yes, all the decisions you make are in a sense predetermined, but a lot of what is determining those decisions is who you are and what sort of thing you would do in a particular circumstance. You are making decisions, that experience is not invalidated by a fully deterministic universe. It's just that you are who you are and you'll make the decision that you would make.
That’s true, there was a huge amount of outrage even before those details came out however.
I of course don’t have insider information. My stance is something close to Buffett’s advice “be fearful when others are greedy, and greedy when others are fearful”. I interpret that as basically that markets tend to be overly reactionary and if you go by fundamentals representing the value of a stock you can potentially outperform the market in the long run. To your questions, yes disaster may really occur, but my opinion is that these risks are not sufficient to pass up the value here. I’ll also note that Charlie munger has been acquiring a substantial stake in BABA, which makes me more confident in its value at its current price.
Alibaba (BABA) - the stock price has been pulled down by fear about regulation, delisting, and most recently instability in China as it's zero covid policy fails. However, as far as I can tell, the price is insanely low for the amount of revenue Alibaba generates and the market share that it holds in China.
Current bioethics norms will strongly condemn this sort of research, which may make it challenging to pursue in the nearish term. The consensus is strongly against, which will make acquiring funding difficult and any human CRISPR editing is completely off the table for now. For example, He Jiankui CRISPR edited some babies in China to make them less susceptible to HIV and went to prison for it.
Do I understand you correctly as endorsing something like: it doesn’t matter how narrow an optimization process is, if it becomes powerful enough and is not well aligned, it still ends in disaster
I’m not sure the problem in biology is decoding. At least not in the same sense it is with neural networks. I see the main difficulty in biology more one of mechanistic inference where a major roadblock may be getting better measurements of what is going on in cells over time rather some algorithm that’s just going to be able to overcome the fact that you’re getting both very high levels of molecular noise in biological data and single snapshots in time that are difficult to place in context. With a neural network you have the parameters and it seems reasonable to say you just need some math to make it more interpretable.
Whereas in biology I think we likely need both better measurements and better tools. I’m not sure the same tools would be particularly applicable to the ai interpretability problem either.
If, for example, I managed to create mathematical tools to reliably learn mechanistic dependencies between proteins and/or genes from high dimensional biological data sets, it’s not clear to me that would be easily applicable to extracting bayes nets from large neural networks.
I’m coming at this from a comp bio angle so it’s possible I’m just not seeing the connections well, having not worked in both fields.
In general the observation from working in the field is that if you have a simple metric, people will figure out how to game it. So you need to build in a lot of safeguards, and you need to evolve all the time as the spammers/abusers evolve. There's no end point, no place where you think you're done, just an ever changing competition.
That's what I was trying to point at in regards to the problem not being patchable. It doesn't seem like there is some simple patch you can write, and then be done. A solution that would work more permanently seems to have some of the "impossible" character of AGI alignment and trying to solve it on that level seems like it could be valuable for AGI alignment researchers.
Another is leisure. People would still need breaks and want to use the work they had done in the past to purchase the ability to stay at a beach resort for a while.
In your opinion, would a resurrection/afterlife change this equation at all?
Yes, an afterlife transforms death (at least relatively low-pain deaths) into something that's really not that bad. It's sad in the sense you won't see a person for a while, but that's not remotely on the level of a person being totally obliterated, which is my current interpretation of death on the basis that I see no compelling evidence for an afterlife. Considering that one's mental processes continuing after the brain ceases to function would rely on some mechanism unknown to our current understanding of reality, I would want considerable evidence to consider an afterlife plausible.
To answer your thought experiment - it depends. For myself, almost certainly. Some friends and family I have discussed cryonics with have expressed little to no interest in living beyond the "normal" biological amount of time. I think they are misguided, but I would not presume to choose this for them. Those who have expressed interest in cryonics I would probably sign up. However, I think your analogy may break down in that it seems an omnipotent god should not need immense suffering to bring people to an afterlife. I don't think a god need prevent all suffering to be good or benevolent, but I think there is a level of unjust suffering a good god would not allow.
I had a really hard time double cruxing this, because I don't actually feel at all uncertain about the existence of a benevolent and omnipotent god. I realized partway through that I wasn't doing a good job arguing both sides and stopped there. I'm posting this comment anyway, in case it makes for useful discussion.
You attribute god both benevolence and omnipotence, which I think is extremely difficult to square with the world we inhabit, in which natural disasters kill and injure thousands, in which children are born with debilitating diseases, and good people die young in accidents that were no fault of their own. One can of course come up with a myriad of possible explanations for these observations, but I think they are a sharp departure from what a naive mind would expect to see in a world created by an all-powerful benevolent being. I'm trying to come up with explanations of these that don't feel completely forced. The best I can come up with is the idea of a divine plan in which the events allow people to fulfill their destiny and become who they are meant to be. Yet, while such might make for good storytelling, I don't think they actually improve the lives of people these things happen to.
Relatedly, you have the problem of human evil. I think the standard reply is that God gives humans the free will to choose good or evil so that He may judge them. I would contend that free will only produces evil in beings created by God insofar as their creator designed them in such a way that they would often choose to do evil things given the world in which they are placed (e.g. why does God make pedophiles?).
Another consideration is the intense competition that exists in the world between all life forms for limited resources does not approximate what I think one would naively expect of a singular God, but looks much more like multiple opposed forces acting against one another. This would at least seem to favor polytheism over monotheism, but given the simpler explanation of evolution, I think neither is necessary. I think Eliezer made this point better https://www.lesswrong.com/posts/pLRogvJLPPg6Mrvg4/an-alien-god.
Agree, I think the problem definitely gets amplified by power or status differentials.
I do think that people often forget to think critically about all kinds of things because their brain just decides to accept it on the 5 second level and doesn't promote the issue as needing thorough consideration. I find all kinds of poorly justified "facts"/advice in my mind because of something I read or someone said that I failed to properly consider.
Even when someone does take the time to think about advice though I think it's easy for things to go wrong. The reason someone is asking for advice may be that they simply do not have the expertise to evaluate claims about challenge X on their own merits. Another possibility is that someone can realize the advice is good for them but overcorrect, essentially trading one problem for another.
The main thing people fail to consider when giving advice is that advice isn't what's wanted.
I fully agree, this post was trying to get at what happens when people do want advice and thus may take bad advice.
Advice comes with no warranty. If some twit injures themselves doing what I told them to (wrongly) then that's 100% on them.
I think in some cases this is generally a fair stance (though I think I would still like to prevent people from misapplying my advice if possible), but if you are in a position of power or influence over someone I'm not sure it applies (e.g. sports coaches telling all their players to work harder and not taking the time to make sure that some of them aren't being pushed to overtraining by this advice).
Failing all of that, say "What choice would you make if I wasn't here?" and then barring them saying something outlandish you just say "Then do that". One way or another they'll get better at thinking for themselves.
That sounds like a very reasonable approach.
I think the metaphor of "fast-forwarding" is a very useful way to view a lot of my behavior. Having thought about this for a while though, I'm not sure fast-forwarding is always a bad thing. I find it can be mentally rejuvenating in a way that introspection is not (e.g. if I've been working for a long period and my brain is getting tired I can often quickly replenish my mental resources by watching a short video or reading a chapter of a fantasy novel after which I'm able to begin working again, whereas I find sitting and reflecting to still require some mental energy).
Of course, this is an important habit to keep an eye on. I sometimes find myself almost unconsciously opening youtube when I don't actually need a break which I've been trying to get myself to stop doing.
Favorite technique: Argue with yourself about your conclusions.
By which I mean if I have any reasonable doubt about some idea, belief, or plan I split my mind into two debaters who take opposite sides of the issue, each of which wants to win and I use my natural competitiveness to drive insight into an issue.
I think the accustomed use of this would be investigating my deeply held beliefs and trying to get to their real weak points, but it is also useful for:
- Examining my favored explanation of a set of data
- Figuring out whether I need to change the way I'm presenting a set of data after I have already sunk costs into making the visualizations
- Understanding my partner's perspective after an argument.
- Preparing for expected real life arguments
- Force myself to understand an issue better, even when I don't expect I will change my mind about it.
- Questioning whether the way I acted in a situation was acceptable.
- An exercise in analytic thinking.
- Evaluating two plausible arguments I've heard but don't have any particularly strong feelings on
- Deciding whether to make a purchase
- Comparing two alternative plans
I think the idea is that you can learn rationality techniques that can be applied to politics much more easily by using examples that are not political.
So to clarify, I think there is merit in his approach of trying to engineer solutions to age related pathology. However, I do not think it will work for all aspects of aging right now. Aubrey believes that all the damage caused by aging are problems that we can begin solving right now. I would suspect that some are hard problems that will require a better understanding of the biological mechanisms involved before we can treat them.
So my position is that aging, like many fields, should be investigated both at the basic biology level and the from the perspective of trying to design therapeutics, because you don’t know if you can fix problems with current knowledge unless you try. However, if you fail to adequately treat the condition you want basic research to be ongoing.
As someone who works in biological science, I give the claim very little credence. I am someone who is very interested in Aubrey's anti-aging ideas and when I bring up aging with colleagues, it is considered to be a problem that will not be solved for a long time. Public opinion usually takes 3 to 5 years to catch up to scientific consensus, and there is no kind of scientific consensus about this. That said, the idea of not having to get old does excite people a lot more than many other scientific discoveries so it might percolate into mainstream much faster than other ideas. Still my sense is that the overwhelming majority of scientists are not on board, which will make it very unlikely for this shift in public perception to happen.
Further, I do not know why he would expect the public to care this much about the issue that it would be impossible to be elected without it. It's not like there's huge electoral pressure to increase spending on cancer or heart disease research, which are diseases that essentially everyone is impacted by (directly or indirectly). The idea that there will be huge pressure for aging research seems absurdly over-optimistic.
So I would give this claim very little credence personally despite the fact that I do think we can at least make major strides into treating age-related pathology within the coming decades if it receives sufficient funding.
I think a very interesting aspect of this idea is that it explains why it can be so hard to come up with truly original ideas, while it is much easier to copy or slightly tweak the ideas of other people. Slight tweaks were probably less likely to get you killed, whereas doing something completely novel could be very dangerous. And while it might have a huge payoff, everyone else in the group could then copy you (due to imitation being our greatest strength as a species) so the original idea creator would not have gained much of a comparative advantage in most cases.
I think a number of the example answers are mystifying meaning. In my view, meaning is simply the answer to the question "why is life worth living?". It is thus a very personal thing, what is meaningful for one mind may be utterly meaningless to another.
Yet as we are all humans, some significant overlap in the sorts of things that provide a sense of reason or gladness to being alive exists.
I will quote my favorite song, "The Riddle" by Five for Fighting, which gives two answers: "there's a reason for the world, you and I" and "there's a reason for the world, who am I?"
I think these capture the two most common sources of meaning for people. Our interactions, love and care for others is one major aspect of what, for many, makes life worth living. And the other is looking inside oneself, finding the things you cherish for their own sake and the moments of flow and joy you are able to find in the world.
This was very interesting. There seems to be a trade off for these people between their increased happiness and the ability to analyze their mistakes and improve, so I am not sure I find it entirely attractive. I think there is balance there, with some of the people studied being too happy to be maximally effective (assuming they have goals more important to them than their own happiness)
I think these are very important points. I have noticed some issues with having the right responses for social situations (especially laughing when it's not entirely appropriate), which is something I've been working on remedying by paying closer attention to when people expect a serious reaction.
The issue of ignoring problems also seems like something to look out for. Just because something does not make you feel bad should not mean you fail to learn from it. I think there is a fine balance between learning from mistakes and dwelling on them, which is another, related skill.
Losing risk aversion and motivation seem unlikely to be problems for me personally, as what you're calling the stoic mindset seems to push those towards a more ideal spot from my natural inclinations. However, I suspect this advice may be critical for others, though they would never have occurred to me as associated problems. This is why I always feel hesitant to give self-help advice.
I think the example with the lightbulbs and SAD is very important because it illustrates well that in areas that humanity is not prioritizing especially, one is much more justified in expecting civilizational inadequacy.
I think a large portion of the judgment of whether one should expect that inadequacy should be a function of how much work and money is being spent on a particular subject.
Great sequence, I've really enjoyed it.
And I definitely agree with this view of rationality, I think the idea of incremental successes enphasizes the need to track successes and failures over time so that you can see where you did well and where you did poorly and plan to make the coin come up heads more often in the future.
You don't build strength while you're lifting weight. You build strength while you're resting.
I think this phrase is particularly helpful as something to repeat to yourself when feeling the impulse to push through exhaustion when you know that you really ought to rest. I'll almost certainly be using it for that purpose when I'm feeling tempted to forget what I've learned.
Yeah, I think the biggest problem for me was that I felt deficient for failing to live up to the standard I set for myself. I sort of shunted those emotions aside and I really fell out of a lot of habits of self-improvement and hard work for a time. So I would say the emotional fallout lead to the most damaging part (of losing good habits in the aftermath).
Thinking about tradeoffs in terms of tasks completed is a good idea as well, I'll try doing that more explicitly.
I definitely have had the experience of trying to live up to a standard and it feeling awful, which then inhibits the desire to make future attempts. I think that feeling indicates a need to investigate whether you're making things difficult on yourself. For example, I would often attempt to learn too many things at once and do too much work at once because I thought the ideal person would basically be learning and working all the time. Then, when I felt myself breaking down it sent my stress levels through the roof because if I couldn't keep going that meant I was just unable to be the ideal I wanted to be. Instead of asking, "okay, the way I'm doing this is clearly unsustainable, but the standard is still worthwhile, how do I change my way of thinking or going about this thing?", I would just try to force myself to continue on, feeling constantly stressed about not failing.
But when I began to ask the question, I saw that I could decrease the work I was putting on myself to something I could actually manage all the time and that would be actually most productive in the long run. And I recognized that sometimes I'll just be exhausted and unable to do something and that doesn't make my whole attempt to live up to the standard a failure. This, it became easier to live up to the standard, or rather, my ideal standard shifted organically to the standard of how I think I can become the best me instead of the standard I ascribed to an unspecified ideal person who I am just not capable of being.
The general idea for me is using the heuristics to form the goals, which in turn suggest concrete actions. The concrete actions are what go on your schedule/to-do list. I'd also advocate constantly updating/refining your goals and concrete methods of achieving goals, both for updating on new information and testing out new methods.
It's possible that a daily schedule just doesn't work for you, but I will say that I had to try a number of different tweaks before it felt okay to me. Examining negative feelings the schedule gives you and then looking for a way the problem might be ameliorated I found to be very helpful, if schedules are still something you're interested in.
Yeah, I do think that I can become aware of that implicit condescension of not criticizing and update more frequently on whether someone might be worth trying to help in that way. I'm still going to avoid criticizing as a general heuristic, especially after just meeting people.
I find myself doing this a great deal when deciding whether to criticize somebody. I model most people I know as not being able to productively use direct criticism. The criticism, however well meant it may be, will hurt their pride, and they will not change. Indeed, the attempt will probably create some bad feeling towards me. It is just better not to try to help them in such a direct way. There are more tactful ways of getting across the same point, but they are often more difficult and not always practical in every situation.
The people I do directly criticize are generally the people I respect the most, because I expect that it will actually be useful to them because they will be able to overcome the impulse to become defensive and actually consider the critique.
I suppose your question indicates that I should try criticizing people more often, as I have gotten into the habit of presuming that people will be unable to productively receive criticism. But, at the same time, criticism is quite socially risky and I am quite confident that the vast majority of people will not handle it well.
I think optimizing based on the preferences of people may be problematic in that the AI may, in such a system, modify persons to prefer things that are very cheaply/easily obtained so that it can better optimize the preferences of people. Or rather, it would do that as part of optimizing-it would make people want things that can be more easily obtained.
In the book Gendlin says that the steps are really just to help people learn, they aren't at all necessary to the process, so I think Gendlin would himself agree with that.
I'm not Raemon, but elaborating on using Gendlin's Focusing to find catalysts might be helpful. Shifting emotional states is very natural to me-I used to find it strange that other people couldn't cry on demand-and when I read Focusing I realized that his notion of a "handle" to a feeling is basically what I use to get myself to shift into a different emotional state. Finding the whole "bodily" sense of the emotion lets you get back there easily, I find.
This seems largely correct to me, although I think hyperbolic discounting of rewards/punishments over time may be less pronounced in human conditioning as compared to animals being conditioned by humans. Humans can think "I'm now rewarding myself for Action A I took earlier" or "I'm being punished for Action B" which can seems, at least in my experience, to decrease the effect of the temporal distance whereas animals seem less able to conceptualize the connection over time. Because of this difference, I think the temporal difference of reward/punishment is less important in people for conditioning as long as the individual is mentally associating the stimulus with the action, although it is still significant.
Also what's the name of the paper for the monkeys and juice study? I'd like to look at it because the result did surprise me.
This reminds me of Donald Kinder's research that shows people do not vote primarily on self-interest as one might naively expect. It seems that people tend to ask instead "What would someone like me do?" when they vote, with this question likely occurring implicitly.