The 'Bitter Lesson' is Wrong 2022-08-20T16:15:04.340Z


Comment by deepthoughtlife on The 'Bitter Lesson' is Wrong · 2022-09-17T01:50:46.702Z · LW · GW

No. That's a foolish interpretation of domain insight. We have a massive number of highly general strategies that nonetheless work better for some things than others. A domain insight is simply some kind of understanding involving the domain being put to use. Something as simple as whether to use a linked list or an array can use a minor domain insight. Whether to use a monte carlo search or a depth limited search and so one are definitely insights. Most advances in AI to this point have in fact been based on domain insights, and only a small amount on scaling within an approach (though more so recently). Even the 'bitter lesson' is an attempted insight into the domain (that is wrong due to being a severe overreaction to previous failure.)

Also, most domain insights are in fact an understanding of constraints. 'This path will never have a reward' is both an insight and a constraint. 'Dying doesn't allow me to get the reward later' is both a constraint and a domain insight. So is 'the lists I sort will never have numbers that aren't between 143 and 987' (which is useful for and O(n) type of sorting). We are, in fact, trying to automate the process of getting domain insights via machine with this whole enterprise  in AI, especially in whatever we have trained them for.

Even, 'should we scale via parameters or data' is a domain insight. They recently found out they had gotten that wrong (Chinchilla) too because they focused too much on just scaling.

Alphazero was given some minor domain insights (how to search and how to play the game), years later, and ended up slightly beating a much earlier approach, because they were trying to do that specifically. I specifically said that sort of thing happens. It's just not as good as it could have been (probably).

Comment by deepthoughtlife on Thoughts on AGI consciousness / sentience · 2022-09-17T01:34:52.357Z · LW · GW

I do agree with your rephrasing. That is exactly what I mean (though with a different emphasis.).

Comment by deepthoughtlife on Why Do People Think Humans Are Stupid? · 2022-09-15T01:35:17.931Z · LW · GW

I agree with you. The biggest leap was going to human generality level for intelligence. Humanity already is a number of superintelligences working in cooperation and conflict with each other; that's what a culture is. See also corporations and governments. Science too. This is a subculture of science worrying that it is superintelligent enough to create a 'God' superintelligence.

To be slightly uncharitable, the reason to assume otherwise is fear -either their own or to play on that of others. Throughout history people have looked for reasons why civilization would be destroyed, and this is just the latest. Ancient prophesiers of doom were exactly the same as modern ones. People haven't changed that much.

That doesn't mean we can't be destroyed, of course. A small but nontrivial percentage of doomsayers were right about the complete destruction of their civilization. They just happened to be right by chance most of the time.

I also agree that quantitative differences could possibly end up being very large, since we already have immense proof of that in one direction given that we have superintelligences massively larger than we are already, and computers have already made them immensely faster than they used to be.

I even agree that it is likely that they key advantages quantitatively would likely be in supra-polynomial arenas that would be hard to improve too quickly even for a massive superintelligence. See the exponential resources we are already pouring into chip design for continued smooth but decreasing progress and even higher exponential resources being poured into dumb tool AIs for noticeable but not game changing increases. While I am extremely impressed by some of them like Stable Diffusion (an image generation AI that has been my recent obsession) there is such a long way to go that resources will be a huge problem before we even get to human level, much less superhuman.

Comment by deepthoughtlife on Thoughts on AGI consciousness / sentience · 2022-09-15T01:11:49.780Z · LW · GW

Honestly Illusionism is just really hard to take seriously. Whatever consciousness is, I have better evidence it exists than anything else since it is the only thing I actually experience directly. I should pretend it isn't real...why exactly? Am I talking to slightly defective P-zombies?

If the computer emitted it for the same a clear example of a begging the question fallacy. If a computer claimed to be conscious because it was conscious, then it logically has to be conscious, but that is the possible dispute in the first place. If you claim consciousness isn't real, then obviously computers can't be conscious. Note, that you aren't talking about real illusionism if you don't think we are p-zombies. Only the first of the two possibilities you mentioned is Illusionism if I recall correctly.

You seem like one of the many people trying to systematize things they don't really understand. It's an understandable impulse, but leads to an illusion of understanding (which is the only thing that leads to a systemization like Illusionism seems like frustrated people claiming there is nothing to see here.)
If you want a systemization of consciousness that doesn't claim things it doesn't know, then assume consciousness is the self-reflective and experiential part of the mind that controls and directs large parts of the overall mind. There is no need to state what causes it.

If a machine fails to be self-reflective or experiential then it clearly isn't conscious. It seems pretty clear that modern AI is neither. It probably fails the test of even being a mind in any way, but that's debatable.

Is it possible for a machine to be conscious? Who knows. I'm not going to bet against it, but current techniques seem incredibly unlikely to do it.

Comment by deepthoughtlife on Can someone explain to me why most researchers think alignment is probably something that is humanly tractable? · 2022-09-03T13:46:17.563Z · LW · GW

As individuals, Humans routinely do things much too hard for them to fully understand successfully. This is due partly due to innately hardcoded stuff (mostly for things we think are simple like vision and controlling our bodies automatic systems), and somewhat due to innate personality, but mostly due to the training process our culture puts us through (for everything else).

For its part, cultures can take the inputs of millions to hundreds of millions of people (or even more when stealing from other cultures), and distill them into both insights and practices that absolutely no one would have ever come up with on their own. The cultures themselves are, in fact, massively superintelligent compare to us, and people are effectively putting their faith either in AI being no big deal because it is too limited, or in the fact that we can literally ask a superintelligence for help in designing things much stupider than culture is to not turn on us too much.

AI is currently a small sub-culture within the greater cultures, and struggling a bit with the task, but as AI grows more impressive, much more of culture will be about how to align and improve AI for our purposes. If the full might of even a midsized culture ever sees this as important enough, alignment will probably become quite rapid, not because it is an easy question, but because cultures are terrifyingly capable.

At a guess, Alignment researches have seen countless impossible tasks fall to the midsized 'Science' culture of which they are a part, and many think this is much the same. 'Human achievable' means anything a human-based culture could ever do. This is just about anything that doesn't violate the substrates it is based on too much (and you could even see AI as a way around that.). Can human cultures tame a new substrate? It seems quite likely.

Comment by deepthoughtlife on What's the Most Impressive Thing That GPT-4 Could Plausibly Do? · 2022-08-31T19:44:08.680Z · LW · GW

I'm hardly missing the point. It isn't impressive to have it be exactly 75%, not more or less, so the fact that it can't always be that is irrelevant. His point isn't that that particular exact number matters, it's that the number eventually becomes very small.  But since the number being very small compared to what it should be does not prevent it from being made smaller by the same ratio, his point is meaningless. It isn't impressive to fulfill an obvious bias toward updating in a certain direction.

Comment by deepthoughtlife on Any Utilitarianism Makes Sense As Policy · 2022-08-31T15:51:45.361Z · LW · GW

It doesn't take many people to cause these effects. If we make them 'the way', following them doesn't take an extremist, just someone trying to make the world better, or some maximizer. Both these types are plenty common, and don't have to make it fanatical at all. The maximizer could just be a small band of petty bureaucrats who happen to have power over the area in question. Each one of them just does their role, with a knowledge that it is to prevent overall suffering. These aren't even the kind of bureaucrats we usually dislike! They are also monsters, because the system has terrible (and knowable) side effects.

Comment by deepthoughtlife on A gentle primer on caring, including in strange senses, with applications · 2022-08-30T17:13:38.036Z · LW · GW

I don't have much time, so:

While footnote 17 can be read as applying, it isn't very specific.

For all that you are doing math, this isn't mathematics, so base needs to be specified.

I am convinced that people really do give occasional others a negative weight.

And here are some notes I wrote while finishing the piece (that I would have edited and tightened up a a lot)(it's a bit all over the place):

This model obviously assumes utilitarianism.
Honestly, their math does seem reasonable to account for people caring about other people (as long as they care about themselves at all on the same scale, which could even be negative, just not exactly 0.).
They do add an extraneous claim that the numbers for the weight of a person can't be negative (because they don't understand actual hate? At least officially.) If someone hates themselves, then you can't do the numbers under these constraints, nor if they hate anyone else. But this constraint seems completely unnecessary, since you can sum negatives with positives easily enough.
I can't see the point of using an adjacency matrix (of a weighted directed graph).
Being completely altruistic doesn't seem like everyone gets a 1, but that everyone gets at least that much.
I don't see a reason to privilege mental similarity to myself, since there are people unlike me that should be valued more highly. (Reaction to footnote 13) Why should I care about similarities to pCEV when valuing people?

Thus, they care less about taking richer people's money. Why is the first example explaining why someone could support taking money from people you value less to give to other people, while not supporting doing so with your own money? It's obviously true under utilitarianism (which I don't subscribe to), but it's also obscures things by framing 'caring' as 'taking things from others by force'.

In 'Pareto improvements and total welfare' should a social planner care about the sum of U, or the sum of X? I don't see how it is clear that it should be X. Why shouldn't they value the sum of U, which seems more obvious?

'But it's okay for different things to spark joy'. Yes, if I care about someone I want their preferences fulfilled, not just mine, but I would like to point out that I want them to get what they want, not just for them to be happy.
Talking about caring about yourself though, if you care about yourself at different times, then you will care about what your current self does, past self did, and future self will, want. I'm not sure that my current preferences need to take into account those things though.
Thus I see two different categories of thing mattering as regards preferences. Contingent or instrumental preferences are changeable in accounting, while you should evaluate things as if your terminal preferences are unchanging.
Even though humans can have them change, such as when they have a child. Even if you already love your child automatically when you have one, you don't necessarily care who that child turns out to be, but you care quite a bit afterwards. See any time travel scenario, and the parent will care very much that Sally no longer exists even though they now have Sammy. They will likely now also terminally value Sammy. Take into account that you will love your child, but not who they are unless you will have an effect on it (such as learning how to care for them in advance making them a more trusting child.).

In practice, subsidies and taxes end up not being about externalities at all, or to a very small degree. Often, one kind of externality (often positive) will be ignored even when it is larger than the other (often negative) externality.
This is especially true in modern countries where people ignore the positive externalities of people's preferences being satisfied making them a better and more useful person in society, while they are obsessed with the idea of the negatives of any exchange.
I have a intuition that the maximum people would pay to avoid an externality is not really that close to its actual effects, and that people would generally lie if you asked them even if they knew.

In the real world, most people (though far from all) seem to have the intuition that the government uses the money they get from a tax less well than the individuals they take it from do.
Command economies are known to be much less efficient than free markets, so the best thing the government could do with a new tax is to lower less efficient taxes, but taxes only rarely go down, so this encourages wasted resources. Even when they do lower taxes, it isn't by eliminating the worst taxes. When they put it out in subsidies, they aren't well targeted subsidies either, but rather, distortionary.
Even a well targeted tax on negative externalities would thus have to handle the fact that it is, in itself, something with significant negative externalities even beyond the administrative cost (of making inefficient use of resources).

It's weird to bring up having kids vs. abortion and then not take a position on the latter. (Of course, people will be pissed at you for taking a position too.)

There are definitely future versions of myself whose utility are much more or less valuable to me than others despite being equally distant.
If in ten years I am a good man, who has started a nice family, that I take good care of, then my current self cares a lot more about their utility than an equally (morally) good version of myself that just takes care of my mother's cats, and has no wife or children (and this is separate from the fact that I would care about the effects my future self would have on that wife and children or that I care about them coming to exist).

Democracy might be less short-sighted on average because future people are more similar to average other people that currently exist than you happen to be right now. But then, they might be much more short-sighted because you plan for the future, while democracy plans for right now (and getting votes.) I would posit that sometimes one will dominate, and sometimes the other.
As to your framing, the difference between you-now and you-future is mathematically bigger than the difference between others-now and others-future if you use a ratio for the number of links to get to them.
Suppose people change half as much in a year as your sibling is different from you, and you care about similarity for what value you place on someone. Thus, two years equals one link.
After 4 years, you are now two links away from yourself-now and your sibling is 3 from you now. They are 50% more different than future you (assuming no convergence). After eight years, you are 4 links away, while they are only 5, which makes them 25% more different to you than you are.
Alternately, they have changed by 67% more, and you have changed by 100% of how much how distant they were from you at 4 years.
It thus seems like they have changed far less than you have, and are more similar to who they were, thus why should you treat them as having the same rate.

Comment by deepthoughtlife on A gentle primer on caring, including in strange senses, with applications · 2022-08-30T15:27:59.382Z · LW · GW

I'm only a bit of the way in, and it is interesting so far, but it already shows signs of needing serious editing, and there are other ways it is clearly wrong too.

In 'The inequivalence of society-level and individual charity' they list the scenarios as 1, 1, and 2 instead of A, B, C, as they later use. Later, refers incorrectly to preferring C to A with different necessary weights when the second reference is is to prefer C to B.

The claim that money becomes utility as a log of the amount of money isn't true, but is probably close enough for this kind of use. You should add a note to the effect. (The effects of money are discrete at the very least).

The claim that the derivative of the log of y = 1/y is also incorrect. In general, log means either log base 10, or something specific to the area of study. If written generally, you must specify the base. (For instance, in Computer Science it is base-2, but I would have to explain that if I was doing external math with that.) The derivative of the natural log is 1/n, but that isn't true of any other log. You should fix that statement by specifying you are using ln instead of log (or just prepending the word natural).

Just plain wrong in my opinion, for instance, claiming that a weight can't be negative assumes away the existence of hate, but people do hate either themselves or others on occasion in non-instrumental ways, wanting them to suffer, which renders this claim invalid (unless they hate literally everyone).

I also don't see how being perfectly altruistic necessitates valuing everyone else exactly the same as you. I could still value others different amounts without being any less altruistic, especially if the difference is between a lower value for me and the others higher. Relatedly, it is possible to not care about yourself at all, but this  math can't handle that.

I'll leave aside other comments because I've only read a little.

Comment by deepthoughtlife on Any Utilitarianism Makes Sense As Policy · 2022-08-30T14:22:35.558Z · LW · GW

I strongly disagree. It would be very easy for a non-omnipotent, unpopular, government that has limited knowledge of the future, that will be overthrown in twenty years to do a hell of a lot of damage with negative utilitarianism, or  any other imperfect utilitarianism. On a smaller scale, even individuals could do it alone.

A negative utilitarian could easily judge that something that had the side effect of making people infertile would cause far less suffering than not doing it, causing immense real world suffering amongst the people who wanted to have kids, and ending civilizations. If they were competent enough, or the problem slightly easier than expected, they could use a disease that did that without obvious symptoms, and end humanity.

Alternately, a utilitarian that valued the far future too much might continually cause the life of those around them to be hell for the sake of imaginary effects on said far future. They might even know those effects are incredibly unlikely, and that they are more likely to be wrong than right due to the distance, but it's what the math says, so...they cause a civil war. The government equivalent would be to conquer Africa (success not necessary for the negative effects, of course), or something like that, because your country is obviously better at ruling, and that would make the future brighter. (This could also be something done by a negative utilitarian to alleviate the long-term suffering of Africans).

Being in a limited situation does not automatically make Utilitarianism safe. (Nor any other general framework.) The specifics are always important.

Comment by deepthoughtlife on [deleted post] 2022-08-30T13:56:51.492Z

A lot of this depends on your definition of doomsday/apocalypse. I took it to mean the end of humanity, and a state of the world we consider worse than our continued existence. If we valued the actual end state of the world more than continuing to exist, it would be easy to argue it was a good thing, and not a doom at all. (I don't think the second condition is likely to come up for a very long time as a reason for something to not be doomsday.) For instance, if each person created a sapient race of progeny that weren't human, but they valued as their own children, and who had good lives/civilizations, then the fact humanity ceased to exist due to a simple lack of biological children would not be that bad. This could in some cases be caused by AGI, but wouldn't be a problem. (It would also be in the far future.)

AI doomsday never (though it is far from impossible). Not doomsday never, it's just unlikely to be AGI. I believe we both aren't that close, and that 'takeoff' would be best described as glacial, and we'll have plenty of time to get it right. I am unsure of the risk level of unaligned moderately superhuman AI, but I believe (very confidently) that tech level for minimal AGI is much lower than the tech level for doomsday AGI. If I was wrong about that, I would obviously change my mind about the likelihood of AGI doomsday.  (I think I put something like 1 in 10 million in the next fifty years. [Though in percentages.] Everything else was 0, though in the case of 25 years, I just didn't know how many 0s to give it.)

'Tragic AGI disasters' are fairly likely though. For example, an AGI that alters traffic light timing to make crashes occur, or intentionally sabotages things it is supposed to repair. Or even an AGI that is well aligned to the wrong people or moral framework doing things like refusing to allow necessary medical procedures due to expense even when people are willing to use their own money to pay (since it thinks the person is worth less than the cost of the procedure, and thus has negative utility, perhaps.). Alternately, it could predict that the people wanting the procedure were being incoherent, and actually would value their kids getting the money more, but feel like they have to try. Whether this is correct or not, it would still be AGI killing people.

I would actually rate the risk of Tool AI as higher, because humans will be using those to try to defeat other humans, and those could very well be strong enough to notably enhance the things humans are bad at. (And most of the things moderately superhuman AGI could do would be doable sooner with tool AI and an unaligned human.) An AI could help humans design a better virus that is like 'Simian Hemorrhagic Fever', but that effects humans, and doesn't apply to people with certain genetic markers (that denote the ethnicity or other traits of the people making it). Humans would then test, manufacture, distribute, and use it to destroy their enemies. Then oops, it mutates, and hits everyone. This is still a very unlikely doom though.

Comment by deepthoughtlife on An Introduction to Current Theories of Consciousness · 2022-08-30T13:24:14.661Z · LW · GW

Interactionism would simply require an extension of physics to include the interaction between the two, which would not defy physics any more than adding the strong nuclear force did. You can hold against it that we do not know how it works, but that's a weak point because there are many things where we still don't know how they work.

Epiphenomenalism seems irrelevant to me since it is simply a way you could posit things to be. A normal dualist ignores the idea because there is no reason to posit it. We can obviously see how consciousness has effects on the body, so there simply isn't a reason to believe it only goes the other way. Additionally, to me, Epiphenomenalism seems clearly false. Dualism as a whole has never said the body can't have effects on consciousness either.

Causal closure seems unrelated to the actuality of physics. It is simply a statement of philosophical belief. It is one dualists obviously disagree with in the strong version, but that is hardly incompatibility with actual physics. Causal closure is not used to any real effect, and is hard to reconcile with how things seem to actually be. You could argue that causal closure is even denying things like the idea of math, and the idea of physics being things that can meaningfully affect behavior.

Comment by deepthoughtlife on An Introduction to Current Theories of Consciousness · 2022-08-30T13:04:22.979Z · LW · GW

If they didn't accept physical stuff as being (at least potentially) equal to consciousness they actually wouldn't be a dualist. Both are considered real things, and though many have less confidence in the physical world, they still believe in it as a separate thing. (Cartesian dualists do have the least faith in the real world, but even they believe you can make real statements about it as a separate thing.) Otherwise, they would be a 'monist'. The 'dual' is in the name for a reason. 

Comment by deepthoughtlife on An Introduction to Current Theories of Consciousness · 2022-08-29T16:58:23.857Z · LW · GW

This is clearly correct. We know the world through our observations, which clearly occur within our consciousness, and are thus at least equally proving our consciousness. When something is being observed, you can assume that the something else doing the observations must exist. If my consciousness observes the world, my consciousness exists. If my consciousness observes itself, my consciousness exists. If my consciousness is viewing only hallucinations, it still exists for that reason. I disagree with Descartes, but 'I think therefore I am' is true of logical necessity.

I do not like immaterialism personally, but it is more logically defensible that illusionism.

Comment by deepthoughtlife on An Introduction to Current Theories of Consciousness · 2022-08-29T16:49:00.387Z · LW · GW

The description and rejection given of dualism are both very weak. Also, dualism is a much broader group of models than is admitted here.

The fact is, we only have direct evidence of the mind, and everything else is just an attempt to explain certain regularities. An inability to imagine that the mind could be all that exists is clearly just a willful denial, and not evidence, but notably, dualism does not require nor even suggest that the mind is all there is, just that it is all we have proof of (even in the cartesian variant). Thus, dualism.

Your personal refusal to imagine that physicalism is false and dualism is true seems completely irrelevant to whether or not dualism is true. Also, dualism hardly 'defies' physics. In dualism, physics is simply 'under' a meta-physics that includes consciousness as another category, without even changing physics. (If it did defy physics, that would be strong proof against physics since it is literally all of the evidence we actually have, but there is no incompatibility at all.)

Description wise, there are forms of dualism for which you give an incorrect analysis of the 'teletransporter' paradox. Obviously, the consciousness interacts with reality in some way, and there is no proof nor reason in dualism to assume that the consciousness could not simply follow the created version in order to keep interacting with the world.

Mind-body wise, the consciousness certainly attaches to the body through the brain to alter the world, assuming the brain and body are real (which the vast majority of dualists believe). Consciousness would certainly alter brain states if brain states are a real thing.

We also don't know that a consciousness would not attach itself to a 'Chinese Room'.

Your attempts at reasoning have led you astray in other areas too, but I'm more familiar with the ways in which these critiques of dualism are wrong. You seem extremely confident of this incorrect reasoning as well. This seems more like a motivated defense of illusionism than actually laying out the theories correctly.

Comment by deepthoughtlife on What would you expect a massive multimodal online federated learner to be capable of? · 2022-08-29T14:01:01.049Z · LW · GW

I was replying to someone asking why it isn't 2-5 years. I wasn't making an actual timeline. In another post elsewhere on the sight, I mention that they could give memory to a system now and it would be able to write a novel.

Without doing so, we obviously can't tell how much planning they would be capable of if we did, but current models don't make choices, and thus can only be scary for whatever people use them for, and their capabilities are quite limited.

I do believe that there is nothing inherently stopping the capabilities researchers from switching over to more agentic approaches with memory and the ability to plan, but it would be much harder than the current plan of just throwing money at the problem (increasing compute and data.).

It will require paradigm shifts (I do have some ideas as to ones that might work) to get to particularly capable and/or worrisome levels, and those are hard to predict in advance, but they tend to take a while. Thus, I am a short term skeptic of AI capabilities and danger.

Comment by deepthoughtlife on What would you expect a massive multimodal online federated learner to be capable of? · 2022-08-28T00:31:44.711Z · LW · GW

You're assuming that it would make sense to have a globally learning model, one constantly still training, when that drastically increases the cost of running the model over present approaches. Cost is already prohibitive, and to reach that many parameters any time soon exorbitant (but that will probably happen eventually). Plus, the sheer amount of data necessary for such a large one is crazy, and you aren't getting much data per interaction. Note that Chinchilla recently showed that lack of data is a much bigger issue right now for models than lack of parameters so they probably won't focus on parameter counts for a while.

Additionally, there are many fundamental issues we haven't yet solved for DL-based AI. Even if it was a huge advancement over present model, which I don't believe it would be at that size, it would still have massive weaknesses around remembering, or planning, and would largely lack any agency. That's not scary. It could be used for ill-purposes, but not at human (or above) levels.

I'm skeptical of AI in the near term because we are not close. (And the results of scaling are sublinear in many ways. I believe that mathematically, it's a log, though how that transfers to actual results can be hard to guess in advance.)

Comment by deepthoughtlife on What's the Most Impressive Thing That GPT-4 Could Plausibly Do? · 2022-08-28T00:15:22.137Z · LW · GW

You're assuming that the updates are mathematical and unbiased, which is the opposite of how people actually work. If your updates are highly biased, it is very easy to just make large updates in that direction any time new evidence shows up. As you get more sure of yourself, these updates start getting larger and larger rather than smaller as they should.

Comment by deepthoughtlife on Adversarial epistemology · 2022-08-28T00:11:58.018Z · LW · GW

That sort of strategy only works if you can get everyone to coordinate around it, and if you can do that, you could probably just get them to coordinate on doing the right things. I don't know if HR would listen to you if you brought your concerns directly to them, but they probably aren't harder to persuade on that sort of thing than convincing the rest of your fellows to defy HR. (Which is just a guess.) In cases where you can't get others to coordinate on it, you are just defecting against the group, to your own personal loss. This doesn't seem like a good strategy.

In more limited settings, you might be able to convince your friends to debate things in your preferred style, though this depends on them in particular. As a boss, you might be able to set up a culture where people are expected to make strong arguments in formal settings. Beyond these, I don't really think it is practical. (They don't generalize -for instance, as a parent, your child will be incapable of making strong arguments for an extremely long time.)

Comment by deepthoughtlife on Help Understanding Preferences And Evil · 2022-08-28T00:00:48.672Z · LW · GW

That does sound problematic for his views if he actually holds these positions. I am not really familiar with him, even though he did write the textbook for my class on AI (third edition) back when I was in college. At that point, there wasn't much on the now current techniques and I don't remember him talking about this sort of thing (though we might simply have skipped such a section).

You could consider it that we have preferences on our preferences too. It's a bit too self-referential, but that's actually a key part of being a person. You could determine those things that we consider to 'right' directly from how we act when knowingly pursuing those objectives, though this requires much more insight.

You're right, the debate will keep going on in philosophical style, but if it works or not as an approach for something different than humans could change that.

Comment by deepthoughtlife on Help Understanding Preferences And Evil · 2022-08-27T14:10:36.794Z · LW · GW

If something is capable of fulfilling human preferences in its actions, and you can convince it to do so, you're already most of the way to getting it to do things humans will judge as positive. Then you only need to specify which preferences are to be considered good in an equally compelling manner. This is obviously a matter of much debate, but it's an arena we know a lot about operating in. We teach children these things all the time.

Comment by deepthoughtlife on What's the Most Impressive Thing That GPT-4 Could Plausibly Do? · 2022-08-26T19:41:34.313Z · LW · GW

It matches his pattern of behavior to freak out about AI every time there is an advance, and I'm basically accusing him of being susceptible to confirmation bias, perhaps the most common human failing even when trying to be rational.

He claims to think AI is bound to destroy us, and literally wrote about how everyone should just give up.  (Which I originally thought was for April Fool's Day, but turned out to not be.) He can't be expected to carefully scrutinize the evidence to only give it the weight it deserves, or even necessarily the right sign. If you were to ask the same thing in reverse about a massive skeptic who thought there was no point even caring for the next fifty years, you wouldn't have to have had them quadruple the length of time before to be unimpressed with them doing so next time AI failed to be what people claimed it was.

Comment by deepthoughtlife on What's the Most Impressive Thing That GPT-4 Could Plausibly Do? · 2022-08-26T16:28:30.660Z · LW · GW

If they chose to design it with effective long term memory, and a focus on novels, (especially prompting via summary) maybe it could write some? They wouldn't be human level, but people would be interested enough in novels on a whim to match some exact scenario that it could be valuable. It would also be good evidence of advancement, since that is a huge current weakness (the losing track of things.).

Comment by deepthoughtlife on What's the Most Impressive Thing That GPT-4 Could Plausibly Do? · 2022-08-26T16:21:04.354Z · LW · GW

But wouldn't that be easy? He seems to take every little advancement as a big deal.

Comment by deepthoughtlife on The Shard Theory Alignment Scheme · 2022-08-25T22:25:00.758Z · LW · GW

I would like to point out that what johnswentworth said about being able to turn off an internal monologue is completely true for me as well. My internal monologue turns itself on and off several (possibly many) times a day when I don't control it, and it is also quite easy to tell it which way to go on that. I don't seem to be particularly more or less capable with it on or off, except on  a very limited number of tasks. Simple tasks are easier without it, while explicit reasoning and storytelling are easier with it. I think my default is off when I'm not worried (but I do an awful lot of intentional verbal daydreaming and reasoning about how I'm thinking too.).

Comment by deepthoughtlife on [Review] The Problem of Political Authority by Michael Huemer · 2022-08-25T20:13:48.503Z · LW · GW

So the example given to decry a hypothetical, obviously bad situation applies even better to what they're proposing. It's every bit the same coercion as they're decrying, but with less personal benefit and choice (you get nothing out of this deal.). And they admit this?  This is self-refuting.

Security agencies don't have any more reason to compete on quality than countries do, it's actually less, because they have every bit as much force, and you don't really have any say. What, you're in the middle of a million people with company A security, and you think you can pick B and they'll be able to do anything?

Comment by deepthoughtlife on [Review] The Problem of Political Authority by Michael Huemer · 2022-08-25T17:05:33.133Z · LW · GW

Except that is clearly not real anarchy. It is a balance of power between the states. The states themselves ARE the security forces in this proposal. I'm saying that they would conquer everyone who doesn't belong to one.

Comment by deepthoughtlife on [Review] The Problem of Political Authority by Michael Huemer · 2022-08-25T15:28:05.421Z · LW · GW

Anarchists always miss the argument from logical necessity, which I won't actually make because it is too much effort, but in summary, politics abhors a vacuum. If there is not a formal power you must consent to, there will be an informal one. If there isn't an informal one, you will shortly be conquered.

In these proposals, what is to stop these security forces from simply conquering anyone and everyone that isn't under the protection of one? Nothing. Security forces have no reason to fight each other to protect your right not to belong to one. And they will conquer, since the ones that don't, won't grow to keep pace. It is thus the same as the example given of a job offer you can't refuse, except that here the deal offered likely is terrible (since they have no reason to give you a good one.).

Why give up a modern, functional government, where you have an actual say, for an ad-hoc, notably violent one where you have no say? I have a lot of problems with the way governments operate, but this isn't better. You can always just support getting rid of or reforming nonfunctional and bad governments, and not be an anarchist.

Comment by deepthoughtlife on Adversarial epistemology · 2022-08-25T13:56:21.421Z · LW · GW

The examples used don't really seem to fit with that though. Blind signatures are things many/most people haven't heard of, and not how things are done; I freely admit I had never heard of them before the example. Your HR department probably shouldn't be expected to be aware of all the various things they could do, as they are ordinary people. Even if they knew what blind signatures were, that doesn't mean it is obvious they should use them, or how to do so even if they thought they should (which you admit). After reading the Wikipedia article, that doesn't seem like an ordinary level of precaution for surveys. (Maybe it should be, but then you need to make that argument, so it isn't a good example for this purpose, in my opinion.)

I also don't blame you for not just trusting the word of the HR department that it is anonymous. But fundamentally speaking, wouldn't you (probably) only have their word that they were using Chaumian blind signatures anyway? You probably wouldn't be implementing the solution personally, so you'd have to trust someone on that score. Even if you did, then the others would probably just have to trust you then. The HR department could be much sneakier about connecting your session to your identity (which they would obviously claim is necessary to prevent multiple voting), but would that be better? It wouldn't make their claim that you will be kept anonymous any more trustworthy.

Technically, you have to trust that the math is as people say it is even if you do it yourself. And the operating system. And the compiler. Even with physical processes, you have to count on things like them not having strategically placed cameras (and that they won't violate the physical integrity of the process.).

Math is not a true replacement for trust. There is no method to avoid having to trust people (that's even vaguely worth considering). You just have to hope to pick well, and hopefully sway things to be a bit more trustworthy.

Interestingly, you admit that in your reply, but it doesn't seem to have the effect it seems like it should.

A better example to match your points could be fans of a sports team. They pay a lot of attention to their team, and should be experts in a sense. When asked how good their team is, they will usually say the best (or the worst). When asked why, they usually have arguments that technically should be considered noticeably significant evidence in that direction, but are vastly weaker than they should be able to come up if it were true. Which is obvious, since there are far more teams said to be the best (or worst) than could actually be the case. In that circumstance, you should be fairly demanding of the evidence.

In other situations though, it seems like a standard that is really easy to have be much stronger against positions you don't like than ones you do, and you likely wouldn't even notice. It is hard to hold arguments you disdain to the same standards as ones you like, even if you are putting in a lot of effort to do so, though in some people it is actually reversed in direction, as they worry too much.

Comment by deepthoughtlife on Adversarial epistemology · 2022-08-25T13:25:42.559Z · LW · GW

I am aware of the excuses used to define it as not hearsay, even though it is clearly the same as all other cases of such. Society simply believes it is a valuable enough scenario that it should be included, even though it is still weak evidence.

Comment by deepthoughtlife on The 'Bitter Lesson' is Wrong · 2022-08-25T13:22:10.119Z · LW · GW

I was pretty explicit that scale improves things and eventually surpasses any particular level that you get to earlier with the help of domain point is that you can keep helping it, and it will still be better than it would be with just scale. MuZero is just evidence that scale eventually gets you to the place you already were, because they were trying very hard to get there and it eventually worked.

AlphaZero did use domain insights. Just like AlphaGo. It wasn't self-directed. It was told the rules. It was given a direct way to play games, and told to. It was told how to search. Domain insights in the real world are often simply being told which general strategies will work best. Domain insights aren't just things like, 'a knight is worth this many points' in chess, or whatever the human-score equivalent is in Go (which I haven't played.). Humans tweaked and altered things until they got the results they wanted from training. If they understood that they were doing so, and accepted it, they could get better results sooner, and much more cheaply.

Also, state of the art isn't the best that can be done.

Comment by deepthoughtlife on Adversarial epistemology · 2022-08-24T21:51:57.405Z · LW · GW

The thing is, no one ever presents the actual strongest version of an argument. Their actions are never the best possible, except briefly, accidentally, and in extremely limited circumstances. I can probably remember how to play an ideal version of the tic-tac-toe strategy that's the reason only children play it, but any game more complicated than that and my play will be subpar. Games are much smaller and simpler things than arguments. Simply noticing that an argument isn't the best it could is a you thing, because it is always true. Basically no one is a specialist in whatever the perfect argument turns out to be, (and people who are will often be wrong). Saying that a correct argument that significantly changes likelihoods isn't real evidence because it could be stronger allows you to always stick with your current beliefs.

Also, if I was a juror, I would like to hear that the accused was overheard telling his buddy that he was out of town the night before, having taken a trip to the city where the murder he is accused of happened. Even though just being one person in that city is an incredibly weak piece of evidence, and it is even weaker for being someone uninvolved in the conversation saying it, it is still valuable to include. (And indeed, such admissions are not considered hearsay in court, even though they clearly are.) There are often cases where the strongest argument alone is not enough, but the weight of all arguments clearly is.

Comment by deepthoughtlife on Nate Soares' Life Advice · 2022-08-23T15:11:05.690Z · LW · GW

Even if they were somehow extremely beneficial normally (which is fairly unlikely), any significant risk of going insane seems much too high. I would posit they have such a risk for exactly the same reason -when using them, you are deliberately routing around very fundamental safety features of your mind.

Comment by deepthoughtlife on What if we solve AI Safety but no one cares · 2022-08-22T22:16:36.186Z · LW · GW

Donald Hobson has a good point about goodharting, but it's simpler than that. While some people want alignment so that everyone doesn't die, the rest of us still want it for what it can do for us. If I'm prompting a language model with "A small cat went out to explore the world" I want it to come back with a nice children's story about a small cat that went out to explore the world that I can show to just about any child. If I prompt a robot that I want it to "bring me a nice flower" I do not want it to steal my neighbor's rosebushes. And so on, I want it to be safe to give some random AI helper whatever lazy prompt is on my mind and have it improve things by my preferences.

Comment by deepthoughtlife on If you know you are unlikely to change your mind, should you be lazier when researching? · 2022-08-22T13:13:43.243Z · LW · GW

Stop actively looking (though keep your ears open) when you have thoroughly researched two things: 

 First, the core issues that that could change your mind about who you'd think you should vote for. This is not about the candidates themselves.

Then, the candidates or questions on the ballot themselves. For candidates: Are they trustworthy? Where do they fall on the issues important to you? Do they implement these issues properly? Will they get things done? Are there issues they bring up that you would have included?  If so, look at those issues without referencing the candidates themselves, then go over the candidates with the new issue too.

You will not know everything at the end of this. You may very well run out of time. If you do, honestly query your own state of knowledge. Even if you don't know everything, is it enough to make an informed decision that would be better than letting it be decided by a slim majority of your fellows? (Over time, you'll get a better sense of when this is, and not need to approach it formally.)

If it is, do it. If it isn't, then don't vote -there's nothing shameful about deciding you don't know enough yet (and start researching further in advance in the future). Voting is an important duty of citizenship, but better to not do it than do it in ways where you are likely to contribute wrongly.

Comment by deepthoughtlife on If you know you are unlikely to change your mind, should you be lazier when researching? · 2022-08-21T18:31:33.447Z · LW · GW

That seems like the wrong take away to me. Why do we change our minds so little?

1.)Some of our positions are just right.

2.)We are wrong, but we don't take the time and effort to understand why we should change our minds.

We don't know which situation we are in beforehand, but if you change your mind less than you think you do, doesn't that mean you think you are often wrong? And that you are wrong about how you respond to it?

You could try to figure out what possible things would get you to change your mind, and look directly at those things, trying to fully understand whether they are or are not the way that would change things.

Comment by deepthoughtlife on How evolution succeeds and fails at value alignment · 2022-08-21T16:02:20.657Z · LW · GW

This is well written, easy to understand, and I largely agree that instilling a value like love for humans in general (as individuals) could deal with an awful lot of failure modes. It does so amongst humans already (though far from perfectly).

When there is a dispute, rather than optimizing over smiles in a lifetime (a proxy of long-term happiness), preferable is obviously something more difficult like, if the versions of the person in both the worlds where it did and did not happen would end up agreeing that it is better to have happened, and that it would have been better to force the issue, then it might make sense to override the human's current preference. Since the future is not known, such determinations are obviously probabilistic, but the thresholds should be set quite high. The vast majority of adults agree that as a child, they should have gotten vaccinations for many diseases, so the probability that the child would later agree is quite high.

Smiles in a lifetime is a great proxy for what an aligned intelligence, artificial or not, should value getting for those it loves when multiple actions are within acceptable bounds, either because of the above or because of the current preferences of that person and their approval in the world where it happens.

Two out of three versions of the person approving is only complicated in worlds where it is the one where it happens that is the one that would disapprove.

Comment by deepthoughtlife on Embracing the Opposition's Point · 2022-08-21T15:41:58.802Z · LW · GW

Understanding what parts of an argument you dislike are actually something you can agree with seems like a valuable thing to keep in mind. The post is well written and easy to understand too. I probably won't do this any more than I already do though.

What I try to do isn't so different, just less formal. I usually simply agree or disagree directly on individual points that come up through trying to understand things in general. I do not usually keep in mind what the current score is of agreement or disagreement is, and that seems to help not skew things too much. I do feel no pressure to just ignore the parts I disagree with for a while though.

I think your descriptions of reasons in favor of social norms are very well reasoned. How much does the original sound like your version of the argument? Is this their argument in your words, or your related argument?

Whether or not you should have included your bit on 'weirdos' based on your rules, I think it was good analysis on them too.

I'm not personally a conservative (independent, relatively centrist but unusual politics, and I would actually self identify as a weirdo), but I think that one of the biggest problems in my country (America) is that the people trying to change society don't put in the effort to figure out the reasons why things should stay the same, thus completely destroying any cost-benefit analysis of the policies they propose. Often the policies could be made much better and much more practical with just a little understanding of it. Thus I usually have little choice in what politicians I support.

There are an awful lot of reforms that could make things better, but instead we focus entirely on ones that do not, because the reformers don't bother to know they're doing so, and wild changes are more interesting to think about (to me as well). We should reform the whole not knowing the reasons to avoid reforms thing first.

To the extent the reformers actually listen to actual conservatives and understand their reasons, that does help.

Comment by deepthoughtlife on The 'Bitter Lesson' is Wrong · 2022-08-21T13:21:16.159Z · LW · GW

The entire thing I wrote is that marrying human insights, tools, etc with the scale increases leads to higher performance, and shouldn't be discarded, not that you can't  do better with a crazy amount of resources than a small amount of resources and human insight.

Much later, with much more advancement, things improve. Two years after the things AlphaGo was famous for happened, they used scale to surpass it, without changing any of the insights. Generating games against itself is not a change to the fundamental approach in a well defined game like Go.  Simple games like Go are very well suited to the brute-force approach. It isn't in the post, but this is more akin to using programs to generate math data for a network you want to know math.  We could train a network on an infinite amount of symbolic math because we have easy generative programs, limited only by the cost we wanted. We could also just give those programs to an AI, and train it to use them. This is identical to what they did for AlphaZero. AlphaZero still uses  the approach humans decided upon, not reinventing things on its own.

 Massive scale increases surpassing the achievements of earlier things is not something I argue against in the above post. Not using human data is hardly the same thing as not using human domain insights.

It isn't until 2 years after AlphaZero that they managed to make a version that actually learned how to play it on its own with MuZero. Given the scale rate increases in the field during that time, it's hardly interesting that eventually it happened, but the scaling required an immense increase in money in the field in addition to algorithmic improvements.

Comment by deepthoughtlife on The 'Bitter Lesson' is Wrong · 2022-08-20T20:42:03.491Z · LW · GW

I probably should have included that or an explicit definition in what I wrote.

Comment by deepthoughtlife on The 'Bitter Lesson' is Wrong · 2022-08-20T20:40:22.761Z · LW · GW

Yes, though I'm obviously arguing against what I think it means in practice, and how is it used, not necessarily how it was originally formulated. I've always thought it was the wrong take on AI history, tacking much too hard toward scale based approaches, and forgetting the successes of other methods could be useful too, as an over-correction from when people made the other mistake.

Comment by deepthoughtlife on The 'Bitter Lesson' is Wrong · 2022-08-20T20:31:37.222Z · LW · GW

I think your response shows I understood it pretty well. I used an example that you directly admit is against what the bitter lesson tries to teach as my primary example. I also never said anything about being able to program something directly better.

I pointed out that I used the things people decided to let go of so that I could improve the results massively over the current state of the machine translation for my own uses, and then implied we should do things like give language models dictionaries and information about parts of speech that it can use as a reference or starting point. We can still use things as an improvement over pure deep learning, by simply letting the machine use them as a reference. It would have to be trained to do so, of course, but that seems relatively easy.

The bitter lesson is about 'scale is everything,' but AlphaGo and its follow-ups use massively less compute to get up to those levels! Their search is not an exhaustive one, but a heuristic one that requires very little compute comparatively. Heuristic searches are less general, not more. It should be noted that I only mentioned AlphaGo to show that even it wasn't a victory of scale like some people commonly seem to believe. It involved taking advantage of the fact that we know the structure of the game to give it a leg up.

Comment by deepthoughtlife on No One-Size-Fit-All Epistemic Strategy · 2022-08-20T13:56:41.170Z · LW · GW

In a lot of ways, this is similar to the 'one weird trick' style of marketing so many lampoon. Assuming that you summarized Kuhn and Feyerabend correctly, it looks like: one weird trick to solve all of science, Popper: 'just falsify things'; Kuhn: 'just find Kantian paradigms for the field'; Feyerabend: 'just realize the only patterns are that there aren't any patterns.'

People like this sort of thing because its easy to understand (and I just made the same sort of simplification for this sentence.). Science is hard to get right, and people can only keep in mind a few factors at a time -literally since our working memory is quite small. Splitting things up usefully and then grinding away at all of those sub-problems is quite tricky too, and not especially motivating.

Long-term memory isn't the most reliable either. I'd say write it down, but then, there's nothing that prevents thought like checklists...and other forms of it aren't clear, so...?

Nothing can be done besides continually striving to become  better, knowing that it is a temptation to stop and just use the shiniest tool that seems to work.

Comment by deepthoughtlife on David Udell's Shortform · 2022-08-20T13:36:04.144Z · LW · GW

My memories of childhood aren't that precise. I don't really know what my childhood state was? Before certain extremely negative things happened to my psyche, that is. There are only a few scattered pieces I recall, like self-sufficiency and  honesty being important, but these are the parts that already survived into my present political and moral beliefs.

The only thing I could actually use is that I was a much more orderly person when I was 4 or 5, but I don't see how it would work to use just that.

Comment by deepthoughtlife on Against population ethics · 2022-08-20T10:43:32.527Z · LW · GW

I can't say I'm surprised a utilitarian doesn't realize how vague it sounds? It is a jargon taken from a word that simply means ability to be used widely? Utility is an extreme abstraction, literally unassignable, and entirely based on guessing. You've straightforwardly admitted that it doesn't have an agreed upon basis. Is it happiness? Avoidance of suffering? Fulfillment of the values of agents? Etc. 

Utilitarians constantly talk about monetary situations, because that is one place they can actually use it and get results? But there, it's hardly different than ordinary statistics. Utility there is often treated as a simple function of money but with diminishing returns. Looking up the term for the kind of utility you mentioned, it seems to once again only use monetary situations as examples, and sources claimed it was meant for lotteries and gambling.

Utility as a term makes sense there, but is the only place where your list has general agreement on what utility means? That doesn't mean it is a useless term, but it is a very vague one.

Since you claim there isn't agreement on the other aspects of the theories, that makes them more of an artificial category where the adherents don't really agree on anything. The only real connection seems to be wanting to do math on on how good things are?

Comment by deepthoughtlife on Against population ethics · 2022-08-20T02:15:14.200Z · LW · GW

You probably don't agree with this, but if I understand what you're saying, utilitarians don't really agree on anything or really have shared categories? Since utility is a nearly meaningless word outside of context due to broadness and vagueness, and they don't agree on anything about it, Utilitarianism shouldn't really be considered a thing itself? Just a collection of people who don't really fit into the other paradigms but don't rely on pure intuitions. Or in other words, pre-paradigmatic?

Comment by deepthoughtlife on The Core of the Alignment Problem is... · 2022-08-19T01:18:26.394Z · LW · GW

The easy first step, is a simple bias toward inaction, which you can provide with a large punishment per output of any kind. For instance, a language model with this bias would write out something extremely likely, and then stop quickly thereafter. This is only a partial measure, of course, but it is a significant first step.

Second through n-th step, harder, I really don't even know, how do you figure out what values to try to train it with to reduce impact. The immediate things I can think  of might also train deceit, so it would take some thought.

Also, across the time period of training, ask a panel (many separate panels) of judges to determine whether actions it is promoting for use in hypothetical situations or games was the minimal action it could have taken for the level of positive impact. Obviously, if the impact is negative, it wasn't the minimal action. Perhaps also train a network explicitly on the decisions of similar panels on such actions humans have taken, and use those same criteria. 

Somewhere in there, best place unknown, penalize heavy use of computation in coming up with plans (though perhaps not with evaluating them.).

Final step (and perhaps at other stages too), penalize any actions taken that humans don't like. This can be done in a variety of ways. For instance, have 3 random humans vote on each action it takes, and for each person that dislikes the action, give it a penalty.

Comment by deepthoughtlife on Conservatism is a rational response to epistemic uncertainty · 2022-08-19T00:59:20.267Z · LW · GW

You seem to have straw-manned your ideological opponents. Your claims are neither factually accurate, nor charitable. They don't point you in a useful direction either. Obviously, conservatives can be very wrong, but your assumptions seem unjustified.

And what are you going to claim your opponents do? In the US, rightists claim they will lower taxes, which they do. They claim they will reduce regulation, which they do. They claim they will turn back whatever they deem the latest outrage...which has mixed results. They explain why all of this is good by referencing the lessons of history, and by simply pointing out their opponents positions whenever those are unpopular. Politicians are politicians, and hardly trustworthy on such things, but (American) rightist one's pay more attention to how ideas have failed in the past, at least in how they talk and the occasional policy.

Their politicians only put in moderate effort to be conservative...but their opponents won't admit conservatism can ever be a good thing. Much like you're doing here. These days, pretty much all centrism is considered too conservative to be considered on a national level by leftists, one of just two major parties. So obviously, conservatives will vote for rightists.

Why then, do you think you can simply take one person from a long time ago, from one country, in this case the UK, (when it has previously been pointed out that the manifestations clearly vary based on time and place), claim they are immodest, and that means much of anything against conservatism? Especially when they turned out to be clearly right by massively improving all those things with their policies. You never examined her evidence, reasoning, or results in any way, just threw stones. 

Having strong beliefs is not immodest if the evidence is strong enough. Plus, you know, she was a politician. Sound-bites are a big thing for them.

Conservatism isn't actually about politics. Conservatives vote for people whose policies and/or character are, in their personal belief, likely to do things supported by the weight of history and caution, not simply the people who are themselves the best incarnation of those things. In many cases, they use heuristics that lead to assuming untested things don't work (which is usually right.). Leftists often want proof their latest thing won't work before discarding it, and don't provide  significant evidence to these conservatives that they will work. Also, conservatives are just people, and they care about many things and have many inclinations that are unrelated to their conservatism.

Comment by deepthoughtlife on The Core of the Alignment Problem is... · 2022-08-18T13:47:39.374Z · LW · GW

I had a thought while reading the section on Goodharting. You could fix a lot of the potential issues with an agentic AI by training it to want its impact on the world to be within small bounds. Give it a strong and ongoing bias toward 'leave the world the way I found it.' This could only be overcome by a very clear and large benefit toward its other goals per small amount of change. It should not be just part of the decision making process though, but part of the goal state. This wouldn't solve every possible issue, but it would solve a lot of them. In other words, make it unambitious and conservative, and then its interventions will be limited and precise if it has a good model of the world.

Comment by deepthoughtlife on Why are politicians polarized? · 2022-08-17T19:23:00.204Z · LW · GW

The obvious thing to keep in mind is that people dislike 'inauthentic' politicians but there are many things the people want in politicians. If a politician wants to portray a certain image to match what people like, they have to pick the policy positions that go along with it to avoid said label. Said images are not so evenly or finely distributed as policy positions, so that tends to lead to significant differences along these axes.

Then, over time, the parties accrete their own new images too, and people have to position themselves in relation to those too...