Posts

cubefox's Shortform 2024-05-26T19:07:19.386Z
Is LLM Translation Without Rosetta Stone possible? 2024-04-11T00:36:46.568Z
Are Intelligence and Generality Orthogonal? 2022-07-18T20:07:44.694Z

Comments

Comment by cubefox on "AI achieves silver-medal standard solving International Mathematical Olympiad problems" · 2024-07-26T02:58:47.209Z · LW · GW

I don't understand what you mean here. How would merely seeing conjectures help with proving them? You arguably need to see many example proofs of example conjectures. Otherwise it would be like expecting a child, who has never seen a proof, learning to prove conjectures merely by showing it a lot of conjectures.

Comment by cubefox on Linch's Shortform · 2024-07-26T02:45:08.323Z · LW · GW

Conceivability is not invoked for logical statements, or mathematical statements about abstract objects. But zombies seem to be concrete rather than abstract objects. Similar to pink elephants. It would be absurd to conjecture that pink elephants are mathematically impossible. (More specifically, both physical and mental objects are typically counted as concrete.) It would also seem strange to assume that elephants being pink is logically impossible. Or things being faster than light. These don't seem like statements that could hide a logical contradiction.

Comment by cubefox on Alexander Gietelink Oldenziel's Shortform · 2024-07-26T02:20:28.249Z · LW · GW

See also the Past Hypothesis. If we instead take a non-speculative starting point as , namely now, we could no longer trust our memories, including any evidence we believe to have about the entropy of the past being low, or about physical laws stating that entropy increases with distance from . David Albert therefore says doubting the Past Hypothesis would be "epistemically unstable".

Comment by cubefox on "AI achieves silver-medal standard solving International Mathematical Olympiad problems" · 2024-07-26T01:43:12.748Z · LW · GW

The diagram actually says it uses the AlphaZero algorithm. Which obviously doesn't involve an LLM.

Comment by cubefox on "AI achieves silver-medal standard solving International Mathematical Olympiad problems" · 2024-07-26T01:40:01.802Z · LW · GW

Well, what kind of problem do you think will help with learning how to prove things, if not proofs? AlphaGo & co learned to play games by being trained with games. And AlphaProof uses the AlphaZero algorithm.

Comment by cubefox on "AI achieves silver-medal standard solving International Mathematical Olympiad problems" · 2024-07-25T23:17:03.848Z · LW · GW

AlphaProof works in the "obvious" way: an LLM generates candidate next steps which are checked using a formal proof-checking system, in this case Lean.

I don't think this is true, actually. Look at the diagram of AlphaProof’s reinforcement learning training loop. The proof part ("solver network") seems to be purely RL based; it even uses the same algorithm as AlphaZero. (The article actually describes AlphaProof as "a new reinforcement-learning based system".)

The contribution of the LLM to AlphaProof seems to be only in providing the left part of the diagram (the "formalizer network"). Namely translating, and somehow expanding, one million human written proofs into 100 million formal Lean proofs. I'm not sure how the LLM increases the number of proofs by 100x, but I assume for each successfully formalized human-written proof it also generates 100 (simple?) synthetic variations. This interpretation is also corroborated by this statement:

We established a bridge between these two complementary spheres by fine-tuning a Gemini model to automatically translate natural language problem statements into formal statements, creating a large library of formal problems of varying difficulty.

So AlphaProof is actually a rather narrow RL system, not unlike the original AlphaGo. The latter was also bootstrapped on human data (expert Go games), similar to how AlphaProof uses (formalized) human proofs for initial training. (Unlike AlphaGo and AlphaProof, AlphaGo Zero and the original AlphaZero did not rely on any non-synthetic / human training data.)

So, if LLMs like GPT-4 and Gemini are considered to be fairly general AI systems (some even consider them AGI), while systems like AlphaGo or AlphaZero are regarded as narrow AI -- then AlphaProof counts as narrow AI. Which contradicts views like the one by Dwarkesh Patel who thought solving IMO math problems may be AGI complete, and confirms the opinion of people like Grant Sanderson (3Blue1Brown) who thought that training on and solving formal math proofs is formally very similar to AlphaGo solving board games.

That being said, unlike AlphaProof, AlphaGeometry 2, which solved the one geometry problem, is not RL based. It does indeed use an LLM (a Gemini-like model trained on large amounts of synthetic training data) when coming up with proofs. Though it works in tandem with a non-ML symbolic deduction engine. Such systems are largely based on brute force search, similar to how DeepBlue played chess. So AlphaGeometry is simultaneously more general (LLM) and less general (symbolic deduction engine) than AlphaProof (an RL system).

Comment by cubefox on Trying to understand Hanson's Cultural Drift argument · 2024-07-23T03:08:45.111Z · LW · GW

There are objective measures on which cultures are "better" from an outside perspective. For example, poverty refugees migrating to richer countries. While bringing with them those traits that made the origin countries poor and high in fertility, and gradually replacing the gene pool and culture of the host country. Here is a recent post on these issues. By the way, I believe Hanson is often deliberately general and unspecific when he doesn't want to step outside the Overton window. The linked post has fewer such qualms. Unfortunately being blunt about this issue is upsetting and is often socially punished.

Comment by cubefox on Open Thread Summer 2024 · 2024-07-22T08:25:09.363Z · LW · GW

I assume it wasn't this old post?

Comment by cubefox on eggsyntax's Shortform · 2024-07-21T01:18:53.872Z · LW · GW

This seems to be inspired by the library/framework distinction in software engineering:

Inversion of Control is a key part of what makes a framework different to a library. A library is essentially a set of functions that you can call, these days usually organized into classes. Each call does some work and returns control to the client.

A framework embodies some abstract design, with more behavior built in. In order to use it you need to insert your behavior into various places in the framework either by subclassing or by plugging in your own classes. The framework's code then calls your code at these points. (source)

Your code calls the library; the framework calls your code ≈ The LLM calls the tool; the scaffolding calls the LLM.

Comment by cubefox on quetzal_rainbow's Shortform · 2024-07-19T11:48:35.328Z · LW · GW

I actually think it's still an inner alignment failure -- even if the preference data was biased, drawing such extreme conclusions is hardly an appropriate way to generalize them. Especially because the base model has a large amount of common sense, which should have helped with giving a sensible response, but apparently it didn't.

Though it isn't clear what is misaligned when RLHF is inner misaligned -- RLHF is a two step training process. Preference data are used to train a reward model, and the reward model in turn creates synthetic preference data which is used to fine-tune the base LLM. There can be misalignment if the reward model misgeneralizes the human preference data, or when the base model fine-tuning method misgeneralizes the data provided by the reward model.

Regarding the scissor statements -- that seems more like a failure to refuse a request to produce such statements, similar to how the model should have refused to answer the past tense meth question above. Giving the wrong answer to an ethical question is different.

Comment by cubefox on quetzal_rainbow's Shortform · 2024-07-19T09:56:10.785Z · LW · GW

For me, the most concerning example is still this (I assume it got downvoted for mind-killed reasons.)

There is a difference between RLHF failures in ethical judgement and jailbreak failures, but I'm not sure whether the underlying "cause" is the same.

Comment by cubefox on Alignment: "Do what I would have wanted you to do" · 2024-07-19T02:34:06.597Z · LW · GW

For moral realism to be true in the sense which most people mean when they talk about it, "good" would have to have an observer-independent meaning. That is, it would have to not only be the case that you personally feel that it means some particular thing, but also that people who feel it to mean some other thing are objectively mistaken, for reasons that exist outside of your personal judgement of what is or isn't good.

That would only be a case of ambiguity (one word used with two different meanings). If you mean with saying "good" the same as people usually mean with "chair", this doesn't imply anti-realism, just likely misunderstandings.

Assume you are a realist about rocks, but call them trees. That wouldn't be a contradiction. Realism has nothing to do with "observer-independent meaning".

For a belief to pay rent, it should not only predict some set of sensory experiences but predict a different set of sensory experiences than would a model not including it.

This doesn't make sense. A model doesn't have beliefs, and if there is no belief, there is nothing it (the belief) predicts. Instead, for a belief to "pay rent" it is necessary and sufficient that it makes different predictions than believing its negation.

If you call increasing-welfare "good" and I call honoring-ancestors "good", our models do not make different predictions about what will happen, only about which things should be assigned the label "good". That is what it means for a belief to not pay rent.

Compare:

If you call a boulder a "tree" and I call a plant with a woody trunk a "tree", our models do not make different predictions about what will happen, only about which things should be assigned the label "tree". That is what it means for a belief to not pay rent.

Of course our beliefs pay rent here, they just pay different rent. If we both express our beliefs with "There is a tree behind the house" then we have just two different beliefs, because we expect different experiences. Which has nothing to do with anti-realism about trees.

Comment by cubefox on Hiding in plain sight: the questions we don't ask · 2024-07-16T00:51:05.121Z · LW · GW

Unfortunately this post seems too long and unorthodox (convoluted?) to attract readers. Writing something shorter and simpler could get more engagement.

Comment by cubefox on Alignment: "Do what I would have wanted you to do" · 2024-07-14T21:06:19.323Z · LW · GW

But why do you care about the concept of a bachelor? What makes you pick it out of the space of ideas and concepts as worthy of discussion and consideration?

Well, "bachelor" was just an example of a word for which you don't know the meaning, but want to know the meaning. The important thing here is that it has a meaning, not how useful the concept is.

But I think you actually want to talk about the meaning of terms like "good". Apparently you now concede that they are meaningful (are associated with anticipated experiences) and instead claim that the concept of "good" is useless. That is surprising. There is arguably nothing more important than ethics; than the world being in a good state or trajectory. So it is obvious that the term "good" is useful. Especially because it is exactly what an aligned superintelligence should be targeted at. After all, it's not an accident that EY came up with extrapolated volition as an ethical theory for solving the problem of what a superintelligence should be aligned to. An ASI shouldn't do bad things and should do good things, and the problem is making the ASI care for being good rather than for something else, like making paperclips.

Regarding the SEP quote: It doesn't argue that moral internalism is part of moral realism, which was what you originally were objecting to. But we need not even use the term "moral realism", we only need the claim that statements on what is good or bad have non-trivial truth values, i.e. aren't purely subjective, or mere expressions of applause, or meaningless, or the like. This is a semantic question about what terms like "good" mean.

Comment by cubefox on Alignment: "Do what I would have wanted you to do" · 2024-07-14T19:13:44.075Z · LW · GW

Yes, knowing that something is (in the moral-cognitivist, moral-realist, observer-independent sense) "good" allows you to anticipate that it... fulfills the preconditions of being "good" (one of which is "increased welfare", in this particular conception of it). At a conceptual level, that doesn't provide you relevant anticipated experiences that go beyond the category of "good and everything it contains"; it doesn't constrain the territory beyond statements that ultimately refer back to goodness itself. It holds the power of anticipated experience only in so much as it is self-referential in the end, which doesn't provide meaningful evidence that it's a concept which carves reality at the joints.

I disagree with that. When we expect something to be good, we have some particular set of anticipated experiences (e.g. about increased welfare, extrapolated desires) that are consistent with our expectation, and some other set that is inconsistent with it. We do not merely "expect" a tautology, like "expecting" that good things are good (or that chairs are chairs etc). We can see this by the fact that we may very well see evidence that is inconsistent with our expectation, e.g. evidence that something instead leads to suffering and thus doesn't increase welfare, and hence isn't good. Believing something to be good therefore pays rent in anticipated experiences.

Moreover, we can wonder (ask ourselves) whether some particular thing is good or not (like e.g., recycling plastic), and this is not like "wondering" whether chairs are chairs. We are asking a genuine question, not a tautological one.

When Seth Herd questioned what you meant by good and "moral claims", you said that you "don't think anyone needs to define what words used in ordinary language mean."

To be clear, what I said was this: "I don't think anyone needs to define what words used in ordinary language mean because the validity of any attempt of such a definition would itself have to be checked against the intuitive meaning of the word in common usage."

But if the only way it does that is because it then allows you to claim that "X fulfills the conditions of membership", then this is not a useful category.

I think I have identified the confusion here. Assume you don't know what "bachelor" means, and you ask me which evidence I associate with that term. And I reply: If I believe something is a bachelor, I anticipate evidence that confirms that it is an unmarried man. Now you could reply that this is simply saying "'bachelor' fulfills the conditions of membership". But no, I have given you a non-trivial definition of the term, and if you already knew what "unmarried" and "man" meant (what evidence to expect if those terms apply), you now also know what to anticipate for "bachelor" -- what the term "bachelor" means. Giving a definition for X is not the same as merely saying "X fulfills the conditions of membership".

If moral realism is viewed through the lens mentioned by Roko, which does imply specific factual anticipated experiences about the world (which go beyond the definition of "moral realism instead"), namely that "All (or perhaps just almost all) beings, human, alien or AI, when given sufficient computing power and the ability to learn science and get an accurate map-territory morphism, will agree on what physical state the universe ought to be transformed into, and therefore they will assist you in transforming it into this state," then it's no longer arbitrary.

Roko relies here on the assumption that moral beliefs are inherently motivating ("moral internalism", as discussed by EY here), which is not a requirement for moral realism.

But you specifically disavowed this interpretation, even going so far as to say that "I can believe that I shouldn't eat meat, or that eating meat is bad, without being motivated to stop eating meat." So your version of "moral realism"

It is not just my interpretation, that is how the term "moral realism" is commonly defined in philosophy, e.g. in the SEP.

is just choosing a specific set of things you define to be "moral"

Well, I specifically don't need to propose any definition. What matters for any proposal for a definition (such as EY's "good ≈ maximizes extrapolated volition") is that it captures the natural language meaning of the term.

without requiring anyone who agrees that this is moral to act in accordance with it (which would indeed be an anticipated experience about the outside world)

I say that's confused. If I believe, for example, that raising taxes is bad, then I do have anticipated experiences associated with this belief. I may expect that raising taxes is followed by a weaker economy, more unemployment, less overall wealth, in short: decreased welfare. This expectation does not at all require that anyone agrees with me, nor that anyone is motivated to not raise taxes.

I really don't know if what I've written here is going to be helpful for this conversation.

The central question here is whether (something like) EY's ethical theory is sound. If it is, CEV could make sense as an alignment target, even if it is not clear how we get there.

Comment by cubefox on Alignment: "Do what I would have wanted you to do" · 2024-07-13T17:20:46.227Z · LW · GW

Your central claim seemed to be that words like "good" have no associated anticipated experience, with which I disagreed in the comment linked above. You didn't yet reply to that.

Comment by cubefox on Alignment: "Do what I would have wanted you to do" · 2024-07-13T17:00:56.900Z · LW · GW

By chance, did you, in the meantime, have any more thoughts on our debate on moral (anti-)realism, on the definability of terms like "good"?

Comment by cubefox on benwr's unpolished thoughts · 2024-07-08T20:22:55.080Z · LW · GW

Bing also uses inner monologue:

https://x.com/MParakhin/status/1632087709060825088

https://x.com/MParakhin/status/1728890277249916933

https://www.reddit.com/r/bing/comments/11ironc/bing_reveals_its_data_structure_for_conversations/

Comment by cubefox on Datasets that change the odds you exist · 2024-07-07T06:58:54.261Z · LW · GW

Abram Demski has written about this here: 2. "Yes requires the possibility of no."

evidence is balanced between making the observation and not making the observation, not between the observation and the observation of the negation

Comment by cubefox on Decaeneus's Shortform · 2024-07-06T15:22:22.161Z · LW · GW

I recognize a very similar failure mode of instrumental rationality: I sometimes include in the decision process for an action not just the utility of that action itself, but also its probability. That is, I act on the expected utility of the action, not on its utility. Example:

  • I should hurry up enough to catch my train (hurrying up enough has high utility)

  • Based on experience, I probably won't hurry up enough (hurrying up enough has low probability)

  • So the expected utility (utility*probability) of hurrying up enough is not very high

  • So I don't hurry up enough

  • So I miss my train.

The mistake is to pay any attention to the expected utility (utility*probability) of an action, rather than just to its utility. The probability of what I will do is irrelevant to what I should do. The probability of an action should be the output, never the input of my decision. If one action has the highest utility, it should go to 100% probability (that is, I should do it) and all the alternative actions should go to 0 probability.

The scary thing is that recognizing this mistake doesn't help with avoiding it.

Comment by cubefox on Nathan Young's Shortform · 2024-07-06T11:54:38.156Z · LW · GW

A lot of problems arise from inaccurate beliefs instead of bad goals. E.g. suppose both the capitalists and the communists are in favor of flourishing, but they have different beliefs on how best to achieve this. Now if we pick a bad policy to optimize for a noble goal, bad things will likely still follow.

Comment by cubefox on Rocket's Shortform · 2024-07-06T11:29:27.220Z · LW · GW

This point was recently elaborated on here: Pascal's Mugging and the Order of Quantification

Comment by cubefox on Andrew Burns's Shortform · 2024-06-27T11:43:58.938Z · LW · GW

I find counterarguments more convincing than to challenge people to bet.

Comment by cubefox on LLM Generality is a Timeline Crux · 2024-06-27T10:14:12.249Z · LW · GW

How general are LLMs? It's important to clarify that this is very much a matter of degree.

I would like to see attempts to come up with a definition of "generality". Animals seem to be very general, despite not being very intelligent compared to us.

Certainly state-of-the-art LLMs do an enormous number of tasks that, from a user perspective, count as general reasoning. They can handle plenty of mathematical and scientific problems; they can write decent code; they can certainly hold coherent conversations.; they can answer many counterfactual questions; they even predict Supreme Court decisions pretty well. What are we even talking about when we question how general they are?

We are clearly talking about something very different from this when we say animals are general. Animals can do none of those things. So are animals, except for humans, really narrow systems, not general ones? Or are we improperly mixing generality with intelligence when we talk about AI generality?

Comment by cubefox on Mistakes people make when thinking about units · 2024-06-26T23:23:03.957Z · LW · GW

Thanks for the effort, though unfortunately I'm not familiar with linear regression.

Comment by cubefox on Mistakes people make when thinking about units · 2024-06-26T23:01:45.933Z · LW · GW

Log odds, measured in something like "bits of evidence" or "decibels of evidence", is the natural thing to think of yourself as "counting". A probability of 100% would be like having infinite positive evidence for a claim and a probability of 0% is like having infinite negative evidence for a claim. Arbital has some math and Eliezer has a good old essay on this.

Odds (and log odds) solve some problems but they unfortunately create others.

For addition and multiplication they at least seem to make things worse. We know that we can add probabilities if they are "mutually exclusive" to get the probability of their disjunction, and we know we can multiply them if they are "independent" to get the probability of their conjunction. But when can we add two odds, or multiply two odds? (Or log odds) And what would be the interpretation of the result?

On the other hand, unlike for probabilities, the multiplication with constants does indeed seem unproblematic for odds. (Or the addition of constants for logs.) E.g. "doubling" some odds makes always sense due to them being unbounded from above, while doubling probabilities is not always possible. And when it is, it is questionable whether it has any sensible interpretation.

But the way Arbital and Eliezer handle it doesn't actually make use of this fact. They instead treat the likelihood ratio (or its logarithm) as evidence strength. But, as I said, the likelihood ratio is actually a ratio of probabilities, not of odds, in which case the interpretation as evidence strength is shaky. The likelihood ratio assumes that doubling a small probability of the evidence constitutes the same evidence strength as doubling a relatively large one, which seems not right.

As a formal example, assume the hypothesis  doubles the probability of evidence  compared to . That is, we have the likelihood ratio . Since  is interpreted to constitute 1 bit of evidence in favor of .

Then assume we also have some evidence  that is doubled by  compared to . So  is interpreted to also be 1 bit of evidence in favor of .

Does this mean both cases involve equal evidence strength? Arguably no. For example, the probability of  may be quite small while the probability of  may be quite large. This would mean  hardly decreases the probability of  compared to , while  strongly decreases the probability of  compared to . So .

So according to the likelihood ratio theory,  would be moderate (1 bit) evidence for , and  would be equally moderate evidence for , but  would be very weak evidence against  while  would be very strong evidence against .

That seems implausible. Arguably  is here much stronger evidence for  than .

Here is a more concrete example:

= The patient actually suffers from Diseasitis.

= The patient suffers from Diseasitis according to test 1.

= The patient suffers from Diseasitis according to test 2.

Log likelihood ratio :

Log likelihood ratio :

So this says both tests represent equally strong evidence.

What if we instead take the ratio of conditional odds, instead of the ratio of conditional probabilities (as in the likelihood ratio)?

Log odds ratio :

Log odds ratio :

So the odds ratios are actually pretty different. Unlike the likelihood ratio, the odds ratio agrees with my argument that is significantly stronger evidence than .

Comment by cubefox on Mistakes people make when thinking about units · 2024-06-25T15:52:02.643Z · LW · GW

I liked this post. All this sounds obvious enough when you read it like that, but it can't be too obvious when even someone like Matt Parker got it wrong.

A further question: What type of units are probabilities? They seem different from from e.g. meters or sheep.

For example, probabilities are bounded below and above (0 and 1) while sheep (and meters etc) are only bounded below (0). (And "million" isn't bounded at all.) So the expression "twice as many" does make sense for sheep or meters, but arguably not for probabilities, or at least not always. Because a 0.8 probability times two would yield a "1.6 probability", which is not a probability. And then it seems also questionable whether doubling the probability 0.01 is "morally the same" as doubling the probability 0.5. Yet e.g. in medical testing and treatment trials, ratios of probabilities are often used, e.g. the likelihood ratio for some hypothesis H and some evidence E, or the risk ratio . Which say one probability is (for example) "double" the other probability, while implicitly assuming doubling a small probability is comparable to doubling a relatively large one. Case in point: Doubling the probability of survival doesn't imply anything about the factor with which the probability of death changes, except that it is between 0 and 1.

Another thing I found interesting is that probabilities can only be added if they are "mutually exclusive". But that's actually the same as for meters and sheep. If there are two sheep on the yard and three sheep on the property, they can only be added if "sheep on the yard" and "sheep on the property" are mutually exclusive. Otherwise we would be double counting sheep. And when adding length measurements, we also have to avoid double counting (double measuring).

Moreover, two probabilities can be validly multiplied when they are "independent". Does this also have an analogy for sheep? I can't think of one, as multiplying sheep seems generally nonsensical. But multiplying meters does indeed make sense in certain cases. It yields an area if the multiplied lengths were measured in a right angle to each other. I'm not sure whether there is any further connection to probability, but both this and being independent are sometimes called "orthogonal".

Comment by cubefox on Eric Neyman's Shortform · 2024-06-24T19:21:13.165Z · LW · GW

"I find soaps disfusing, I'm straight up afused by soaps"

Comment by cubefox on Population ethics and the value of variety · 2024-06-24T11:20:46.842Z · LW · GW

Imagine 10 equally happy people, all exactly alike. Now imagine 10 equally happy people, all different. The latter feels more valuable to me, and I think to many others as well.

But their life does seem equally valuable to themselves, so it isn't clear why external evaluations should matter here. Imagine we can either rescue 11 equal people or 10 different people. The equal people would argue that they should be rescued because they are one more.

Are the 11 equal people like a single person who lives 11 times as long as normal and repeats the same experience 11 times while each time forgetting everything that happened before? Arguably no, because the single person would neither want to have amnesia nor repeating experiences, similar to how we wouldn't want to unknowingly die during sleep. But the 11 people are probably not bothered much by the existence of their copies.

Comment by cubefox on Appraising aggregativism and utilitarianism · 2024-06-24T10:49:20.609Z · LW · GW

Is it right to say that aggregativism is, similar to total and average utilitarianism, incompatible with the procreation asymmetry, unlike some forms of person affecting utilitarianism?

Comment by cubefox on Suffering Is Not Pain · 2024-06-23T20:07:17.045Z · LW · GW

I'm not sure I understand this distinction. Say I have a strong desire to eat pizza, but only a weak craving. I have a hard time imagining what that would be like. Or a strong craving but a weak desire. Or even this: I have a strong desire not to eat pizza, but also a strong craving to eat pizza. Are perhaps desires, in this picture, more intellectual somehow, or purely instrumental, while cravings are ... animalistic urges? One example I can think of in these terms would be addiction, where someone has a strong desire not to smoke and a strong craving to smoke. Or, another example, someone has a strong craving to laugh and a strong desire to instead keep composure.

Does then craving (rather than desire) frustration, or aversion realization, constitute suffering? This is perhaps more plausible. But still, it seems to make sense to say I have an aversion to pain because I suffer from it, which wouldn't make sense if suffering was the same as an aversion being realized.

Comment by cubefox on Suffering Is Not Pain · 2024-06-19T13:42:06.239Z · LW · GW

I find the description "The unsatisfactoriness that arises from craving, aversion, and clinging/attachment to sensations and experiences" a bit hard to understand. But it seems similar to "preference frustration" which occurs when one can't satisfy a strong desire. Is this the intended meaning?

It seems somewhat plausible that preference frustration is indeed identical to suffering, but I'm not convinced. When I'm suffering from pain it also is the case that I don't want to be in pain, so a preference is frustrated. However, it also seems that I don't want to be in pain because I suffer from pain. So in this case, preference frustration would be explained with suffering. This in turn would mean they can't be identical, since explanation is irreflexive: "x because x" is false for any x.

I would object more directly to your proposal that pain is "an unpleasant physical sensation or emotional experience". People with depression have an unpleasant emotional experience, but they clearly aren't thereby in pain. The IASP definition you referred to seems more appropriate: "An unpleasant sensory and emotional experience associated with, or resembling that associated with, actual or potential tissue damage". Depression feels nothing like tissue damage, so it's not pain. Since both pain and depression (and other emotional experiences like anxiety) can cause suffering, this is another point for your argument that pain is not the same as suffering.

Comment by cubefox on [Linkpost] Guardian article covering Lightcone Infrastructure, Manifest and CFAR ties to FTX · 2024-06-17T18:27:38.428Z · LW · GW

One problem with discussing this is that we here arguably have an asymmetric discourse situation.

Comment by cubefox on My AI Model Delta Compared To Yudkowsky · 2024-06-15T21:05:48.159Z · LW · GW

If there was no fact of the matter of what you want overall, there would be no fact of the matter of whether an AI is aligned with you or not. Which would mean there is no alignment problem.

The referenced post seems to apply specifically to IRL, which is purely based on behaviorism and doesn't take information about the nature of the agent into account. (E.g. the fact that humans evolved from natural selection tells us a lot of what they probably want, and information about their brain could tell us how intelligent they are.) It's also only an epistemic point about the problem of externally inferring values, not about those values not existing.

Comment by cubefox on My AI Model Delta Compared To Yudkowsky · 2024-06-15T18:00:07.732Z · LW · GW

More complicated yes, but I assume the question is whether superintelligent AIs can understand what you want "overall" at least as good as other humans. And here, I would agree with ozziegooen, the answer seems to be yes -- even if they otherwise tend to reason about things differently than we do. Because there seems to be a fact of the matter about what you want overall, even if it is not easy to predict. But predicting it is not obviously inhibited by a tendency to think in different terms ("ontology"). Is the worry perhaps that the AI finds the concept of "what the human wants overall" unnatural, so is unlikely to optimize for it?

Comment by cubefox on My AI Model Delta Compared To Yudkowsky · 2024-06-15T15:30:07.461Z · LW · GW

Your thermostat example seems to rather highlight a disanalogy: The concept of a goal doesn't apply to the thermostat because there is apparently no fact of the matter about which counterfactual situations would satisfy such a "goal". I think part of the reason is that the concept of a goal requires the ability to apply it to counterfactual situations. But for humans there is such a fact of the matter; there are things that would be incompatible with or required by our goals. Even though some/many other things may be neutral (neither incompatible nor necessary).

So I don't think there are any "extra assumptions" needed. In fact, even if there were such extra assumptions, it's hard to see how they could be relevant. (This is analogous to the ancient philosophical argument that God declaring murder to be good obviously wouldn't make it good, so God declaring murder to be bad must be irrelevant to murder being bad.)

Comment by cubefox on When is "unfalsifiable implies false" incorrect? · 2024-06-15T10:08:02.511Z · LW · GW

In an interview, Elon Musk said that if gravity on Earth had been only slightly stronger, it would have been impossible to build orbital rockets. If this is true, presumably certain astronomical observations which are filtered out by the atmosphere or the magnetic field of the earth couldn't have been made. Though I don't know any specifics.

Comment by cubefox on quetzal_rainbow's Shortform · 2024-06-15T00:37:27.404Z · LW · GW

Animals were optimized for agency and generality first, AIs last.

Comment by cubefox on My AI Model Delta Compared To Yudkowsky · 2024-06-14T23:13:25.221Z · LW · GW

If it is merely "not clear" then this doesn't seem to be enough for an optimistic inductive inference. I also disagree that this looks good from a PR perspective. It looks even worse than Kant's infamous example where you allegedly aren't allowed to lie when hiding someone from a murderer.

Comment by cubefox on My AI Model Delta Compared To Yudkowsky · 2024-06-14T21:41:12.535Z · LW · GW

Ignored or downvoted. Perhaps someone could make a postmortem analysis of those comment threads today.

Comment by cubefox on My AI Model Delta Compared To Yudkowsky · 2024-06-14T19:32:36.790Z · LW · GW

At least RLHF is observably generalizing in "catastrophic ways":

You may argue that this will change in the future, but that isn't supported by an inductive argument (ChatGPT-3.5 had the same problem).

Comment by cubefox on Open Thread Summer 2024 · 2024-06-14T14:03:02.340Z · LW · GW

Bug report: When opening unread posts in a background tab, the rendering is broken in Firefox:

It should look like this:

The rendering in comments is also affected.

My current fix is to manually reload every broken page, though this is obviously not optimal.

Comment by cubefox on DPO/PPO-RLHF on LLMs incentivizes sycophancy, exaggeration and deceptive hallucination, but not misaligned powerseeking · 2024-06-11T08:30:39.985Z · LW · GW

As far as I understand, in RLHF, PPO/DPO doesn't directly use preferences from human raters, but instead synthetic preference data generated by a reward model. The reward model in turn is trained on preference data given by actual human raters. The reward model may be misgeneralizing this data, in which case the DPO input may include preferences that humans wouldn't give. Which might change your conclusion.

Comment by cubefox on Summary of Situational Awareness - The Decade Ahead · 2024-06-10T13:15:06.572Z · LW · GW

(I recommend not using the acronym "USG". I never heard it before, and I assume many other people don't know what it means either.)

Comment by cubefox on What if a tech company forced you to move to NYC? · 2024-06-10T12:43:48.695Z · LW · GW

Things staying mostly the same doesn't seem to count as a "large effect". For example, we wouldn't say taking a placebo pill has a large effect.

Comment by cubefox on The Perils of Popularity: A Critical Examination of LessWrong's Rational Discourse · 2024-06-09T20:16:31.187Z · LW · GW

It seems clear there is some degree of truth to this. It may help to introduce agree/disagree voting not just for comments, but also for posts. This way people can express their (dis)agreement without also expressing their motivation for the post to (not) appear on the frontpage.

Comment by cubefox on Response to Aschenbrenner's "Situational Awareness" · 2024-06-08T19:21:59.644Z · LW · GW

Why? 95% risk of doom isn't certainty, but seems obviously more than sufficient.

If AI itself leads to doom, it likely doesn't matter whether it was developed by US Americans or by the Chinese. But if it doesn't lead to doom (the remaining 5%) it matters a lot which country is first, because that country is likely to achieve world domination.

The USG could choose the coinflip, or it could choose to try to prevent China from putting the world at risk without creating that risk itself.

Short of choosing a nuclear war with China, the US can't do much to deter the country from developing superintelligence. Except of course for seeking international coordination, as Akash proposed. But that's what cousin_it was arguing against.

The whole problem seems like a prisoner's dilemma. Either you defect (try to develop ASI before the other country, for cases where AI doom doesn't happen), or you try to both cooperate (international coordination). I don't see a rational third option.

Comment by cubefox on Response to Aschenbrenner's "Situational Awareness" · 2024-06-08T12:01:33.563Z · LW · GW

If the Chinese are working on a technology that will end humanity, that doesn't mean the US needs to work on the same technology. There's no point working on such technology.

Only if it was certain that the technology will end humanity. Since it clearly is less than certain, it makes sense to try to beat the other country.

Comment by cubefox on MichaelDickens's Shortform · 2024-06-07T19:04:54.974Z · LW · GW

Low confidence: Given that our ancestors had to deal with mold for millions of years, I would expect that animals are quite well adapted to its toxicity. This is different from (evolutionary speaking) new potentially toxic substances, like e.g. transfats or microplastics.

Comment by cubefox on Andrew Burns's Shortform · 2024-06-06T20:50:09.616Z · LW · GW

For what it's worth, Yann LeCun argues that video diffusion models like Sora, or any models which predict pixels, are useless for creating an AGI world model. So this might be a dead end. The reason, according to LeCun, is that pixel data is very high dimensional and redundant compared to text (LLMs only use something like 65.000 tokens), which makes exact prediction less useful. In his 2022 outline of his proposed AGI framework, JEPA, he instead proposes an architecture which predicts embeddings rather than exact pixels.