Posts
Comments
Smart people are often too arrogant and proud, and know too much.
I thought that might be the case. If you looked at GPT-3 or 3.5, then, the higher the quality of your own work, the less helpful (and, potentially, the more destructive and disruptive) it is to substitute in the LLM's work; so higher IQ in these early years of LLMs may correlate with dismissing them and having little experience using them.
But this is a temporary effect. Those who initially dismissed LLMs will eventually come round; and, among younger people, especially as LLMs get better, higher-IQ people who try LLMs for the first time will find them worthwhile and use them just as much as their peers. And if you have two people who have both spent N hours using the same LLM for the same purposes, higher IQ will help, all else being equal.
Of course, if you're simply reporting a correlation you observe, then all else is likely not equal. Please think about selection effects, such as those described here.
Using LLMs is an intellectual skill. I would be astonished if IQ was not pretty helpful for that.
For editing adults, it is a good point that lots of them might find a personality tweak very useful, and e.g. if it gave them a big bump in motivation, that would likely be worth more than, say, 5-10 IQ points. An adult is in a good position to tell what's the delta between their current personality and what might be ideal for their situation.
Deliberately tweaking personality does raise some "dual use" issues. Is there a set of genes that makes someone very unlikely to leave their abusive cult, or makes them loyal obedient citizens to their tyrannical government, or makes them never join the hated outgroup political party? I would be pretty on board with a norm of not doing research into that. Basic "Are there genes that cause personality disorders that ~everyone agrees are bad?" is fine; "motivation" as one undifferentiated category seems fine; Big 5 traits ... have some known correlations with political alignment, which brings it into territory I'm not very comfortable with, but if it goes no farther that it might be fine.
On a quick skim, an element that seems to be missing is that having emotions which cause you to behave 'irrationally' can in fact be beneficial from a rational perspective.
For example, if everyone knows that, when someone does you a favor, you'll feel obligated to find some way to repay them, and when someone injures you, you'll feel driven to inflict vengeance upon them even at great cost to yourself—if everyone knows this about you, then they'll be more likely to do you favors and less likely to injure you, and your expected payoffs are probably higher than if you were 100% "rational" and everyone knew it. I believe this is in fact why we have the emotions of gratitude and anger, and I think various animals have something resembling them. Put it this way: carrying out threats and promises is "irrational" by definition, but making your brain into a thing that will carry out threats and promises may be very rational.
So you could call these emotions "irrational" or the thoughts they lead to "biased", but I think that (a) likely pushes your thinking in the wrong direction in general, and (b) gives you no guidance on what "irrational" emotions are likely to exist.
What is categorized as "peer pressure" here? Explicit threats to report you to authorities if you don't conform? I'm guessing not. But how about implicit threats? What if you've heard (or read in the news) stories about people who don't conform—in ways moderately but not hugely more extreme than you—having their careers ruined? In any situation that you could call "peer pressure", I imagine there's always at least the possibility of some level of social exclusion.
The defining questions for that aspect would appear to be "Do you believe that you would face serious risk of punishment for not conforming?" and "Would a reasonable person in your situation believe the same?". Which don't necessarily have the same answer. It might, indeed, be that people whom you observe to be "conformist" are the ones who are oversensitive to the risk of social exclusion.
The thing that comes to mind, when I think of "formidable master of rationality", is a highly experienced engineer trying to debug problems, especially high-urgency problems that the normal customer support teams haven't been able to handle. You have a fresh phenomenon, which the creators of the existing product apparently didn't anticipate (or if they did, they didn't think it worth adding functionality to handle it), which casts doubt on existing diagnostic systems. You have priors on which tools are likely to still work, priors on which underlying problems are likely to cause which symptoms; tests you can try, each of which has its own cost and range of likely outcomes, and some of which you might invent on the spot; all of these lead to updating your probability distribution over what the underlying problem might be.
Medical diagnostics, as illustrated by Dr. House, can be similar, although I suspect the frequency of "inventing new tests to diagnose a never-before-seen problem" is lower there.
One argument I've encountered is that sentient creatures are precisely those creatures that we can form cooperative agreements with. (Counter-argument: one might think that e.g. the relationship with a pet is also a cooperative one [perhaps more obviously if you train them to do something important, and you feed them], while also thinking that pets aren't sentient.)
Another is that some people's approach to the Prisoner's Dilemma is to decide "Anyone who's sufficiently similar to me can be expected to make the same choice as me, and it's best for all of us if we cooperate, so I'll cooperate when encountering them"; and some of them may figure that sentience alone is sufficient similarity.
So, the arithmetic and geometric mean agree when the inputs are equal, and, the more unequal they are, the lower the geometric mean is.
I note that the subtests have ceilings, which puts a limit on how much any one can skew the result. Like, if you have 10 subtests, and the max score is something like 150, then presumably each test has a max score of 15 points. If we imagine someone gets five 7s and five 13s (a moderately unbalanced set of abilities), then the geometric mean is 9.54, while the arithmetic mean is 10. So, even if someone were confused about whether the IQ test was using a geometric or an arithmetic mean, does it make a large difference in practice?
The people you're arguing against, is it actually a crux for them? Do they think IQ tests are totally invalid because they're using an arithmetic mean, but actually they should realize it's more like a geometric mean and then they'd agree IQ tests are great?
1. IQ scores do not measure even close to all cognitive abilities and realistically could never do that.
Well, the original statement was "sums together cognitive abilities" and didn't use the word "all", and I, at least, saw no reason to assume it. If you're going to say something along the lines of "Well, I've tried to have reasonable discussions with these people, but they have these insane views", that seems like a good time to be careful about how you represent those views.
2. Many of the abilities that IQ scores weight highly are practically unimportant.
Are you talking about direct measurement, or what they correlate with? Because, certainly, things like anagramming a word have almost no practical application, but I think it's intended to (and does) correlate with language ability. But in any case, the truth value of the statement that IQ is "an index that sums together cognitive abilities" is unaffected by whether those abilities are useful ones.
Perhaps you have some idea of a holistic view, of which that statement is only a part, and maybe that holistic view contains other statements which are in fact insane, and you're attacking that view, but... in the spirit of this post, I would recommend confining your attacks to specific statements rather than to other claims that you think correlate with those statements.
3. Differential-psychology tests are in practice more like log scales than like linear scales, so "sums" are more like products than like actual suns; even if you are absurdly good at one thing, you're going to have a hard time competing with someone in IQ if they are moderately better at many things.
I wonder how large a difference this makes in practice. So if we run with your claim here, it seems like your conclusion would be... that IQ tests combine the subtest scores in the wrong way, and are less accurate than they should be for people with very uneven abilities? Is that your position? At any rate, even if the numbers are logarithms, it's still correct to say that the test is adding them up, and I don't consider that good grounds for calling it "insane" for people to consider it addition.
thinks of IQ as an index that sums together cognitive abilities
Is this part not technically true? IQ tests tend to have a bunch of subtests intended to measure different cognitive abilities, and you add up—or average, which is adding up and dividing by a constant—your scores on each subtest. For example (bold added):
The current version of the test, the WAIS-IV, which was released in 2008, is composed of 10 core subtests and five supplemental subtests, with the 10 core subtests yielding scaled scores that sum to derive the Full Scale IQ.
Interesting. The natural approach is to imagine that you just have a 3-sided die with 2, 4, 6 on the sides, and if you do that, then I compute A = 12 and B = 6[1]. But, as the top Reddit comment's edit points out, the difference between that problem and the one you posed is that your version heavily weights the probability towards short sequences—that weighting being 1/2^n for a sequence of length n. (Note that the numbers I got, A=12 and B=6, are so much higher than the A≈2.7 and B=3 you get.) It's an interesting selection effect.
The thing is that, if you roll a 6 and then a non-6, in an "A" sequence you're likely to just die due to rolling an odd number before you succeed in getting the double 6, and thus exclude the sequence from the surviving set; whereas in a "B" sequence there's a much higher chance you'll roll a 6 before dying, and thus include this longer "sequence of 3+ rolls" in the set.
To illustrate with an extreme version, consider:
A: The expected number of rolls of a fair die until you roll two 6s in a row, given that you succeed in doing this. You ragequit if it takes more than two rolls.
Obviously that's one way to reduce A to 2.
- ^
Excluding odd rolls completely, so the die has a 1/3 chance of rolling 6 and a 2/3 chance of rolling an even number that's not 6, we have:
A = 1 + 1/3 * A2 + 2/3 * A
Where A2 represents "the expected number of die rolls until you get two 6's in a row, given that the last roll was a 6". Subtraction and multiplication then yields:
A = 3 + A2
And if we consider rolling a die from the A2 state, we get:
A2 = 1 + 1/3 * 0 + 2/3 * A
= 1 + 2/3 * ASubstituting:
A = 3 + 1 + 2/3 * A
=> (subtract)
1/3 * A = 4
=> (multiply)
A = 12For B, a similar approach yields the equations:
B = 1 + 1/3 * B2 + 2/3 * B
B2 = 1 + 1/3 * 0 + 2/3 * B2And the reader may solve for B = 6.
I can also come up with a story where obviously it's cheaper and more effective to disable all of the nuclear weapons than it is to take over the world, so why would the AI do the second thing?
Erm... For preventing nuclear war on the scale of decades... I don't know what you have in mind for how it would disable all the nukes, but a one-off breaking of all the firing mechanisms isn't going to work. They could just repair/replace that once they discovered the problem. You could imagine some more drastic thing like blowing up the conventional explosives on the missiles so as to utterly ruin them, but in a way that doesn't trigger the big chain reaction. But my impression is that, if you have a pile of weapons-grade uranium, then it's reasonably simple to make a bomb out of it, and since uranium is an element, no conventional explosion can eliminate that from the debris. Maybe you can melt it, mix it with other stuff, and make it super-impure?
But even then, the U.S. and Russia probably have stockpiles of weapons-grade uranium. I suspect they could make nukes out of that within a few months. You would have to ruin all the stockpiles too.
And then there's the possibility of mining more uranium and enriching it; I feel like this would take a few years at most, possibly much less if one threw a bunch of resources into rushing it. Would you ruin all uranium mines in the world somehow?
No, it seems to me that the only ways to reliably rule out nuclear war involve either using overwhelming physical force to prevent people from using or making nukes (like a drone army watching all the uranium stockpiles), or being able to reliably persuade the governments of all nuclear powers in the world to disarm and never make any new nukes. The power to do either of these things seems tantamount to the power to take over the world.
why most perfect algorithms that recreate a strawberry on the molecular level destroy the planet as well.
Phrased like this, the answer that comes to mind is "Well, this requires at least a few decades' worth of advances in materials science and nanotechnology and such, plus a lot of expensive equipment that doesn't exist today, and e.g. if you want this to happen with high probability, you need to be sure that civilization isn't wrecked by nuclear war or other threats in upcoming decades, so if you come up with a way of taking over the world that has higher certainty than leaving humanity to its own devices, then that becomes the best plan." Classic instrumental convergence, in other words.
The political version of the question isn't functionally the same as the skin cream version, because the former isn't a randomized intervention—cities that decided to add gun control laws seem likely to have other crime-related events and law changes at the same time, which could produce a spurious result in either direction. So it's quite reasonable to say "My opinion is determined by my priors and the evidence didn't appreciably affect my position."
90% awful idea: "Genetic diversity" in computer programs for resistance to large-scale cyberattacks.
The problem: Once someone has figured out the right security hole in Tesla's software (and, say, broken into a server used to deliver software updates), they can use this to install their malicious code into all 5 million Teslas in the field (or maybe just one model, so perhaps 1 million cars), and probably make them all crash simultaneously and cause a catastrophe.
The solution: There will probably come a point where we can go through the codebase and pick random functions and say, "Claude, write a specification of what this function does", and then "Claude, take this specification and write a new function implementing it", and end up with different functions that accomplish the same task, which are likely to have different bugs. Have every Tesla do this to its own software. Then the virus or program that breaks into some Teslas will likely fail on others.
One reason this is horrible is that you would need an exceptionally high success rate for writing those replacement functions—else this process would introduce lots of mundane bugs, which might well cause crashes of their own. That, or you'd need a very extensive set of unit tests to catch all such bugs—so extensive as to probably eat up most of your engineers' time writing them. Though perhaps AIs could do that part.
To me, that will lead to an environment where people think that they are engaging with criticism without having to really engage with the criticism that actually matters.
This is a possible outcome, especially if the above tactic were the only tactic to be employed. That tactic helps reduce ignorance of the "other side" on the issues that get the steelmanning discussion, and hopefully also pushes away low-curiosity tribalistic partisans while retaining members who value deepening understanding and intellectual integrity. There are lots of different ways for things to go wrong, and any complete strategy probably needs to use lots of tactics. Perhaps the most important tactic would be to notice when things are going wrong (ideally early) and adjust what you're doing, possibly designing new tactics in the process.
Also, in judging a strategy, we should know what resources we assume we have (e.g. "the meetup leader is following the practice we've specified and is willing to follow 'reasonable' requests or suggestions from us"), and know what threats we're modeling. In principle, we might sort the dangers by [impact if it happens] x [probability of it happening], enumerate tactics to handle the top several, do some cost-benefit analysis, decide on some practices, and repeat.
If you frame the criticism as having to be about the mission of psychiatry, it's easy for people to see "Is it ethical to charge poor patients three-digit fees for no-shows?" as off-topic.
My understanding/guess is that "Is it ethical to charge poor patients three-digit fees for no-shows?" is an issue where the psychiatrists know the options and the impacts of the options, and the "likelihood of people actually coming to blows" comes from social signaling things like "If I say I don't charge them, this shows I'm in a comfortable financial position and that I'm compassionate for poor patients"/"If I say I do charge them, this opens me up to accusations (tinged with social justice advocacy) of heartlessness and greed". I would guess that many psychiatrists do charge the fees, but would hate being forced to admit it in public. Anyway, the problem here is not that psychiatrists are unaware of information on the issue, so there'd be little point in doing a steelmanning exercise about it.
That said, as you suggest, it is possible that people would spend their time steelmanning unimportant issues (and making 'criticism' of the "We need fifty Stalins" type). But if we assume that we have one person who notices there's an important unaddressed issue, who has at least decent rapport with the meetup leader, then it seems they could ask for that issue to get steelmanned soon. That could cover it. (If we try to address the scenario where no one notices the unaddressed issue, that's a pretty different problem.)
I want to register high appreciation of Elizabeth for her efforts and intentions described here. <3
The remainder of this post is speculations about solutions. "If one were to try to fix the problem", or perhaps "If one were to try to preempt this problem in a fresh community". I'm agnostic about whether one should try.
Notes on the general problem:
- I suspect lots of our kind of people are not enthusiastic about kicking people out. I think several people have commented, on some cases of seriously bad actors, that it took way too long to actually expel them.
- Therefore, the idea of confronting someone like Jacy and saying "Your arguments are bad, and you seem to be discouraging critical thinking, so we demand you stop it or we'll kick you out" seems like a non-starter in a few ways.
- I guess one could have lighter policing of the form "When you do somewhat-bad things like that, someone will criticize you for it." Sort of like Elizabeth arguing against Jacy. In theory, if one threw enough resources at this, one could create an environment where Jacy-types faced consistent mild pushback, which might work to get them to either reform or leave. However, I think this would take a lot more of the required resources (time, emotional effort) than the right people are inclined to give.
- Those who enjoy winning internet fights... Might be more likely to be Jacy-types in the first place. The intersection of "happy to spend lots of time policing others' behavior" and "not having what seem like more important things to work on" and "embodies the principles we hope to uphold" might be pretty small. The example that comes to mind is Reddit moderators, who have a reputation for being power-trippers. If the position is unpaid, then it seems logical to expect that result. So I conclude that, to a first approximation, good moderators must be paid.
- Could LLMs help with this today? (Obviously this would work specifically for online written stuff, not in-person.) Identifying bad comments is one possibility; helping write the criticism is another.
- Beyond that, one could have "passive" practices, things that everyone was in the habit of doing, which would tend to annoy the bad actors while being neutral (or, hopefully, positive) to the good actors.
- (I've heard that the human immune system, in certain circumstances, does basically that: search for antibodies that (a) bind to the bad things and (b) don't bind to your own healthy cells. Of course, one could say that this is obviously the only sensible thing to do.)
Reading the transcript, my brain generated the idea of having a norm that pushes people to do exercises of the form "Keep your emotions in check as you enumerate the reasons against your favored position, or poke holes in naive arguments for your favored position" (and possibly alternate with arguing for your side, just for balance). In this case, it would be "If you're advocating that everyone do a thing always, then enumerate exceptions to it".
Fleshing it out a bit more... If a group has an explicit mission, then it seems like one could periodically have a session where everyone "steelmans" the case against the mission. People sit in a circle, raising their hands (or just speaking up) and volunteering counterarguments, as one person types them down into a document being projected onto a big screen. If someone makes a mockery of a counterargument ("We shouldn't do this because we enjoy torturing the innocent/are really dumb/subscribe to logical fallacy Y"), then other people gain status by correcting them ("Actually, those who say X more realistically justify it by ..."): this demonstrates their intelligence, knowledge, and moral and epistemic strength. Same thing when someone submits a good counterargument: they gain status ("Ooh, that's a good one") because it demonstrates those same qualities.
Do this for at least five minutes. After that, pause, and then let people formulate the argument for the mission and attack the counterarguments.
Issues in transcript labeling (I'm curious how much of it was done by machine):
- After 00:07:55, a line is unattributed to either speaker; looks like it should be Timothy.
- 00:09:43 is attributed to Timothy but I think must be Elizabeth.
- Then the next line is unattributed (should be Timothy).
- After 00:14:00, unattributed (should be Timothy).
- After 00:23:38, unattributed (should be Timothy)
- After 00:32:34, unattributed (probably Elizabeth)
Grammatically, the most obvious interpretation is a universal quantification (i.e. "All men are taller than all women"), which I think is a major reason why such statements so often lead to objections of "But here's an exception!" Maybe you can tell the audience that they should figure out when to mentally insert "... on average" or "tend to be". Though there are also circumstances where one might validly believe that the speaker really means all. I think it's best to put such qualified language into your statements from the start.
Are you not familiar with the term "vacuously true"? I find this very surprising. People who study math tend to make jokes with it.
The idea is that, if we were to render a statement like "Colorless green ideas sleep furiously" into formal logic, we'd probably take it to mean the universal statement "For all X such that X is a colorless green idea, X sleeps furiously". A universal statement is logically equivalent to "There don't exist any counterexamples", i.e. "There does not exist X such that X is a colorless green idea and X does not sleep furiously". Which is clearly true, and therefore the universal is equally true.
There is, of course, some ambiguity when rendering English into formal logic. It's not rare for English speakers to say "if" when they mean "if and only if", or "or" when they mean "exclusive or". (And sometimes "Tell me which one", as in "Did you do A, or B?" "Yes." "Goddammit.") Often this doesn't cause problems, but sometimes it does. (In which case, as I've said, the solution is not to give their statement an ambiguous truth value, but rather to ask them to restate it less ambiguously.)
"Dragons are attacking Paris" seems most naturally interpreted as the definite statement "There's some unspecified number—but since I used the plural, it's at least 2—of dragons that are attacking Paris", which would be false. One could also imagine interpreting it as a universal statement "All dragons are currently attacking Paris", which, as you say, would be vacuously true since there are no dragons. However, in English, the preferred way to say that would be "Dragons attack Paris", as CBiddulph says. "Dragons are attacking Paris" uses the present progressive tense, while "Dragons attack Paris" uses what is called the "simple present"/"present indefinite" tense. Wiki says:
The simple present is used to refer to an action or event that takes place habitually, to remark habits, facts and general realities, repeated actions or unchanging situations, emotions, and wishes.[3] Such uses are often accompanied by frequency adverbs and adverbial phrases such as always, sometimes, often, usually, from time to time, rarely, and never.
Examples:
- I always take a shower.
- I never go to the cinema.
- I walk to the pool.
- He writes for a living.
- She understands English.
This contrasts with the present progressive (present continuous), which is used to refer to something taking place at the present moment: I am walking now; He is writing a letter at the moment.
English grammar rules aren't necessarily universal and unchanging, but they do give at least medium-strength priors on how to interpret a sentence.
to the point where you can't really eliminate the context-dependence and vagueness via taboo (because the new words you use will still be somewhat context-dependent and vague)
You don't need to "eliminate" the vagueness, just reduce it enough that it isn't affecting any important decisions. (And context-dependence isn't necessarily a problem if you establish the context with your interlocutor.) I think this is generally achievable, and have cited the Eggplant essay on this. And if it is generally achievable, then:
Richard is arguing against foundational pictures which assume these problems away, and in favor of foundational pictures which recognize them.
I think you should handle the problems separately. In which case, when reasoning about truth, you should indeed assume away communication difficulties. If our communication technology was so bad that 30% of our words got dropped from every message, the solution would not be to change our concept of meanings; the solution would be to get better at error correction, ideally at a lower level, but if necessary by repeating ourselves and asking for clarification a lot.
Elsewhere there's discussion of concepts themselves being ambiguous. That is a deeper issue. But I think it's fundamentally resolved in the same way: always be alert for the possibility that the concept you're using is the wrong one, is incoherent or inapplicable to the current situation; and when it is, take corrective action, and then proceed with reasoning about truth. Be like a digital circuit, where at each stage your confidence in the applicability of a concept is either >90% or <10%, and if you encounter anything in between, then you pause and figure out a better concept, or find another path in which this ambiguity is irrelevant.
Presumably anything which is above 50% eggplant is rounded to 100%, and anything below is rounded to 0%.
No, it's more like what you encounter in digital circuitry. Anything above 90% eggplant is rounded to 100%, anything below 10% eggplant is rounded to 0%, and anything between 10% and 90% is unexpected, out of spec, and triggers a "Wait, what?" and the sort of rethinking I've outlined above, which should dissolve the question of "Is it really eggplant?" in favor of "Is it food my roommate is likely to eat?" or whatever new question my underlying purpose suggests, which generally will register as >90% or <10%.
And you appear to be saying in 99% of cases, the vagueness isn't close to 50% anyway, but closer to 99% or 1%. That may be the case of eggplants, or many nouns (though not all), but certainly not for many adjectives, like "large" or "wet" or "dusty". (Or "red", "rational", "risky" etc.)
Do note that the difficulty around vagueness isn't whether objects in general vary on a particular dimension in a continuous way; rather, it's whether the objects I'm encountering in practice, and needing to judge on that dimension, yield a bunch of values that are close enough to my cutoff point that it's difficult for me to decide. Are my clothes dry enough to put away? I don't need to concern myself with whether they're "dry" in an abstract general sense. (If I had to communicate with others about it, "dry" = "I touch them and don't feel any moisture"; "sufficiently dry" = "I would put them away".)
And, in practice, people often engineer things such that there's a big margin of error and there usually aren't any difficult decisions to make whose impact is important. One may pick one's decision point of "dry enough" to be significantly drier than it "needs" to be, because erring in that direction is less of a problem than the opposite (so that, when I encounter cases in the range of 40-60% "dry enough", either answer is fine and therefore I pick at random / based on my mood or whatever); and one might follow practices like always leaving clothes hanging up overnight or putting them on a dryer setting that's reliably more than long enough, so that by the time one checks them, they're pretty much always on the "dry" side of even that conservative boundary.
Occasionally, the decision is difficult, and the impact matters. That situation sucks, for humans and machines:
https://en.wikipedia.org/wiki/Buridan's_ass#Buridan's_principle
Which is why we tend to engineer things to avoid that.
The edges of perhaps most real-world concepts are vague, but there are lots of central cases where the item clearly fits into the concept, on the dimensions that matter. Probably 99% of the time, when my roommate goes and buys a fruit or vegetable, I am not confounded by it not belonging to a known species, or by it being half rotten or having its insides replaced or being several fruits stitched together. The eggplant may be unusually large, or wet, or dusty, or bruised, perhaps more than I realized an eggplant could be. But, for many purposes, I don't care about most of those dimensions.
Thus, 99% of the time I can glance into the kitchen and make a "known unknown" type of update on the type of fruit-object there or lack thereof; and 1% of the time I see something bizarre, discard my original model, and pick a new question and make a different type of update on that.
I don't think you mentioned "nootropic drugs" (unless "signaling molecules" is meant to cover that, though it seems more specific). I don't think there's anything known to give a significant enhancement beyond alertness, but in a list of speculative technologies I think it belongs.
I would be surprised if grocery stores sold edge cases... But perhaps it was a farmer's market or something, perhaps a seller who often liked to sell weird things, perhaps grew hybridized plants. I'll take the case where it's a fresh vegetable/fruit/whatever thing that looks kind of eggplant-ish.
Anyway, that would generally be determined by: Why do I care whether he bought an eggplant? If I just want to make sure he has food, then that thing looks like it counts and that's good enough for me. If I was going to make a recipe that called for eggplant, and he was supposed to buy one for me, then I'd want to know if its flesh, its taste, etc., were similar enough to an eggplant to work with the recipe (and depending on how picky the target audience was). If I were studying plants for its own sake, I might want to interrogate him about its genetics (or the contact info of the seller if he didn't know). If I wanted to be able to tell someone else what it was, then... default description is "it's an edge case of an eggplant", and ideally I'd be able to call it a "half-eggplant, half-X" and know what X was; and how much I care about that information is determined by the context.
I think, in all of these cases, I would decide "Well, it's kind of an eggplant and kind of not", and lose interest in the question of whether I would call it an "eggplant" (except in that last case, though personally I'm with Feynman's dad on not caring too much about the official name of such things) in favor of the underlying question that I cared about. My initial idea, that there would be either a classical eggplant or nothing in the kitchen, turned out to be incoherent in the face of reality, and I dropped the idea in favor of some new approximation to reality that was true and was relevant to my purpose.
What do you know, there's an Eliezer essay on "dissolving the question". Though working through an example is done in another post (on the question "If a tree falls in a forest...?").
Are you familiar with the term "bounded distrust"? Scott and Zvi have written about it; Zvi's article gives a nice list of rules. You seem to have arrived at some similar ideas.
I've heard that, for sleep in particular, some people have Received Wisdom that everyone needs 8 hours of sleep per day, and if they're not getting it, that's a problem, and this has led some people who naturally sleep 7 hours to worry and stress about it, causing insomnia and whatnot.
Of course, ideally such people would have better ability to manage such worries (and better "I should do some more googling before stressing about it too hard" reflexes), but in practice many do not.
You may be inspired by, or have independently invented, a certain ad for English-language courses.
My take on this stuff is that, when a person's words are ambiguous in a way that matters, then what should happen is you ask them for clarification (often by asking them to define or taboo a word), and they give it to you, and (possibly after multiple cycles of this) you end up knowing what they meant. (It's also possible that their idea was incoherent and the clarification process makes them realize this.)
What they said was an approximation to the idea in their mind. Don't concern yourself with the truth of an ambiguous statement. Concern yourself with the truth of the idea. It makes little sense to talk of the truth of statements where the last word is missing, or other errors inserted, or where important words are replaced with "thing" and "whatever"; and I would say the same about ambiguous statements in general.
If the person is unavailable and you're stuck having to make a decision based on their ambiguous words, then you make the best guess you can. As you, say, you have a probability distribution over the ideas they could have meant. Perhaps combine that with your prior probabilities of the truth values to help compute the expected value of each of your possible choices.
It's a decent exploration of stuff, and ultimately says that it does work:
Language is not the problem, but it is the solution. How much trouble does the imprecision of language cause, in practice? Rarely enough to notice—so how come? We have many true beliefs about eggplant-sized phenomena, and we successfully express them in language—how?
These are aspects of reasonableness that we’ll explore in Part Two. The function of language is not to express absolute truths. Usually, it is to get practical work done in a particular context. Statements are interpreted in specific situations, relative to specific purposes. Rather than trying to specify the exact boundaries of all the variants of a category for all time, we deal with particular cases as they come up.
If the statement you're dealing with has no problematic ambiguities, then proceed. If it does have problematic ambiguities, then demand further specification (and highlighting and tabooing the ambiguous words is the classic way to do this) until you have what you need, and then proceed.
I'm not claiming that it's practical to pick terms that you can guarantee in advance will be unambiguous for all possible readers and all possible purposes for all time. I'm just claiming that important ambiguities can and should be resolved by something like the above strategy; and, therefore, such ambiguities shouldn't be taken to debase the idea of truth itself.
Edit: I would say that the words you receive are an approximation to the idea in your interlocutor's mind—which may be ambiguous due to terminology issues, transmission errors, mistakes, etc.—and we should concern ourselves with the truth of the idea. To speak of truth of the statement is somewhat loose; it only works to the extent that there's a clear one-to-one mapping of the words to the idea, and beyond that we get into trouble.
I am dismayed by the general direction of this conversation. The subject is vague and ambiguous words causing problems, there's a back-and-forth between several high-karma users, and I'm the first person to bring up "taboo the vague words and explain more precisely what you mean"?
Solution: Taboo the vague predicates and demand that the user explain more precisely what they mean.
Statements do often have ambiguities: there are a few different more-precise statements they could be interpreted to mean, and sometimes those more-precise statements have different truth values. But the solution is not to say that the ambiguous statement has an ambiguous truth value and therefore discard the idea of truth. The solution is to do your reasoning about the more-precise statements, and, if someone ever hands you ambiguous statements whose truth value is important, to say "Hey, please explain more precisely what you meant." Why would one do otherwise?
By the way:
colorless green ideas sleep furiously
There is a straightforward truth value here: there are no colorless green ideas, and therefore it is vacuously true that all of them sleep furiously.
Using the pronoun people ask you to use has become a proxy for all sorts of other tolerant/benevolent attitudes towards that person and the way they want to live their life, and to an even greater extent, refusing to do that is a proxy for thinking they should be ignored, or possibly reviled, or possibly killed.
There's an interesting mechanic here, a hyperstitious cascade. In certain educational environments, people are taught to use approved language with protected-class members. In that environment, anyone who uses forbidden language is, therefore, some kind of troublemaker. That then makes it somewhat less illegitimate for the most sensitive of those protected-class members to say they feel threatened when someone uses forbidden language. Which then makes it all the more important to teach people to use approved language, and have harsher enforcement on it. If this goes far enough, then we get to where one can make the case that unpunished usage of forbidden language constitutes a hostile environment, which would therefore drive out the protected classes and hence violate civil rights law.
I would expect that some amount of good safety research is of the form, "We tried several ways of persuading several leading AI models how to give accurate instructions for breeding antibiotic-resistant bacteria. Here are the ways that succeeded, here are some first-level workarounds, here's how we beat those workarounds...": in other words, stuff that would be dangerous to publish. In the most extreme cases, a mere title ("Telling the AI it's writing a play defeats all existing safety RLHF" or "Claude + Coverity finds zero-day RCE exploits in many codebases") could be dangerous.
That said, some large amount should be publishable, and 5 papers does seem low.
Though maybe they're not making an effort to distinguish what's safe to publish from what's not, and erring towards assuming the latter? (Maybe someone set a policy of "Before publishing any safety research, you have to get Important Person X to look through it and/or go through some big process to ensure publishing it is safe", and the individual researchers are consistently choosing "Meh, I have other work to do, I won't bother with that" and therefore not publishing?)
Any specific knowledge about colostrum? (Mildly surprised it hasn't been mentioned in the thread.) Do breastmilk banks usually supply that, and is it worthwhile?
The idea that ethical statements are anything more than "just expressions of emotion" is, to paraphrase Lucretius, "regarded by the common people as true, by the wise[1] as false, and by rulers as useful."
I figure you think the wise are correct. Well, then. Consider randomly selected paragraphs from Supreme Court justices' opinions. Or consider someone saying "I'd like to throw this guy in jail, but unfortunately, the evidence we have is not admissible in court, and the judicial precedent on rules of evidence is there for a reason—it limits the potential abusiveness of the police, and that's more important than occasionally letting a criminal off—so we have to let him go." Is that an ethical statement? And is it "just an expression of emotion"?
For the record, in an ethical context, when I say a behavior is bad, I mean that (a) an ethical person shouldn't do it (or at least should have an aversion to doing it—extreme circumstances might make it the best option) and (b) ethical people have license to punish it in some way, which, depending on the specifics, might range from "social disapproval" to "the force of the law".
Alarming and dangerous as this view may be, I'd be really surprised if literally everyone who had power ("in charge of anything important") also lacked the self-awareness to see it.
I think there are lots of people in power who are amoral, and this is indeed dangerous, and does indeed frequently lead to them harming people they rule over.
However, I don't think most of them become amoral by reading emotivist philosophy or by independently coming to the conclusion that ethical statements are "just expressions of emotion". What makes rulers frequently immoral? Some have hypothesized that there's an evolved response to higher social status, to become more psychopathic. Some have said that being psychopathic makes people more likely to succeed at the fight to become a ruler. It's also possible that they notice that, in their powerful position, they're unlikely to face consequences for bad things they do, and... they either motivatedly find reasons to drop their ethical principles, or never held them in the first place.
There's a philosophy called "emotivism" that seems to be along these lines. "Emotivism is a meta-ethical view that claims that ethical sentences do not express propositions but emotional attitudes."
I can see a couple of ways to read it (not having looked too closely). The first is "Everyone's ethical statements are actually just expressions of emotion. And, as we all know, emotions are frequently illogical and inappropriate to the situation. Therefore, everything anyone has ever said or will say about ethics is untrustworthy, and can reasonably be dismissed." This strikes me as alarming, and dangerous if any adherents were in charge of anything important.
The second reading is something like, "When humans implement ethical judgments—e.g. deciding that the thief deserves punishment—we make our emotions into whatever is appropriate to carry out the actions we've decided upon (e.g. anger towards the thief). Emotions are an output of the final judgment, and are always a necessary component of applying the judgment. However, the entire process leading up to the final judgment isn't necessarily emotional; we can try, and expect the best of us to usually succeed, at making that process conform to principles like logical consistency." That I would be on board with. But... that seems like a "well, duh" which I expect most people would agree with, and if that was what the emotivists meant, I don't see why they would express themselves the way they seem to.
I think a proper human morality somehow accounts for disgust having actually been an important part of how it was birthed.
I'm not sure if people maintain consistent distinctions between legal philosophy, ethics, and morality. But for whatever it is that governs our response to crimes, I think anger / desire-for-revenge is a more important part of it. Also the impulse to respond to threats ("Criminal on the streets! Who's he coming for next?"), which I guess is fear and/or anger.
Come to think of it, if I try to think of things that people declare "immoral" that seem to come from disgust rather than fear or anger, I think of restrictions on sexual behavior (e.g. homosexuality, promiscuity) and drugs, which I think the law shouldn't touch (except in forms where someone was injured nonconsensually, in which case revenge-anger comes into play). As emotions go, I think I'd distrust disgust more than the others.
The problem that you seem to be reaching for is a real one. You may find enlightening Leslie Lamport's "Buridan's principle":
A discrete decision based upon an input having a continuous range of values cannot be made within a bounded length of time.
The full paper discusses a similar situation:
Buridan’s Principle has appeared as a fundamental problem in computer design. In computing, a device that makes a discrete (usually binary) decision based upon a continuous input value is called an arbiter, and Buridan’s Principle is usually known as the Arbiter Problem [1].
...
If, as is usually the case, the peripheral device’s setting of the flag is not synchronized with the computer’s execution, then the computer’s binary decision is based upon an input having a continuous range of values. Buridan’s Principle asserts that the decision cannot be made in a bounded length of time. However, the computer must make that decision before beginning its next instruction, which generally happens in a fixed length of time.
The computer is thus trying to do something that is impossible. Just as the driver at the railroad crossing has a finite probability of being hit by the train, the computer has a finite probability of not making its decision in time. The physical manifestation of the computer’s indecision is that bad voltage levels are propagated. For example, if a 0 is represented by a zero voltage and a 1 is represented by +5 volts, then some wire might have a level of 2.5 volts. This leads to errors, because a 2.5 volt level could be interpreted as a 0 by some circuits and a 1 by others. The computer stops acting like a digital device and starts acting like a continuous (analog) one, with unpredictable results.
The Arbiter Problem is a classic example of Buridan’s Principle. The problem is not one of making the “right” decision, since it makes little difference if the interrupt is handled after the current instruction or after the following one; the problem is simply making a decision. The Arbiter Problem went unrecognized for a number of years because engineers did not believe that their binary circuit elements could ever produce “1/2’s”. The problem is solved in modern computers by allowing enough time for deciding so the probability of not reaching a decision soon enough is much smaller than the probability of other types of failure. For example, rather than deciding whether to interrupt execution after the current instruction, the computer can decide whether to interrupt it after the third succeeding instruction. With proper circuit design, the probability of not having reached a decision by time t is an exponentially decreasing function of t, so allowing a little extra time for the decision can make the probability of failure negligible.
...
Buridan’s Principle might lead one to suspect that a digital computer is an impossibility, since every step in its execution requires making discrete decisions within a fixed length of time. However, those decisions are normally based upon a discontinuous set of inputs. Whenever the value of a memory register is tested, each bit will be represented by a voltage whose value lies within two separate ranges—the range of values representing a zero or the range representing a one. Intermediate voltages are not possible because the register is never examined while it is in an intermediate state—for example, while a bit is changing from zero to one. The Arbiter Problem arises when the computer must interact asynchronously with an external device, since synchronization is required to prevent the computer from seeing an intermediate voltage level by reading a bit while the device is changing it. A similar problem occurs in interactions between the computer and its environment that require analog to digital conversion, such as video input.
The full paper is probably worth reading.
The paradox arises for people who lack a concept of "known unknowns" as distinct from "unknown unknowns". If our knowledge of x can only be in the state of "we know what x is and everything about it" or "we don't know anything about x and aren't even aware that anything like x exists", then the reasoning is all correct. However, for many things, that's a false binary: there are a lot of intermediate states between "zero knowledge of the concept of x" and "100% knowledge of x".
Yeah, learning by reading at home definitely has a huge effect in many cases. In Terence Tao's education, he was allowed to progress through multiple years of a subject per year (and to do so at different rates in different subjects), and since the classes he attended were normal ones, I think his academic progression must have been essentially determined by his ability to teach himself at home via textbooks. Unless perhaps they let him e.g. attend 7th grade science 2 days a week and 6th grade science the rest? I should learn more about his life.
The educational setup can also feed into the reading aspect. During my childhood, on a few occasions, I did explicitly think, "Well, I would like to read more of this math stuff (at home), but on the other hand, each thing I learn by reading at home is another thing I'll have to sit through the teacher telling me, being bored because I already know it", and actually decided to not read certain advanced math stuff because of that. (Years later, I changed my mind and chose to learn calculus from my sister's textbook around 8th grade—which did, in fact, cause me to be bored sitting through BC Calculus eventually.) This could, of course, be solved by letting kids easily skip past stuff by taking a test to prove they've already learned it.
Maybe with the objection that the time coefficient can be different for different school subjects, because some of them are more focused on understanding things, and others are more focused on memorizing things
Possibly. It's also the case that IQ is an aggregated measure of a set of cognitive subtests, and the underlying capabilities they measure can probably be factored out into things like working memory, spatial reasoning, etc., which are probably all correlated but imperfectly so; then if some of those are more useful for some subjects than others, you'll expect some variance in progression between subjects. And you certainly observe that the ultra-gifted kids, while generally above average at everything, are often significantly more ahead in math than in language, or vice versa (some of this is probably due to where they choose to spend their time, but I think a nonzero amount is innate advantage).
Among the various ways to take up the extra time of the rapid learners, probably the best one is "don't go faster, go wider".
The term of art, for doing this within a single subject, is "enrichment". And yeah, if you can do it, it fits nicely into schedules. "Taking more classes" is a more general approach. There are administrative obstacles to the latter: K-12 schools seem unlikely to permit a kid to skip half the sessions of one class so he can attend half the sessions of another class (and make up any gaps by reading the textbooks). Colleges are more likely to permit this by default, due to often not having attendance requirements, though one must beware of double-booking exams.
(Note: I am not saying that this is optimal for the rapid learner. The optimal thing for the rapid learner would be to... learn faster, obviously.
I think the best setup—can't find the citation—is believed to be "taking a class with equally gifted children of the same age, paced for them". If you don't have that, then skipping grades (ideally per-subject) would address knowledge gaps; taking a class paced for at least somewhat gifted kids (possibly called an "advanced" class, or a class at a high-tier college) would partly address the learning speed gap, and enrichment would also address the learning speed gap, to a variable extent depending on the details.
A more realistic example would be a math textbook, where each chapter is followed by exercises, some of them marked as "optional, too difficult"
A specific way of doing this, which I think would be good for education to move towards, is to have a programming component: have some of those optional exercises be "Write programs to implement the concepts from this chapter".
But if I had to guess, I would guess that the gifted kids who stay within the confines of school will probably lose most of their advantage, and the ones who focus on something else (competitions, books, online courses, personal projects) will probably keep it.
Subjects Not Permitted Acceleration. [...] With few exceptions, they have very jaded views of their education. Two dropped out of high school and a number have dropped out of university. Several more have had ongoing difficulties at university, not because of lack of ability but because they have found it difficult to commit to undergraduate study that is less than stimulating. These young people had consoled themselves through the wilderness years of undemanding and repetitive school curriculum with the promise that university would be different—exciting, intellectually rigorous, vibrant—and when it was not, as the first year of university often is not, it seemed to be the last straw.
Some have begun to seriously doubt that they are, indeed, highly gifted. The impostor syndrome is readily validated with gifted students if they are given only work that does not require them to strive for success. It is difficult to maintain the belief that one can meet and overcome challenges if one never has the opportunity to test oneself.
Versus:
Young People Who Have [skipped 3 or more grades by the end of high school]. [...] In every case, these young people have experienced positive short-term and long-term academic and socioaffective outcomes. The pressure to underachieve for peer acceptance lessened significantly or disappeared after the first acceleration. Despite being some years younger than their classmates, the majority topped their state in specific academic subjects, won prestigious academic prizes, or represented their country or state in Math, Physics, or Chemistry Olympiads. The majority entered college between ages 11 and 15. Several won scholarships to attend prestigious universities in Australia or overseas. All have graduated with extremely high grades and, in most cases, university prizes for exemplary achievement. All 17 are characterized by a passionate love of learning and almost all have gone on to obtain their Ph.D.s.
Though one could say this is more of an attitude and habit and "ever bothered to figure out study skills" thing, than a "you've permanently lost your advantage" thing. If you took one of those jaded dropouts (of 160+ IQ) and, at age 30, threw them into a job where they had to do some serious and challenging scientific work... There's a chance that their attitude and habits would make them fail and get fired within the first few months, that chance depending on how severe and how ingrained they are. But if they did ok enough to not get fired, then I expect that, within a year, they would be pulling ahead of a hypothetical 120 IQ counterpart for whom everything had gone great and who started with slightly more knowledge.
I'll give a citation on learning speed to show the extent of the problem, at least in early years (bold added):
Observation and investigation prove that in the matter of their intellectual work these children are customarily wasting much time in the elementary schools. We know from measurements made over a three-year period that a child of 140 IQ can master all the mental work provided in the elementary school, as established, in half the time allowed him. Therefore, one-half the time which he spends at school could be utilized in doing something more than the curriculum calls for. A child of 170 IQ can do all the studies that are at present required of him, with top "marks," in about one-fourth the time he is compelled to spend at school. What, then, are these pupils doing in the ordinary school setup while the teacher teaches the other children who need the lessons?
No exhaustive discussion of time-wasting can be undertaken here, except to say briefly that these exceptional pupils are running errands, idling, engaging in "busy work," or devising childish tasks of their own, such as learning to read backward—since they can already read forward very fluently. Many are the devices invented by busy teachers to "take up" the extra time of these rapid learners, but few of these devices have the appropriate character that can be built only on psychological insight into the nature and the needs of gifted children.
Note that this is above what the mere "intelligence quotient" seems to predict—i.e. at a given chronological age, IQ 140 children have 1.4x the mental age of those at IQ 100, and IQ 170 have 1.7x that mental age. So why would you get 2x and 4x learning speeds respectively?
One guess is that, at least within this elementary-school age range, higher mental age also means they've figured out better strategies for paying attention, noticing when you've understood something vs when you need to go back and re-read, etc.—which other kids may eventually figure out too. Another guess is that, for at least some kids, it seems that one of the things that manifests as higher intelligence is an increased need for intellectual stimulation; so in general, it may be that the higher-IQ kids are more naturally inclined to pay attention when the teacher is saying new things, and more inclined to read the textbook and think back to it, so there's less need for self-control, and so they're less handicapped by the lack of it in early years.
I don't know how far up in age the above extends. I do expect the lower-IQ learning rate to catch up somewhat in later years, perhaps bringing the difference down to the IQ ratio. (The learning-rate difference certainly doesn't disappear, though; I believe it's common that if an exceptionally gifted kid gets accelerated into college classes with adults of equal knowledge—several of whom are probably moderately gifted—she'll still finish at the top of her class.)
There's also a failure mode of focusing on "which arguments are the best" instead of "what is actually true". I don't understand this failure mode very well, except that I've seen myself and others fall into it. Falling into it looks like focusing a lot on specific arguments, and spending a lot of time working out what was meant by the words, rather than feeling comfortable adjusting arguments to fit better into your own ontology and to fit better with your own beliefs.
The most obvious way of addressing this, "just feel more comfortable adjusting arguments to fit better into your own ontology and to fit better with your own beliefs", has its own failure mode, where you end up attacking a strawman that you think is a better argument than what they made, defeating it, and thinking you've solved the issue when you haven't. People have complained about this failure mode of steelmanning a couple of times. At a fixed level of knowledge and thought about the subject, it seems one can only trade off one danger against the other.
However, if you're planning to increase your knowledge and time-spent-thinking about the subject, then during that time it's better to focus on the ideas than on who-said-or-meant-what; the latter is instrumentally useful as a source of ideas.
So, there’s no need to worry about the dictionary police enforcing how you should use a word, but understanding how “acceptance” is commonly used and comparing them to definitions found in common advice related to “acceptance” might help us
I see you practice acceptance. ;-)
There was also a character, Kotomine Kirei, who was brought up with good ethics and tried to follow them, but ultimately realized that the only thing that pleased him was causing other people pain... and there's an alternate universe work in which he runs a shop that sells insanely spicy mapo tofu. I suppose he could have gotten into the BDSM business as well. Drill sergeant? Interrogator? (That might not work, but there probably would be people who thought it did.)
I dropped out after 10th grade. I messed around at home, doing some math and programming, for ~6 years, then started working programming jobs at age 22 (nearly 10 years ago). I'd say results were decent.
A friend of mine dropped out after 11th grade. He has gone back and forth between messing around (to some extent with math and programming; in later years, with meditation) and working programming jobs, and I think is currently doing well with such a job. Probably also decent.
(And neither of us went to college, although I think my friend may have audited some classes.)
they use a relative ELO system
ELO itself is a relative system, defined by "If [your rating] - [their rating] is X, then we can compute your expected score [where win=1, draw=0.5, loss=0] as a function of X (specifically )."
that is detached from the FIDE ELO
Looking at the Wiki, one of the complaints is actually that, as the population of rated human players changes, the meaning of a given rating may change. If you could time-teleport an ELO 2400 player from 1950 into today, they might be significantly different from today's ELO 2400 players. Whereas if you have a copy of Version N of a given chess engine, and you're consistent about the time (or, I guess, machine cycles or instructions executed or something) that you allow it, then it will perform at the same level eternally. Now, that being the case, if you want to keep the predictions of "how do these fare against humans" up to date, you do want to periodically take a certain chess engine (or maybe several) and have a bunch of humans play against it to reestablish the correspondence.
Also, I'm sure that the underlying model with ELO isn't exactly correct. It asserts that, if player A beats player B 64% of the time, and player B beats player C 64% of the time, then player A must beat player C 76% of the time; and if we throw D into the mix, who C beats 64% of the time, then A and B must beat D 85% and 76% of the time, respectively. It would be a miracle if that turned out to be exactly and always true in practice. So it's more of a kludge that's meant to work "well enough".
... Actually, as I read more, the underlying validity of the ELO model does seem like a serious problem. Apparently FIDE rules say that any rating difference exceeding 400 (91% chance of victory) is to be treated as a difference of 400. So even among humans in practice, the model is acknowledged to break down.
This is expensive to calculate
Far less expensive to make computers play 100 games than to make humans play 100 games. Unless you're using a supercomputer. Which is a valid choice, but it probably makes more sense in most cases to focus on chess engines that run on your laptop, and maybe do a few tests against supercomputers at the end if you feel like it.
and the error bar likely increases as you use more intermediate engines.
It does, though to what degree depends on what the errors are like. If you're talking about uncorrelated errors due to measurement noise, then adding up N errors of the same size (i.e. standard deviation) would give you an error of √N times that size. And if you want to lower the error, you can always run more games.
However, if there are correlated errors, due to substantial underlying wrongness of the Elo model (or of its application to this scenario), then the total error may get pretty big. ... I found a thread talking about FIDE rating vs human online chess ratings, wherein it seems that 1 online chess ELO point (from a weighted average of online classical and blitz ratings) = 0.86 FIDE ELO points, which would imply that e.g. if you beat someone 64% of the time in FIDE tournaments, then you'd beat them 66% of the time in online chess. I think tournaments tend to give players more time to think, which tends to lead to more draws, so that makes some sense...
But it also raises possibilities like, "Perhaps computers make mistakes in different ways"—actually, this is certainly true; a paper (which was attempting to correspond FIDE to CCRL ratings by analyzing the frequency and severity of mistakes, which is one dimension of chess expertise) indicates that the expected mistakes humans make are about 2x as bad as those chess engines make at similar rating levels. Anyway, it seems plausible that that would lead to different ... mechanics.
Here are the problems with computer chess ELO ratings that Wiki talks about. Some come from the drawishness of high-level play, which is also felt at high-level human play:
Human–computer chess matches between 1997 (Deep Blue versus Garry Kasparov) and 2006 demonstrated that chess computers are capable of defeating even the strongest human players. However, chess engine ratings are difficult to quantify, due to variable factors such as the time control and the hardware the program runs on, and also the fact that chess is not a fair game. The existence and magnitude of the first-move advantage in chess becomes very important at the computer level. Beyond some skill threshold, an engine with White should be able to force a draw on demand from the starting position even against perfect play, simply because White begins with too big an advantage to lose compared to the small magnitude of the errors it is likely to make. Consequently, such an engine is more or less guaranteed to score at least 25% even against perfect play. Differences in skill beyond a certain point could only be picked up if one does not begin from the usual starting position, but instead chooses a starting position that is only barely not lost for one side. Because of these factors, ratings depend on pairings and the openings selected.[48] Published engine rating lists such as CCRL are based on engine-only games on standard hardware configurations and are not directly comparable to FIDE ratings.
It probably takes years before a tumor grows big enough for normal methods to detect it.
There exist fast-growing cancers. I figure that if the fungi theory is correct, then probably a good amount of this is caused by the specific fungus (and perhaps what part of the body that fungus targets), and most of the rest comes from the target's immune system (not sure what else would contribute significantly). If transmission and mild infections are common, and if, say, 1% of cancers are fast-growing, I feel like there should be lots of cases where an immunocompromised person picks up a fast-growing cancer fungus at a hospital or something and, within a few years, gets diagnosable cancer. Enough that it should have been noticed. I don't have numbers for this, but that's my suspicion.
Or, for example... How often do couples get the same type of cancer? I found this:
METHODS
The authors identified 25,670 cancer-free married couples in northern California who were followed for up to 31 years for the development of cancer. In Cox proportional hazards analysis, the development of cancer in a spouse was treated as a time-dependent, independent variable, and spouse-with/spouse-without risk ratios were determined, controlling for age and gender. For selected concordant espoused pairs, additional explanatory information was sought in their medical records.
RESULTS
There was no excess concordance for all cancers combined; the spouse-with/spouse-without risk ratio was 0.97 (95% confidence interval, 0.90–1.05). Statistically significant husband-wife associations were found only for cancer of the tongue and stomach and for non-Hodgkin lymphoma. Except for cancer of the penis/endometrium and testis/vulva, based on one couple with each combination, gender specific cancers did not aggregate within married couples. Established and suspected risk factors, not necessarily related to the marriage, were found for some individuals who had concordance with their spouses.
CONCLUSIONS
Little spousal concordance for cancer occurrence was found. The study of spousal aggregation does not appear useful in identifying unsuspected environmental causes of cancer in heterogeneous populations in urban areas of affluent Western countries. A cohort study would have to be much larger than this one to detect weak spousal concordance reliably..
Also, for whatever Claude's opinion is worth:
Q: How often do couples get the same type of cancer?
While it's not extremely common for couples to get the same type of cancer, it does happen occasionally. This phenomenon has been studied, and there are several factors to consider:
- Shared environmental factors: Couples often share the same living environment, diet, and lifestyle habits, which can expose them to similar cancer risk factors.
- Similar behaviors: Shared behaviors like smoking, alcohol consumption, or sun exposure can increase risk for certain cancers in both partners.
- Infectious agents: Some cancers are caused by infectious agents (like HPV for cervical cancer), which can be transmitted between partners.
- Age-related risks: As couples age together, they may face similar age-related cancer risks.
- Genetic factors: While not directly shared between couples, people might choose partners with similar genetic backgrounds, potentially influencing cancer risk.
- Coincidence: Given the prevalence of cancer, some couples will develop the same cancer by chance.
- Screening effect: When one partner is diagnosed, the other may be more likely to get screened, potentially leading to a diagnosis of a cancer that might have otherwise gone undetected.
Studies on this topic have shown:
- A slight increase in cancer risk for partners of cancer patients, but this varies by cancer type.
- Higher correlations for smoking-related cancers, suggesting shared lifestyle factors play a role.
- Increased risk for cancers with infectious causes, like stomach cancer (H. pylori) or liver cancer (hepatitis viruses).
It's important to note that while interesting, these occurrences are not common enough to be considered a significant public health concern.
My first significant thought (which came up a bit in the AIs' output) is that it would seem that, if fungi cause cancer, then the fungi would at least sometimes be transmitted from one person to another, and if you weren't aware of the fungi, then this would look like cancer being transmitted from one to the other. Yet I think this has basically never been observed.[1]
One could try supposing that each fungus is only rarely able to infect people—only the few individuals that are unusually vulnerable to it. But, well. I imagine that would generally include anyone whose immune system is crippled. Surely there have been enough cases of people with cancer next to old, immunocompromised people in a hospital, with sufficient mistakes in hygiene that the one would have infected the other. Maybe there are additional requirements for an individual to be infected (the fungus has a favorite temperature? Acidity? Salt level?)... but even taking that into account, I think there should have been enough cases that we would have noticed. (If the chance of an individual being infectable by a given fungus is so low that we never see transmission, then how is it that, er, 1/6th of all deaths are caused by cancer? There would have to be zillions of different fungi, each of which is able to infect only a tiny number of people... which surely would have led to natural selection for much better infectivity by now?)
Incidentally, I think it is known that there are some viruses (like HPV) that cause (or greatly heighten the risk of) cancer. It's plausible that fungi play a significantly larger role of this type than people realize. But for it to be the primary cause seems implausible.
The strongest evidence is that they found cancers that seem to have no mutations.
This seems worth digging into.
- ^
There are a few cases of cancers where it's known that the actual cancer cells themselves go from organism to organism: https://en.wikipedia.org/wiki/Clonally_transmissible_cancer
I note that the date on the tweet is June 28, the day after the Trump-Biden debate. It mentions the office of President and concludes with arguments in favor of having an 80-year-old president with no serious arguments against.
I give 95% odds the tweet was causally downstream of the debate: either directly inspired, or a result of arguing with people about old age and cognitive decline, for whom the subject came up because of the debate.
I'm not entirely sure if, or how, it was meant to be a comment on the debate. It could be that he wanted to downplay any perceived downsides to having his favored candidate as president. It could be that the topic was in the air and he had a take and wrote it up. (It is in-character for him to double down on his contrarian take, expressing doubts about an entire field of research in the process—for all that he's said about "used to be extremely confident". [To be sure, his criticisms are often valid, but going from "here are valid criticisms of his opponents" to "the opposite of what his opponents believe is the truth", or "you should throw out everything you thought you believed and just listen to the arguments he's making", is a risky leap.]) Who knows.
On the subject itself, he mentions slow memory recall, but not faulty memory, nor dementia, Alzheimer's, Parkinson's, and so on. Those are real and substantial problems. Perhaps a more rigorous thesis would be "Old people who are not affected by dementia, Alzheimer's, etc., including any subclinical levels thereof, can be intellectually very strong", and one can think of examples bearing this out. However, it's also true that the background probability of an 80-year-old having one or more of these issues is fairly high.
In context, I think "average people" means "the individual consumers paying $20/month to OpenAI". Who do you mean by "many"? I doubt that posters at /r/localllama are representative of "average people" by the above definition.