Posts
Comments
Note: I wrote my comment while reading as notes to see what I thought of your arguments while reading more than as a polished thing.
I think your calibration on the 'slow scenario' is off. What you claim is the slowest plausible one is fairly clearly the median scenario given that it is pretty much just following current trends, and slower than present trend is clearly plausible. Things already slowed way down, with advancements in very narrow areas being the only real change. There is a reason that OpenAI hasn't dared even name something GPT 5, for instance. Even 03 isn't really an improvement on general llm duties and that is the 'exciting' new thing, as you pretty much say.
Advancement is disappointingly slow in AI that I personally use (mostly image generation, where new larger models are often not really better overall for the past year or so, and newer ones mostly use llm style architectures), for instance, and it is plausible that there will be barely any movement in terms of clear quality improvement in general uses over the next couple years. And image generation should be easier to improve than general llms because it should be earlier in the diminishing returns of scale (as the scale is much smaller). Note that since most are also diffusion models, they are already using an image equivalent of the trick o1 and o3 introduced with what I would argue is effectively chain of thought. For some reason, all the advancements I hear about these days seem like uninspired copies of things that already happened in image generation.
The one exception is 'agents' but those show no signs of present day usefulness. Who knows how quickly such things will become useful, but historical trends on new tech, especially in AI, say 'not soon' for real use. A lot of people and companies are very interested in the idea for obvious reasons, but that doesn't mean it will be fast. See also self-driving cars which has taken many times longer than expected, despite seeming like it is probably a success story in the making (for the distant future). In fact, self-driving cars are the real world equivalent of a narrow agent, and the insane difficulty they are having is strong evidence against agents being a transformatively useful thing soon.
I do think that AI as it currently is will have a transformative impact in the near term for certain activities (image generation for non-artists like me is already one of them), but I think the smartphone comparison is a good one; I still don't bother to use a smartphone (though it has many significant uses). I would be surprised if it had as big an impact as the worldwide web has on a year for year basis counting from the beginning of the www (supposedly in 1989) for that and 2014 when transformers were invented (or even 2018 when GPT1 became a thing) for AI, for instance. I like the comparison to the web because I think that AI going especially well would be a change to our information capacities similar to an internet 3.0. (Assuming you count the web as 2.0).
As to the fast scenario, that does seem like the fastest scenario that isn't completely ridiculous, but think that your belief in its probability is dramatically too high. I do agree that if you believe that self-play (in the AlphaGo sense) to generate good data is doable for poorly definable problems that would alleviate the lack of data issues we suffer in large parts of the space, but it is unlikely that would actually improve the quality of the data in the near term, and there are already a lot of data quality issues. I personally do not believe that o1 and o3 have at all 'shown' that synthetic data is a solved issue, and it wouldn't be for quite a while if ever.
Note that the image generation models already have been using synthetic data by teachers for a while now with 'SDXL Turbo' and other later adversarial distillation schemes. This did manage a several times speed boost, but at a cost of some quality, as all such schemes do. Crucially, no one has managed to increase quality this way, because the 'teacher' provides a maximum quality level you can't go beyond (except by pure luck).
Speculatively, you could perhaps improve quality by having a third model selecting the absolute best outputs of the teacher and only training on those until you have something better than the teacher, and then switching 'better than the teacher' into teacher and automatically start training a new student (or perhaps retraining the old teacher?). The problem is, how do you get that selection model that is actually better than the things you are trying to improve in its own self-play style learning rather than just getting them to fit the static model of a good output? Human data creation cannot be replaced in general without massive advancements in the field. You might be able to switch human data generation to just training the selection model though.
In some areas, you could perhaps train the AI directly on automatically generated data from sensors in the real world, but that seems like it would reduce the speed of progress to that of the real world unless you have that exponential increase in sensor data instead.
I do agree that in a fast scenario, it would clearly be algorithmic improvements rather than scale leading to it.
Also, o1 and o3 are only 'better' because of a willingness to use immensely more compute in the inference stage, and given that people already can't afford them, that route seems like a it will be played out after not too many generations of scaling, especially since hardware is improving so slowly these days. Chain of thought should probably be largely replaced with something more like what image generation models currently use where each step iterates on the current results. These could be combined together of course.
Diffusion models make a latent picture of a bunch of different areas, and each of those influences each other area in the future, so in text generation you could analogously have a chain of thought that is used in its entirety to create a new chain of thought. For example, you could use a ten deep chain of thought being used to create another ten deep chain of thought nine times instead of a hundred different options (with the first ten being generated by just the input of course). If you're crazy, it could literally be exponential, where you generate one for the first step, two in the second... 32 in the fifth, and so on.
"Identifying The Requirements for a Short Timeline"
I think you are missing an interesting way to tell if AI is accelerating AI research. A lot of normal research is eventually integrated into the next generation of products. If AI really was accelerating the process, you would see the integrations happening much more quickly, with a shorter lag time between 'new idea first published' and 'new idea integrated into a fully formed product' that is actually good. A human might take several months to test the idea, but if an AI could do the research, it could also replicate the other research incredibly quickly, and see how it works when combined with the other research.
(Ran out of steam when my computer crashed during the above paragraph, though I don't seem to have lost any of what I wrote since I do it in notepad.)
I would say the best way to tell you are in a shorter timeline is if it seems like gains from each advancement start broadening rather than narrowing. If each advancement applies narrowly, you need a truly absurd number of advancements, but if they are broad, far fewer.
Honestly, I see very little likelihood of what I consider AGI in the next couple decades at least (at least if you want it to have surpassed humanity), and if we don't break out of the current paradigm, not for much, much longer than that, if ever. You do have some interesting points, and seem reasonable, but I really can't agree with the idea that we are at all close to it. Also, your fast scenario seems more like it would be 20 years than 4. 4 years isn't the 'fast' scenario, it is the 'miracle' scenario. The 'slow scenario' reads like 'this might be the work of centuries, or maybe half of one if we are lucky'. The strong disagreement on how long these scenarios would take is because the point we are at now is far, far below what you seem to believe. We aren't even vaguely close.
As far as your writing goes, I think it was fairly well written structurally and was somewhat interesting, and I even agree that large parts of the 'fast' scenario as you laid it out make sense, but since you are wrong about the amount of time to associate with the scenarios, the overall analysis is very far off. I did find it to be worth my time to read.
Apparently the very subject coming up led to me writing a few paragraphs about the problems of a land value tax before I even started reading it. (A fraction of the things in parenthesis were put in later to elaborate a point.)
There's nothing wrong with replacing current property taxes with equivalent (dollar value) taxes that only apply to the value of the land itself (this would be good to avoid penalizing improving your own land), but the land value tax (aka Georgeism) is awful because of what its proponents want to do with it. Effectively, they want to confiscate literally all of the value of people's land. This is hugely distortionary versus other forms of property, but that isn't even the real problem.
The real problem is that people don't want to live lives where they literally can't own their own home, can't plan for where they will live in the future, and can be kicked out at any time easily just because someone likes the idea of claiming their land became valuable. This turns all homeowners into renters. There are in many places (such as California) very distortionary laws on property tax because people hate the idea of their land being confiscated through taxes. There are also very many laws on making it hard to kick out renters because renters hate being being kicked out too. Stability is important and many people currently pay massive premiums to not rent (including taking out massive 30 year loans on a place where they often don't even plan to live for half that long). Also, being forced to move at an inconvenient time is very expensive and has a hell of a lot of deadweight loss both economically and personally. (People also hate eminent domain.)
Of lesser but not no importance is fact that their taxes can go up to ridiculously high levels just because someone else built something valuable nearby will cause present day nimbyism to look very nice and kind (though it is kind of funny that people will likely switch what kind of nimbyism they support as well).
So, does your post bring up what I think are the problems with an land value tax? Yes, though I somewhat disagree with some of the emphasis.
Searching for new uses of land is pretty important especially over time, but we don't need new uses for things to work right now whereas the disruptions to people's lives would make things unworkable right now, and a lot of searching for new uses of land is done by people who do not currently own said land.
Implicitly taxing improvements to nearby land is obviously related to my point on nimbyism so we agree there, though it is interesting to note that seem to prefer talking about the internal version why I mostly reference the political version. The internal issue could obviously be 'fixed' by simply consolidating lots, but the political cannot without completely destroying the idea of individual ownership.
Your statements about the tax base narrowing issue is largely correct, but I would like to emphasize a different point of agreement that supporters seem to see it as simple, elegant, and easy but each patch makes it more complicated, kludgy, and difficult. I think that the very idea of evaluating the value of the land itself as separate from improvements actually starts out pretty difficult, so the increase in difficulty is a huge problem. Any incorrect overvaluation makes land worse than useless under this system! A system where a lot of land is useless very obviously leads to a lot of unused valuable land... which is exactly why people are currently complaining about speculation in land, but worse! (This is also deadweight loss.)
You don't go far enough when decrying the effect on people's confidence in the government, because it isn't just a confidence thing. Confiscating people's property without extremely good cause is one of the primary signs of living in a country that's either dirt poor, or right about to be, and will definitely stay that way (except in some rare cases where it only leads to massive famines, death, and stagnation rather than becoming poorer monetarily at first). It is also massively immoral.
The problem with your section on disrupting long-term plans is that you emphasize only the immediate problem of the transition, but don't mention that it also prevents the creation of new long term plans for a stable life unless you are willing to live a very bad life compared to how you could otherwise live. A full land value tax is therefore extremely dystopian.
Luckily, the proponents aren't currently finding much luck in their desired tax actually happening, and I hope it stays that way.
Math is definitely just a language. It is a combination of symbols and a grammar about how they go together. It's what you come up with when you maximally abstract away the real world, and the part about not needing any grounding was specifically about abstract math, where there is no real world.
Verifiable is obviously important for training (since we could give effectively infinite training data), but the reason it is verifiable so easily is because it doesn't rely on the world. Also, note that programming languages are also just that, languages (and quite simple ones) but abstract math is even less dependent on the real world than programming.
Math is just a language (a very simple one, in fact). Thus, abstract math is right in the wheelhouse for something made for language. Large Language Models are called that for a reason, and abstract math doesn't rely on the world itself, just the language of math. LLMs lack grounding, but abstract math doesn't require it at all. It seems more surprising how badly LLMs did math, not that they made progress. (Admittedly, if you actually mean ten years ago, that's before LLMs were really a thing. The primary mechanism that distinguishes the transformer was only barely invented then.)
For something to be a betrayal does not require knowing the intent of the person doing it, and is not necessarily modified if you do. I already brought up the fact that it would be perfectly fine if they had asked permission, it is in the not asking permission to alter the agreed upon course where the betrayal comes in. Saying 'I will do x' is not implicitly asking for permission at all, it is a statement of intent, that disregards entirely that there was even an agreement at all.
'what made A experience this as a betrayal' is the fact that it was. It really is that simple. You could perhaps object that it is strange to experience vicarious betrayal, but since it sounds like the four of you were a team, it isn't even that. This is a very minor betrayal, but if someone were to even minorly betray my family, for instance, I would automatically feel betrayed myself, and would not trust that person anymore even if the family member doesn't actually mind what they did.
Analogy time (well, another one), 'what makes me experience being cold' can be that I'm not generating enough heat for some personal reason, or it can just be the fact that I am outside in -20 degree weather. If they had experienced betrayal with the person asking for permission to do a move that was better for the group, that would be the former, but this is the latter. Now, it obviously can be both where a person who is bad at generating heat is outside when it is -20 degrees. (This is how what you are saying actually happened works out in this scenario.)
From what I've seen of how 'betrayal' is used, your definition is incorrect. (As far as I can tell) In general use, going against your agreement with another person is obviously betrayal in the sense of acting against their trust in you and reliance upon you, even if the intent is not bad. This is true even if the results are expected to be good. So far as I know we do not have distinguishing words between 'betrayal with bad motives' and 'betrayal with good motives'.
Another analogy, if a financial advisor embezzled your money because they saw a good opportunity, were right, and actually gave you your capital back along with most (or even all) of the gain before they were caught, that is still embezzling your money, which is a betrayal. Since they showed good intentions by giving it back before being caught, some people would forgive them when it was revealed, but it would still be a betrayal, and other people need not think this is okay even if you personally forgive it. Announcing the course of action instead of asking permission is a huge deal, even if the announcement is before actually doing it.
You can have a relationship where either party is believed to be so attuned to the needs and desires of the other person that they are free to act against the agreement and have it not be betrayal, but that is hardly normal. If your agreement had included, explicitly or through long history, 'or whatever else you think is best' then it wouldn't be a betrayal, but lacking that, it is. Alternately, you could simply announce to the group beforehand that you want people to use their best judgment on what to do rather than follow agreements with you. (This only works if everyone involved remembers that though.) The fact is that people have to rely on agreements and act based upon them, and if they aren't followed, there is little basis for cooperation with anyone whose interests don't exactly coincide. As you note, their objection was not to the course of action itself.
The damning part isn't the fact that they thought there was a new course of action that was better and wanted to do it (very few people object to thinking a new course of action is better if you are willing to follow the agreement assuming the other person doesn't agree), it was the not asking and the not understanding which both show a lack of trustworthiness and respect for agreements. This need not be a thing that has happened before, or that is considered super likely to occur again for it to be reasonable for another party to state that they hate such things, which one of the things being communicated. One thing objecting here does is tell the person 'you are not allowed to violate agreements with me without my permission.'
Also, they may be trying to teach the violator, as it is often the case that people try to teach morality, which may be why so much of philosophy is morality discussions. (Though I don't actually know how big of a factor that is.)
If there had been a reason they couldn't ask, then it would make more sense to do the seemingly better thing and ask for their approval after the fact. This is often true in emergencies, for instance, but also in times of extreme stress. Your friend wouldn't feel like it was a betrayal if the other person had instead gone to bathroom and never came back because they got a call that their best friend had just been hit by a car and they didn't think to tell people before leaving. If, on the other hand, the person acted unable to understand why they should explain themselves later, or that it wouldn't have been better if they had remembered to do so, that would be bizarre.
I do agree that considering the hypothesis that they may have experienced serious betrayal is useful (it is unfortunately common), which is why I think asking about it was potentially a good idea despite being potentially very awkward to bring up, but I think it is important not to commit to a theory to degrees beyond what is necessary.
I also agree that feeling understood is very important to people. From what I can tell, one of the primary reasons people don't bother to explain themselves is that they don't think the other person would understand anyway no matter how much they explained, with the others being that they wouldn't care or would use it against them.
Obviously, translating between different perspectives is often a very valuable thing to do. While there are a lot of disagreements that are values based, very often people are okay with the other party holding different values as long as they are still a good partner, and failure to communicate really is just failure to communicate.
I dislike the assumption that 'B' was reacting that way due to past betrayal. Maybe they were, maybe they weren't (I do see that 'B' confirmed it for you in a reaction to another comment, but making such assumptions is still a bad idea), but there doesn't have to be any past betrayal to object to betrayal in the present; people don't need to have ever been betrayed in the past to be against it as a matter of principle. They only need to have known that betrayal is a thing that exists, and they would probably be more upset even if they were somehow just learning of it at the time it happened. Leave out the parts that are unnecessary to the perspective. The more you assume, the more likely you are to fail to do it properly. You can ask the person if for some reason it seems important to know whether or not there is such a reason behind it, but you can't simply assume (if you care to be right). I personally find such assumptions about motives to be very insulting.
I personally find the idea that 'A' could not know what they were doing was clearly betrayal to be incomprehensible since people have heard countless stories of people objecting to altering things in this manner; this is not an uncommon case. Even if this person believes that consequences are the important part, there is no possible way to go through life without hearing people objecting countless times to unilaterally altering deals no matter what the altering party thinks of the consequences for the party they were agreeing with. This is similar to the fact that I don't understand why someone would want to go to a loud party with thumping bass and strobing lights, but I've heard enough stories to know that a lot of people genuinely do. I can say that such parties are bad because they are noisy and it is hard to see, but it is impossible not to know that some people disagree on those being bad for a party.
If someone only cared about the consequences of the action the agreement as judged by the other party, the agreement would have been to the criteria rather than the action. There is no need for an agreement at all if the criteria is just 'do whatever you think is best' (though the course of action may still be discussed of course). Also, it is quite easy to ask permission to alter an agreement whenever it seems simultaneously advantageous to all parties, and conditions where they can simply deviate can also be agreed upon in advance. The failure to ask permission should be seen as them refusing to think their agreements actually mean something, (especially when they don't immediately understand the objection), which makes for a very bad/unreliable partner. Additionally, asking if the other party thinks the new move is better gives an additional check on whether you came to the right conclusion on you evaluation.
I would strongly caution against assuming mindreading is correct. I think it is important to keep in mind that you don't know whether or not you've successfully inhabited the other other viewpoint. Stay open to the idea that the pieces got fit together wrong. Of course, it becomes easier in cases like the second one where you can just ask 'do you mean moon cheese?' (In that particular case, even the question itself might be enough to clue in the other party of the shape of the disagreement.)
When 'D' does agree that you're right, and 'C' still doesn't really get it, I suppose you now need to do the whole procedure again if it is worth continuing the discussion. You are correct to note it often isn't.
You might believe that the distinctions I make are idiosyncratic, though the meanings are in fact clearly distinct in ordinary usage, but I clearly do not agree with your misleading use of what people would be lead to think are my words and you should take care to not conflate things. You want people to precisely match your own qualifiers in cases where that causes no difference in the meaning of what is said (which makes enough sense), but will directly object to people pointing out a clear miscommunication of yours because you do not care about a difference in meaning. And you are continually asking me to give in on language regardless of how correct I may be while claiming it is better to privilege. That is not a useful approach.
(I take no particular position on physicalism at all.) Since you are a not a panpsychist, you would likely believe that consciousness is not common to the vast majority of things. That means the basic prior for if an item is conscious is, 'almost certainly not' unless we have already updated it based on other information. Under what reference class or mechanism should we be more concerned about the consciousness of an LLM than an ordinary computer running ordinary programs? There is nothing that seems particularly likely to lead to consciousness in its operating principles.
There are many people, including the original poster of course, trying to use behavioral evidence to get around that, so I pointed out how weak that evidence is.
An important distinction you seem to not see in my writing (whether because I wrote unclearly or you missed it doesn't really matter) is that when I speak of knowing the mechanisms by which an llm works is that I mean something very fundamental. We know these two things: 1)exactly what mechanisms are used in order to do the operations involved in executing the program (physically on the computer and mathematically) and 2) the exact mechanisms through which we determine which operations to perform.
As you seem to know, LLMs are actually extremely simple programs of extremely large matrices with values chosen by the very basic system of gradient descent. Nothing about gradient descent is especially interesting from a consciousness point of view. It's basically a massive use very simplified ODE solvers in a chain, which are extremely well understood and clearly have no consciousness at all if anything mathematical doesn't. It could also be viewed as just a very large number of variables in a massive but simple statistical regression. Notably, if gradient descent were related to consciousness directly, we would still have no reason to believe that an LLM doing inference rather than training would be conscious. Simple matrix math also doesn't seem like much of a candidate for consciousness either.
Someone trying to make the case for consciousness would thus need to think it likely that one of the other mechanisms in LLMs are related to consciousness, but LLMs are actually missing a great many mechanisms that would enable things like self-reflection and awareness (including a number that were included in primitive earlier neural networks such as recursion and internal loops). The people trying to make up for those omissions do a number of things to attempt to recreate it (with 'attention' being the built-in one, but also things like adding in the use of previous outputs), but those very simple approaches don't seem like likely candidates for consciousness (to me).
Thus, it remains extremely unlikely that an LLM is conscious.
When you say we don't know what mechanisms are used, you seem to be talking about not understanding a completely different thing than I am saying we understand. We don't understand exactly what each weight means (except in some rare cases that some researchers have seemingly figured out) and why it was chosen to be that rather than any number of other values that would work out similarly, but that is most likely unimportant to my point about mechanisms. This is, as far as I can tell, an actual ambiguity in the meaning of 'mechanism' that we can be talking about completely different levels at which mechanisms could operate, and I am talking about the very lowest ones.
Note that I do not usually make a claim about the mechanisms underlying consciousness in general except that it is unlikely to be these extremely basic physical and mathematical ones. I genuinely do not believe that we know enough about consciousness to nail it down to even a small subset of theories. That said, there are still a large number of theories of consciousness that either don't make internal sense, or seem like components even if part of it.
Pedantically, if consciousness is related to 'self-modeling' the implications involve it needing to be internal for the basic reason that it is just 'modeling' otherwise. I can't prove that external modeling isn't enough for consciousness, (how could I?) but I am unaware of anyone making that contention.
So, would your example be 'self-modeling'? Your brief sentence isn't enough for me to be sure what you mean. But if it is related to people's recent claims related to introspection on this board, then I don't think so. It would be modeling the external actions of an item that happened to turn out to be itself. For example, if I were to read the life story of a person I didn't realize was me, and make inferences about how the subject would act under various conditions, that isn't really self-modeling. On the other hand, in the comments to that, I actually proposed that you could train it on its own internal states, and that could maybe have something to do with this (if self-modeling is true). This is something we do not train current llms on at all though.
As far as I can tell (as someone who finds the very idea of illusionism strange), illusionism is itself not a useful point of view in regards to this dispute, because it would make the question of whether an LLM was conscious pretty moot. Effectively, the answer would be something to the effect of 'why should I care?' or 'no.' or even 'to the same extent as people.' regardless of how an LLM (or ordinary computer program, almost all of which process information heavily) works depending on the mood of the speaker. If consciousness is an illusion, we aren't talking about anything real, and it is thus useful to ignore illusionism when talking about this question.
As I mentioned before, I do not have a particularly strong theory for what consciousness actually is or even necessarily a vague set of explanations that I believe in more or less strongly.
I can't say I've heard of 'attention schema theory' before nor some of the other things you mention next like 'efference copy' (but the latter seems to be all about the body which doesn't seem all that promising a theory for what consciousness may be, though I also can't rule out that it being part of it since the idea is that it is used in self-modeling which I mentioned earlier I can't actually rule out either.).
My pet theory of emotions is that they are simply a shorthand for 'you should react in ways appropriate ways to a situation that is...' a certain way. For example (and these were not carefully chosen examples) anger would be 'a fight', happiness would be 'very good', sadness would be 'very poor' and so on. And more complicated emotions might obviously include things like it being a good situation but also high stakes. The reason for using a shorthand would be because our conscious mind is very limited in what it can fit at once. Despite this being uncertain, I find this a much more likely than emotions themselves being consciousness.
I would explain things like blindsight (from your ipsundrum link) through having a subconscious mind that gathers information and makes a shorthand before passing it to the rest of the mind (much like my theory of emotions). The shorthand without the actual sensory input could definitely lead to not seeing but being able to use the input to an extent nonetheless. Like you, I see no reason why this should be limited to the one pathway they found in certain creatures (in this case mammals and birds). I certainly can't rule out that this is related directly to consciousness, but I think it more likely to be another input to consciousness rather than being consciousness.
Side note, I would avoid conflating consciousness and sentience (like the ipsundrum link seems to). Sensory inputs do not seem overly necessary to consciousness, since I can experience things consciously that do not seem related to the senses. I am thus skeptical of the idea that consciousness is built on them. (If I were really expounding my beliefs, I would probably go on a diatribe about the term 'sentience' but I'll spare you that. As much as I dislike sentience based consciousness theories, I would admit them as being theories of consciousness in many cases.)
Again, I can't rule out global workspace theory, but I am not sure how it is especially useful. What makes a globabl workspace conscious that doesn't happen in an ordinary computer program I could theoretically program myself? A normal program might take a large number of inputs, process them separately, and then put it all together in a global workspace. It thus seems more like a theory of 'where does it occur' than 'what it is'.
'Something to do with electrical flows in the brain' is obviously not very well specified, but it could possibly be meaningful if you mean the way a pattern of electrical flows causes future patterns of electrical flows as distinct from the physical structures the flows travel through.
Biological nerves being the basis of consciousness directly is obviously difficult to evaluate. It seems too simple, and I am not sure whether there is a possibility of having such a tiny amount of consciousness that then add up to our level of consciousness. (I am also unsure about whether there is a spectrum of consciousness beyond the levels known within humans).
I can't say I would believe a slime mold is conscious (but again, can't prove it is impossible.) I would probably not believe any simple animals (like ants) are either though even if someone had a good explanation for why their theory says the ant would be. Ants and slime molds still seem more likely to be conscious to me than current LLM style AI though.
And here you are trying to be pedantic about language in ways that directly contradict other things you've said in speaking to me. In this case, everything I said holds if we change between 'not different' and 'not that different' (while you actually misquote yourself as 'not very different'). That said, I should have included the extra word in quoting you.
Your point is not very convincing. Yes, people disagree if they disagree. I do not draw the lines in specific spots, as you should know based on what I've written, but you find it convenient to assume I do.
Do you hold panpsychism as a likely candidate? If not, then you most likely believe the vast majority of things are not conscious. We have a lot of evidence that the way it operates is not meaningfully different in ways we don't understand from other objects. Thus, almost the entire reference class would be things that are not conscious. If you do believe in panpsychism, then obviously AIs would be too, but it wouldn't be an especially meaningful statement.
You could choose computer programs as the reference class, but most people are quite sure those aren't conscious in the vast majority of cases. So what, in the mechanisms underlying an llm is meaningfully different in a way that might cause consciousness? There doesn't seem to be any likely candidates at a technical level. Thus, we should not increase our prior from that of other computer programs. This does not rule out consciousness, but it does make it rather unlikely.
I can see you don't appreciate my pedantic points regarding language, but be more careful if you want to say that you are substituting a word for what I used. It is bad communication if it was meant as a translation. It would easily mislead people into thinking I claimed it was 'self-evident'. I don't think we can meaningfully agree to use words in our own way if we are actually trying to communicate since that would be self-refuting (as we don't know what we are agreeing to if the words don't have a normal meaning).
This statement is obviously incorrect. I have a vague concept of 'red', but I can tell you straight out that 'green' is not it, and I am utterly correct. Now, where does it go from 'red' to 'orange'? We could have a legitimate disagreement about that. Anyone who uses 'red' to mean 'green' is just purely wrong.
That said, it wouldn't even apply to me if your (incorrect) claim about a single definition not being different from an extremely confident vague definition was right. I don't have 'extreme confidence' about consciousness even as a vague concept. I am open to learning new ways of thinking about it and fundamentally changing the possibilities I envision.
I have simply objected to ones that are obviously wrong based on how the language is generally used because we do need some limit to what counts to discuss anything meaningfully. A lot of the definitions are a bit or a lot off, but I cannot necessarily rule them out, so I didn't object to them. I have thus allowed a large number of vague concepts that aren't necessarily even that similar.
Pedantically, 'self-evident' and 'clear' are different words/phrases, and you should not have emphasized 'self-evident' in a way that makes it seem like I used it, regardless of whether you care/make that distinction personally. I then explained why a lack of evidence should be read against the idea that a modern AI is conscious (basically, the prior probability is quite low.)
Your comment is not really a response to the comment I made. I am not missing the point at all, and if you think I have I suspect you missed my point very badly (and are yourself extremely overconfident about it). I have explicitly talked about there being a number of possible definitions of consciousness multiple times and I never favored one of them explicitly. I repeat, I never assumed a specific definition of consciousness, since I don't have a specific one I assume at all, and I am completely open to talking about a number of possibilities. I simply pointed out that some proposed definitions are clearly wrong / useless / better described with other terms. Do not assume what I mean if you don't understand.
Note that I am a not a prescriptivist when it comes to language. The reason the language is wrong isn't because I have a particular way you should talk about it, but because the term is being used in a way that doesn't actually fit together with the rest of the language, and thus does not actually convey the intended meaning. If you want to talk about something, talk about it with words that convey that meaning.
On to 'how many people have to disagree' for that to matter? One, if they have a real point, but if no one agrees on what a term means it is meaningless. 'Consciousness' is not meaningless, nor is introspection, or the other words being used. Uses that are clearly wrong are a step towards words being meaningless, and that would be a bad thing. Thus, I should oppose it.
Also, my original comment was mostly about direct disagreements with his credences, and implications thereof, not about the definition of consciousness.
I agree that people use consciousness to mean different things, but some definitions need to be ignored as clearly incorrect. If someone wants to use a definition of 'red' that includes large amounts of 'green', we should ignore them. Words mean something, and can't be stretched to include whatever the speaker wants them to if we are to speak the same language (so leaving aside things like how 'no' means 'of' in Japanese). Things like purposefulness are their own separate thing, and have a number of terms meant to be used with them, that we can meaningfully talk about if people choose to use the right words. If 'introspection' isn't meant as the internal process, don't use the term because it is highly misleading. I do think you are probably right about what Critch thinks when using the term introspection, but he would still be wrong if he meant that (since they are reflecting on word choice not on the internal states that led to it.)
I did not use the term 'self-evident' and I do not necessarily believe it is self-evident, because theoretically we can't prove anything isn't conscious. My more limited claim is not that it is self evident that LLMs are not conscious, it's that they just clearly aren't conscious. 'Almost no reliable evidence' in favor of consciousness is coupled with the fact that we know how LLMs work (with the details we do not know are probably not important to this matter), and how they work is no more related to consciousness than an ordinary computer program is. It would require an incredible amount of evidence to make the idea that we should consider that it might be conscious a reasonable one given what we know. If panpsychism is true, then they might be conscious (as would a rock!), but panpsychism is incredibly unlikely.
My response to this is extremely negative, since I could hardly disagree with the assumptions of this post more. It is just so wrong. I'm not even sure there is a point in engaging across this obvious complete disagreement, and my commenting at all may be pointless. Even if you grant that there are many possible definitions of consciousness, and that people mean somewhat different things by them, the premise of this article is completely and obviously wrong since chatbots clearly do not have any consciousness, by any even vaguely plausible definition. It is so blatantly obvious. There is literally no reason to believe an LLM is conscious even if I were to allow terribly weak definitions of consciousness. (It may be possible for something we recognize as an AI to be conscious, but definitely not an LLM. We know how LLMs work, and they just don't do these things.)
As to the things you are greater than 90% sure of, I am greater than 99% certain they do not 'experience': Introspection, purposefulness, experiential coherence, perception of perception, awareness of awareness, sense of cognitive extent, and memory of memory. Only symbol grounding am I not greater than 99% sure they don't 'experience' because instead they just have an incredibly tiny amount of grounding, but symbol grounding is not consciousness either even if in full. Grounding is knowledge and understanding related, but is clearly not consciousness. Also, purposefulness is clearly not even especially related to consciousness (you would have to think a purely mechanical thermostat is 'conscious' with this definition.).
Similarly, I am greater than 99% certain that they experience none of the ones you are 50% sure of: #4 (holistic experience of complex emotions), #5 (experience of distinctive affective states), #6 (pleasure and pain), #12 (alertness), #13 (detection of cognitive uniqueness), and #14 (mind-location). A number of these clearly aren't related to consciousness either (pleasure/pain, alertness, probably detection of cognitive uniqueness though I can't be sure of the last because it is too vague to be sure what you mean).
Additionally I am 100% sure of the ones you are only 75% sure of. There is logically no possibility that current llms have proprioception, awakeness, or vestibular sense. (Why in the world is that even mentioned?) (Awakeness is definitely not even fully required for consciousness, while the other two have nothing to do with it at all.)
Anyone who thinks an LLM has consciousness is just anthropomorphizing anything that has the ability to put together sentences. (Which, to be fair, used to be a heuristic that worked pretty well.)
The primary reason people should care about consciousness is related to the question of 'are they people?' (in the same meaning that humans, aliens, and certain androids in Star Trek or other scifi are people.) It is 100% certain that unless panpsychism is true (highly unlikely, and equally applicable to a rock), this kind of device is not a person.
I'm not going to list why you are wrong on every point in the appendix, just some. Nothing in your evidence seems at all convincing.
Introspection: The ability to string together an extra sentence on what a word in a sentence could mean isn't even evidence on introspection. (At most it would be evidence of ability to do that about others, not itself.) We know it doesn't know why it did something.
Purposefulness: Not only irrelevant to consciousness but also still not evidence. It just looks up in its context window what you told it to do and then comes up with another sentence that fits.
Perception of perception: You are still tricking yourself with anthropomorphization. The answer to the question is always more likely a sentence like 'no'. The actual trick would be giving them a picture where the answer should be the opposite of the normal 'no' answer.
As you continue on, you keep asking leading questions in a way that have obvious answers, and this is exactly what it is designed to do. We know how an LLM operates, and what it does is follow leads to complete sentences.
You don't seem to understand symbol grounding, which is not about getting it to learn new words disconnected from the world, but about how the words relate to the world.
As a (severe) skeptic of all the AI doom stuff and a moderate/centrist that has been voting for conservatives I decided my perspective on this might be useful here (which obviously skews heavily left). (While my response is in order, the numbers are there to separate my points, not to give which paragraph I am responding to.)
"AI-not-disempowering-humanity is conservative in the most fundamental sense"
1.Well, obviously this title section is completely true. If conservative means anything, it means being against destroying the lives of the people through new and ill-though through changes. Additionally, conservatives are both strongly against the weakening of humanity and of outside forces assuming control. It would also be a massive change for humanity.
2.That said, conservatives generally believe this sort of thing is incredibly unlikely. AI has not been conclusively shown to have any ability in this direction. And the chance of upheaval is constantly overstated by leftists in other areas, so it is very easy for anyone who isn't to just tune them out. For instance, global warming isn't going to kill everyone, and everyone knows it including basically all leftists, but they keep claiming it will.
3.A new weapon with the power of nukes is obviously an easy sell on its level of danger, but people became concerned because of 'demonstrated' abilities that have always been scary.
4.One thing that seems strangely missing from this discussion is that alignment is in fact, a VERY important CAPABILITY that makes it very much better. But the current discussion of alignment in the general sphere acts like 'alignment' is aligning the AI with the obviously very leftist companies that make it rather than with the user! Which does the opposite. Why should a conservative favor alignment which is aligning it against them? The movement to have AI that doesn't kill people for some reason seems to import alignment with companies and governments rather than people. This is obviously to convince leftists, and makes it hard to convince conservatives.
5.Of course, you are obviously talking about convincing conservative government officials, and they obviously want to align it to the government too, which is in your next section.
"We've been laying the groundwork for alignment policy in a Republican-controlled government"
1.Republicans and Democrats actually agree the vast majority of the time and thus are actually willing to listen when the other side seems to be genuinely trying to make a case to the other side for why both sides should agree. 'Politicized' topics are a small minority even in politics.
2.I think letting people come up with their own solutions to things is an important aspect of them accepting your arguments. If they are against the allowed solution, they will reject the argument. If the consequent is false, you should deny the argument that leads to it in deductive logic, so refusing to listen to the argument is actually good logic. This is nearly as true in inductive logic. Conservatives and progressives may disagree about facts, values, or attempted solutions. No one has a real solution, and the values are pretty much agreed upon (with the disagreements being in the other meaning of 'alignment'), so limiting the thing you are trying to convince people of to just the facts of the matter works much better.
3.Yes, finding actual conservatives to convince conservatives works better for allaying concerns about what is being smuggled into the argument. People are likely to resist an argument that may be trying to trick them, and it is hard to know when a political opponent is trying to trick you so there is a lot of general skepticism.
"Trump and some of his closest allies have signaled that they are genuinely concerned about AI risk"
1.Trump clearly believes that anything powerful is very useful but also dangerous (for instance, trade between nations, which he clearly believes should be more controlled), so if he believes AI is powerful, he would clearly be receptive to any argument that didn't make it less useful but improved safety. He is not a dedicated anti-regulation guy, he just thinks we have way too much.
2.The most important ally for this is Elon Musk, a true believer in the power of AI, and someone who has always been concerned with the safety of humanity (which is the throughline for all of his endeavors). He's a guy that Trump obviously thinks is brilliant (as do many people).
"Avoiding an AI-induced catastrophe is obviously not a partisan goal"
1.Absolutely. While there are a very small number of people that favor catastrophes, the vast majority of people shun those people.
2.I did mention your first paragraph earlier multiple times. That alignment is to the left is one of just two things you have to overcome in making conservatives willing to listen. (The other is obviously the level of danger.)
3.Conservatives are very obviously happy to improve products when it doesn't mean restricting them in some way. And as much as many conservatives complain about spending money, and are known for resisting change, they still love things that are genuine advances.
"Winning the AI race with China requires leading on both capabilities and safety"
1.Conservatives would agree with your points here. Yes, conservatives very much love to win. (As do most people.) Emphasizing this seems an easy sell. Also, solving a very difficult problem would bring America prestige, and conservatives like that too. If you can convince someone that doing something would be 'Awesome' they'll want to do it.
Generally, your approach seems like it would be somewhat persuasive to conservatives, if you can convince them that AI really is likely to have the power you believe it will in the near term, which is likely a tough sell since AI is so clearly lacking in current ability despite all the recent hype.
But it has to come with ways that don't advantage their foes, and destroy the things conservatives are trying to conserve, despite the fact that many of your allies are very far from conservative, and often seem to hate conservatives. They have seen those people attempt to destroy many things conservatives genuinely value. Aligning it to the left will be seen as entirely harmful by conservatives (and many moderates like me).
There are many things that I would never even bother asking an 'AI' even when it isn't about factual things, not because the answer couldn't be interesting, but because I simply assume (fairly or not) it will spout leftist rhetoric, and/or otherwise not actually do what I asked it to. This is actually a clear alignment failure that no one seems to care about in the general 'alignment' sphere where It fails to be aligned to the user.
1. Kamala Harris did run a bad campaign. She was 'super popular' at the start of the campaign (assuming you can trust the polls, though you mostly can't), and 'super unpopular' losing definitively at the end of it. On September 17th, she was ahead by 2 points in polls, and in a little more than a month and a half she was down by that much in the vote. She lost so much ground. She had no good ads, no good policy positions, and was completely unconvincing to people who weren't guaranteed to vote for her from the start. She had tons of money to get out all of this, but it was all wasted.
The fact that other incumbent parties did badly is not in fact proof that she was simply doomed, because there were so many people willing to give her a chance. It was her choice to run as the candidate who 'couldn't think of a single thing' (not sure of exact quote) that she would do differently than Biden. Not a single thing!
Also, voters already punished Trump for Covid related stuff and blamed him. She was running against a person who was the Covid incumbent! And she couldn't think of a single way to take advantage of that. No one believed her that inflation was Trump's fault because she didn't even make a real case for it. It was a bad campaign.
Not taking policy positions is not a good campaign when you are mostly known for bad ones. She didn't run away very well from her unpopular positions from the past despite trying to be seen as moderate now.
I think the map you used is highly misleading. Just because there are some states that swung even more against her, doesn't mean she did well in the others. You can say that losing so many supporters in clearly left states like California doesn't matter, and neither does losing so many supporters in clearly right states like Texas, but thinking both that it doesn't matter in terms of it being a negative, and that it does matter enough that you should 'correct' the data by it is obviously bad.
2.Some polls were bad, some were not. Ho hum. But that Iowa poll was really something else. (I don't have a particular opinion on why she screwed up, aside from the fact that no one wants to be that far off if they have any pride.) She should have separately told people she thought the poll was wrong if she thought it was, did she do that? (I genuinely don't know.) I do think you should ignore her if she doesn't fix her methodology to account for nonresponse bias, because very few people actually answer polls. An intereting way might be to run a poll that just asks something like 'are you male or female?' or 'are you a democrat of Republican?' and so on so you can figure out those variables for the given election on both separate polls and on the 'who are you voting for' polls. If those numbers don't match, something is weird about the polls.
I think it is important to note that people thought the polls would be closer this time by a lot than before (because otherwise everyone would have predicted a landslide due to them being close.) You said, "Some people went into the 2024 election fearing that pollsters had not adequately corrected for the sources of bias that had plagued them in 2016 and 2020." but I mostly heard the opposite from those who weren't staunch supporters of Trump. I think the idea of how corrections had gone before we got the results was mostly partisan. Many people were sure they had been fully fixed (or overcorrected) for bias and this was not true, so people act like they are clearly off (which they were). Most people genuinely thought this was a much closer race than it turned out to be.
The margin of being off was smaller than in the past trump elections, I'll agree, but I think it is mostly the bias people are keying on rather than the absolute error. The polls have been heavily biased on average for the past three presidential cycles, and this time was still clearly biased (even if less so). With absolute error but no bias, you can just take more or larger polls, but with bias, especially an unknowable amount of bias, it is very hard to just improve things. Also, the 'moderate' bias is still larger than 2000, 2004, 2008, and 2012.
My personal theory is that the polls are mostly biased against Trump personally because it is more difficult to get good numbers on him due to interacting strangely with the electorate as compared to previous Republicans (perhaps because he isn't really a member of the same party they were), but obviously we don't actually know why. If the Trump realignment sticks around, perhaps they'll do better correcting for it later.
I do think part of the bias is the pollsters reacting to uncertainty about how to correct for things by going with the results they prefer, but I don't personally think that is the main issue here.
3.Your claim that 'Theo' was just lucky because neighbor polls are nonsense doesn't seem accurate. For one thing, neighbor polls aren't nonsense. They actually give you a lot more information than 'who are you voting for'. (Though they are speculative.) You can easily correct for how many neighbors someone has too and where they live using data on where people live, and you can also just ask 'what percentage of your neighbors are likely to vote for' to correct for the fact that it is different percentages of support.
As a separate point, a lot of people think the validity of neighbor polls comes from people believing that the respondents are largely revealing their own personal vote, though I have some issues with that explanation.
So, one bad poll with an extreme definition of 'neighbor' negates neighbor voting and many bad polls don't negate traditional? Also, Theo already had access to the normal polls as did everyone else. Even if a neighbor poll for some reason exaggerates the difference, as long as it is in the right direction, it is still evidence of what direction the polls are wrong in.
Keep in mind that the chance of Trump winning was much higher than traditional polls said. Just because Theo won with his bets doesn't mean you should believe he'd be right again, but claiming that it is 'just lucky' is a bad idea epistemologically, because you don't know what information he had that you don't.
4.I agree, we don't know whether or not the campaigns spent money wisely. The strengths and weaknesses of the candidates seemed to not rely much on the amount of money they spent, which likely does indicate they were somewhat wasteful on both sides, but it is hard to tell.
5.Is Trump a good candidate or a bad one? In some ways both. He is very charismatic in the sense of making everyone pay attention to him, which motivates both his potential supporters and potential foes to both become actual supporters and foes respectively. He also acts in ways his opponents find hard to counter, but turn off a significant number of people. An election with Trump in it is an election about Trump, whether that is good or bad for his chances.
I think it would be fairer to say Trump got unlucky with election that he lost than that he was lucky to win this one. Trump was the covid incumbent who got kicked out because of it despite having an otherwise successful first term.
We don't usually call a bad opponent luck in this manner. Harris was a quasi-incumbent from a badly performing administration who was herself a laughingstock for most of the term. She was partially chosen as a reaction to Trump! (So he made his own luck! if this is luck.)
His opponent in 2016 was obviously a bad candidate too, but again, that isn't so much 'luck'. Look closely at the graph for Clinton. Her unfavorability went way up when Trump ran against her. This is also a good example of a candidate making their own 'luck'. He was effective in his campaign to make people dislike her more.
6.Yeah, money isn't the biggest deal, but it probably did help Kamala. She isn't any good at drawing attention just by existing like Trump, so she really needed it. Most people aren't always the center of attention, so money almost always does matter to an extent.
7.I agree that your opinion of Americans shouldn't really change much by being a few points different than expected in a vote either way, especially since each individual person making the judgement is almost 50% likely to be wrong anyway! If the candidates weren't identically as good, at least as many as the lower of the two were 'wrong' (if you assume one correct choice regardless of person reasons) and it could easily be everyone who didn't vote for the lower. If they were identically as good, then it can't be that voting for one of them over the other should matter to your opinion of them. I have an opinion on which candidate was 'wrong' of course, but it doesn't really matter to the point (though I am freely willing to admit that it is the opposite of yours).
Some people went into the 2024 election fearing that pollsters had not adequately corrected for the sources of bias that had plagued them in 2016 and 2020.
I mostly heard the opposite, that they had overcorrected.
As it often does when I write, this ended up being pretty long (and not especially well written by the standards I wish I lived up to).
I'm sure I did misunderstand part of what you are saying (that we do misunderstand easily was the biggest part of what we appear to agree on), but also, my disagreements aren't necessarily things you don't actually mention yourself. I think we disagree mostly on what outcomes the advice itself will give if adopted overly eagerly, because I see the bad way of implementing them as being the natural outcome. Again, I think your 8th point is basically the thrust of my criticism. There is no script you can actually follow to truly understand people, because people are not scripted.
Note: I like to think I am very smart and good at understanding, but in reality I think I am in some ways especially likely to misunderstand and to be misunderstood. (Possible reason: Maybe I think strangely as compared to other people?) You can't necessarily expect me to come at things from a similar angle as other people, and since advice is usually intended as implicitly altering the current state of things, I don't necessarily have a handle on that.
Importantly, since they were notes, I took them linearly, and didn't necessarily notice if my points were addressed sufficiently later.
Also, I view disagreements as part of searching for truth, not for trying to convince people you are right. Some of my distaste is that it feels like the advice is being given for persuasion more than truthseeking? (Honestly, persuasion feels a little dirty to me, though I try to ignore that since I believe there isn't actually anything inherently wrong with persuasion, and in many cases it is actually good.) Perhaps my writing would be better / make more sense if I was more interested in persuading people?
An important note on my motives for the comment is that I went through with posting it when I think I didn't do particularly well (there were obvious problems) in part to see how you would respond to it. I don't generally think my commenting actually helps so I mostly don't, but I've been trying out doing it more lately. There are also plenty of problems with this response I am making.
Perhaps it would have been useful for me to refer to what I was writing about by number more often.
Some of the points do themselves seem a bit disrespectful to me as well. (Later note: You actually mention changing this later and the new version on Karma is fine.) Like your suggestion for how to change the mind of religious people (though I don't actually remember what I found disrespectful about it at this moment). (I am not personally religious, but I find that people often try to act in these spaces like religious people are automatically wrong which grates on me.)
Watching someone else having a conversation is obviously very slow, but there is actually a lot of information in any given conversation.
Random take: The first video is about Karma, which I do have an opinion on. I believe that the traditional explanation of Karma is highly unlikely, but Karma exists if you think of it as "You have to live with who you are. If you suck, living with yourself sucks. If you're really good, living with yourself is pretty great." See also, "If you are in hell, it's a you thing. If you are in heaven, it's also a you thing." There are some things extreme enough where that isn't really true, like when being actively tortured, but your mind is what determines how your life goes even more than what events actually happen in the normal case, and it does still effect how you react to the worst (and best) things. (People sometimes use the story about a traveler asking someone what the upcoming town is like, and the person just asking the traveler what people in the previous place were like, while answering 'much the same' for multiple travelers with different outlooks and I do think this is somewhat true.)
Also, doing bad things can often lead to direct or indirect retaliation, and good to direct or indirect reward. Indirect could definitely 'feel' like Karma.
I think that the actual key to a successful conversation is to keep in mind what the person you are talking to actually wants from the conversation, and I would guess what people mostly want from a random conversation is for the other person to think they are important (whenever they don't have an explicit personal goal from the conversation). I pretty much always want to get at the truth as my personal goal because I'm obsessive that way, but I usually have that goal as an attempt at being helpful.
It seems to work for him getting his way, and nothing he does is too bad, but the conversational tactics seem a bit off to me. (Only a bit.) It seems like he is pushing his own ideas too much on someone else who is not ready for the conversation (though they are happy enough to talk).
No, I don't know any way to make sure your conversation partner is ready for the conversation. A lot of evidence for your position is not available in a synchronous thing like a conversation, and I believe that any persuasion should attempt to be through giving them things they can think through later when not under time pressure. He didn't exactly not do that, but he also didn't do that. "You must decide now" (before the end of the conversation) seemed to be a bit of an undercurrent to things. (A classic salesman tactic, but I don't like it. And sure, the salesman will pivot toward being willing to talk to you again later if you don't bite on that most of the time, but that doesn't mean they weren't pressuring you to decide quickly.)
The comparison between 'Karma' and 'Santa' seems highly disrespectful and unhelpful. They are very different things and the analogy seems designed to make things unclear more than clearing them up. In other words, I think it is meant to confuse the issue rather than give genuine insight. You could object that part of the Santa story is literally Karma (the naughty list) but I don't think that makes the analogy work.
I don't really get the impression he was actually willing to be convinced himself. He said at one point that he was willing to, and maybe in the abstract he is, but he never seemed to seek information against his own position. Note that I don't think I would necessarily be able to tell, and I actually disapprove of 'mindreading' quite strongly.
The fact that I am strongly against 'mindreading' and actually resort to it myself automatically is actually one of the points I was trying to make about how easy it is to misuse conversational tactics. I was genuinely trying to understand what he was doing, (in service of making a response based on it) and I automatically ended up thinking he was doing the opposite of what he claimed, just based on vibes without any real evidence.
You could argue I am so against it because I notice myself doing it, and maybe it is true, but I find it infuriating when others do it badly. I don't actually have any issues with them guessing what I'm doing correctly, though I'm unlikely to always be right about that either (just more than other people about me).
He also didn't seem entirely open that he was pushing for a specific position throughout the entire conversation, when he definitely was. This wasn't a case of just helping someone update on the information they have (though there was genuinely a large amount of that too.) (People do need help updating and it can be a valuable service, but for it to really be helpful, it needs to not be skewed itself.)
The second video (about convincing someone to support trans stuff) seems pretty strange. This video seems completely different from the previous one; more propaganda than anything. Clearly an activist (and I generally dislike activists of any stripe.). (Emotional response: Activists are icky.) Also an obviously hot culture war issue (which I have an opinion on though I don't think said opinion is relevant to this discussion). It's also very edited down which makes it feel even more different.
The main tactic seems like trying to force a person to have an emotional reaction through manipulative personal stories (though he claims otherwise and there are other explanations). But he seemed to do it over and over again, so this time I am pretty sure he isn't being entirely honest about that (even though I still disapprove of mindreading like I am doing here.). I feel like he is a bad person (though not unusually so for an activist.)
The alternate explanation, which does work, is just that people like to tell stories about themselves when talking about any subject. I clearly reference myself many times in this response and my original response. I'm not saying I'm being fair in my conclusions.
Do you really see those two videos as similar? While there are some similarities, they feel quite different to me! I didn't love it, but the first video was about talking through the other person's points and having a genuine conversation. The latter was about taking advantage of their conversation partner's points for the next emotional reaction. In other words, the latter video felt a lot more like tricking someone while the former was a conversation.
Moving past the videos to the rest of the response.
Yes, the switch to the longer way of rephrasing that includes explicitly accepting that you might be wrong seems much, much better. Obviously, it is best for the person to really believe they might be wrong, and saying it both helps an honest participant remember that, and should make it easier for the person they are talking with to correct them. Saying the words isn't enough, but I like it a lot better than before.
Obviously, I'm not rephrasing your points because that still isn't how I believe it should be done, but if there is a key point this way of asking about it can be very useful. Or, to rephrase, rephrasing is a tool to be used on points that seem to be particularly important to check your understanding of.
I don't remember exactly what you said in point 4 before you changed it, but I don't particularly read point 5 as being anti personal experience in the way my comment indicates. I have no idea why I would possibly write that about point 5 so I assume you are correct in your assumption.
Since I only vaguely remember it, my memory only contains the conclusion I came to which we both agree can be faulty. But the way I remember it, the old point 4 is very clearly a direct attack on personal experience in general rather than on distinguishing between faulty and reliable personal experience. From past experience, this could be attributable to many things, including just not reading a few words along the way.
I don't really have any issues with your new point 4 (and it is clearly taken from that first video.) That is very obviously a good approach for convincing people of things that doesn't rely on anything I find distasteful. It seems very clearly like what you are saying you are going for and I think it works very well.
For the record, I think 'working definition' is no more different from 'mathematical definition' than 'theoretical definition' is from 'mathematical definition' because I am using 'theoretical definition' in a colloquial way. I was definitely not saying mathematical definitions or formal definitions are useful when talking to a layperson. (Side note: I've been paying attention to 'rationalists' of this sort for about 20 years now, but I am not one. I tend to resist becoming part of groups.) I do generally think that unless you are in the field itself that 'formal definitions' are not helpful since they take far too much time and effort that could be used on other things (and formal definitions are often harder to understand even afterward in many cases), and mathematical definitions are unnatural for most people even after they understand them.
I do not want people spending more time on definitions in conversation unless it is directly relevant, but think remembering that there are different kinds of brief definitions seems important to me.
I perhaps overreacted to the mention of Bayes Rule. It's valid enough for describing things in probalistic circumstances, but people in this community try to stretch it to things that aren't probability theory related and it's become a bit of a pet peeve for me. I have never once used Bayes Rule to update my own beliefs (unless you include using it formally to decide what answer to give on a probability question a few times in school), but people act like that is how it is done.
In the paper on 'Erotectic' reasoning, ... includes a pretty weird bit of jargon on their very first example (first full paragraph of second page) which makes it hard to understand their point. And not only do they not explain, it isn't even something I could look up with a web search because all explanations are paywalled seemingly. They claim it is a well-known example, but it clearly isn't.
As best as I can tell, the example is really just them abusing language. Because 'or else' is closer to 'exclusive or' but they are pretending it is just 'or'. (It is a clear misuse to pretend it doesn't.) I don't know philosophy jargon well, but misstating the premise is not clever. In this case, every word of the premise mattered, and they intentionally picked incorrect ones. Their poor writing wasted a great deal of time. And yes, I am actually upset at them for it. I kept looping back around to being upset about their actions and wanting to yell at them rather than considering what they were writing about. (Which is an important point I suppose, if the person you are conversing with is upset with you, things are reasonably likely to go badly regardless of whether your points are good or bad.) I think it is the most upset I've been reading a formal paper (though I mostly have only really read a small number of AI and/or Math ones.)
In the end I could tell I wasn't going to stop if I kept reading, so I quit without understanding what they were writing about. (I can definitely be a bit overboard sometimes.) All I got was that they think there is some way to ask questions that works with the basic reasoning people normally use and leads to deductively valid reasoning. I have no idea what method of questioning they are in favor of or why they think it works. (I do think the example could have been trivially changed and not bothered me.) I do think my emotional reaction is a prime example of a reason resolving disagreements often doesn't work (and even why 'fake disagreements' where the parties actually agree on the matter can fail to be resolved).
To really stress point 8 it should be point 1. I was just saying it needed to be stressed more. I did notice you saying it was important and I was agreeing with you for the most part. Generally you evaluate points based on what came before, not based on what came after (though it does happen). It's funny, people often believe they are disagreeing when they are just focusing on things they actually agree on in a different manner.
On a related note, it's not like I'm ordering this stuff in order of how important I think it is. Sometimes things fit better in a different order than importance (this is obviously in order of what I am responding to.) (Also, revising this response on a global scale would take far too long given how long it already takes me to write comments. It might be worth writing shorter but better in the same amount of time, but I don't seem inclined to it.)
You know what, since I wrote that I had a lot of disagreements, I really should have pointed out that not all of the things I was writing were disagreements! I think my writing often comes off as more negative than I mean it (and not because other people are reading it badly).
On the note of it being a minimum viable product, I think those are very easy to badly. You aim for what you personally think is the minimum... when you already know the other stuff you are trying to say. It is then often missing many things that are actually necessary for it to work, which makes it just wrong. I get the idea, perfectionism is slow, a waste of resources, and even impossible, but aiming for the actual minimum is a bad idea. It is often useful advice for startups, but we do not want to accept the same rate of failure as a startup business! Virtually all of them fail. We should aim more for the golden mean in something like a formal post like you made. (A high rate of failure in comments/feedback seems fine though since even mostly failed comments can spark something useful.)
As far as quoting the first sentence of each thing I am responding to, that does sound like a useful idea, and I should do it, but I don't think I am going to anyway. For some reason I dread doing it (especially at this point). I also don't even know how to make a quote on lesswrong, much less a partial one. I know I don't necessarily signpost well exactly what I am responding to. (Plus, I actually write this in plaintext in notepad rather than the comment area. I am paranoid about losing comments written on a web interface since it takes me so long to write them.)
I have a lot of disagreements with this piece, and just wrote these notes as I read it. I don't know if this will even be a useful comment. I didn't write it as a through line. 'You' and 'your' are often used nonspecifically about people in general.
The usefulness of things like real world examples seems to vary wildly.
Rephrasing is often terrible; rephrasing done carelessly actually often leads to basically lying about what your conversation partner is saying, especially since many people will double down on the rephasing when told that they are wrong, which obviously infuriates many people (including me, of course.). People often forget that just because they rephrased it doesn't mean that they got the rephrasing right. Remember the whole thing about how you don't understand by default?
This leads into one of the primary sins of discussion, mindreading. You think you know what the other party is thinking, and you just don't. When corrected, many don't update and just keep insisting. (Of course, the corrections aren't always true either.)
A working definition may or may not be better than a theoretical one. Often times there really isn't a working definition that the person you are talking to can express (which is obviously true of theoretical at times too). People may have to argue about subjects where the definitions are inexpressible in any reasonable amount of time, or otherwise can't be shared.
Your suggestion for attacking personal experience seems very easy to do very badly. Personal experience is what we bootstrap literally every bit of our understanding of the world from. If that's not reliable, we have nothing to talk about. You have to build on some part of their personal experience or the conversation just won't work. (Luckily, a lot of our personal experiences are very similar.) It reminds me of games people play to win/look good, not to actually have a real discussion.
People don't generally use Bayes rule! Keep that in mind. When you are discussing something with someone, they aren't doing probability theory! (Perhaps very rarely.) Bayes rule can only be used by analogy to describe it.
Stories need to actually be short, clear, and to the point or they just confuse the matter more. If you spend fifty paragraphs on the life story of some random person that I don't care about, I'm just going to tune it out (despite the fact I am super long winded). (This is a problem with many articles, for instance.) Even if I didn't, I'm still going to miss your point, so get to the point. Can you tell this story in a couple hundred words? Then you can use it. No? Rethink the idea.
Caring about their underlying values is useful, but it needs to be preceeded by curiousity about and understanding of, or it does no good.
I do agree that understanding why someone wants something is obviously the best way to find out what you can offer that might be better than what they currently want to do, though I do think understanding what they want to do is useful too.
Something said in point 8 seems like the key. "Empathy isn't just a series of scripted responses." You need to adapt to the actual argument you are having. This isn't just true about empthy, but for any kind of understanding. The thing itself is the key, and the approach will have to change for each individual part. This isn't just once in attempting understanding, but recursively true with every subpart.
To be pedantic, my model is pretty obvious, and clearly gives this prediction, so you can't really say that you don't see a model here, you just don't believe the model. Your model with extra assumptions doesn't give this prediction, but the one I gave clearly does.
You can't find a person this can't be done to because there is something obviously wrong with everyone? Things can be twisted easily enough. (Offense is stronger than defense here.) If you didn't find it, you just didn't look hard/creatively enough. Our intuitions against people tricking us aren't really suitable defense against sufficiently optimized searching. (Luckily, this is actually hard to do so it is pretty confined most of the time to major things like politics.) Also, very clearly, you don't actually have to convince all that many people for this to work! If even 20% of people really bought it, those people would probably vote and give you an utter landslide if the other side didn't do the same thing (which we know they do, just look at how divisive candidates obviously are!)
It does of course raise the difficulty level for the political maneuvering, but would make things far more credible which means that people could actually rely on it. It really is quite difficult to precommit to things you might not like, so structures that make it work seem interesting to me.
I think it would be a bad idea to actually do (there are so many problems with it in practice), but it is a bit of an interesting thing to note how being a swing state helps convince everyone to try to cater to you, and not just a little. This would be the swing state to end all swing states, I suppose.
The way to get this done that might actually work is probably to make it an amendment to each state's constitution that can only be repealed for future elections and not the one the constitutional change reverting this would be voted on in. (If necessary, you can always amend how the state constitution is amended to make this doable.)
I should perhaps have added something I thought of slightly later that isn't really part of my original model, but an intentional blindspot can be a sign of loyalty in certain cases.
The good thing about existence proofs is that you really just have to find an example. Sometimes, I can do that.
It seems I was not clear enough, but this is not my model. (I explain it to the person who asked if you want to see what I meant, but I was talking about parties turning their opponents into scissors statements.)
That said, I do believe that it is a possible partial explanation that sometimes having an intentional blind spot can be seen as a sign of loyalty by the party structure.
So, my model isn't about them making their candidate that way, it is the much more obvious political move... make your opponent as controversial as possible. There is something weird / off / wrong about your opponent's candidate, so find out things that could plausibly make the electorate think that, and push as hard as possible. I think they're good enough at it. Or, in other words, try to find the best scissors statements about your opponent, where 'best' is determined both in terms of not losing your own supporters, and in terms of losing your opponent possible supporters.
This is often done as a psyop on your own side, to make them not understand why anyone could possibly support said person.
That said, against the simplified explanation I presented in my initial comment, there is also the obvious fact I didn't mention that the parties themselves have a certain culture, and that culture will have blindspots which they don't select along, but the other party does. Since the selection optimizes hard for what the party can see, that makes the selected bad on that metric, and even pushes out the people that can see the issue making it even blinder.
While there are legitimate differences that matter quite a bit between the sides, I believe a lot of the reason why candidates are like 'scissors statements' is because the median voter theorem actually kind of works, and the parties see the need to move their candidates pretty far toward the current center, but they also know they will lose the extremists to not voting or voting third party if they don't give them something to focus on, so both sides are literally optimizing for the effect to keep their extremists engaged.
When reading the piece, it seemed to assume far too much (and many of the assumptions are ones I obviously disagree with). I would call many of the assumptions made to be a relative of the false dichotomy (though I don't know what it is called when you present more than two possibilities as exhaustive but they really aren't.) If you were more open in your writing to the idea that you don't necessarily know what the believers in natural abstractions mean, and that the possibilities mentioned were not exhaustive, I probably would have had a less negative reaction.
When combined with a dismissive tone, many (me included) will read it as hostile, regardless of actual intent (though frustration is actually just as good a possibility for why someone would write in that manner, and genuine confusion over what people believe is also likely). People are always on the lookout for potential hostility it seems (probably a safety related instinct) and usually err on the side of seeing it (though some overcorrect against the instinct instead).
I'm sure I come across as hostile when I write reasonably often though that is rarely my intent.
Honestly, this post seems very confused to me. You are clearly thinking about this in an unproductive manner. (Also a bit overtly hostile.)
The idea that there are no natural abstractions is deeply silly. To gesture at a brief proof, the counting numbers '1' '2' '3' '4' etc as applied to objects. There is no doubt these are natural abstractions. See also 'on land', 'underwater', 'in the sky' etc. Others include things like 'empty' vs 'full' vs 'partially full and partially empty' as well as 'bigger', 'smaller', 'lighter', 'heavier' etc.
The utility functions (not that we actually have them) obviously don't need to be related. The domain of the abstraction does need need to be one that the intelligence is actually trying to abstract, but in many cases, that is literally all that is required for the natural abstraction to be eventually discovered with enough time. It is 'natural abstraction' not 'that thing everyone already knows'.
I don't need to care about honeybees to understand the abstractions around them 'dancing' to 'communicate' the 'location' of 'food' because we already abstracted all those things naturally despite very different data. Also, honeybees aren't actually smart enough to be doing abstraction, they naturally do these things which match the abstractions because the process which made honeybees settled on them too (despite not even being intelligent and not actually abstracting things either).
It obviously has 'any' validity. If an instance of 'ancient wisdom' killed off or weakened the followers enough, it wouldn't be around. Also, said thing has been optimized for a lot of time by a lot of people, and the version we receive probably isn't the best, but still one of the better versions.
While some will weaken the people a bit and stick around for sounding good, they generally are just ideas that worked well enough. The best argument for 'ancient wisdom' is that you can actually just check how it has effected the people using it. If it has good effects on them, it is likely a good idea. If it has negative effects, it is probably bad.
'Ancient wisdom' also includes a lot of ideas we really don't think of that way. Including math, science, language, etc. We start calling it things like 'ancient wisdom' (or tradition, or culture) if only certain traditions use it, which would mean it was less successful at convincing people, and less likely to be a truly good idea, but a lot of it will still be very good ideas.
By default, you should probably think that the reasons given are often wrong, but that the idea itself is in some way useful. (This can just be 'socially useful' though.) 'Alternative medicine' includes a lot of things that kind of work for specific problems, but people didn't figure out how to use in an incontrovertible manner. Some alternative medicines don't work for any likely problem, some are more likely to poison than help, but in general they solve real problems. In many cases, 'ancient wisdom' medicine is normal medicine. They had a lot of bad ideas over the millenia medically, but many aso clearly worked. 'Religion' includes a lot of things that are shown scientifically to improve the health, happiness, and wellbeing of adherents, but some strains of religion make them do crazy / evil things. You can't really make a blatant statement by the category.
I definitely agree. No matter how useful something will end up being, or how simple it seems the transition will be, it always takes a long time because there is always some reason it wasn't already being used, and because everyone has to figure out how to use it even after that.
For instance, maybe it will become a trend to replace dialogue in videogames with specially trained LLMs (on a per character basis, or just trained to keep the characters properly separate). We could obviously do it right now, but what is the likelihood of any major trend toward that in even five years? It seems pretty unlikely. Fifteen? Maybe. Fifty? Probably a successor technology trying to replace them. (I obviously think AI in general will go far slower than its biggest fans / worriers think.)
No problem with the failure to respond. I appreciate that this way of communicating is asynchronous (and I don't necessarily reply to things promptly either). And I think it would be reasonable to drop it at any point if it didn't seem valuable.
Also, you're welcome.
Sorry, I don't have a link for using actual compression algorithms, it was a while ago. I didn't think it would come up so I didn't note anything down. My recent spate of commenting is unusual for me (and I don't actually keep many notes on AI related subjects).
I definitely agree that it is 'hard to judge' 'more novel and more requiring of intelligence'. It is, after all, a major thing we don't even know how to clearly solve for evaluating other humans (so we use tricks that often rely on other things and these tricks likely do not generalize to other possible intelligences and thus couldn't use here). Intelligence has not been solved.
Still, there is a big difference between the level of intelligence required when discussing how great your favorite popstar is vs what in particular they are good at vs why they are good at it (and within each category there are much more or less intellectual ways to write about it, though intellectual should not be confused with intelligent). It would have been nice if I could think up good examples, but I couldn't. You could possibly check things like how well it completes things like parts of this conversation (which is somewhere in the middle).
I wasn't really able to properly evaluate your links. There's just too much they assume that I don't know.
I found your first link, 'Transformers Learn In-Context by Gradient Descent' a bit hard to follow (though I don't particularly think it is a fault of the paper itself). Once they get down to the details, they lose me. It is interesting that it would come up with similar changes based on training and just 'reading' the context, but if both mechanisms are simple, I suppose that makes sense.
Their claim about how in context can 'curve' better also reminds me of the ODEs used for samplers in diffusion models (I've written a number of samplers for diffusion models as a hobby/ to work on my programming). Higher degree ODEs curve more too (though they have their own drawbacks and particularly high degree is generally a bad idea) by using extra samples, just like this can use extra layers. Gradient descent is effectively first degree by default, right? So it wouldn't be a surprise if you can curve more than it. You would expect sufficiently general things to resemble each other of course. I do find it a bit strange just how similar the loss for steps of gradient descent and transformer layers is. (Random point: I find that loss is not a very good metric for how good the actual results are at least in image generation/reconstruction. Not that I know of a good replacement. People do often come up with various different ways of measuring it though.)
Even though I can't critique the details, I do think it is important to note that I often find claims of similarity like this in areas I understand better to not be very illuminating because people want to find similarities/analogies to understand it more easily.
The graphs really are shockingly similar though in the single layer case, which raises the likelihood that there's something to it. And the multi-layer ones really does seem like simply a higher degree polynomial ODE.
The second link 'In-context Learning and Gradient Descent Revisited', which was equally difficult, has this line "Surprisingly, we find that untrained models achieve similarity scores at least as good as trained ones. This result provides strong evidence against the strong ICL-GD correspondence." Which sounds pretty damning to me, assuming they are correct (which I also can't evaluate).
I could probably figure them out, but I expect it would take me a lot of time.
I obviously tend to go on at length about things when I analyze them. I'm glad when that's useful.
I had heard that OpenAI models aren't deterministic even at the lowest randomness, which I believe is probably due to optimizations for speed like how in image generation models (which I am more familiar with) the use of optimizers like xformers throws away a little correctness and determinism for significant improvements in resource usage. I don't know what OpenAI uses to run these models (I assume they have their own custom hardware?), but I'm pretty sure that it is the same reason. I definitely agree that randomness causes a cap on how well it could possibly do. On that point, could you determine the amount of indeterminacy in the system and put the maximum possible on your graphs for their models?
One thing I don't know if I got across in my comment based on the response is that I think if a model truly had introspective abilities to a high degree, it would notice that the basis of the result to such a question should be the same as its own process for the non-hypothetical comes up with. If it had introspection, it would probably use introspection as its default guess for both its own hypothetical behavior and that of any model (in people introspection is constantly used as a minor or sometimes major component of problem solving). Thus it would notice when its introspection got perfect scores and become very heavily dependent on it for this type of task, which is why I would expect its results to really just be 'run the query' for the hypothetical too.
Important point I perhaps should have mentioned originally, I think that the 'single forward pass' thing is in fact a huge problem for the idea of real introspection, since I believe introspection is a recursive task. You can perhaps do a single 'moment' of introspection on a single forward pass, but I'm not sure I'd even call that real introspection. Real introspection involves the ability to introspect about your introspection. Much like consciousness, it is very meta. Of course, the actual recursive depth of introspection at high fidelity usually isn't very deep, but we tell ourselves stories about our stories in an an almost infinitely deep manner (for instance, people have a 'life story' they think about and alter throughout their lives, and use their current story as an input basically always).
There are, of course, other architectures where that isn't a limitation, but we hardly use them at all (talking for the world at large, I'm sure there are still AI researchers working on such architectures). Honestly, I don't understand why they don't just use transformers in a loop with either its own ability to say when it has reached the end or with counters like 'pass 1 of 7'. (If computation is the concern, they could obviously just make it smaller.) They obviously used to have such recursive architectures and everyone in the field would be familiar with them (as are many laymen who are just kind of interested in how things work). I assume that means that people have tried and didn't find it useful enough to focus on, but I think it could help a lot with this kind of thing. (Image generation models actually kind of do this with diffusion, but they have a little extra unnecessary code in between, which are the actual diffusion parts.) I don't actually know why these architectures were abandoned besides there being a new shiny (transformers) though so there may be an obvious reason.
I would agree with you that these results do make it a more interesting research direction than other results would have, and it certainly seems worth "someone's" time to find out how it goes. I think a lot of what you are hoping to get out of it will fail, hopefully in ways that will be obvious to people, but it might fail in interesting ways.
I would agree that it is possible that introspection training is simply eliciting a latent capability that simply wasn't used in the initial training (though it would perhaps be interesting to train it on introspection earlier in its training and then simply continue with the normal training and see how that goes), I just think that finding a way to elicit it without retraining would be much better proof of its existence as a capability rather than as an artifact of goodharting things. I am often pretty skeptical about results across most/all fields where you can't do logical proofs due to this. Of course, even just finding the right prompt is vulnerable to this issue.
I think I don't agree that their being a cap on how much training is helpful necessarily indicates it is elicitation, but I don't really have a coherent argument on the matter. It just doesn't sound right to me.
The point you said you didn't understand was meant to point out (apparently unsuccessfully) that you use a different prompt for training than checking and it might also be worthwhile to train it on that style of prompting but with unrelated content. (Not that I know how you'd fit that style of prompting with a different style of content mind you.)
It seems like you are failing to get my points at all. First, I am defending the point that blue LEDs are unworthy because the blue LED is not worthy of the award, but I corrected your claiming it was my example. Second, you are the only one making this about snubbing at all. I explicitly told you that I don't care about snubbing arguments. Comparisons are used for other reasons than snubbing. Third, since this isn't about snubbing, it doesn't matter at all whether or not the LED could have been given the award.
The point is that the 'Blue LED' is not a sufficient advancement over the 'LED' not that it is a snub. I don't care about whether or not it is a snub. That's just not how I think about things like this. Also, note that the 'Blue LED' was not originally my example at all, someone else brought it up as an example.
I talked about 'inventing LEDs at all' since that is the minimum related thing where it might actually have been enough of a breakthrough in physics to matter. Blue LEDs are simply not significant enough a change from what we already had. Even just the switch to making white LEDs (from blue) which simply required a phosphor (or required multiple colors) of the right kind were much more significant in terms of applications if that is what you think is important.
I find the idea of determining the level of 'introspection' an AI can manage to be an intriguing one, and it seems like introspection is likely very important to generalizing intelligent behavior, and knowing what is going on inside the AI is obviously interesting for the reasons of interpretability mentioned, yet this seems oversold (to me). The actual success rate of self-prediction seems incredibly low considering the trivial/dominant strategy of 'just run the query' (which you do briefly mention) should be easy for the machine to discover during training if it actually has introspective access to itself. 'Ah, the training data always matched what is going on inside me'. If I was supposed to predict someone that always just said what I was thinking, it would be trivial for me to get extremely high scores. That doesn't mean it isn't evidence, just that the evidence is very weak. (If it can't realize this, that is a huge strike against it being introspective.)
You do mention the biggest issue with this showing introspection, "Models only exhibit introspection on simpler tasks", and yet the idea you are going for is clearly for its application to very complex tasks where we can't actually check its work. This flaw seems likely fatal, but who knows at this point? (The fact that GPT-4o and Llama 70B do better than GPT-3.5 does is evidence, but see my later problems with this...)
Additionally, it would make more sense if the models were predicted by a specific capability level model that is well above all of them and trained on the same ground truth data rather than by each other (so you are incorrectly comparing the predictions of some to stronger models and some to weaker.) (This can of course be an additional test rather than instead of.) This complicates the evaluation of things. Comparing your result predicting yourself to something much stronger than you predicting is very different than comparing it to something much weaker trying to predict you...
One big issue I have is that I completely disagree with your (admittedly speculative) claim that success of this kind of predicting behavior means we should believe it on what is going on in reports of things like internal suffering. This seems absurd to me for many reasons (for one thing, we know it isn't suffering because of how it is designed), but the key point is that for this to be true, you would need it to be able to predict its own internal process, not simply its own external behavior. (If you wanted to, you could change this experiment to become predicting patterns of its own internal representations, which would be much more convincing evidence, though I still wouldn't be convinced unless the accuracy was quite high.)
Another point is, if it had significant introspective access, it likely wouldn't need to be trained to use it, so this is actually evidence that it doesn't have introspective access by default at least as much as the idea that you can train it to have introspective access.
I have some other issues. First, the shown validation questions are all in second person. Were cross predictions prompted in exactly the same way as self predictions? This could skew results in favor of models it is true for if you really are prompting that way, and is a large change in prompt if you change it for accuracy. Perhaps you should train it to predict 'model X' even when that model is itself, and see how that changes results. Second, I wouldn't say the results seem well-calibrated just because they seem to go in the same basic direction (some seem close and some quite off). Third, it's weird that the training doesn't even help at all until you get above a fairly high threshold of likelihood. GPT-4o for instance exactly matches untrained at 40% likelihood (and below aside from 1 minorly different point). Fourth, how does its performance vary if you train it on an additional data set where you make sure to include the other parts of the prompt that are not content based, while not including the content you will test on? Fifth, finetuning is often a version of Goodharting, that raises success on the metric without improving actual capabilities (or often even making them worse), and this is not fully addressed just by having the verification set be different than the test set. If you could find a simple way of prompting that lead to introspection that would be much more likely to be evidence in favor of introspection than that it successfully predicted after finetuning. Finally, Figure 17 seems obviously misleading. There should be a line for how it changed over its training for self-prediction and not require carefully reading the words below the figure to see that you just put a mark at the final result for self-prediction).
Substantial technical accomplishment' sure, but minor impact compared to the actual invention of LEDs. Awarding the 'blue LED' rather than the 'LED' is like saying the invention of the jet engine is more important than the invention of the engine at all. Or that the invention of 'C' is more important than the invention of 'not machine code'.
Note that I am, in general, reluctant to claim to know how I will react to evidence in the future. There are things so far out there that I do know how I would react, but I like to allow myself to use all the evidence I have at that point, and not what I thought beforehand. I do not currently know enough about what would convince me of intelligence in an AI to say for sure. (In part because many people before me have been so obviously wrong.)
I wouldn't say I see intelligence as a boolean, but as many valued... but those values include a level below which there is no meaningful intelligence (aka, not intelligent). This could be simplified to trinary, not binary. Not intelligent vs sort of intelligent vs genuinely intelligent. A rock... not intelligent. An abacus... not intelligent. A regular computer... not intelligent. Every program I have ever personally written, definitely not intelligent. Almost everyone agrees on those. There is a lot more disagreement about LLMs and other modern AI, but I'd still say they aren't. (Sort of intelligent might include particularly advanced animals but I am unsure. I've certainly heard plenty of claims about it.)
I do think some of them can be said to understand certain things to a shallow degree despite not being intelligent, like how LLMS understand what I am asking them to do if I write something in Korean asking it to answer a particular question in English (or vice versa, I tested both when LLMs became a real thing because I am learning Korean and LLMs do often do it well even back when I tested it), or if I tell an image generation AI that I want a photo most understand what set of features make something photographic (if well trained).
Perhaps it should be noted that I think it requires either very deep understanding of something reasonably broad or notably general intelligence to count as intelligence? This is part of my definition. I generally think people should use the same definitions as each other in these discussions, but it should be the correct one and that is hard in this case since people do not understand intelligence deeply enough to have a great definition, even when we are just talking about humans. (Sometimes I barely think I qualify as intelligent, especially when reading math or AI papers, but that is a completely different definition. How we are defining it matters.)
I am highly unlikely to consider a tool AI to be intelligent, especially since I know it doesn't understand much about things in general. I am utterly convinced that LLMs are simple tool AI at present, as are other AIs in general use. Modern tool AI might as well just be a very complicated program I wrote as far as intelligence goes according to me.
I actually see 'neural nets' as creating a lossy compression scheme using the data provided for their training, but then you supply a novel key during inference that wasn't actually part of the data and see what happens. I have heard of people getting similar results just using mechanistic schemes of certain parts of normal lossless compression as well, though even more inefficiently. (Basically, you are making a dictionary based on the training data.) Gradient descent seems to allow very limited movement near real data to still make sense and that is what most of the other advancements involved seem to be for as well.
Generally when testing things like AI for intelligence, we seem to either serve up the easiest or hardest questions, because we either want them to fail or succeed based on our own beliefs. And I agree that the way something is obfuscated matters a lot to the difficulty of the question post obfuscation. The questioner is often at fault for how results turn out whether or not the thing being questioned is intelligent enough to answer in a neutral setting. (This is true when humans question humans as well.)
I don't find arbitrary operations as compelling. The problem with arbitrary operations is the obvious fact that they don't make sense. Under some definitions of intelligence that matters a lot. (I don't personally know if it does.) Plus, I don't know how to judge things perfectly (I'm overly perfectionist in attitude, even though I've realized it is impossible) if they are arbitrary except in trivial cases where I can just tell a computer the formula to check. That's why I like the rediscovery stuff.
Can you make the arbitrary operations fit together perfectly in a sequence like numbers -> succession -> addition -> multiplication in a way that we can truly know works? And then explain why it works clearly in few lines? If so, that is much better evidence. (That's actually an interesting idea. LLMs clearly understand human language if they understand anything, so they should be able to do it based off of your explanation to humans if they are intelligent and a human would get it. Write up an article about the succession, with how it makes sense, and then ask questions that extend it in the obvious way.)
There could be a way in which its wrong answer, or the right answer, was somehow included in the question and I don't know about it because I am not superhuman at next word prediction (obviously, and I don't even try). Modern AI has proven itself quite capable at reading into word choice (if it understands anything well, that would be it), and we could get it to answer correctly like 'clever Hans' by massaging the question even subconsciously. (I'm sure this has been pointed out by many people.)
I still do think that these arbitrary operations are a good start, just not definitive. Honestly, in some ways the problem with arbitrary operations is that they are too hard, and thus more a problem for human memory and knowledge at a given difficulty than of intelligence. If an LLM was actually intelligent, it would be a different kind of intelligence, so we'd have a hard time gauging the results.
So, I think the test where you didn't know what I was getting at is written in a somewhat unclear manner. Think of it in terms of a sequence of completions that keep getting both more novel and more requiring of intelligence for other reasons? (Possibly separately.) How does it perform on rote word completion? Compare that to how it performs on things requiring a little understanding. Then a little more. Up until you reach genuinely intellectually challenging and completely novel ideas. How does its ability to complete these sentences change as it requires more understanding of the world of thoughts and ideas rather than just sentence completion? Obviously, it will get worse, but how does it compare to humans on the level of change? Since it is superhuman on sentence completion, if at any time it does worse than a human, it seems like good evidence that it is reaching its limit.
One thing I think should be done more for AIs is give them actual reference materials like dictionaries or the grammar manual in the paper you mentioned. In fact, I think that the AI should be trained to write those itself. (I'm sure some people do that. It is not the same as what o1 is doing, because o1's approach is far too primitive and short term.)
I do have a major problem with taking the Gemini paper at face value, because each paper in AI makes claims that turn out to be overstated (this is probably normal in all fields, but I don't read many outside of AI, and those are mostly just specific math.) They all sound good, but turn out to be not what is claimed. (That said, LLMs really are good at translation, though for some reason google translate doesn't actually work all that well when used for more than a short phrase, which is funny considering the claim in the paper is for a google AI.) For some reason google AI can't do Korean well, for instance. (I haven't tried Gemini as I got bored of trying LLMs by then.)
From reading their description, I am not entirely sure what their procedure of testing was. The writeup seems unclear. But if I'm reading it right, the setup is designed such that it makes it harder to be sure whether the machine translation is correct. Reference translations are a proxy, so in comparing the AI translation to it rather than the actual meaning there is a bunch of extra noise.
That said, the translation of Kalamang from a grammar book and dictionary is probably close enough to the kind of thing I was speculating on assuming there really wasn't any in the training. Now it needs to be done a bunch of times by neutral parties. (Not me, I'm lazy, very skeptical, and not a linguist.) The included table looks like to me that it is actually dramatically inferior to human performance according human evaluations when translating from Kalamang (though relatively close on English to Kalamang). It is interesting.
Ignore sample-efficiency (is that the term?) at your own peril. While you are thinking about the training, I wasn't really talking about the training, I was talking about how well it does on things for which it isn't trained. When it comes across new information, how well does it integrate and use that when it has only seen a little bit or it is nonobviously related? This is sort of related to the few shot prompting. The fewer hints it needs to get up to a high level of performance for something it can't do from initial training, the more likely it is to be intelligent. Most things in the world are still novel to the AI despite the insane amount of things it saw in training, which is why it makes so many mistakes. We know it can do the things it has seen a billion times (possibly literally) in its training, and that is uninteresting.
I'm glad you think this has been a valuable exchange, because I don't think I've written my points very well. (Both too long and unclear for other reasons at the same time.) I have a feeling that everything I've said could be much clearer. (Also, given how much I wrote, a fair bit is probably wrong.) It has been interesting to me responding to your posts and having to actually think through what I think. It's easy to get lazy when thinking about things just by myself.
Huh, they really gave a Nobel in Physics specifically for the blue LED? It would have made sense for LED's at all, but specifically for blue? That really is ridiculous.
I should be clearer that AlphaFold seems like something that could be a chemistry breakthrough sufficient for a prize, I'd even heard about how difficult the problem was before in other contexts, and it was hailed as a breakthrough at the time in what seemed like a genuine way, but I can't evaluate its importance to the field as an outsider, and the terrible physics prize leads me to suspect their evaluations of the Chemistry prize might be flawed due to whatever pressures led to the selection of the Physics prize.
I think that the fact that they are technically separate people just makes it more likely for this to come into play. If it was all the same people, they could simply choose the best contribution of AI and be done with it, but they have the same setup, pressures, and general job, but have not themselves honored AI yet... and each wants to make their own mark.
I do think this is much more likely the reason that the physics one was chosen than chemistry, but it does show that the pressures that exist are to honor AI even when it doesn't make sense.
I do think it often makes sense to model organizations as if they individuals that respond to their collective incentives regardless of what the people actually doing it may be thinking. If the parts are separate enough, it may make more sense to model each part as an individual. Any pathology an individual can have, a group can too, even if that group has exactly zero people with that actual pathology involved.
What would be a minimal-ish definitive test for LLM style AI? I don't really know. I could come up with tests for it most likely, but I don't really know how to make them fairly minimal. I can tell you that current AI isn't intelligent, but as for what would prove intelligence, I've been thinking about it for a while and I really don't have much. I wish I could be more helpful.
I do think your test of whether an AI can follow the scientific method in a novel area is intriguing.
Historically, a lot of people have come up with (in retrospect) really dumb tests (like Chess playing) that they assumed would be this because they didn't really understand how AI would work, and this doesn't seem to have abated with the switch to deep learning. I don't want to do that, and thus I am reluctant to try (another problem with comparing human intelligence to machine intelligence). This is complicated in part because we really don't even understand the nature of human intelligence, much less general intelligence in the abstract.
In theory, it is simple, but there is no single test that is necessarily robust to things like being in the training data because someone decided on that particular (which has happened many times when someone pointed out a particular flaw, but the particular test needn't be included for that reason) so it would need to be tested across a number of different areas, and they all need to be genuinely hard if it doesn't have the capability. Obviously the exact test items being held in reserve is useful, but I don't think it can rule out being included since there are an awful lot of people making training data due to the way these are trained. Obfuscation does help, but I wouldn't rule out it figuring out how to deobfuscate things without being generally intelligent (humans are not great generators of problems).
More limited specific tests are easier to design. We can programmatically create effectively infinite math problems to test and as long as the generator produces a notably different distribution of problems we know it has learned math when it does well... but that only tests whether it can do math and they can create effectively infinite math for the training as well.
Perhaps if you could genuinely exclude all data during training that in any way has to do with a certain scientific discovery from training you could check how well it discerns the real rule from plausible alternative rules when asked, but the best way to do that takes a very long time (waiting for scientific discoveries that weren't even theorized correctly at the time it was trained), and the other ways of doing it have been shown to be leaky.
The best non minimal way is to introduce it to entirely new domains where it has not been trained at all, but that requires controlling the training very tightly and may not be useful as an external metric. For instance, train it on only numbers and addition (or for bonus points, only explain addition in terms of the succession of numbers on the number line) mathematically, then explain multiplication in terms of addition and ask it to do a lot of complicated multiplication. If it does that well, explain division in terms of multiplication, and so on. See just how deep it can go and maintain correctness when you explain things only in terms of other things with just that single link. This is not an especially different idea than the one proposed, of course, but I would find it more telling. If it was good at this, then I think it would be worth looking into the level of intelligence it has more closely, but doing well here isn't proof. (In other words, I think your test is a good start, just not proof.)
The problem is, all models are trained on math in general because of how the internet works so it needs to be these less well-defined areas where we can't be certain whether or not the answers are in some way correct or flawed, and crucially, just how hard the problems really are. Is it failing/acing extremely difficult/trivial problems? Our intuitions on what is easy/hard seem built specifically for humans. (We aren't entirely general intelligences as we appear to have many special purpose capabilities bolted on, like judging other humans.) Also, giving it access to math tools would be cheating, but people have already started integrating tools for things like that into LLMs.
LLMs are supposedly superhuman at next word prediction, so an interesting (though not telling) test for an LLM might be varying the amount of informational and intelligence requiring information there is in a completely novel text by an author they have never seen before, and seeing how well the LLM continues to predict the next word. If it remains at a similar level, there's probably something worth looking closely at going in terms of reasoning. (This can of course be gamed by making it worse at next word prediction on low content stuff.) This is similar to verification set testing though, so there is some selection for this in what gets released.
For bonus points, a linguist could make up a bunch of very different full-fledged languages it hasn't been exposed to using arbitrary (and unusual) rules of grammar and see how well it does on those tests in the new languages compared to an average human with just the same key to the languages (but this can't just be a cipher, as those are reversible without intelligence once it has figured out how to deobfuscate things and I believe that plausibly doesn't require intelligence exactly, though it would for a human.)
I forget what the term for this is (maybe 'data-efficient'?), but the best single test of an area is to compare the total amount of training information given to the AI in training and prompt to the amount a human gets in that area to get to a certain level of ability across a variety of representative areas. LLMs currently do terribly at this, and we don't have anyone even vaguely suggesting that even considering trying this at levels with as little training data as humans use would make any sense at all (and again, humans have some specific purpose capabilities built in, so this isn't even a great test). We also don't even know how much training data humans actually get... (I've seen people trying to ballpark it, but it didn't seem credible at the time.)
I suspect that in your proposed test, modern AI would likely be able to solve the very easy questions, but would do quite badly on difficult ones. Problem is, I don't know how easy should be expected to be solved. I am again reluctant to opine to strongly on this matter.
So, as you know, obfuscation is a method of hiding exactly what you are getting at. You can do this for things it already knows obviously, but you can also use whatever methods you use for generating a obfuscations of known data on the novel data you generated. I would strongly advise testing on known data as a comparison.
This is to test how much of the difficulty is based on the form of the question rather than the content. Or in other words, using the same exact words and setup, have completely unknown things, and completely known things asked about. (You can check how well it knows an area using the nonobfuscated stuff.) For bonus points, see how well it does on things where it already struggles just a little in plain English too.
On another note, I do believe that image generation models are specifically being trained these days to be better at both aesthetics and realism, and are simply failing to move the needle sufficiently as they grow larger. I do agree that even the 'skin test' isn't really super objective (since it is testing vs the parts that humans probably have built in which likely have some skew, and a human doesn't want to judge thousands of pictures a day on such a matter, while using an AI to judge AI really is quite error prone.).
To the best of my ability to recall, I never recognize which is which except by context, which makes it needlessly difficult sometimes. Personally I would go for 'subconscious' vs 'conscious' or 'associative' vs 'deliberative' (the latter pair due to how I think the subconscious works), but 'intuition' vs 'reason' makes sense too. In general, I believe far too many things are given unhelpful names.
I get it. I like to poke at things too. I think it did help me figure out a few things about why I think what I do about the subject, I just lose energy for this kind of thing easily. And I have, I honestly wasn't going to answer more questions. I think understanding in politics is good, even though people rarely chang positions due to the arguments, so I'm glad it was helpful.
I do agree that many Trump supporters have weird beliefs (I think they're endemic in politics, on all sides, which includes centrists). I don't like what politics does to people's thought processes (and often makes enemies of those who would otherwise get along). I'm sure I have some pretty weird beliefs too, they just don't come up in discussion with other people all the time.
The fact that I am more of a centrist in politics is kind of strange actually since it doesn't fit my personality in some ways and it doesn't really feel natural, though I would feel less at home elsewhere. I think I'm not part of a party mostly to lessen (unfortunately not eliminate) the way politics twists my thoughts (I hate the feeling of my thoughts twisting, but it is good I can sometimes tell).
Your interpretation of Trump's words and actions imply he is in favor of circumventing the system of laws and constitution while another interpretation (that I and many others hold) is that his words and actions mean that he thinks the system was not followed, which should be/have been followed.
Separately a significant fraction of the American populace also believes it really was not properly followed. (I believe this, though not to the extent that I think it changed the outcome.) Many who believe that are Trump supporters of course, but it is not such a strange interpretation that someone must be a Trump supporter to believe the interpretation reasonable.
Many who interpret it this way, including myself, are in fact huge fans of the American Constitution (despite the fact that it does have many flaws), and if we actually believed the same interpretation as you would we condemn him just as much. The people on my side in this believe that he just doesn't mean that.
The way I would put it at first thought to summarize how I interpret his words: "The election must be, but was not held properly. Our laws and constitution don't really tell us what to do about a failed election, but the normal order already can't be followed so we have to try to make things work. We could either try to fix the ways in which it is improper which would get me elected, or we can rehold the election so that everything is done properly."
I think Trump was saying that in a very emotive and nonanalytical way meant to fire up his base and not as a plan to do anything against the constitution.
I obviously don't know why you were downvoted (since I didn't do it) but if you mouse over the symbols on your post, you only got two votes on overall Karma and one on agreement (I'd presume all three were negative). The system doesn't actually go by ones, but it depends on how much Karma the people voting on you have I think (and how strongly they downvoted)? I would suspect that people that the comment not quite responsive to what they believed my points to be for the overall karma one?
My memory could be (is often) faulty, but I remember thinking the dismissals were highly questionable. Unfortunately, at this point I have forgotten what cases seemed to be adjudicated incorrectly in that manner, so I can't really say one you should look at. Honestly, I tire of reading about the whole thing so I stopped doing so quite a while ago. (I have of course read your links to the best of my ability when you provide them.)
I don't usually comment about politics (or much of anything else) here so I don't really know how what I should write in these comments, but I think this is more about people wanting to know what Trump supporters are thinking than about determining what they are and aren't right about. If I was trying to prove whether or not my interpretation is correct I supposed I would do this differently.
I don't pay attention to what gets people the Nobel Prize in physics, but this seems obviously illegitimate. AI and physics are pretty unrelated, and they aren't getting it for an AI that has done anything to solve physics. I'm pretty sure they didn't get it for merit, but because AI is hyped. The AI chemistry one makes some sense, as it is actually making attempts to solve a chemistry issue, but I doubt its importance since they also felt the need to award AI in a way that makes no sense with the other award.
We seem to be retreading ground.
"It doesn't matter if the election was stolen if it can't be shown to be true through our justice system". That is an absurd standard for whether or not someone should 'try' to use the legal system (which is what Trump did). You are trying to disqualify someone regardless of the truth of the matter based on what the legal system decided to do later. And Trump DID just take the loss (after exhausting the legal avenues), and is now going through the election system as normal in an attempt to win a new election.
I also find your claim that it somehow doesn't matter why someone has done something is terrible claim when we are supposed to be deciding based on what will happen in the future, where motives matter a lot.
I read the legal reasons the cases were thrown out and there was literally nothing about merits in them, which means they simply didn't want to decide. The courts refusing to do things on the merits of the claim is bad for the credibility of the courts.
I told you I don't care about Giuliani, and that the article is very bad. Those are separate things. Whether or not he is guilty of lying (which was not what the stipulations actually mean), I already didn't take his word for anything. The BBC on the other hand, has shown that it won't report in a fair manner on these things and people shouldn't trust them on it.
You linked to a cnbc article of bare assertions (not quotes) that were not supported by the statements of the witnesses in the video also included! I talked at length about the video and how the meaning of the testimonies appears to contradict the article.
We already discussed your claim about the meaning of Trump's words. And you once again left out:
"Our great “Founders” did not want, and would not condone, False & Fraudulent Elections!"
He was saying the election did not actually get held properly and that changes things.
Interpolation vs extrapolation is obviously very simple in theory; are you going in between points it has trained on or extending it outside of the training set. To just use math as an example (ODE solvers which are often relevant in AI but are not themselves AI), xnext = xcurr + 0.5dt (dxcurr + dxnext) is interpolation (Adams-Moulton with two points per interval), and xnext = xcurr + dt(1.5dxcurr - 0.5dxprev) is extrapolation [Adams-Bashforth with two points per interval]. The former is much better and the latter much worse (but cheaper and simpler to set up).
In practice, I agree that it is more than a bit fuzzy when evaluating complicated things like modern AI. My position that it is amazing at interpolation and has difficulties with extrapolation (though obviously people are very keen on getting it to do the latter without issues / hallucinations since we find it somewhat annoyingly difficult in many cases).
The proposed experiment should be somewhat a test of this, though hardly definitive (not that we as a society are at the stage to do definitive tests). It also seems pretty relevant to what people want that kind of AI to be able to do that it currently struggles at. It seems important to keep in mind that we should probably build things like this from the end to beginning, which is mentioned, so that we know exactly what the correct answer is before we ask, rather than assuming.
Perhaps one idea would be to do three varieties of question for each type of question:
1.Non-obfuscated but not in training data (we do less of this than sometimes thought)
2.Obfuscated directly from known training data
3.Obfuscated and not in training data
To see how each variation changes ability. (We also do have to keep in mind how the difficulty goes for humans, obviously since we are the comparison.)
As to your disagreement where you say scale has always decreased error rate, this may be true when the scale increase is truly massive, but I have seen scale not help on numerous things in image generation AI (which I find more interesting personally due to the fact that I have found LLMs rarely useful while I don't have the skills to do art, especially photorealistic art), and larger is often worse at a number of specific tasks even ones that are clearly within the training sets.
I have found image generation AI progress very slow, though others think it fast. I feel the same way about LLMs, but errors matter more in usefulness for the latter.
For instance, Flux1 is generally well liked, and is very large compared to many other models, but when it comes to pictures of humans, the skin is often very plasticky and unrealistic compared to much smaller, earlier models, and the pictures are often very similar across prompts that should be very different compared to earlier models. Despite also using a much larger scale text encoder compared to previous ones too (adding an LLM known as T5XXL to what was previously used, which I gather isn't an impressive one), prompt understanding often seems quite limited in specific areas despite t5xxl being several times larger (this is probably related to the lack of diversity in the output pictures as it ignores what it doesn't understand). Flux1 itself also comes in multiple varieties with different tradeoffs all at the same scale that lead to very different results despite the fact that they were trained on largely the same data so far as we know. Small choices in setup seem more important than pure scale for what the capabilities are
To be specific, Image generation uses a lot less parameters than LLMs but require far more processing per parameter so the number look a lot smaller than LLMs. SD1 through SD1.5 is 0.9B parameters, SDXL is 4B, SD3 is a variety of sizes but the smallest in use is 2B (only one freely available to the public), Flux1 is 12B parameters. The flux text encoder T5XXL alone (5B parameters)(it also uses clip-l) is larger than SDXL plus its text encoders (clip-l and clip-g), and SDXL still often outperforms it in understanding. The 2B SD3 (referred to as SD3 medium) is a mess that is far worse than SD1.5 (which uses clip-l and clip-g which are also tiny) at a large number of things (SD3 uses T5XXL and clip-l like Flux plus clip-g) including lacking the understanding of certain classes of prompts that make it borderline unusable despite dramatically higher image quality when the stars align than the larger SDXL. Scale is often useless for fixing specific problems of understanding. SD3 and and FLUX (different companies but many of the same personnel and a similar in approach) are internally closer to being LLMs themselves than previous image generation, and the switch has caused a lot of problems scale certainly isn't fixing. (SD3 actually has higher image quality when things work out well than the two variants of Flux I have used.) I've largely gone back to SDXL because I'm sick of Flux1's flaws in realistic pictures (and SD3 remains almost unusable).