Posts
Comments
If you want to understand why a model, any model, did something, you presumably want a verbal explanation of its reasoning, a chain of thought. E.g. why AlphaGo made its famous unexpected move 37. That's not just true for language models.
Actually the paper doesn't have any more on this topic than the paragraph above.
Yeah, I also guess that something in this direction is plausibly right.
perhaps nothingness actually contains a superposition of all logically possible states, models, and systems, with their probability weighted by the inverse of their complexity.
I think the relevant question here is why we should expect their probability to be weighted by the inverse of their complexity. Is there any abstract theoretical argument for this? In other words, we need to find an a priori justification for this type of Ockham's razor.
Here is one such attempt: Any possible world can be described as a long logical conjunction of "basic" facts. By the principle of indifference, assume any basic fact has the same a priori probability (perhaps even probability 0.5, equal to its own negation), and that they are a priori independent. Then longer conjunctions will have lower probability. But longer conjunctions also describe more complex possible worlds. So simpler possible worlds are more likely.
Though it's not clear whether this really works. Any conjunction completely describing a possible world would also need to include a statement "... and no other basic facts are true", which is itself a quantified statement, not a basic fact. Otherwise all conjunctive descriptions of possible worlds would be equally long.
Regular LLMs can use chain-of-thought reasoning. He is speaking about generating chains of thought for systems that don't use them. E.g. AlphaGo, or diffusion models, or even an LLM in cases where it didn't use CoT but produced the answer immediately.
As an example, you ask an LLM a question, and it answers it without using CoT. Then you ask it to explain how it arrived at its answer. It will generate something for you that looks like a chain of thought. But since it wasn't literally using it while producing its original answer, this is just an after-the-fact rationalization. It is questionable whether such a post-hoc "chain of thought" reflects anything the model was actually doing internally when it originally came up with the answer. It could be pure confabulation.
Clearly ANNs are able to represent propositional content, but I haven't seen anything that makes me think that's the natural unit of analysis.
Well, we (humans) categorize our epistemic state largely in propositional terms, e.g. in beliefs and suppositions. We even routinely communicate by uttering "statements" -- which express propositions. So propositions are natural to us, which is why they are important for ANN interpretability.
It seems they are already doing this with R1, in a secondary reinforcement learning step. From the paper:
2.3.4. Reinforcement Learning for all Scenarios
To further align the model with human preferences, we implement a secondary reinforcement learning stage aimed at improving the model’s helpfulness and harmlessness while simultaneously refining its reasoning capabilities. Specifically, we train the model using a combination of reward signals and diverse prompt distributions. For reasoning data, we adhere to the methodology outlined in DeepSeek-R1-Zero, which utilizes rule-based rewards to guide the learning process in math, code, and logical reasoning domains. For general data, we resort to reward models to capture human preferences in complex and nuanced scenarios. We build upon the DeepSeek-V3 pipeline and adopt a similar distribution of preference pairs and training prompts. For helpfulness, we focus exclusively on the final summary, ensuring that the assessment emphasizes the utility and relevance of the response to the user while minimizing interference with the underlying reasoning process. For harmlessness, we evaluate the entire response of the model, including both the reasoning process and the summary, to identify and mitigate any potential risks, biases, or harmful content that may arise during the generation process. Ultimately, the integration of reward signals and diverse data distributions enables us to train a model that excels in reasoning while prioritizing helpfulness and harmlessness.
Because I don’t care about “humanity in general” nearly as much as I care about my society. Yes, sure, the descendants of the Amish and the Taliban will cover the earth. That’s not a future I strive for. I’d be willing to give up large chunks of the planet to an ASI to prevent that.
I don't know how you would prevent that. Absent an AI catastrophe, fertility will recover, in the sense that "we" (rationalists etc) will mostly be replaced with people of low IQ and impulse control, exactly those populations that have the highest fertility now. And "banishing aging and death" would not prevent them from having high fertility and dominating the future. Moloch is relentless. The problem is more serious than you think.
I'm almost certainly somewhat of an outlier, but I am very excited about having 3+ children. My ideal number is 5 (or maybe more if I become reasonably wealthy). My girlfriend is also on board.
It's quite a different question whether you would really pull through with this or whether either of you would change their preference and stop at a much lower number.
Therefore, it's very likely that OpenAI is sampling the best ones from multiple CoTs (or CoT steps with a tree search algorithm), which are the ones shown in the screenshots in the post.
I find this unlikely. It would mean that R1 is actually more efficient and therefore more advanced that o1, which is possible but not very plausible given its simple RL approach. I think it's more likely that o1 is similar to R1-Zero (rather than R1), that is, it may mix languages which doesn't result in reasoning steps that can be straightforwardly read by humans. A quick inference time fix for this is to do another model call which translates the gibberish into readable English, which would explain the increased CoT time. The "quick fix" may be due to OpenAI being caught off guard by the R1 release.
As you say, the addition of logits is equivalent to the multiplication of odds. And odds are related to probability with . (One can view them as different scalings of the same quantity. Probabilities have the range , odds have the range , and logits have the range .)
Now a well-known fact about multiplication of probabilities is this:
- , when and are independent.
But there is a similar fact about the multiplication of odds , though not at all well-known:
- , when and are independent.
That is, multiplying the odds of two independent events/propositions gives you the probability of their conjunction, given that their biconditional is true, i.e. given that they have the same truth values / that they either both happen or both don't happen.
Perhaps this yields some more insight in how to interpret practical logit addition.
Yes. Current reasoning models like DeepSeek-R1 rely on verified math and coding data sets to calculate the reward signal for RL. It's only a side-effect if they also get better at other reasoning tasks, outside math and programming puzzles. But in theory we don't actually need strict verifiability for a reward signal, only your much weaker probabilistic condition. In the future, a model could check the goodness of it's own answers. At which point we would have a self-improving learning process, which doesn't need any external training data for RL.
And it is likely that such a probabilistic condition works on many informal tasks. We know that checking a result is usually easier than coming up with the result, even outside exact domains. E.g. it's much easier to recognize a good piece of art than to create one. This seems to be a fundamental fact about computation. It is perhaps a generalization of the apparent fact that NP problems (with quickly verifiable solutions) cannot in general be reduced to P problems (which are quickly solvable).
A problem with subjective probability is that it ignores any objective fact which would make one probability "better" or "more accurate" than the other. Someone could believe a perfectly symmetrical coin has a 10% chance of coming up heads, even though such a coin is physically impossible.
The concept of subjective probability is independent of any fact about objectively existing symmetries and laws. It also ignores physical dispositions, called propensities, which is like denying that a vase is breakable because this would, allegedly, be like positing a "mysterious force" which makes it true that the vase would break if it dropped.
Subjective probability is only a measure of degree of belief, not of what a "rational" degree of belief would be, and neither is it a measure of ignorance, of how much evidence someone has about something being true or false.
It is also semantically implausible. It is perfectly valid to say "I thought the probability was low, but it was actually high. I engaged in wishful thinking and ignored the evidence I had." But with subjective probability this would be a contradiction, it would be equivalent to saying "My degree of belief was low, but it was actually high". That's not what the first sentence actually expresses.
Related: In the equation , the values of all four variables are unknown, but x and y seem to be more unknown (more variable?) than a and b. It's not clear what the difference is exactly.
Explaining the Shapley value in terms of the "synergies" (and the helpful split in the Venn diagram) makes much more intuitive sense than the more complex normal formula without synergies, which is usually just given without motivation. That being said, it requires first computing the synergies, which seems somewhat confusing for more than three players. The article itself doesn't mention the formula for the synergy function, but Wikipedia has it.
Let me politely disagree with this post. Yes, often desires ("wants") are neither rational nor irrational, but that's far from always the case. Let's begin with this:
But the fundamental preferences you have are not about rationality. Inconsistent actions can be irrational if they’re self-defeating, but “inconsistent preferences” only makes sense if you presume you’re a monolithic entity, or believe your "parts" need to all be in full agreement all the time… which I think very badly misunderstands how human brains work.
In the above quote you could simply replace "preferences" with "beliefs". The form of argument wouldn't change, except that you now say (absurdly) that beliefs, like preferences, can't be irrational. I disagree with both.
One example of irrational desires is Akrasia (weakness of will). This phenomenon occurs when you want something (eat unhealthy, procrastinate, etc) but do not want to want it. In this case the former desire is clearly instrumentally irrational. This is a frequent and often serious problem and adequately labeled "irrational".
Note that this is perfectly compatible with the brain having different parts: E.g.: The (rather stupid) cerebellum wants to procrastinate, the (smart) cortex wants to not procrastinate. When in contradiction, you should listen to your cortex rather than to your cerebellum. Or something like that. (Freud called the stupid part of the motivation system the "id" and the smart part the "super ego".)
Such irrational desires are not reducible to actions. An action can fail to obtain for many reasons (perhaps it presupposed false beliefs) but that doesn't mean the underlying desire wasn't irrational.
Wants are not beliefs. They are things you feel.
Feelings and desires/"wants" are not the same. It's the difference between hedonic and preference utilitarianism. Desires are actually more similar to beliefs, as both are necessarily about something (the thing which we believe or desire), whereas feelings can often just be had, without them being about anything. E.g. you can simply feel happy without being happy about something specific. (Philosophers call mental states that are about something "intentional states" or "propositional attitudes".)
Moreover, sets of desires, just like sets of beliefs, can be irrational ("inconsistent"). For example, if you want x to be true and also want not-x to be true. That's irrational, just like believing x while also believing not-x. A more complex example from utility theory: If describes your degrees of belief in various propositions, and describes your degrees of desire that various proposition are true, and , then . In other words, if you believe two propositions to be mutually exclusive, your expected desire for their disjunction should equal the sum of your expected desires for the individual propositions, a form of weighted average.
More specifically, for a Jeffrey utility function defined over a Boolean algebra of propositions, and some propositions , "the sum is greater than its parts" would be expressed as the condition (which is, of course, not a theorem). The respective general theorem only states that , which follows from the definition of conditional utility
Yeah definitional. I think "I should do x" means about the same as "It's ethical to do x". In the latter the indexical "I" has disappeared, indicating that it's a global statement, not a local one, objective rather than subjective. But "I care about doing x" is local/subjective because it doesn't contain words like "should", "ethical", or "moral patienthood".
Ethics is a global concept, not many local ones. That I care more about myself than about people far away from me doesn't mean that this makes an ethical difference.
This seems to just repeat the repugnant conclusion paradox in more graphic detail. Any paradox is such that one can make highly compelling arguments for either side. That's why it's called a paradox. But doing this won't solve the problem. A quote from Robert Nozick:
Given two such compelling opposing arguments, it will not do to rest content with one's belief that one knows what to do. Nor will it do to just repeat one of the arguments, loudly and slowly. One must also disarm the opposing argument; explain away its force while showing it due respect.
Tailcalled talked about this two years ago. A model which predicts text does a form of imitation learning. So it is bounded by the text it imitates, and by the intelligence of humans who have written the text. Models which predict future sensory inputs (called "predictive coding" in neuroscience, or "the dark matter of intelligence" by LeCun) don't have such a limitation, as they predict reality more directly.
This still included other algorithmically determined tweets -- from what your followers had liked and later more generally "recommended" tweets. These are no longer present in the following tab.
I'm pretty sure there were no tabs at all before the acquisition.
Twitter did use an algorithmic timeline before (e.g. tweets you might be interested in, tweets people you followed liked), it was just less algorithmic than the "for you" tab currently. The time when it was completely like the current "following" tab was many years ago.
The algorithm has been horrific for a while
After Musk took over, they implemented a mode which doesn't use an algorithm on the timeline at all. It's the "following" tab.
In the past we already had examples ("logical AI", "Bayesian AI") where galaxy-brained mathematical approaches lost out against less theory-based software engineering.
Cities are very heavily Democratic, while rural areas are only moderately Republican.
I think this isn't compatible with both getting about equally many votes. Because much more US Americans live in cities than in rural areas:
In 2020, about 82.66 percent of the total population in the United States lived in cities and urban areas.
https://www.statista.com/statistics/269967/urbanization-in-the-united-states/
It's not that "they" should be more precise, but that "we" would like to have more precise information.
We know pretty conclusively now from The Information and Bloomberg that for OpenAI, Google and Anthropic, new frontier base LLMs have yielded disappointing performance gains. The question is which of your possibilities did cause this.
They do mention that the availability of high quality training data (text) is an issue, which suggests it's probably not your first bullet point.
Ah yes, the fork asymmetry. I think Pearl believes that correlations reduce to causations, so this is probably why he wouldn't particularly try to, conversely, reduce causal structure to a set of (in)dependencies. I'm not sure whether the latter reduction is ultimately possible in the universe. Are the correlations present in the universe, e.g. defined via the Albert/Loewer Mentaculus probability distribution, sufficient to recover the familiar causal structure of the universe?
This approach goes back to Hans Reichenbach's book The Direction of Time. I think the problem is that the set of independencies alone is not sufficient to determine a causal and temporal order. For example, the same independencies between three variables could be interpreted as the chains and . I think Pearl talks about this issue in the last chapter.
If base model scaling has indeed broken down, I wonder how this manifests. Does the Chinchilla scaling law no longer hold beyond a certain size? Or does it still hold, but reduction in prediction loss no longer goes along with a proportional increase in benchmark performance? The latter could mean the quality of the (largely human generated) training data is the bottle neck.
"Misinterpretation" is somewhat ambiguous. It either means not correctly interpreting the intent of an instruction (and therefore also not acting on that intent) or correctly understanding the intent of the instruction while still acting on a different interpretation. The latter is presumably what the outcome pump was assumed to do. LLMs can apparently both understand and act on instructions pretty well. The latter was not at all clear in the past.
Interesting. Question: Why does the prediction confidence start at 0.5? And how is the "actual accuracy" calculated?
I think I ger what you mean, though making more assumptions is perhaps not the best way to think about it. Logic is monotonic (classical logic at least), meaning that a valid proof remains valid even when adding more assumptions. The "taking advantage of some structure" seems to be different.
Note, the quantity you refer to is called entropy by Wikipedia, not Shannon information.
Is this a reaction to OpenAI Shifts Strategy as Rate of ‘GPT’ AI Improvements Slows?
We arguably have already colonized Antarctica. See Wikipedia.
A similar point would be: There is no permanent deep sea settlement (an underwater habitat), although this would be much easier to achieve than a settlement on Mars.
Yudkowsky has written about it:
(...) In standard metaethical terms, we have managed to rescue 'moral cognitivism' (statements about rightness have truth-values) and 'moral realism' (there is a fact of the matter out there about how right something is). We have not however managed to rescue the pretheoretic intuition underlying 'moral internalism' (...)
Replace in the post "morality" with "rationality" and you get a reductio ad absurdum.
I made basically the same proposal here, but phrased as a task of translating between a long alien message and human languages: https://www.lesswrong.com/posts/J3zA3T9RTLkKYNgjw/is-llm-translation-without-rosetta-stone-possible See also the comments, which contain a reference to a paper with a related approach on unsupervised machine translation. Also this comment echoes your post:
I think this is a really interesting question since it seems like it should neatly split the "LLMs are just next token predictors" crows from the "LLMs actually display understanding" crowd.
Arguably, "basic logical principles" are those that are true in natural language. Otherwise nothing stops us from considering absurd logical systems where "true and true" is false, or the like. Likewise, "one plus one is two" seems to be a "basic mathematical principle" in natural language. Any axiomatization which produces "one plus one is three" can be dismissed on grounds of contradicting the meanings of terms like "one" or "plus" in natural language.
The trouble with set theory is that, unlike logic or arithmetic, it often doesn't involve strong intuitions from natural language. Sets are a fairly artificial concept compared to natural language collections (empty sets, for example, can produce arbitrary nestings), especially when it comes to infinite sets.
Interesting, I really hope TMS gains more acceptance. By the way, according to studies, ECT (the precursor of TMS) is even more effective, though it does have more side effects, due to the required anesthesia, and it is gatekept even more strongly. In my youth I suffered from depression for several years, and all of this likely would have been avoidable with a few ECT sessions (TMS wasn't a thing yet), if it wasn't for the medical system's irrational bias in favor of exclusively using SSRIs and CBT. I think this happens because most medical staff have no idea how terrible depression can be, so they don't get the sense of urgency they'd get from more visible diseases.
Guys, for this specific case you really have to say what OS you are using. Otherwise you might be totally talking past each other.
(Font-size didn't change on any OS, but the font itself changed from Calibri to Gill Sans on Windows. Gill Sans has a slightly smaller x-height so probably looks a bit smaller.)
I tested it on Android, it's the same for both Firefox and Chrome. The font looks significantly smaller than the old font, likely due to the smaller x-height you mentioned. Could the font size of the comments be increased a bit so that it appears visually about as large as the old one? Currently I find it too small to read comfortably. (Subjective font size is often different from the standard font size measure. E.g. Verdana appears a lot larger than Arial at the same standard "size".)
(A general note: some people are short sighted and wear glasses, and the more short-sighted you are, the stronger the glasses contract your field of view to a smaller area. So things that may appear as an acceptable size for people who aren't particularly short-sighted, may appear too small for more short-sighted people.)
Did the font size in comments change? It does seem quite small now...
Of course for a "real" prisoners dilemma any form of coordination is ruled out from the start. But in real world instances, coordination can sometimes be introduced into systems that previously were prisoner's dilemmas. That's what I mean with "solving" a prisoner's dilemma. Making the dilemma go away.
The thing I'm pointing out here is that "coordination" is a very unspecific term, and one concrete form of coordination is being able to vote for cooperation. (Example: voting on a climate change bill instead of trying to minimize your personal carbon footprint, which would make you personally significantly worse of with hardly any benefit on the whole, which is why you would defect but vote on cooperate.) I think voting is usually not appreciated as a method of coordination, only as a method of choosing the most popular policy/party, which doesn't need to involve solving a prisoner's dilemma.
Some issues that seem to be controversial are really taboo, or arise due to an underlying taboo. For this case I have two general recommendations here.
Related to this: Some opinions may be often expressed because of virtue signalling; e.g. because the opposite is taboo, or for other reasons. Hearing such opinions doesn't provide significant testimonial evidence for their truth, since people don't hold them because of evidence they encountered, but because they feel virtuous. Though it is not easy to recognize why particular opinions are being expressed, whether they are motivated by signalling or not.
Solutions to a prisoner's dilemma are typically assumed to involve "coordination" in some sense. But what kinds of mechanism are appropriate examples for coordination? For an N-person prisoner's dilemma, one form of coordination is implementing voting. Say, everyone is forced to cooperate when the majority votes "cooperate". Nobody has a selfish interest to cooperate, but everyone has a selfish interest to vote for "cooperate".
This is interesting because economists often see voting as irrational for decision theoretic reasons. But from the game theoretic perspective above, it appears to be rational. This is probably not a new insight, but I haven't seen voting being portrayed as a type of solution to N-person prisoner's dilemmas.
One bit could also encode "probably true" and "probably false". It doesn't have to be "certainly true" and "certainly false". And this is of course what we observe. We aren't perfectly certain in everything we can barely remember to be true.
Thanks, this was an interesting article. The irony of course being that I, not knowing Russian, read it using Google Translate.
To push back a little:
In fact, I’d go further and argue that explainatory modeling is just a mistaken approach to predictive modeling. Why do we want to understand how things work? To make better decisions. But we don’t really need to understand how things work to make better decisions, we just need to know how things will react to our what we do.
The word "react" here is a causal term. To predict how things will "react" we need some sort of causal model.
What makes predictive modeling a better idea is that it also allows us to find factors that are not causal, but still useful.
Usefulness is also a causal notion. X is useful if it causes a good outcome. If X doesn't cause a good outcome, but is merely correlated with it, it isn't useful.