Posts
Comments
Yeah, these are mysteries, I don't know why. TSMC I think did get hit pretty hard though.
Politicians announce all sorts of things on the campaign trail, that usually is not much indication of what post-election policy will be.
Seems more likely the drop was from Trump tariff leaks than deepseek’s app.
I also note that 30x seems like an under-estimate to me, but also too simplified. AIs will make some tasks vastly easier, but won't help too much with other tasks. We will have a new set of bottlenecks once we reach the "AIs vastly helping with your work" phase. The question to ask is "what will the new bottlenecks be, and who do we have to hire to be prepared for them?"
If you are uncertain, this consideration should lean you much more towards adaptive generalists than the standard academic crop.
There's the standard software engineer response of "You cannot make a baby in 1 month with 9 pregnant women". If you don't have a term in this calculation for the amount of research hours that must be done serially vs the amount of research hours that can be done in parallel, then it will always seem like we have too few people, and should invest vastly more in growth growth growth!
If you find that actually your constraint is serial research output, then you still may conclude you need a lot of people, but you will sacrifice a reasonable amount of growth speed for attracting better serial researchers.
(Possibly this shakes out to mathematicians and physicists, but I don't want to bring that conversation into here)
The most obvious one imo is the immune system & the signals it sends.
Others:
- Circadian rhythm
- Age is perhaps a candidate here, though it may be more or less a candidate depending on if you're talking about someone before or after 30
- Hospice workers sometimes talk about the body "knowing how to die", maybe there's something to that
If that’s the situation, then why the “if and only if”, if we magically make then all believe they will die if they make ASI, then they would all individually be incentivized to stop it from happening independent of China’s actions.
I think that China and the US would definitely agree to pause if and only if they can confirm the other also committing to a pause. Unfortunately, this is a really hard thing to confirm, much harder than with nuclear.
This seems false to me. Eg Trump for one seems likely to do what the person who pays him the most & is the most loyal to him tells him to do, and AI risk worriers do not have the money or the politics for either of those criteria compared to, for example, Elon Musk.
Its on his Linkedin at least. Apparently since the start of the year.
I will note this sounds a lot like Turntrout's old Attainable Utility Preservation scheme. Not exactly, but enough that I wouldn't be surprised if a bunch of the math here has already been worked out by him (and possibly, in the comments, a bunch of the failure-modes identified).
Engineers: Its impossible.
Meta management: Tony Stark DeepSeek was able to build this in a cave! With a box of scraps!
Although I don't think the first example is great, seems more like a capability/observation-bandwidth issue.
I think you can have multiple failures at the same time. The reason I think this was also goodhart was because I think the failure-mode could have been averted if sonnet was told “collect wood WITHOUT BREAKING MY HOUSE” ahead of time.
If you put current language models in weird situations & give them a goal, I’d say they do do edge instantiation, without the missing “creativity” ingredient. Eg see claude sonnet in minecraft repurposing someone’s house for wood after being asked to collect wood.
Edit: There are other instances of this too, where you can tell claude to protect you in minecraft, and it will constantly tp to your position, and build walls around you when monsters are around. Protecting you, but also preventing any movement or fun you may have wanted to have.
I don't understand why Remmelt going "off the deep end" should affect AI safety camp's funding. That seems reasonable for speculative bets, but not when there's a strong track-record available.
It is, we’ve been limiting ourselves to readings from the sequence highlights. I’ll ask around to see if other organizers would like to broaden our horizons.
I mean, one of them’s math built bombs and computers & directly influenced pretty much every part of applied math today, and the other one’s math built math. Not saying he wasn’t smart, but no question are bombs & computers more flashy.
Fixed!
The paper you're thinking of is probably The Developmental Landscape of In-Context Learning.
@abramdemski I think I'm the biggest agree vote for alexander (without me alexander would have -2 agree), and I do see this because I follow both of you on my subscribe tab.
I basically endorse Alexander's elaboration.
On the "prep for the model that is coming tomorrow not the model of today" front, I will say that LLMs are not always going to be as dumb as they are today. Even if you can't get them to understand or help with your work now, their rate of learning still makes them in some sense your most promising mentee, and that means trying to get as much of the tacit knowledge you have into their training data as possible (if you want them to be able to more easily & sooner build on your work). Or (if you don't want to do that for whatever reason) just generally not being caught flat-footed once they are smart enough to help you, as all your ideas are in videos or otherwise in high context understandable-only-to-abram notes.
Should you write text online now in places that can be scraped? You are exposing yourself to 'truesight' and also to stylometric deanonymization or other analysis, and you may simply have some sort of moral objection to LLM training on your text.
This seems like a bad move to me on net: you are erasing yourself (facts, values, preferences, goals, identity) from the future, by which I mean, LLMs. Much of the value of writing done recently or now is simply to get stuff into LLMs. I would, in fact, pay money to ensure Gwern.net is in training corpuses, and I upload source code to Github, heavy with documentation, rationale, and examples, in order to make LLMs more customized to my use-cases. For the trifling cost of some writing, all the worlds' LLM providers are competing to make their LLMs ever more like, and useful to, me.
in some sense that’s just hiring you for any other job, and of course if an AGI lab wants you, you end up with greater negotiating leverage at your old place, and could get a raise (depending on how tight capital constraints are, which, to be clear, in AI alignment are tight).
Over the past few days I've been doing a lit review of the different types of attention heads people have found and/or the metrics one can use to detect the presence of those types of heads.
Here is a rough list from my notes, sorry for the poor formatting, but I did say its rough!
- Bigram entropy
- positional embedding ablation
- prev token attention
- prefix token attention
- ICL score
- comp scores
- multigram analysis
- duplicate token score
- induction head score
- succession score
- copy surpression heads
- long vs short prefix induction head differentiation
- induction head specializations
- literal copying head
- translation
- pattern matching
- copying score
- anti-induction heads
- S-inhibition heads
- Name mover heads
- Negative name mover heads
- Backup name mover heads
- (I don't entirely trust this paper) Letter mover heads
- (possibly too specific to be useful) year identification heads
- also MLPs which id which years are greater than the selected year
- (I don't entirely trust this paper) queried rule locating head
- (I don't entirely trust this paper) queried rule mover head
- (I don't entirely trust this paper) "fact processing" head
- (I don't entirely trust this paper) "decision" head
- (possibly too specific) subject heads
- (possibly too specific) relation heads
- (possibly too specific) mixed subject and relation heads
And yes, I do think that interp work today should mostly focus on image nets for the same reasons we focus on image nets. The field’s current focus on LLMs is a mistake
A note that word on the street in mech-interp land is that often you get more signal & a greater number of techniques work on bigger & smarter language models over smaller & dumber possibly-not-language-models. Presumably due to smarter & complex models having more structured representations.
Can you show how a repeated version of this game results in overall better deals for the company? I agree this can happen, but I disagree for this particular circumstance.
Then the company is just being stupid, and the previous definition of exploitation doesn't apply. The company is imposing large costs for a large cost to itself. If the company does refuse the deal, its likely because it doesn't have the right kinds of internal communication channels to do negotiations like this, and so this is indeed a kind of stupidity.
Why the distinction between exploitation and stupidity? Well they require different solutions. Maybe we solve exploitation (if indeed it is a problem) via collective action outside of the company. But we would have to solve stupidity via better information channels & flexibility inside the company. There is also a competitive pressure to solve such stupidity problems where there may not be in an exploitation problem. Eg if a different company or a different department allowed that sort of deal, then the problem would be solved.
If conversations are heavy tailed then we should in fact expect people to have singular & likely memorable high-value conversations.
otoh I also don't think cutting off contact with anyone "impure", or refusing to read stuff you disapprove of, is either practical or necessary. we can engage with people and things without being mechanically "nudged" by them.
I think the reason not to do this is because of peer pressure. Ideally you should have the bad pressures from your peers cancel out, and in order to accomplish this you need your peers to be somewhat decorrelated from each other, and you can't really do this if all your peers and everyone you listen to is in the same social group.
there is no neurotype or culture that is immune to peer pressure
Seems like the sort of thing that would correlate pretty robustly to big-5 agreeableness, and in that sense there are neurotypes immune to peer pressure.
Edit: One may also suspect a combination of agreeableness and non-openness
Some assorted polymarket and metaculus forecasts on the subject:
They are not exactly low.
Those invited to the foresight workshop (also the 2023 one) are probably a good start, as well as foresight’s 2023 and 2024 lectures on the subject.
I will take Zvi's takeaways from his experience in this round of SFF grants as significant outside-view evidence for my inside view of the field.
I think you are possibly better/optimizing more than most others at selecting conferences & events you actually want to do. Even with work, I think many get value out of having those spontaneous conversations because it often shifts what they're going to do--the number one spontaneous conversation is "what are you working on" or "what have you done so far", which forces you to re-explain what you're doing & the reasons for doing it to a skeptical & ignorant audience. My understanding is you and David already do this very often with each other.
I think its reasonable for the conversion to be at the original author's discretion rather than an automatic process.
Back in May, when the Crowdstrike bug happened, people were posting wild takes on Twitter and in my signal groupchats about how Crowdstrike is only used everywhere because the government regulators subject you to copious extra red tape if you try to switch to something else.
Here’s the original claim:
Microsoft blamed a 2009 antitrust agreement with the European Union that they said forced them to sustain low-level kernel access to third-party developers.[286][287][288] The document does not explicitly state that Microsoft has to provide kernel-level access, but says Microsoft must provide access to the same APIs used by its own security products.[287]
This seems consistent with your understanding of regulatory practices (“they do not give a rats ass what particular software vendor you use for anything”), and is consistent with the EU’s antitrust regulations being at fault—or at least Microsoft’s cautious interpretation of the regulations, which indeed is the approach you want to take here.
I believed “bear spray” was a metaphor for a gun. Eg if you were posting online about camping and concerned about the algorithm disliking your use of the word gun, were going into a state park which has guns banned, or didn’t want to mention “gun” for some other reason, then you’d say “bear spray”, since bear spray is such an absurd & silly concept that people will certainly understand what you really mean.
Turns out, bear spray is real. Its pepper spray on steroids, and is actually more effective than a gun, since its easier to aim and is optimized to blind & actually cause pain rather than just damage. [EDIT:] Though see Jimmy's comment below for a counter-point.
[Bug report]: The Popular Comments section's comment preview ignores spoiler tags
As seen on Windows/Chrome
Film: The Martian
Rationality Tie-in: Virtue of scholarship is thread throughout, but Watney is generally an intelligent person tacking a seemingly impossible to solve problem.
Moneyball
The Martian
A Boy and His Dog -- a weird one, but good for talking through & a heavy inspiration for Fallout
Ex Machina
300
I have found that they mirror you. If you talk to them like a real person, they will act like a real person. Call them (at least Claude) out on their corporate-speak and cheesy stereotypes in the same way you would a person scared to say what they really think.
@Nick_Tarleton How much do you want to bet, and what resolution method do you have in mind?
I note you didn't mention the info-sec aspects of the war, I have heard China is better at this than the US, but that doesn't mean much because you would expect to hear that if China was really terrible too.
The mistake you are making is assuming that "ZFC is consistent" = Consistent(ZFC)
where the ladder is the Godel encoding for "ZFC is consistent" specified within the language of ZFC.
If your logic were valid, it would just as well break the entirety of the second incompleteness theorem. That is, you would say "well of course ZFC can prove Consistent(ZFC) if it is consistent, for either ZFC is consistent, and we're done, or ZFC is not consistent, but that is a contradiction since 'ZFC is consistent' => Consistent(ZFC)
".
The fact is that ZFC itself cannot recognize that Consistent(ZFC)
is equivalent to "ZFC is consistent".
@Morpheus you too seem confused by this, so tagging you as well.
Why do some mathematicians feel like mathematical objects are "really out there" in some metaphysically fundamental sense? For example, if you ask mathematicians whether ZFC + not Consistent(ZFC)
is consistent, they will say "no, of course not!" But given ZFC is consistent, the statement is in fact consistent due to by Godel's second incompleteness theorem[1]. Similarly, if we have the Peano axioms without induction, mathematicians will say that induction should be there, but in fact you cannot prove this fact from within Peano, and given induction mathematicians will say transfinite induction should be there.
I argue that an explanation could be from logical induction. In logical induction, fast but possibly wrong sub-processes bet with each other over whether different mathematical facts will be proven true or false by a slow but ground-truth formal system prover. Another example of backstops in learning. But one result of this is that the successful sub-processes are not selected very hard to give null results on unprovable statements, producing spurious generalization and the subjective feeling--as expressed by probabilities for propositions--that some impossible theorems are true.
Of course, the platonist can still claim that this logical induction stuff is very similar to bayesian updating in the sense that both tell you something about the world, even when you can't directly observe the relevant facts. If a photon exists your lightcone, there is no reason to stop believing the photon exists, even though there is no chance for you to ever encounter it again. Similarly, just because a statement is unprovable, doesn't mean its right for you to have no opinion on the subject, insofar as the simplest & best internal logical-induction market traders have strong beliefs on the subject, they may very well be picking up on something metaphysically fundamental. Its simply the simplest explanation consistent with the facts.
The argument here is that there are two ways of proving
ZFC + not Consistent(ZFC)
is inconsistent. Either you provenot Consistent(ZFC)
from axioms inZFC
or you contradict an axiom ofZFC
fromnot Consistent(ZFC)
. The former is impossible by Godel's second incompleteness theorem. The ladder is equivalent to provingConsistent(ZFC)
from an axiom ofZFC
(its contrapositive), which is also impossible by Godel. ↩︎
If you trust both them and Metaculus, then you ought to update downwards on your estimate of the PRC's strategic ability.
I note that the PRC doesn't have a single "strategic ability" in terms of war. They can be better or worse at choosing which wars to fight, and this seems likely to have little influence on how good they are at winning such wars or scaling weaponry.
Eg in the US often "which war" is much more political than "exactly what strategy should we use to win this war" is much more political than "how much fuel should our jets be able to carry", since more people can talk & speculate about the higher level questions. China's politics are much more closed than the US's, but you can bet similar dynamics are at play.
Are we indeed (as I suspect) in a massive overhang of compute and data for powerful agentic AGI? (If so, then at any moment someone could stumble across an algorithmic improvement which would change everything overnight.)
Why is this relevant for technical AI alignment (coming at this as someone skeptical about how relevant timeline considerations are more generally)?