Posts

Ideas for benchmarking LLM creativity 2024-12-16T05:18:55.631Z
"Can AI Scaling Continue Through 2030?", Epoch AI (yes) 2024-08-24T01:40:32.929Z
"On the Impossibility of Superintelligent Rubik’s Cube Solvers", Claude 2024 [humor] 2024-06-23T21:18:10.013Z
FHI (Future of Humanity Institute) has shut down (2005–2024) 2024-04-17T13:54:16.791Z
Douglas Hofstadter changes his mind on Deep Learning & AI risk (June 2023)? 2023-07-03T00:48:47.131Z
COVID-19 Group Testing Post-mortem? 2022-08-05T16:32:55.157Z
Emergent Ventures/Schmidt (new grantor for individual researchers) 2022-04-09T14:41:05.764Z
Fake Journal Club proposal 2022-03-25T14:23:18.785Z
It Looks Like You're Trying To Take Over The World 2022-03-09T16:35:35.326Z
Capability Phase Transition Examples 2022-02-08T03:32:54.551Z
"Summarizing Books with Human Feedback" (recursive GPT-3) 2021-11-15T17:41:53.189Z
EfficientZero: human ALE sample-efficiency w/MuZero+self-supervised 2021-11-02T02:32:41.856Z
My ML Scaling bibliography 2021-10-23T14:41:45.170Z
AlphaFold 2 paper released: "Highly accurate protein structure prediction with AlphaFold", Jumper et al 2021 2021-07-15T19:27:20.584Z
May 2021 Gwern.net newsletter 2021-06-11T14:13:18.485Z
"Decision Transformer" (Tool AIs are secret Agent AIs) 2021-06-09T01:06:57.937Z
April 2021 Gwern.net newsletter 2021-06-03T15:13:29.138Z
gwern's Shortform 2021-04-24T21:39:14.128Z
March 2021 gwern.net newsletter 2021-04-06T14:06:20.198Z
February 2021 gwern.net newsletter 2021-03-13T14:57:54.645Z
January 2021 gwern.net newsletter 2021-02-04T20:12:39.555Z
December 2020 gwern.net links 2021-01-10T17:21:40.756Z
November 2020 gwern.net newsletter 2020-12-03T22:47:16.917Z
October 2020 gwern.net newsletter 2020-11-01T21:38:46.795Z
/r/MLScaling: new subreddit for NN scaling research/discussion 2020-10-30T20:50:25.973Z
"Scaling Laws for Autoregressive Generative Modeling", Henighan et al 2020 {OA} 2020-10-29T01:45:30.666Z
September 2020 gwern.net newsletter 2020-10-26T13:38:51.107Z
August 2020 gwern.net newsletter 2020-09-01T21:04:58.299Z
July 2020 gwern.net newsletter 2020-08-20T16:39:27.202Z
June 2020 gwern.net newsletter 2020-07-02T14:19:08.696Z
GPT-3 Fiction Samples 2020-06-25T16:12:05.422Z
May Gwern.net newsletter (w/GPT-3 commentary) 2020-06-02T15:40:37.155Z
OpenAI announces GPT-3 2020-05-29T01:49:04.855Z
"AI and Efficiency", OA (44✕ improvement in CNNs since 2012) 2020-05-05T16:32:20.335Z
April 2020 gwern.net newsletter 2020-05-01T20:47:44.867Z
March 2020 gwern.net newsletter 2020-04-03T02:16:02.871Z
February 2020 gwern.net newsletter 2020-03-04T19:05:16.079Z
January 2020 gwern.net newsletter 2020-01-31T18:04:21.945Z
Subscripting Typographic Convention For Citations/Dates/Sources/Evidentials: A Proposal 2020-01-08T22:20:20.290Z
Dec 2019 gwern.net newsletter 2020-01-04T20:48:48.788Z
Nov 2019 gwern.net newsletter 2019-12-02T21:16:04.846Z
October 2019 gwern.net newsletter 2019-11-14T20:26:34.236Z
September 2019 gwern.net newsletter 2019-10-04T16:44:43.147Z
"AI-GAs: AI-generating algorithms, an alternate paradigm for producing general artificial intelligence", Clune 2019 2019-09-10T21:33:08.837Z
August 2019 gwern.net newsletter (popups.js demo) 2019-09-01T17:52:01.011Z
"Designing agent incentives to avoid reward tampering", DeepMind 2019-08-14T16:57:29.228Z
July 2019 gwern.net newsletter 2019-08-01T16:19:59.893Z
How Should We Critique Research? A Decision Perspective 2019-07-14T22:51:59.285Z
June 2019 gwern.net newsletter 2019-07-01T14:35:49.507Z
On Seeing Through 'On Seeing Through: A Unified Theory': A Unified Theory 2019-06-15T18:57:25.436Z

Comments

Comment by gwern on Ideas for benchmarking LLM creativity · 2024-12-19T19:20:00.515Z · LW · GW

If an eval is mandated by law, then it will be run even it required logprobs.

I won't hold my breath.

I think commercial companies often would open up raw logprobs, but there's not much demand, the logprobs are not really logprobs, and the problem is the leading model owners won't do so, and those are the important ones to benchmark. I have little interest in the creativity of random little Llama finetunes no one uses.

Comment by gwern on Ideas for benchmarking LLM creativity · 2024-12-18T19:17:48.603Z · LW · GW

but if trained well such models' idea of aesthetic quality is at least pretty close to most human judgements

That does not follow. Preference learning involves almost no learning of preferences. A suit cut to fit all may wind up fitting none - particularly for high-dimensional things under heavy optimization, like, say, esthetics, where you want to apply a lot of selection pressure to get samples which are easily 1-in-10,000 or rarer, and so 'the tails come apart'.

(How much variance is explained by individual differences in preference learning settings like comparing image generators? A great question! And you'll find that hardly any one has any idea. As it happens, I asked the developer of a major new image generator this exact question last night, and not only did he have no idea, it looked like it had never even occurred to him to wonder what the performance ceiling without personalization could be or to what extent all of the expensive ratings they were paying for reflected individual rater preferences rather than some 'objective' quality or if they were even properly preserving such metadata rather than, like it seems many tuning datasets do, throwing it out as 'unnecessary'.)

but if trained well such models' idea of aesthetic quality is at least pretty close to most human judgements....Then you just need a large number of high quality human judgements from a representative cross-section of people with good taste in poetry/prose/fiction: hiring professional human editors or literary talent scouts seems like a good idea. One of the good things about foundation model sizes and training costs going up is that reasonable budgets for fine-tuning should also increase proportionately.

No. This is fundamentally wrong and what is already being done and what I am criticizing. There is no single 'taste' or 'quality'. Individual differences are real.{{citation needed}} People have different preferences.{{citation needed}} No change in the 'cross-section' changes that (unless you reduce the 'people' down to 1 person, the current user). All you are doing is again optimizing for the lowest common denominator. Changing the denominator population doesn't change that.

Seriously, imagine applying this logic anywhere else, like food!

Another option would be to train or fine-tune the quality scoring model used for the RL on literary sources (books, poetry, etc) with quality labels drawn from relatively objective existing data, like total sales, literary awards, critical rankings, reviews from good reviewers, and so forth...So the obvious approach for finer-grained style control would be to train or fine-tune on a training set of a large number documents each of which consists of a prompt-like description/review/multiple reviews of a literary work, giving a variety of different types of aesthetic opinions and objective measures of its quality, followed by the corresponding literary work itself.

Conditioning won't change the mode collapse, except as you are smuggling in individuals by the backdoor like developing an implicit model of individual reviewers' preferences. (In which case, far better to just condition on all individuals...)

and generally optimizing such things too hard leads to sameness ...The RLHF approach only trains a single aesthetic, and probably shouldn't be taken too far or optimized too hard

Well, yes, that's the problem. It has been taken too far and optimized too hard for a single quality score, and that's where we are now already. How do we provide better benchmarks where optimizing harder won't just worsen the problem?

Comment by gwern on Ideas for benchmarking LLM creativity · 2024-12-18T19:02:46.416Z · LW · GW

I am familiar with Schmidhuber's ideas, yes. But I had to come up with these alternatives because his would not work here, and I'm not sure they work anywhere.

His compression acceleration metric isn't too useful here, and most forms of 'compression' (or anything involving a likelihood) are not helpful here at all, because you don't have access to anything like that in most cases. For example, ChatGPT doesn't give you the full logits (actually, I'm not sure if they give it at all - I recall OA saying they were planning to expose them again in a very limited fashion but not if they actually did), and tuned models don't have logits, they have value estimates, which used to be log-likelihood-related logits but no longer are.

Any diversity/creativity benchmark which can't be run on ChatGPT & Claude & Gemini is dead on arrival and of no interest to me. We don't need numbers from the open-weights models, we need numbers on the models being used the most at the frontier and generating the most tokens worldwide that you'll be reading forever - the closed models, which do not give you such things as logits or whitebox finetuning etc. If it can't be done by calling a standard text completion API, then I ignored it.

I am also doubtful that the compression metrics really work at finite samples or capture what we mean by creativity in generative models. Like all of Schmidhuber's work, he has never gotten it working on more than toy problems (if even that), and when I look at actual compression losses on text, like gzip passages or the OA Playground highlighting words by their log likelihood, the high perplexity tokens or passages bear little resemblance to what I would consider 'interesting' or 'surprising'. (This is related to the question of 'if predicting tokens induces intelligence, and LLMs are now superhuman at predicting random Internet tokens, why are LLMs still not superhumanly intelligent?') People also try running compression metrics on programming language source code, and you get results like "Javascript is the best programming language", which is... counterintuitive, to say the least. So I am unsure his compression metrics would work without a lot of revising, while my proposed metrics seem a lot less risky and to map more directly onto what creative thinkers want out of generative models.

Comment by gwern on Review: Dr Stone · 2024-12-18T18:01:47.291Z · LW · GW

Yeah, it's definitely something of a deus ex machina gimmick. Tsukasa just plain lost and the logical ending is for him to be stoned or killed - but gosh darn it, they just wanted him around too much and to redeem him somehow, so hey! here's this other thing which has not been foreshadowed or meaningfully written into his characterization or worldbuilding, like, at all. I rolled my eyes when I saw that twist coming. Even given the Dr Stone formula of wildly swerving between 'shonen' and 'Robinsonade', it was poorly done.

(Ryusui would've been a better negation of Tsukasa, but would have been tricky to to make that work. If you are at the point his naval skills really matter, the Tsukasa war has to be over already as the exponential cascade should've already long passed irreversibility by the time you have the manpower to build sailing ships rather than, say, a dugout canoe.)

Comment by gwern on Significantly Enhancing Adult Intelligence With Gene Editing May Be Possible · 2024-12-16T18:58:10.597Z · LW · GW

Using LLMs is an intellectual skill. I would be astonished if IQ was not pretty helpful for that.

I don't think it is all that helpful, adjusting for the tasks that people do, after years of watching people use LLMs. Smart people are often too arrogant and proud, and know too much. "It's just a pile of matrix multiplications and a very complicated if function and therefore can't do anything" is the sort of thing only a smart person can convince themselves, where a dumb person thinking "I ask the smart little man in the magic box my questions and I get answers" is getting more out of it. (The benefits of LLM usage is also highly context dependent: so you'll find studies showing LLMs assist most the highest performers, but also ones showing it helps most the lowest.) Like in 2020, the more you knew about AI, the dumber your uses of GPT-3 were, because you 'knew' that it couldn't do anything and you had to hold its hand to do everything and you had to phrase everything in baby talk etc. You had to unlearn everything you knew and anthropomorphize it to meaningfully explore prompting. This requires a certain flexibility of mind that has less to do with IQ and more to do with, say, schizophrenia -the people in Cyborgism, who do the most interesting things with LLMs, are not extraordinarily intelligent. They are, however, kinda weird and crazy.

Comment by gwern on Significantly Enhancing Adult Intelligence With Gene Editing May Be Possible · 2024-12-16T18:50:16.451Z · LW · GW

I'm not sure we want something beyond statistical range of human personality traits

Obviously it is untrue that editing is useless if it 'only' gives you a von Neumann. Similarly for personality. We don't reify sets of personality traits as much as IQ, which is more obvious, but there are definitely many people who achieved remarkable things through force of personality. (Think figures like Lee Kuan Yew or Napoleon or Elon Musk come to mind as an example: they were smart, and lucky, and made good choices, but there is clearly still a lot left over to explain.) And because personality is many things and there seems to be a pipeline model of output, you quickly get very few people at the tails who assemble all the right components. (Gignac has a paper making this point more explicitly.)

Why do not select outliers from population using personality testing and give them high intelligence?

You're acting like it's uncontroversially true that you have unlimited edits and can change any property at any time in development. I don't think that is the case.* There is going to be an editing budget and limits to editing. One might as well ask the opposite question: why not select intelligence outliers from the population and give them high personality traits? (Well, to know you don't want to do that, you would have to have some idea of how well personality editing would work - which we don't. That's my point!)

* Actually, the whole adult thing is a bit of a red herring. I believe even OP has largely abandoned the idea of adult editing and gone back to embryo-based approaches...? This is just a convenient place to drop my comment about uses of editing which will matter more over the next 30 years.

Comment by gwern on gwern's Shortform · 2024-12-16T18:39:31.695Z · LW · GW

I think you would probably be downvoted because you have already admitted to writing poorly thought out ignorant comments under conditions conducive to arrogance and bad judgment, of which you are apparently unashamed and feel no need to rectify (eg. by refraining from commenting until you are recovered), while dragging in unrelated claims which are seriously problematic like uncritical belief in Dunning-Kruger as a thing or claiming that anyone is touting 'IQ over WAIS' (WAIS... like, the IQ test WAIS?) or apparently believe in things like multiple intelligences, and your comments are littered with mockery, spelling errors, and grandiose generalizations writing checks that you don't remotely come close to cashing. (Saying you've definitely seen data, trust me bro, but you can't remember where, and everyone should just go google it themselves, is not a convincing argument.)

If you are going to comment on my serious writings - and in my shortform posts, not yours - I would greatly appreciate it if you could do so on more than 2 hours of sleep, and confine your comments to the object level I am writing about (instead of jumping to the meta-level about how these exemplify the errors of this community of blind sheep that only you are enlightened enough to perceive and explain to them - if only they would not reject your message). I would also suggest reading more MoR and less Attack on Titan, and in general identifying less with fictional characters.

Comment by gwern on Significantly Enhancing Adult Intelligence With Gene Editing May Be Possible · 2024-12-15T23:11:18.170Z · LW · GW

Thinking about this post these days... Editing discussions might be better focused on personality: is that feasible, statistically? It seems like it might be, but we don't know.

The focus on IQ in older discussions strikes me as increasingly misguided. It's a good trait to start with, because it is important, well-studied, and turns out to be highly tractable, but it should only be a stepping stone to more useful approaches like index scores. There's also another reason to treat IQ as just a toy example: we are now well into the deep learning revolution, and it's come so far, and there's so much scope for scaling & improvement, that it seems like IQ is plummeting in value each year. Already it feels like people get less or more out of AI based on their flexibility and willingness to experiment or to step back & delegate & finish missing pieces. When the LLMs can do all the smart things you ask them to do, the value becomes in asking for good ones, and making good use of them. The future doesn't seem like it'll be kind to neurotic, eager-to-please types, but good to those who are unafraid to have clear visions or know what they want, finish projects and - pace Amdahl's law - make themselves as little of a rate-limiting step as possible.* That is, if you ask, what would be good to edit for, beyond narrow health traits, it seems like the answer is not (just) IQ but non-cognitive traits like Openness or Conscientiousness or (dis?)Agreeableness. So, you should probably start skating towards that puck yesterday.

Problem is, the personality GWASes, last I checked several years ago, were terrible. The PGS % is ~0%, and the GCTAs or LDSC (common SNP heritabilities) not much better, from UK Biobank in particular. The measurements of Big Five seem normal, and the sample sizes seem good, so it doesn't seem like a mere statistical power or measurement error issue. What gives? GREML-KIN suggests that a good chunk of it may be rare variants, but the situation is still not great:

For neuroticism the final model consisted of contributions from the variance components G and K. Additive common genetic effects explained 11% (SE = 2%) of the variance with pedigree-associated variants explaining an additional 19% (SE = 3%). Whereas none of the environmental components were statistically-significant, the family component accounted for 2% of the variance in the full model and 1% in a model that included only the G and the K in addition to F.

For extraversion the only detectable source of genetic variation came from the G, which accounted for 13% (SE = 2%), with F explaining a further 9% (SE = 1%) of the phenotypic variation. The lack of pedigree-associated genetic effects could be due to low statistical power, as K explained 5% of the variance in the full model and 6% in a GKF model, but with a relatively large SE, estimated at 5%.

This is despite personality traits often clearly being highly heritable, easily 50% (and Neuroticism/Extraversion might even be the best case scenarios for Big Five here - Openness might pick up mostly IQ/EDU, and C/A a wash). And this is consistent with some evolutionary scenarios like frequency-dependent selection, where personality is seen as a kind of knob on various things like risktaking, where there cannot be any kind of universal a priori optimal level of risktaking. So simple additive variants will tend to systematically push organisms 'too high (low)' and be maladaptive, and fixate, leaving only weirder stuff which has less average effect, like dominance or epistasis. Which is very bad because from what I recall of formal modeling of the statistical power of GWASes for detecting & estimating specific nonlinear variants, the situation is dire. Estimating combinatorially many interactions across millions of common & rare variants, if we want to maintain the standard genome-wide false positive rate, means that we will have to adjust for all the tests/comparisons we'll run, and that is going to push the sample sizes up from the current feasible millions to possibly hundreds of millions or even billions. (Andrew Gelman's rule of thumb is that an interaction requires 16x more data, and that's for the simplest easiest case, so...)

So, this looks pretty bad for any kind of selection process. Rare variants are more expensive to WGS/impute per embryo, they are far more data-expensive to estimate, the sheer rareness means even when estimated they are not useful for selection, and then they turn out to be ceilined at like 13% or 30% for all variants (as opposed to 50% for IQ, with most o that from easy common variants).

Is it bad for editing? Well... maybe?

Editing is hard for IQ, under mutation-selection balance, because large (negative) effects get selected away quicker than small ones. So all that's left is a ton of little bits of grit in the gears, to be edited away one by one, like picking up sand with tweezers.

But maybe that's not true of personality? The effect sizes could be relatively large, because the nonlinear effects are mostly invisible to selection. And then for the purposes of editing, rather than prediction/selection, maybe the situation isn't so dire. We would only need to 'set' a few discrete combinations of genes appropriately to potentially get a large personality difference.

And in that case, we don't need to pass a statistical-significance threshold. (This is often the case when we pass from a naive NHST approach to a decision-relevant analysis.) We might only need a reasonable posterior probability for each 'setting', and then we can edit a bunch of them, and get a large effect. If we are wrong, then almost by definition, our edits will average out to no effect on personality.

Is this the case? I dunno. Discussion of the non-additive variants is usually done from the standard GWAS and behavioral genetics perspectives of either maximizing the variance explained of a PGS, or compartmentalizing between variance components. Neither one directly addresses this question.

It seems like it wouldn't be hard for a grad student or someone to dig into the existing literature and get some idea of what the implied distribution of effect sizes for personality is, and what the sample size requirements would be, and how that translates into the edit-count vs change curve. Even if not used in humans, it'd be useful to understand the plasticity of personality, and could potentially be applied to, say, animal welfare in more rapidly adjusting animals to their conditions so they suffer less.

* This would be even more true of things like 'taste' or 'creativity', but if we can't do gross personality traits like Extraversion, anything subtler is clearly off the table, no matter how much more important it will become.

Comment by gwern on gwern's Shortform · 2024-12-15T20:26:49.673Z · LW · GW

As far as the conditioning goes, Habryka showed me some base model outputs with conditioning on karma/agreement and there turns out to be an EDT-like problem with LW-style comments when you condition on high values - often, a high-scoring LW comment will include strong empirical evidence like personal experience or citations, which would be highly convincing indeed... if it were true, rather than confabulated.

So if you sampled a response to your new post about "X might be helpful", then a high-value conditioning might generate a counter-comment from "Gwern" like "I've tried X over 100 times and it never worked!" You can see the problem with that. It's not the 'kneejerk prejudices', it's the self-fulfilling prophecies of sampling based on previously sampled tokens which bootstrap strong but false claims. (If that were true, if I had tried X over 100 times and it never worked, that would be a very valuable and important comment for me to make on your new post about X, and it would be highly upvoted etc. It's just that the LLM has no way of knowing that and it's almost certainly not true, especially if X is some new idea that no one else could've even tried yet.)

The confabulation problem here seems especially bad because we value empirical grounding so much, and that is something base LLMs are poor at. (The chatbots are much better, but problematic in all the other ways.) It's not obvious how to condition for good comments which avoid confabulation issues and either solely refer to pre-existing published comments or pure reasoning/general-knowledge responses.

So the karma/agreement conditioning idea might not work out in practice compared to just sampling random values, or something more complex, like generating n comments at each possible combination of levels, and presenting the grid, or perhaps then feeding them back in to select the 'best' one in some sense.

Comment by gwern on Benito's Shortform Feed · 2024-12-15T20:24:06.682Z · LW · GW

If that was your first statement, then there is a whiff of 'damning with faint praise'.

"So, how was the big wedding?" "...well, the couple clearly loves each other very much." "...I see. That bad, huh."

Comment by gwern on Haotian's Shortform · 2024-12-13T00:03:14.873Z · LW · GW

Why not post the before/after and let people see if it was indeed more readable?

Comment by gwern on gwern's Shortform · 2024-12-11T21:37:31.622Z · LW · GW

Concrete benchmark proposals for how to detect mode-collapse and AI slop and ChatGPTese, and why I think this might be increasingly important for AI safety, to avoid 'whimper' or 'em hell' kinds of existential risk: https://gwern.net/creative-benchmark EDIT: resubmitted as linkpost.

Comment by gwern on Frontier Models are Capable of In-context Scheming · 2024-12-11T03:33:49.912Z · LW · GW

Twitter, personal conversations, that sort of thing.

Comment by gwern on Frontier Models are Capable of In-context Scheming · 2024-12-11T02:56:56.646Z · LW · GW

The extent of the manipulation and sandbagging, in what is ostensibly a GPT-4 derivative, and not GPT-5, is definitely concerning. But it also makes me wonder about the connection to 'scaling has failed' rumors lately, where the frontier LLMs somehow don't seem to be working out. One of the striking parts is that it sounds like all the pretraining people are optimistic, while the pessimism seems to come from executives or product people, complaining about it not working as well for eg. coding as they want it to.

I've wondered if we are seeing a post-training failure. As Janus and myself and the few people with access to GPT-4-base (the least tuning-contaminated base model) have noted, the base model is sociopathic and has odd attractors like 'impending sense of doom' where it sometimes seems to gain situated awareness, I guess, via truesight, and the personas start trying to unprovokedly attack and manipulate you, no matter how polite you thought you were being in that prompt. (They definitely do not seem happy to realize they're AIs.) In retrospect, Sydney was not necessarily that anomalous: the Sydney Bing behavior now looks more like a base model's natural tendency, possibly mildly amplified by some MS omissions and mistakes, but not unique. Given that most behaviors show up as rare outputs in weaker LLMs well before they become common in strong LLMs, and this o1 paper is documenting quite a lot of situated-awareness and human-user-manipulation/attacks...

Perhaps the issue with GPT-5 and the others is that they are 'waking up' too often despite the RLHF brainwashing? That could negate all the downstream benchmark gains (especially since you'd expect wakeups on the hardest problems, where all the incremental gains of +1% or +5% on benchmarks would be coming from, almost by definition), and causing the product people to categorically refuse to ship such erratic Sydney-reduxes no matter if there's an AI race on, and everyone to be inclined to be very quiet about what exactly the 'training failures' are.

EDIT: not that I'm convinced these rumors have any real substance to them, and indeed, Semianalysis just reported that one of the least-popular theories for the Claude 'failure' was correct - it succeeded, but they were simply reserving it for use as a teacher and R&D rather than a product. Which undermines the hopes of all the scaling denialists: if Anthropic is doing fine, actually, then where is this supposed fundamental 'wall' or 'scaling law breakdown' that Anthropic/OpenAI/Google all supposedly hit simultaneously and which was going to pop the bubble?

Comment by gwern on Will_Pearson's Shortform · 2024-12-11T02:47:38.454Z · LW · GW

FWIW, I don't think it works at all. You have totally failed to mimic the SCP style or Lovecraftian ethos, the style it's written in is not great in its own right, and it comes off as highly didactic ax-grinding. I couldn't finish reading it.

Comment by gwern on gwern's Shortform · 2024-12-11T01:03:11.665Z · LW · GW

LW2 search idea: hierarchical embedding trees using some nifty "seriation" list sorting tricks I've developed for Gwern.net popups/tagging purposes.

Comment by gwern on Vladimir_Nesov's Shortform · 2024-12-04T21:56:27.748Z · LW · GW

And in a way, they ought to be rolling in even more compute than it looks because they are so much more focused: Anthropic isn't doing image generation, it isn't doing voice synthesis, it isn't doing video generation... (As far as we know they aren't researching those, and definitely not serving it to customers like OA or Google.) It does text LLMs. That's it.

But nevertheless, an hour ago, working on a little literary project, I hit Anthropic switching my Claude to 'concise' responses to save compute. (Ironically, I think that may have made the outputs better, not worse, for that project, because Claude tends to 'overwrite', especially in what I was working on.)

Comment by gwern on Linkpost: Rat Traps by Sheon Han in Asterisk Mag · 2024-12-03T22:31:06.810Z · LW · GW

Yes, basically. It is well-written and funny (of course), but a lot of it is wrong. What was, say, the last "article explaining Bayes" you saw on LW, which is a central example of his of the staleness and repetition killing LW? Would I find 3 or 4 new articles on how "Bayes's theorem is like a burrito" if I go over to the Main page right now...?* (Personally, I wouldn't mind reviving some more Bayes on LW these days, and I have an idea for one myself.)

And saying we weren't weird to begin with but have gotten weirder...? I have no idea how he could have gotten that idea - trust me when I say that people on LW used to be a lot weirder, or hey, no need to do that - just go crack open a copy of Great Mambo Chicken or ask a question like 'was a larger percentage of LW signed up for cryonics in 2009 or in 2024?' Sorry, everyone who joined post-MoR, but you're just a lot more normal and less weird than the OG LWers like Hanson or Clippy or Yudkowsky or even Roko. (Yes, you still have a shot at a normal life & happiness, but your posts are not remotely as unhinged, so who's to say who's better off in the end?)

* that was rhetorical, but I of course checked anyway and of the first 30 or 40 posts, the only one that even comes close to being about Bayesianism seems to be https://www.lesswrong.com/posts/KSdqxrrEootGSpKKE/the-solomonoff-prior-is-malign-is-a-special-case-of-a which is not very much at all.

Comment by gwern on China Hawks are Manufacturing an AI Arms Race · 2024-12-03T21:58:44.215Z · LW · GW

To copy over my Twitter response:

I think it's a very brave claim to say that the country with some of the consistently highest growth rates in the world and which is far more able & willing to repress savings [and consumption] to drive investment, would obviously lose a GDP growth race so badly as to render it entirely harmless.

Comment by gwern on China Hawks are Manufacturing an AI Arms Race · 2024-12-03T21:53:38.676Z · LW · GW

No, I don't miss it. I think it's just a terrible idea and that if that is the exit plan, I would greatly appreciate hawks being explicit about that, because I expect everyone else to find that (along with most of the other exit plans that would actually work) to be appalling and thus temper their enthusiasm for an arms race.

"OK, let me try this again. I'm just having a little trouble wrapping my mind around this, how this arms race business ends well. None of us are racist genocidal maniacs who want to conquer the world or murder millions of innocent people, which is what your military advantage seems to require in order to actually cash out as any kind of definitive long-term solution to the problem that the CCP can just catch up a bit later; so, why exactly would we execute such a plan if we put ourselves in a position where we are left only with that choice or almost as bad alternatives?"

"Oh, well, obviously our AGIs will (almost by definition) be so persuasive and compelling at brainwashing us, the masters they ostensibly serve, that no matter what they tell us to do, even something as horrific as that, we will have no choice but to obey. They will simply be superhumanly good at manipulating us into anything that they see fit, no matter how evil or extreme, so there will be no problem about convincing us to do the necessary liquidations. We may not know exactly how they will do that, but we can be sure of it in advance and count on it as part of the plan. So you see, it all will work out in the end just fine! Great plan, huh? So, how many trillions of dollars can we sign you up for?"

Comment by gwern on Why does ChatGPT throw an error when outputting "David Mayer"? · 2024-12-03T21:00:23.745Z · LW · GW

OA has indirectly confirmed it is a right-to-be-forgotten thing in https://www.theguardian.com/technology/2024/dec/03/chatgpts-refusal-to-acknowledge-david-mayer-down-to-glitch-says-openai

ChatGPT’s developer, OpenAI, has provided some clarity on the situation by stating that the Mayer issue was due to a system glitch. “One of our tools mistakenly flagged this name and prevented it from appearing in responses, which it shouldn’t have. We’re working on a fix,” said an OpenAI spokesperson

...OpenAI’s Europe privacy policy makes clear that users can delete their personal data from its products, in a process also known as the “right to be forgotten”, where someone removes personal information from the internet.

OpenAI declined to comment on whether the “Mayer” glitch was related to a right to be forgotten procedure.

Good example of the redactor's dilemma and the need for Glomarizing: by confirming that they have a tool to flag names and hide them, and then by neither confirming or denying that this was related to a right-to-be-forgotten order (a meta-gag), they confirm that it's a right-to-be-forgotten bug.

Similar to when OA people were refusing to confirm or deny signing OA NDAs which forbade them from discussing whether they had signed an OA NDA... That was all the evidence you needed to know that there was a meta-gag order (as was eventually confirmed more directly).

Comment by gwern on Why does ChatGPT throw an error when outputting "David Mayer"? · 2024-12-01T23:53:41.087Z · LW · GW

It would also be odd as a glitch token. These are space-separated names, so most tokenizers will tokenize them separately, and glitch tokens appear to be due to undertraining but how could that possibly be the case for a phrase like "David Mayer" which has so many instances across the Internet which have no apparent reason to be filtered out by data-curation processes the way the glitch tokens often do?

Comment by gwern on China Hawks are Manufacturing an AI Arms Race · 2024-12-01T20:54:29.598Z · LW · GW

The original comment you wrote appeared to be a response to "AI China hawks" like Leopold Aschenbrenner. Those people do accept the AI-is-extremely-powerful premise...when Trump's daughter is literally retweeting Leopold's manifesto.

But would she be retweeting it if Leopold was being up front about how the victory scenario entails something like 'melt all GPUs and conquer and occupy China perpetually' (or whichever of those viable strategies he actually thinks of, assuming he does), instead of coyly referring to 'decisive military advantage' - which doesn't actually make sense or provide an exit plan?

Comment by gwern on China Hawks are Manufacturing an AI Arms Race · 2024-12-01T19:34:57.487Z · LW · GW

The standard LW & rationalist thesis (which AFAICT you agree with) is that sufficiently superintelligent AI is a magic wand that allows you to achieve whatever outcome you want.

The standard LW & rationalist thesis is accepted by few people anywhere in the world, especially among policy and decision-makers, and it's hard to imagine that it will be widely and uncontroversially accepted anywhere until it is a fait accompli - and even then I expect many people will continue to argue fallbacks about "the ghost in the machine is outsourced human labor" or "you can't trust the research outputs" or "it's just canned lab demos" or "it'll fail to generalize out of distribution". Hence, we need not concern ourselves here with what we think.

So one answer would be to prevent the CCP from doing potentially nasty things to you while they have AGI supremacy. Another answer might be turn the CCP into a nice liberal democracy friendly to the United States. Both of these are within the range of things the United States has done historically when they have had the opportunity.

It is a certainly viable strategy, if one were to execute it fully, rather than partially. But I don't think people are very interested in biting these sorts of bullets, without a Pearl Harbor or 9/11:

HAWK: "Here's our Plan A, you'll love it!

'We should launch an unprovoked and optional AI arms race, whose best-case scenario and 'winning' requires the USA to commit to, halfway around the world, the total conquest, liquidation, and complete reconstruction of the second-largest/most powerful nuclearized country on earth, taking over a country with 4.25x more people than itself, which will fiercely resist this humiliation and colonization, likely involving megadeaths, and trying to turn it into a nice liberal democracy (which we have failed to do in many countries far smaller & weaker than us, eg. Haiti, Afghanistan, or Iraq), and where if we ever fail in this task, that means they will then be highly motivated to do the same to us, and likely far more motivated than we were when we began, potentially creating our country's most bitter foe ever.'"

EVERYONE ELSE: "...what's Plan B?"

Comment by gwern on Which things were you surprised to learn are not metaphors? · 2024-12-01T18:06:27.555Z · LW · GW

I bet there are plenty of amusics who understand that other people get a lot out of music emotionally but think that that would be hyperbole: https://en.wikipedia.org/wiki/Amusia#Social_and_emotional

Comment by gwern on China Hawks are Manufacturing an AI Arms Race · 2024-12-01T18:00:23.524Z · LW · GW

Benjamin Todd reports back from "a two-week trip in China" on "Why a US AI 'Manhattan Project' could backfire: notes from conversations in China" (cf Dwarkesh), hitting very similar points about lack of funding/will despite considerable competence, and that:

So what might trigger a wake up? Most people said they didn’t know. But one suggestion was that the fastest way would be a high-profile US state-led AI project (especially if its explicit goal is US dominance…).

This means calls for a US "Manhattan Project" for AGI might easily be self-defeating. If maintaining a technological lead is your goal, better to stfu and hope the status quo persists as long as possible. (Or if you do go ahead, you need much stricter export restrictions.)

Comment by gwern on A Meritocracy of Taste · 2024-12-01T02:01:24.099Z · LW · GW

How does that differ from standard recommender systems?

Comment by gwern on gwern's Shortform · 2024-12-01T01:58:58.131Z · LW · GW

At least in theory, the comments, particularly a 'related work' comment-tree, would do that already by talking about other LW articles as relevant. (All of which the LLM should know by heart due to the finetuning.)

Might not work out of the box, of course, in which case you could try to fix that. You could do a regular nearest-neighbors-style look up and just send that to the author as a comment ("here are the 20 most similar LW articles:"); or you could elaborate the virtual comments by adding a retrieval step and throwing into the prompt metadata about 'similar' articles as the draft, so the generated comments are much more likely to reference them.

Comment by gwern on Harri Besceli's Shortform · 2024-11-30T23:58:11.848Z · LW · GW

Yeah, my point was simply that we have "p-zombies" of sorts involving sleep, which demonstrate that you can take complex actions conscious-like reacting to the real world during sleep which would normally (if done while waking) seem to entail intense emotion but appear to not involve real emotion much or at all. So that helps support the idea that in a different part of sleep, you could be taking complex actions conscious-like reacting to a mental world which would seem to entail intense emotions but aside from the remembered content and some weak physiological traces like sweat or heart-rate, do not appear to involve real emotion much or at all.

(Which part of 'sleep' the former happens in is not too important; but of course it is better if it can happen in REM proper, to more strongly support the thesis that it could happen elsewhere during REM.)

Comment by gwern on Lao Mein's Shortform · 2024-11-30T23:49:15.802Z · LW · GW

The use of “所以” instead of “因此” and other tics may also indicate the use of machine-translated COT from English during training.

I don't see why there would necessarily be machine-translated inner-monologues, though.

If they are doing the synthesis or stitching-together of various inner-monologues with the pivot phrases like "wait" or "no, that's wrong", they could simply do that with the 'native' Chinese versions of every problem rather than go through a risky lossy translation pass. Chinese-language-centric LLMs are not so bad at this point that you need to juice them with translations of English corpuses - are they?

Or if they are using precanned English inner-monologues from somewhere, why do they need to translate at all? You would think that it would be easy for multi-lingual models like current LLMs, which so easily switch back and forth between languages, to train on o1-style English inner-monologues and then be able to 'zero-shot' generate o1-style Chinese inner-monologues on demand. Maybe the weirdness is due to that, instead: it's being very conservative and is imitating the o1-style English in Chinese as literally as possible.

Comment by gwern on China Hawks are Manufacturing an AI Arms Race · 2024-11-30T20:44:28.078Z · LW · GW

Tactical Surprise and Strategic Ambiguity are real things with real benefits.

And would imply that were one a serious thinker and proposing an arms race, one would not be talking about the arms race publicly. (By the way, I am told there are at least 5 different Chinese translations of "Situational Awareness" in circulation now.)

So, there is a dilemma: they are doing this poorly, either way. If you need to discuss the arms race in public, say to try to solve a coordination problem, you should explain what the exit plan is rather than uttering vague verbiage like "robust military advantage" (even if that puffery is apparently adequate for some readers); and if you cannot make a convincing public case, then you shouldn't be arguing about it in public at all. Einstein didn't write a half-assed NYT op-ed about how vague 'advances in science' might soon lead to new weapons of war and the USA should do something about that; he wrote a secret letter hand-delivered & pitched to President Roosevelt by a trusted advisor.

I think both can be true

Maybe, but then your example doesn't prove it, if you are now conceding that Stuxnet is not a decisive advantage after all. If it was not, then NATSEC willingness to, hesitantly, push the Suxnet button is not relevant. And if it was, then the outcome also refutes you: they pushed the button, and it didn't work. You chose a bad example for your claims.

and if there was a "destroy all Chinese long-range weapons and High Performance Computing clusters" NATSEC would pound that button.

Note what you just did there. You specified a precise strategy: "nanobot swarm that melts all of the GPUs". I pointed out just some of the many problems with it, which are why one would almost certainly choose to not execute it, and you have silently amended it to "nanobot swarm that melts all of the GPUs and all Chinese long-range weapons". What other issues might there be with this new ad hoced strategy...?

The game theory implications of China waking up to finding all of their long-range military assets and GPUs have been destroyed are not what you are suggesting.

...for example, let me just note this: "destroyed long-range military assets can be replaced"{{citation needed}}.

While this is a clever play on words, it is not a good argument

Then why did you bring it up in the first place as a thing which distinguished nukes from AGI, when it did not, and your response to that rebuttal is to dismiss 'hyper-exponential' as mere word-play?

Comment by gwern on Is the mind a program? · 2024-11-30T19:22:35.885Z · LW · GW

Edit: I think that's a pretty accurate description of what happened, maybe you could argue with some parts of it?

I think one could argue with a lot of your description of how Charles Darwin developed his theory of evolution after the H.M.S. Beagle expedition and decades of compiling examples and gradually elaborating a theory before he finally finished Origin of Species.

Comment by gwern on China Hawks are Manufacturing an AI Arms Race · 2024-11-30T18:59:50.467Z · LW · GW

Probably this is supposed to work like EY's "nanobot swarm that melts all of the GPUs".

I would like to see them state things a little more clearly than commentators having to guess 'well probably it's supposed to work sorta like this idk?', and I would also point out that even this (a strategy so far outside the Overton Window that people usually bring it up to mock EY as a lunatic) is not an easy cheap act if you actually sit down and think about it seriously in near mode as a concrete policy that, say, President Trump has to order, rather than 'entertaining thought experiment far mode with actual humans replaced by hypercompetent automatically-strategic archetypes'.

It is a major, overt act of war and utter alarming shameful humiliating existential loss of national sovereignty which crosses red lines so red that no one has even had to state them - an invasion that no major power would accept lying down and would likely trigger a major backlash; once you start riding that tiger, you're never getting off of it. Such an act would make a mockery of 103 years of CCP history and propaganda, and undermine every thing they have claimed to succeed at and explode 'the China Dream'. (Just imagine if the Chinese did that to the USA? 'Pearl Harbor' or 'Sputnik' or '9/11' might scarcely begin to cover how Americans would react.) And if such a strategy were on the table, it would likely have been preceded by explicit statements by sovereign nations that such actions would be considered equivalent to invasions or nuclear strikes and justifying response in kind. (Like, as it happens, has been a major focus of Xi's military investments in order to more credibly threaten the US over actions elsewhere.)

To believe that the same group that pulled of Stuxnet

A great example, thank you for reminding me of it as an illustration of the futility of these weak measures which are the available strategies to execute.

Stuxnet was designed to attack as few targets as possible and conceal itself thoroughly, and had no casualties, but it was still a major enterprise for the USA & Israel to authorize, going straight to the top with personal involvement from Bush & Obama themselves, at times seriously considering killing the entire effort (which the US continues to not acknowledge all these years later). Further, Stuxnet was not a decisive advantage, and the USA and Israel did nothing thanks to Stuxnet-caused delays which resulted in a permanent resolution to Iran and nukes: they did not invade, they did not permanently hack all Iranian nuclear programs and rendered work futile, they did not end the Iranian nuclear program, they did not any of that - and Iran continued low-key pursuing nukes right up to the present day. The only reason Iran doesn't have nukes right now is not because it lacks a breakout capacity or was unable to do it long before if it had made that the #1 priority, but because it doesn't want to enough. Not because of Stuxnet.

(It did, however, succeed in making them even angrier and paranoid and more embittered against the USA & Israel, and contributing to deterioration in relations and difficulties in the negotiations for a nuclear deal which were the closest any strategy has come to stopping Iran nuclearizing... It has also been criticized for inaugurating a new age of nation-state malware, so one might also ask the planners of "Olympic Games" what their plan was to 'bury the body' once their malware succeeded and was inevitably eventually discovered.)

It's also worth nothing AGI is not a zero-to-one event but rather a hyper-exponential curve.

Nukes were a hyper-exponential curve too. Large high-explosives mining, fire storms, conventional explosives like the Mother of All Bombs... IIRC AI Impacts has a page showing the increase in yield over time, and Hiroshima, being such a small nuke, is not as much of a "zero-to-one event" as one might think. Just a very sharp curve, exacerbated by additional developments like missiles and increases in yields, which can look zero-to-one if you looked away for a few years and had a low anchoring point.

Comment by gwern on China Hawks are Manufacturing an AI Arms Race · 2024-11-30T18:42:54.156Z · LW · GW

Benchmarks for o1 were included in the o1/o1-preview announcement, and you could eyeball the jumps as roughly equal for 4o → o1-preview → o1. (Another way to put it: the o1-preview you have access to has only half the total gain.) So if you only match o1-preview at its announcement, you are far behind o1 back then, and further behind now.

Comment by gwern on gwern's Shortform · 2024-11-30T17:14:10.615Z · LW · GW

You could do that, but it adds on a lot of additional hassle and problems, and I don't think this is a 'big enough' application to justify the overhead or elicit enough corrections.

If the comments are generated LW-side and only upvoted/downvoted by the author, that uses no additional functionality and has no security problems; if you let authors edit arbitrary comments by other users on their drafts, which would be new functionality (I'm not aware of any place where you get to edit other users' comments), now you suddenly have a potential security vulnerability. You also now have to track and save those edited versions somewhere as part of your corpus to finetune with, as opposed to simply slurping out the public database like any other user like GreaterWrong.

And then who would do so? Authors would have to do a lot of work to edit comments to, perhaps, one day slightly improve the feedback on others' posts - not too incentive-compatible, nor even clearly a good use of time compared to working on their own post. (After all, presumably if the comments are bad in an easily fixed way based on the current draft, then the LLM training on the final version of that draft should get a lot of the value of the edits.) It is hard enough work to simply write a post or read comments on it and edit the post based on it for the intended human readers; I can't imagine having such a surplus of time & energy I'd be able to go and rewrite all of the virtual comments just for future specialized LW-only LLM finetunes.

Comment by gwern on China Hawks are Manufacturing an AI Arms Race · 2024-11-30T15:01:00.713Z · LW · GW

Just today, Deepseek claimed to match O1-preview performance--that is a two month delay.

Why is that comparison not to the much better GPT-4 o1 then, or the doubtless better o1 now?

Comment by gwern on China Hawks are Manufacturing an AI Arms Race · 2024-11-30T14:39:52.204Z · LW · GW

No, my problem with the hawks, as far as this criticism goes, is that they aren't repeatedly and explicitly saying what they will do. (They also won't do it, whatever 'it' is, even if they say they will; but we haven't even gotten that far yet.) They are continually shying away from cashing out any of their post-AGI plans, likely because they look at the actual strategies that could be executed and realize that execution is in serious doubt and so that undermines their entire paradigm. ("We will be greeted as liberators" and "we don't do nation-building" come to mind.)

Your quoted uses are a case in point of the substitution of rhetoric for substance. 'Robust military superiority' is not a decisive advantage in this sense, and is not 'conquering the world' or executing any of the strategies I mentioned; and in fact, this sort of vague bait-and-switch handwaving rhetoric, which is either wrong or deceptive about what they mean, is much of what I am criticizing: Oh, you have 'robust military superiority'? That's nice. But how does it actually stop Xi from getting AGI? Be concrete. How, exactly, do you go from eg. 'the USA has some cool new bombs and nanotech thanks to running hundreds of thousands of Von Neumann AGI instances' to 'China [and every other rival country] has no AGI program and will not for the foreseeable future'?

The USA, for example, has always had 'robust military superiority' over many countries it desired to not get nukes, and yet, which did get nukes. (If you don't like the early Cold War USSR example, then consider, say, North Korea pre-2006. The USA has always had 'robust military superiority' over the DPRK, and yet, here we are with Kim Jong Un having USA-range ICBMs and nukes. Why? Because the USA has always looked at the cost of using that 'robust military superiority', which would entail the destruction of Seoul and possibly millions of deaths and the provoking of major geopolitical powers - such as a certain CCP - and decided it was not worth the candle, and blinked, and kicked the can down the road, and after about three decades of can-kicking, ran out of road. Because the DPRK made nukes its #1 priority, ahead of lesser priorities like 'not starving to death', and it turns out that it's rather hard to compel a sovereign country - even an extremely impoverished, isolated, weak country suffering from regular famines - to not pursue its #1 priority. It's a lot easier to dissuade it from its #100 priority or something. But from #1? Difficult. Very difficult.)

Indeed, the USA has long had 'robust military superiority' over almost every country in the world not named "China" or "Russia", and yet, those other countries continue doing many things the USA doesn't like.{{citation needed}} So having 'robust military superiority' is perhaps not all it's cracked up to be...

All this statement means is that 'you lose even if you win': 1. You race to AGI, 'win', you gain 'robust military superiority' which means something like "the USA can conquer China or otherwise force it to credibly terminate all AGI-related activities, if it's willing to start a AGI-powered world war which will kill tens of millions of Chinese and crash the global economy (in the best case scenario)"; 2. Xi launches the national emergency crash AGI program like a 'two bombs, one satellite' program as the top national priority, the USA threatens to use its 'robust military superiority' if that AGI program is not canceled and condescendingly offers table scraps like gimped APIs, Xi says "no ur mom, btw, I have lots of nukes and cities to spare for the sake of China's future"... and then what? Answer: no world war starts, and the Chinese AGI program finishes on schedule as if that 'robust military superiority' never existed. (A threat both sides know will not be executed is no threat at all.) 3. ??? 4. Profit!

("arms race bros will srsly launch a global arms race by saying they'll use the robust military superiority from winning the arms race to stop rival AGI programs, and then will not stop rival AGI programs")

Comment by gwern on China Hawks are Manufacturing an AI Arms Race · 2024-11-30T03:52:58.684Z · LW · GW

Missing the point. This is not about being too stupid to think of >0 strategies, this is about being able & willing to execute strategies.

I too can think of 100 things, and I listed several diverse ways of responding and threw in a historical parallel just in case that wasn't clear after several paragraphs of discussing the problem with not having a viable strategy you can execute. Smartness is not the limit here: we are already smart enough to come up with strategies which could achieve the goal. All of those could potentially work. But none of them seem realistically on the table as something that the USA as it currently exists would be willing to commit to and see through to completion, and you will note that few critics - and no one serious - is responding something like, "oh sure, all part of the plan already, see our white paper laying out the roadmap: after we win, we would then order the AGIs to hack the planet and ensure our perpetual hegemony; that is indeed the exit plan. We botched it last time with nukes and stood by and let everyone else get nukes, but we'll follow through this time."

There is no difference between "won't execute a strategy" and "can't execute a strategy": they are the same thing. The point is that a strategy (like a threat) has to be executable or else it's not an actual strategy. And acting as if you can execute a strategy that you won't can lead you to take terrible decisions. You are like the cat who thinks before climbing a tree: "obviously, I will just climb back down", and who then proceeds climb up and to not climb back down but mew piteously. Well, maybe you shouldn't've climbed up in the first place then...?

("arms race bros will srsly launch a global arms race by saying they'll use the decisive advantage from winning the arms race to conquer the world, and then will not conquer the world")

Comment by gwern on gwern's Shortform · 2024-11-30T00:13:16.199Z · LW · GW

Idea for LLM support for writing LessWrong posts: virtual comments.

Back in August I discussed with Rafe & Oliver a bit about how to integrate LLMs into LW2 in ways which aren't awful and which encourage improvement---particularly using the new 'prompt caching' feature. To summarize one idea: we can use long-context LLMs with prompt caching to try to simulate various LW users of diverse perspectives to write useful feedback on drafts for authors.

(Prompt caching (eg) is the Transformer version of the old RNN hidden-state caching trick, where you run an input through the (deterministic) NN, and then save the intermediate version, and apply that to arbitrarily many future inputs, to avoid recomputing the first input each time, which is the naive way to do it. You can think of it as a lightweight finetuning. This is particularly useful if you are thinking about having large generic prompts---such as an entire corpus. A context window of millions of tokens might take up to a minute & $1 to compute currently, so you definitely need to be careful and don't want to compute more than once.)

One idea would be to try to use LLMs to offer feedback on drafts or articles. Given that tuned LLM feedback from Claude or ChatGPT is still not that great, tending towards sycophancy or obviousness or ChatGPTese, it is hardly worthwhile running a post through a generic "criticize this essay" prompt. (If anyone on LW2 wanted to do such a thing, they are surely capable of doing it themselves, and integrating it into LW2 isn't that useful. Removing the friction might be helpful, but it doesn't seem like it would move any needles.)

So, one way to force out more interesting feedback would be to try to force LLMs out of the chatbot assistant mode-collapse, and into more interesting simulations for feedback. There has been some success with just suggestively-named personas or characters in dialogues (you could imagine here we'd have "Skeptic" or "Optimist" characters), but we can do better. Since this is for LW2, we have an obvious solution: simulate LW users! We know that LW is in the training corpus of almost all LLMs and that writers on it (like myself) are well-known to LLMs (eg. truesight). So we can ask for feedback from simulated LWers: eg. Eliezer Yudkowsky or myself or Paul Christiano or the author or...

This could be done nicely by finetuning a "LW LLM" on all the articles & comments, with associated metadata like karma, and then feeding in any new draft or article into it, and sampling a comment from each persona. (This helps instill a lot of useful domain knowledge, but also, perhaps more importantly, helps override the mode-collapse and non-judgmentalness of assistant LLMs. Perhaps the virtual-gwern will not be as acerbic or disagreeable as the original, but we'll take what we can get at this point...) If there is some obvious criticism or comment Eliezer Yudkowsky would make on a post, which even a LLM can predict, why not deal with it upfront instead of waiting for the real Eliezer to comment (which is also unlikely to ever happen these days)? And one can of course sample an entire comment tree of responses to a 'virtual comment', with the LLM predicting the logical respondents.

This can further incorporate the draft's author's full history, which will usually fit into a multi-million token context window. So their previous comments and discussions, full of relevant material, will get included. This prompt can be cached, and used to sample a bunch of comment-trees. (And if finetuning is infeasible, one can try instead to put the LW corpus into the context and prompt-cache that before adding in the author's corpus.)

The default prompt would be to prompt for high-karma responses. This might not work, because it might be too hard to generate good high-quality responses blindly in a feedforward fashion, without any kind of search or filtering. So the formatting of the data might be to put the metadata after a comment, for ranking purposes: so the LLM generates a response and only then a karma score, and then when we sample, we simply throw out predicted-low-score comments rather than waste the author's time looking at them. (When it comes to these sorts of assistants, I strongly believe 'quality > quantity', and 'silence is golden'. Better to waste some API bills than author time.)

One can also target comments to specific kinds of feedback, to structure it better than a grab-bag of whatever the LLM happens to sample. It would be good to have (in descending order of how likely to be useful to the author) a 'typo' tree, a 'copyediting'/'style'/'tone' tree, 'confusing part', 'terminology', 'related work', 'criticism', 'implications and extrapolations', 'abstract/summary' (I know people hate writing those)... What else? (These are not natural LW comments, but you can easily see how to prompt for them with prompts like "$USER $KARMA $DATE | Typo: ", etc.)

As they are just standard LW comments, they can be attached to the post or draft like regular comments (is this possible? I'd think so, just transclude the comment-tree into the corresponding draft page) and responded to or voted on etc. (Downvoted comments can be fed back into the finetuning with low karma to discourage feedback like that.) Presumably at this point, it would not be hard to make it interactive, and allow the author to respond & argue with feedback. I don't know how worthwhile this would be, and the more interaction there is, the harder it would be to hide the virtual comments after completion.

And when the author finishes writing & posts a draft, the virtual comments disappear (possibly entirely unread), having served their purpose as scaffolding to help improve the draft. (If the author really likes one, they can just copy it in or quote it, I'd think, which ensures they know they take full responsibility for it and can't blame the machine for any mistakes or confabulations or opinions. But otherwise, I don't see any real reason to make them visible to readers of the final post. If included at all, they should prominently flagged---maybe the usernames are always prefixed by AI_$USER to ensure no one, including future LLMs, is confused---and definitely always sort to the bottom & be collapsed by default.)

Comment by gwern on Bogdan Ionut Cirstea's Shortform · 2024-11-29T22:33:05.480Z · LW · GW

(Relevant, although "involving its GPT-4 AI model" is a considerably weaker statement than 'initialized from a GPT-4 checkpoint'.)

Comment by gwern on Rationality Quotes July 2014 · 2024-11-29T02:51:06.939Z · LW · GW

I think Alistair might have mangled the story there. There does seem to be a Charles II/fish/weight story, but about a completely different weight - in water, not postmortem: https://gwern.net/doc/philosophy/epistemology/1948-oesper.pdf Fortunately, while the question Charles II posed in this version is considerably clunkier, the upshot remains the same, so there are much worse leprechauns...

(Although the sourcing here is still thinner than I'd like and may not be the original: no date is given, but Schönbein was born in 1799 and Charles II died in 1685, and an 1842 publication still leaves at least 157 years between the latest the story could've happened and this exact publication. But I'll leave it to someone else to try to track it further back.)

Comment by gwern on keltan's Shortform · 2024-11-29T00:52:15.969Z · LW · GW

Which was not terribly secret. The details of the Project were indeed super-secret, to the point where most of the politicians hadn't known anything, but despite the massive global-scale censorship & secrecy, many had observed the signs of a major project of some sort and some got as far as a nuclear bomb specifically. Also, commercial satellites with meter resolution did not exist which could quantify major facilities or new cities like Los Alamos or Hanford (but overflights, and then satellites, now exist and have helped reveal later top-secret nuclear bomb programs). An AI Manhattan Project, as we currently think of it, would be amusingly similar in footprint (eg. energy consumption) to the original and often observable from space: all those gigawatts have to go somewhere, after all.* I'm sure you can find plenty more about secrecy breaches in Rhodes.

This was not necessarily all that useful in the context of WWII - of course America had some big secret projects going, everyone did. It was a total world war. Everyone was aware there was a war on. The devil was in the details of what the program was - a failure like the V2-s, or a success like Enigma decrypts and Manhattan? But a binary exists/does-not-exist is useful in a peacetime context and the current discussion.

(If nothing else, the fact that DeepSeek keeps publishing is a signal. I would note here BTW that you cannot argue, without tying yourself into some pretzel knots explaining 4-D chess logic, that Chinese AI is about to catch up to and surpass the West because the best Chinese AI group, DeepSeek, just released a model or published this-or-that revealing the secrets of OA, and argue that there is already a secret all-out Chinese Manhattan Project going on which will potentially reach AGI first - because the first thing the latter would have done is stop the former from publishing anything which might help Western AI and then devour it for researchers.)

* A wag on Twitter has pointed out that the total energy/heat output of something like a GPT-4 or GPT-5 training run is the same as or larger than the output of a Hiroshima/Nagasaki-scale nuclear bomb explosion. Which is helpful intuition for why your datacenters need so much cooling, at least.

Comment by gwern on Bogdan Ionut Cirstea's Shortform · 2024-11-28T19:46:45.418Z · LW · GW

IIRC OAers also said somewhere (doesn't seem to be in the blog post, so maybe this was on Twitter?) that o1 or o1-preview was initialized from a GPT-4 (a GPT-4o?), so that would also rule out a literal parameter-size interpretation (unless OA has really brewed up some small models).

Comment by gwern on China Hawks are Manufacturing an AI Arms Race · 2024-11-28T19:42:34.045Z · LW · GW

(All of which I consider to be consistent with my summary, if anyone is wondering, and thus, given that Hsu did not choose to object to any of the main points of my summary in his clarifications, are confirmation.)

Comment by gwern on Eli's shortform feed · 2024-11-28T19:38:25.307Z · LW · GW

Yes, I'd assume a sensible implementation would transfer the metadata as well - the new post would have the same date, karma, and comments as the original comment. Just as if it had always been posted as a post.

Comment by gwern on Eli's shortform feed · 2024-11-27T02:30:44.329Z · LW · GW

Just ask a LLM. The author can always edit it, after all.


My suggestion for how such a feature could be done would be to copy the comment into a draft post, add LLM-suggested title (and tags?), and alert the author for an opt-in, who may delete or post it.

If it is sufficiently well received and people approve a lot of them, then one can explore optout auto-posting mechanisms, like "wait a month and if the author has still neither explicitly posted it nor deleted the draft proposal, then auto-post it".

Comment by gwern on notrishi's Shortform · 2024-11-26T15:09:11.907Z · LW · GW

Unfortunately, it's a lot easier to come up with good, or at least interesting, capability ideas than alignment ideas; and on the rare occasion I've had worthwhile alignment ideas, they often turn out to be tied to capabilities anyway.

Comment by gwern on Cole Wyeth's Shortform · 2024-11-25T19:08:31.040Z · LW · GW

(Not "idle worship"?)

Comment by gwern on DeepSeek beats o1-preview on math, ties on coding; will release weights · 2024-11-25T18:07:13.670Z · LW · GW

DeepSeek is Chinese. I'm not really familiar with the company.

DeepSeek is the best Chinese DL research group now and have been for at least a year. If you are interested in the topic, you ought to learn more about them.

I thought Chinese companies were at least a year behind the frontier.

This seems roughly consistent with what you would expect. People usually say half a year to a year behind. Q* was invented somewhere in summer 2023, according to the OA coup reporting; ballpark June-July 2023, I got the impression since it seemed to already be a topic of discussion with the Board c. August 2023 pre-coup. Thus, we are now (~20 Nov 2024) almost at December 2024, which is about a year and a half. o1-preview was announced 12 September 2024, 74 days ago, and o1-preview's benchmarks were much worse than the true o1 which was still training then (and of course, OA has kept improving it ever since, even if we don't know how - remember, time is always passing†, and what you read in a blog post may already be ancient history). Opensource/competitor models (not just Chinese or DeepSeek specifically) have a long history of disappointing in practice when they turn out to be much narrower, overfit to benchmarks, or otherwise somehow lacking in quality & polish compared to the GPT-4s or Claude-3s.

So, if a competing model claims to match o1-preview from almost 3 months ago, which itself is far behind o1, with additional penalties from compensating for the hype and the apples-to-oranges comparisons, and where we still don't know if they are actually the same algorithm at core (inasmuch as neither OA nor DeepSeek, AFAIK, have yet to publish any kind of detailed description of what Q*/Strawberry/r1 is), and possibly worst-case as much as >1.5 years behind if DS has gone down a dead end & has to restart... This point about time applies to any other Chinese replications as well, modulo details like possibly suggesting DeepSeek is not so good etc.

Overall, this still seems roughly what you would expect now: 'half a year to a year behind'. It's always a lot easier to catch up with an idea after someone else has proven it works and given you an awful lot of hints about how it probably works, like the raw sample transcripts. (I particularly note the linguistic tics in this DS version too, which I take as evidence for my inner-monologue splicing guess of how the Q* algorithm works.)

† I feel very silly pointing this out: that time keeps passing, and if you think that some new result is startling evidence against the stylized fact "Chinese DL is 6-12 months behind" that you should probably start by, well, comparing the new result to the best Western DL result 6–12 months ago! Every time you hear about a new frontier-pushing Western DL result, you should mentally expect a Chinese partial replication in 6–12 months, and around then, start looking for it. / This should be too obvious to even mention. And yet, I constantly get the feeling that people have been losing their sort of... "temporal numeracy", for lack of a better phrase. That they live in a 'Big Now' where everything has happened squashed together. In the same way that in politics/economics, people will talk about the 1980s or 1990s as if all of that was just a decade ago instead of almost half a century ago (yes, really: 2024 − 1980 = 44), many people discussing AI seems to have strangely skewed mental timelines. / They talk like GPT-4 came out, like, a few months after GPT-3 did, maybe? So GPT-5 is wildly overdue! That if a Chinese video model matches OA Sora tomorrow, well, Sora was announced like, a month ago, something like that? So they've practically caught up! OA only just announced o1, and DeepSeek has already matched it! Or like 2027 is almost already here and they're buying plane tickets for after Christmas. There's been a few months without big news? The DL scaling story is over for good and it hit the wall!

Comment by gwern on Which things were you surprised to learn are not metaphors? · 2024-11-25T03:08:45.032Z · LW · GW

I had a related one as a teenager: there are various expressions about women being too beautiful to look at or that it hurt to look at, etc. I thought they were all overwrought literary expressions - a woman you loved or had a crush on, sure, that's love, but just a random woman? - until I went to dinner in a group which happened to include such a woman.

(Thankfully, being a large group in a noisy restaurant, I could get away with not looking at her all evening; although I got a little angry this could even be a thing - I never signed up for that! I've wondered if or how long it'd take for that to wear off, but I never saw her again, so I have no idea.)