Posts
Comments
I'm not sure I see any difference here between regular dangerously convergent instrumental drives and this added risk of 'intrinsic' drives. They just seem like the same thing to me. Like the two predictions you give seem already true and fulfilled:
Relative to other goals, agentic systems are easy to steer to seek power.
Agentic systems seek power outside of the “training distribution”, but in ways which don’t seem to be part of larger power-seeking plans.
Both of these seem like I would expect from a flexible, intelligent agent which is capable of handling many complicated changing domains, like a LLM: they are easy to steer to seek power (see: all the work on RLHF and the superficiality of alignment and ease of steering and low-dimensional embeddings), and they can execute useful heuristics even if those cannot be easily explained as part of a larger plan. (Arguably, that's most of what they do currently.) In the hypotheticals you give, the actions seem just like a convergent instrumental drive of the sort that an agent will rationally develop in order to handle all the possible tasks which might be thrown at it in a bewildering variety of scenarios by billions of crazy humans and also other AIs. Trying to have 'savings' or 'buying a bit of compute to be safe', even if the agent cannot say exactly what it would use those for in the current scenario, seems like convergent, and desirable, behavior. Like buying insurance or adding validation checks to some new code, usually it won't help, but sometimes the prudence will pay off. As humans say, "shit happens". Agents which won't do that and just helplessly succumb to hardware they know is flaky or give up the moment something is a little more than expensive than average or write code that explodes the instant you look at it funny because you didn't say "make sure to check for X Y & Z" - those agents are not good agents for any purpose.
If there are 'subshards' which achieve this desirable behavior because they, from their own perspective, 'intrinsically' desire power (whatever that sort of distinction makes when you've broken things down that far), and it is these subshards which implement the instrumental drive... so what? After all, there has to be some level of analysis at which an agent stops thinking about whether or not it should do some thing and just starts doing the thing. Your muscles "intrinsically desire" to fire when told to fire, but the motor actions are still ultimately instrumental, to accomplish something other than individual muscles twitching. You can't have 'instrumental desire' homunculuses all the way down to the individual transistor or ReLU neuron.
Possibly it will still be counterintuitive to many folks, as Said quoted in a sibling comment.
No, this is a little different. Your approach here sounds like ours and the intuitive one (just at the cost of additional complexity).
The 'auto dark mode' we abandoned is where you just use step #2 there and you skip #1 (and thus, any widget or toggle which enables a reader to do anything with localStorage
), and 'auto is the only state'. The logic there is, the reader already has access to a widget or toggle to set their dark mode preference: it's just their OS/browser, which will have some config page somewhere with various settings like 'turn on dark mode at night' or 'always use dark mode' or 'always use light mode'. Just trust the OS/browser and use whatever setting it sends to the web page. Don't waste the effort and screen real estate to add in a redundant widget/toggle. It's handled already. Easier for everyone - it Just Works™!
Unfortunately, the connection between 'a year ago when I updated my Android phone and it asked me if I wanted to use the cool new dark mode, I said yes' and 'this webpage I am reading now is in dark mode for some reason, and I can't change it back to normal???', apparently winds up eluding some readers. (This is what Said's sibling comment is about.) It winds up being "too much magic".
The current toggle+localStorage
+auto approach, on the other hand, while adding to the clutter, does not seem to confuse readers: "the page is in dark-mode, for some reason. But I want want light-mode and I am unhappy. I see a little light-mode button. I push that button. Now the page is in light-mode. I am happy." (And then it is light-mode ever after.) At least, I have seen many fewer (or no) complaints about the dark mode being on when it shouldn't be after we moved to the toggle. So as far as we can tell, it's working.
You might be interested in a small "hybrid LLM" proposal I wrote for using diffusion on embeddings for then decoding/sampling.
At first look, I like your design a lot. Even though I am required to (because "imitation is the sincerest flattery"), it has its own fairly unique style which doesn't immediately remind me of anything else. I like the playfulness and use of some color. I am also impressed by your design writeup: you have covered far more than I would have expected and definitely thought it through. I may wind up stealing some ideas here.
More miscellaneous observations:
-
dark mode selector: you use a 2-state light vs dark selector. This is the obvious thing, but we think that it is ultimately wrong and you need a three-state selector to accommodate "auto". I think this is especially important given how many people now read websites like yours or mine on their smartphones, often at night or in bed, and just assume now that everything will use dark-mode as appropriate. (I'm sure you've seen many screenshots of Gwernnet on Twitter at this point, and noticed that they are almost always smartphones, and then much of the time, dark-mode. I am pretty sure that in most of those cases, it is not because the reader specifically opted-into dark-mode forever, but simply because auto fired, and readers take it for granted. I don't expect auto to become any less common, and the 2-state selector will just get more inappropriate and defaulting to the wrong thing. We have also noted that increasingly, websites are choosing 3-state rather than 2-state the past 2 years, often with nearly-identical semantics & icons, and take this as confirmation of our earlier choice.)
-
Smallcaps acronyms: I did the same thing originally but ultimately removed them. They wound up adding a lot to the page, and while they initially (ahem) looked cool and fancy, they alienated readers and over time I just kept noticing them and feeling more and more alienated by them. Smallcaps may be "proper" typographically, but I think that ship has sailed: we read so little material with acronyms typeset in true smallcaps, that it now achieves the opposite of the intended effect - it's the 'NASA' which is smallcapsed which is bizarre and alien looking, not the regular old 'NASA'. Is it worth spending "weirdness points" on? I ultimately felt not.
-
Color: You mention that link-icons can be chaotic if colored. I agree, but in your case, I think you have a lot of scope to be playful with color.
For example, you went to a lot of trouble to separate the dropcaps and enable the fun colored dropcaps.... but then don't use the colored ones anywhere (right?). So why not make the dropcaps colored... on hover?
In fact, why not make 'fun on hover' a core design principle? "If not friend, why friend-shaped?" Make everything on the site a little friend you can play with. (This would be a good time to try to write down a few catchphrases or design principles to sum up your goals here. Why dropcaps or the animated pond logo? etc) When I look at your pond, I feel like it would be wonderful if the pond was animated on hover - if when I hovered, then it was animated.
Right now, it feels a bit awkward. It's animated just enough to bother me in the corner of my eye, but not enough to consciously notice it. It is also too small, IMO. The detail is illegible at this size, beautiful as the fullsize version is. (What looks good at large size almost never looks as good at small size, like line-height or less, and needs to be redone. This is part of why link-icons are hard.*) Also, in the long run, I think you are better off looking into generative pixel art for adding more images/video in that style. You may think you are willing to pay $270 each time, I'm sure you could afford lots of them for something as close to your heart as your personal website - but you're not. The cost and time will gradually deter you and inherently create a scarcity mindset, sabotaging your creativity and playfulness and willingness to go "wouldn't it be fun if...?". Beware more than trivial inconveniences! This is a website design which would benefit from fun little pixel art motifs all over the place, and you want to be able to flip over to your generative tool as soon as an idea for a trout element hits you and start creating it. You don't have to go all Yamauchi No.10 Family Office on the reader, but for this sort of cozy playful design, I think the more the better, so there's a feeling of always something cute around the corner.
You have a nice fleuron footer. But wouldn't it be so much niftier if that fish were cheerfully animated once I hover over it, and it does little trout flips around my cursor? And if the fleurons became brighter blue or richer texture and more water-like?
And wouldn't it be nice if all of the trout link-icons also turned blue on hover? (I think the trout link-icon spacing is a bit off, incidentally. The Youtube link icon is also definitely bad with the "YouTube's logo is definitely red" example - way too close to the 's'.) We have recently implemented link-icon colors on Gwernnet (some background), and while I'm still not sure how appropriate it is for Gwernnet or if it needs to be rethought, I feel it's very appropriate for your design.
Lots of things you could do with it. For example, you could have a gentle "breathing" cycle of all of the colors, similar to some of Apple's light icons - the page could use JS to very slowly cycle through the default color-less version to the hover versions and back. (Perhaps just for the first minute, or perhaps instead after a few minutes, whatever feels more esthetic.) And Pope suggests that for the AI risk articles, like empowerment, you could have the eyes turn red at random times.
Or you could define the hover colors to be a 'theme' and have different parts of the site have different themes. Theming is a classic thing to do with websites (see eg GreaterWrong). For example, the same way that Gwernnet has different dropcaps for different subjects - the dropcats for the cat essays, the yinit for technical articles, the Goudy for biology, cheshire for literature etc - you could have, I don't know, personal stuff be yellow, technical AI be blue, humanities stuff be green, and so on.
-
Might note "callouts" are also called "admonitions".
-
Visual regression testing: you can also check snapshots of the raw HTML too. Since you are trying to bake a lot into the HTML, this should work well for you and complement the image approach. This can be as just downloading some URLs and running
diff
against a directory of older downloads. I implemented this a few months ago and it was easy to implement and has given me more confidence when I review the lorem unit-test pages to check that any changes in the final HTML make sense. -
I notice way down in the footer a backlinks section, but doesn't seem to be covered in the design page yet? Also, possible bug: the backlinks section of the design page includes... the design page?
-
"Text transformers" seems like a risky terminology choice, especially given your profession & site content. I know I did a double-take when skimming - "he's using text transformers? ooh how interesting - oh wait." Maybe just call them "compilers" or something.
-
Collapses: I prefer collapses to not require clicks because it reduces friction. I think this is especially true of the Table of Contents - if you don't display that by default (which seems like a bad choice on long pages like the design page), at least make them as easy as possible to access!
-
List indentation: your lists do not indent the contents / outdent the list marker. Is that deliberate? (Actually, is this even consistent? It felt like I saw it happening somewhere but not other places...)
-
Overall clutter: on reflection, I agree with the other comments that right now the pages have some degree of clutter. Just doing too much.
An example here would be the underlining in the superscripted counter of the dates like "Published on October 31st, 2024" - it really jumps out at you, when you look at the date line, the 'st' is the first thing you read. This is bad because this is neither in line with the semantics of the rest of the page, where underlining always denotes a hyperlink, nor is it decorative in a way which improves the rest of the page appearance or is consistent with the blue-pixel-art-book esthetic. The 'st' shouldn't be underlined, it should if anything be even smaller or faded out.
Another example would be the slashed-zeros: the slash is somewhat distracting and overloaded on its own and questionable at best (this isn't source code or raw data where confusing 'O'/'0' can be catastrophic) but combined with the zeros also being funny little squashed zeros, you have this overloaded effect where the zeros all over the page keep popping out at you from the corner of your eye or while scanning.
Then you have all of the other flourishes like the swashes for the capital 'Q'... It's just too much. You can have lots of semantics, like the link-icons, or you can have lots of decoration, but you can't have both, not if they are going to often be on the screen together. (Like just at the top of the design page, you're being hit with logos, toggles, faded out text, underlined superscripts, doubly-variant common letters (8 instances of '0' alone), very fancy capital swashes, dropcaps, collapses with icon+chevrons+backgrounds, screenshots inline without a clear border (and everything inside the screenshots tugging at the eye), 2 link-icons, monospace+italics+bold+roman...)
-
Collapses: the '>' for the disclosure toggles seem oddly offset, and just above the midline enough to look like a bug. Either commit to it being superscript or make it exactly middle-looking/inline.
Overall, best new personal website I've seen in a while: ★★★★☆.
I look forward to it being tidied up some more, and seeing what clever new touches you put on it as you keep evolving it and presumably can experiment with things like LLM rewrites or integration or add more pixel art, so I can add that last star. :)
* There is a semi-famous game development anecdote about this effect, about how John Romero's Daikatana wound up shipping so late due to poor management: an artist proudly showed off the multi-thousand-pixel art of the fancy sword they had been slaving away for a while on. The person pointed out to the artist that the sword in question was going to be rendered at like 64x64 pixels, and every detail was going to be invisible when resized, and it was going to look like s---t and so they had just wasted all that work, were going to have to throw it away and start from scratch, and they had fallen that much further behind schedule. A small image is not a large image with fewer pixels, and pixel art is not a drawing with blockier points. · This affects a lot of things - like part of why our new link-icon color feature is so difficult to implement well is that a color which looks fine as a big logo will look totally different as a thin line of a few pixels. It's really quite surprising to me how different things can look when you scale them way down. Something that is clearly purple when I clone it from Paul Graham's website will turn into a 'white' line when I use it as the link color for pg links, or some blue that is medium-colored as a page background will become jet black. So even after the considerable manual labor of getting all of the right colors defined, you still have to do esoteric colorspace transforms to ensure they look right, and I think we're going to have to adjust a bunch of them on top of that as well, once we have time and I can catch up post-Dwarkesh-Patel interview etc. /sigh Good web design is only easy if you don't really care about good results.
Coffee culture in America doesn't have much to do with the Revolutionary War. The rise of coffee is much later than the American Revolution. The brief boycott didn't last (after all, Americans - infamous smugglers in general - were smuggling plenty of tea because of the taxes, so sourcing tea was not a problem) and there was enormous consumption of tea consistently throughout: https://en.wikipedia.org/wiki/American_tea_culture#Colonial_and_Revolutionary_eras In fact, I was surprised to learn recently that American tea was overwhelmingly green tea in the 1800s, and one of the biggest export markets for green tea worldwide.
(This was really surprising to me, because if you look around the 1900s, even as late as the 1990s, black tea is the standard American tea; all iced tea is of course black tea, and your local grocery store would be full of mostly just black teas with a few token green teas, and exactly one oolong tea if you were lucky - as I found out the hard way when I became interested in non-black teas.)
It can be both, of course. Start with process supervision but combine it with... something else. It's hard to learn how to reason from scratch, but it's also clearly not doing pure strict imitation learning, because the transcripts & summaries are just way too weird to be any kind of straightforward imitation learning of expert transcripts (or even ones collected from users or the wild).
Also worth noting Dustin Moskowitz was a prominent enough donor this election cycle, for Harris, to get highlighted in news coverage of her donors: https://www.washingtonexaminer.com/news/campaigns/presidential/3179215/kamala-harris-influential-megadonors/ https://www.nytimes.com/2024/10/09/us/politics/harris-billion-dollar-fundraising.html
There's no way I can meaningfully pick from like 100 covers. Pick 5 or 10, max, if you expect meaningful votes from people.
The extensive effort they make to integrate into legacy systems & languages shows how important that is.
codyz is doubling down on the UFO claims, but as far as I can see, the case has fallen apart so completely no one even wants to discuss it and even Tyler Cowen & Robin Hanson have stopped nudge-nudge-wink-winking it for now.
So I hereby double my unilateral bet to $2,000.
Show me a field where replication crises tear through, exposing fraud and rot and an emperor that never had any clothes, a field where replications fail so badly that they result in firings and polemics in the New York Times and destroyed careers- and then I will show you a field that is a little confused but has the spirit and will get there sooner or later.
So... parapsychology? How'd that work out? Did they have the (ahem) spirit and get there sooner or later?
Considering that one of the primary barriers to civilian nuclear power plants was, and remains, nuclear bomb proliferation risk, I'm not sure how telling this analogy is. There's a big reason that nuclear power plants right now are associated with either avowed nuclear powers or powers that want to be nuclear powers at some point (eg. France, South Korea, North Korea, Japan, Iran, Pakistan...) or countries closely aligned with said nuclear powers. Or rather, it seems to me that the analogy goes the opposite of how you wanted: if someone had 'solved' nuclear reactor design by coming up with a type of nuclear reactor which was provably impossible to abuse for nukes, that would have been a lot more useful for nuclear reactor design than fiddling with details about how exactly to boil water to drive a turbine. If you solve the latter, you have not solved the former at all; if you solve the former, someone will solve the latter. And if you don't, 'nuclear power plant' suddenly becomes a design problem which includes things like 'resistant to Israeli jet strikes' or 'enables manipulation of international inspectors' or 'relies on a trustworthy closely-allied superpower rather than untrustworthy one for support like spent fuel reprocessing and refueling'.
That is what someone might claim, yes, to avoid losing face by too visibly caring about losing face or attempting to manipulate it.
Can't you do this as polls in a single comment?
The fact that Bob has this policy in the first place is more likely when he's being self-deceptive.
A fun fictional example here is Bester's The Demolished Man: how do you plan & carry out an assassination when telepaths are routinely eavesdropping on your mind? The protagonist visits a company musician, requesting a musical earworm for a company song to help the workers' health or something; alas! the earworm gets stuck in his head, and so all any telepath hears is the earworm. And you can't blame a man for having an earworm stuck in his head, now can you? He has an entirely legitimate reason for that to be there, which 'explains away' the evidence of the deception hypothesis that telepathic-immunity would otherwise support.
Hm. Does that imply that a pack of dogs hunting a human is a stag hunt game?
I am also a little dubious that this is defining a concept which doesn't just mostly overlap with "face", which is substantially older, already well-known, and infinitely easier to remember & write.
Most of these examples seem like substituting in 'face' or 'lose face' would work just fine. "Senator, may I cause you to lose face by criticizing you publicly?" "He didn't like the advice I gave him about his errors because he lost face." "She felt infantilized and like losing face when her boyfriend told her how to solve something instead of commiserating with her."
They may, but I think the AI code generators would have to be quite good. As long as the LLMs are merely complementing programming languages, I expect them to remain human-readable & writable; only once they are replacing existing programming languages do I expect serious inscrutability. Programming language development can be surprisingly antiquated and old-fashioned: there are many ways to design a language or encode it where it could be infeasible to 'write' it without a specialized program, and yet, in practice, pretty much every language you'll use which is not a domain-specific (usually proprietary) tool will let you write source code in a plain text editor like Notepad or nano.
The use of syntax highlighting goes back to at least the ALGOL report, and yet, something like 50 years later, there are not many languages which can't be read without syntax highlighting. In fact, there's very few which can't be programming just fine with solely ASCII characters in an 80-col teletype terminal, still. (APL famously failed to ever break out of a niche and all spiritual successors have generally found it wiser to at least provide a 'plain text' encoding; Fortress likewise never became more than a R&D project.) Like this website - HTML, CSS, JS, maybe some languages which compile to JS, SVG... all writable in a 1970s Unix minicomputer printing out to physical paper.
Or consider IDEs which operate at 'project' level or have 'tags' or otherwise parse the code in order to allow lookups of names, like methods on an object - you could imagine programming languages where these are not able to be written out normally because they are actually opaque UUIDs/blobs/capabilities, and you use a structural editor (similar to spreadsheets) to modify everything, instead of typing out names letter by letter like a barbarian. (And 'visual' programming languages often do such a thing.) The Smalltalk systems where you did everything by iteratively interacting with GUI objects come to mind as systems where it's not even clear what the 'plain text' version is, after you've used the systems dynamically as they were intended to be used, and rewritten enough objects or overridden enough methods... But again, few languages in widespread use will do that.
Also of relevance is the wave of resignations from the DC newspaper The Washington Post the past few days over Jeff Bezos suddenly exerting control.
I had no idea ABBYY was so big. I thought it was just some minor OCR or PDF software developer. Interesting to hear about their historical arc. (I am also amused to see my Sutton meme used.)
Our strategy is for variants to preserve well-defined behavior in the application but introduce diversity in the effect of undefined behavior (such as out-of-bounds accesses).
This Galois work is a lot narrower and targeted at low-level details irrelevant to most code, which thankfully is now written in non-C languages - where out-of-bounds accesses don't pwn your machine and undefined behavior does not summon nasal demons and stuff like ASLR is largely irrelevant.
So AI is wholly necessary for most of the value of such a metamorphic code idea.
And yeah, I think it's a pretty decent idea: with cheap enough LLMs, you can harden applications by sampling possible implementations which pass all unit-tests, and whose final combination pass all end-to-end or integration tests. You can already do this a bit to check things with LLMs being so cheap. (Last night, Achmiz asked a Markov chain question and I was too lazy to try to figure it out myself, so I had ChatGPT solve it 3 ways in R: Monte Carlo, solving the matrix, and proving an exact closed-form probability. The answer could be wrong but that seems unlikely when they all seem to agree. If I wanted to write it up, I'd also have Claude solve it independently in Python so I could cross-check all 6 versions...)
This would help avoid a decent number of logic bugs and oversights, and it would also have some benefits in terms of software engineering: you are getting a lot of automated 'chaos engineering' and unit-test generation and performance benchmarking for free, by distributing a combinatorial number of implementations. It's almost like a mass fuzzing exercise, where the users provide the fuzz.
You might think this would run into issues with tracking the combinatorial number of binaries, which could take up petabytes if you are distributing, say, a 1GB package to 1 million users, but this has plenty of possible fixes: if you are using reproducible builds, as you ought to, then you only need to track a list of the variants for each function and store that per user, and then you can rebuild the exact binary for a given user on-demand.* I think a bigger issue is that forcing diversity out of tuned LLMs is quite hard, and so you would run into the systematic error problem at a higher level: all the tuned LLMs, feeding on each others' outputs & mode-collapsed, will turn in code with the same implicit assumptions & algorithms & bugs, which would mostly defeat the point.
* Similarly, the LLMs are, or should be, deterministic and fixable with a seed. So the overhead here might be something like, if you have a codebase with 10,000 functions, each time you push out a release - which might happen daily or weekly - you store the RNG seed for the LLM snapshot ID (maybe a kilobyte total), generate 2 versions of each function and randomize per user, and track 10,000 bits or ~1kb per user, so if you have a million users that's just a gigabyte. Whenever you need to investigate a specific binary because it triggered a crash or something, you just fetch the LLM ID & RNG, decode the specific 10,000 function variants they used, and compile. For anyone with millions of users who is serious about security or reliability, a gigabyte of overhead per release is nothing. You already waste that much with random Docker images and crap.
Maybe a better framing would be the economic perspective from Hanson's growth paper: "is AI a complement or is it a substitute?" Does AI assist a human worker (or a human organization), making them more productive, functioning as simply a kind of tool (or 'capital') which multiplies their labor; or does it replace that human worker/organization? When it's the former, it may indeed take a very long time; but the latter can happen instantly.
No one can force a freelance artist to learn to use Photoshop or how to best use some snazzy new feature, and artists will be learning the ins-and-outs of their new technologies and workflows for many decades to come and slowly becoming more productive thanks to their complementing by digital illustration tools. Whereas on the other hand, their employers can replace them potentially in minutes after the next big Midjourney upgrade.*
More historically, in colonization, a group of settlers may simply arrive literally overnight in their wagons and set up a new town (eg. a gold rush boomtown), and begin replacing the local indigenous peoples, without any sort of centuries-long gradual '+2% local per capita GDP growth per year until convergence' using only the original local indigenous people's descendants.
* A personal example: when I wanted more fancy dropcaps for Gwern.net, I was contacting human artists and trying to figure out how much it would cost and what the workflow was, and how many thousands of dollars & months of back-and-forth a good dropcap set might cost, and if I would have to settle for instead something like 1 custom dropcap per essay. When Midjourney became reasonably adequate at v5 & DALL-E at 3, I didn't spend decades working with artists to integrate AI into their workflow and complement their labor... I substituted AI for artists: stopped my attempt to use them that night, and never looked back. When I made 10 dropcaps for this year's Halloween theme (the 'purple cats' got particularly good feedback because they're adorable), this is something I could never do with humans because it would be colossally expensive and also enormously time-consuming to do all that just for a special holiday mode which is visible a few hours out of the year. At this point, I'm not sure how many artists or font designers I would want to use even if they were free, because it means I don't have to deal with folks like Dave or have one of my projects delayed or killed by artists, or the hassle of all the paperwork and payments, and I get other benefits like extremely rapid iteration & exploration of hundreds of possibilities without wearing out their patience etc.
This is similar to the answer I got from o1-preview
in ChatGPT when I originally asked with OP's post as the text, so that's pleasant to see. (I didn't post anything here because I was unsure and wasn't checking it in enough detail to repost, and so didn't believe in publishing it without being able to improve it.)
I thought there might be some relationship at first with an appropriate transformation, but when I recalled how Kelly requires both edge and net worth, and the problem of frequency of payoffs, I lost my confidence that there would be any simple elegant relationship beyond a simple 'more information = more returns'. Why indeed would you expect 1 bit of information to be equally valuable for maximizing expected log growth in eg. both a 50:50 shot and a 1,000,000,000:1 shot? Or for a billionaire vs a bankrupt? (Another way to think of it: suppose you have 1 bit of information on both over the market and you earn the same amount. How many trades would it take before your more informed trade ever made a difference? In the first case, you quickly start earning a return and can compound that immediately; in the second case, you might live a hundred lives without ever once seeing a payoff.)
There the main bottleneck is the iteration of selection, or making synthetic genomes. Going for the most typical genome with the least amount of originality is not a technical challenge in itself, right?
Right.
If you are doing genome synthesis, you aren't frustrated by the rare variant problems as much because you just aren't putting them in in the first place; therefore, there is no need to either identify the specific ones you need to remove from a 'wild' genome nor make highly challenging edits. (This is the 'modal genome' baseline. I believe it has still not been statistically modeled at all.)
While if you are doing iterated embryo selection, you can similarly rely mostly on maximizing the common SNPs, which provide many SDs of possible improvement, and where you have poor statistical guidance on a variant, simply default to trying to select out against them and move towards a quasi-modal genome. (Essentially using rare-variant count as a tiebreaker and slowly washing out all of the rare variants from your embryo-line population. You will probably wind up with a lot in the final ones anyway, but oh well.)
Which would be a good thing as nominally they claim to let everyone opt out of scraping already by using robots.txt
and other methods, and so the canary shouldn't do anything there that people couldn't already do.
No, that's what I think too: they were turning down investors, even, excluding them from the upround. The conditionality was probably not necessary at all. But it does serve a valuable purpose for the inevitable lawsuit.
With SNPs, there's tens of thousands of different SNPs which would each need to be targeted differently. With high copy sequences, there's a relatively small set of different sequences.
No, rare variants are no silver bullet here. There's not a small set, there's a larger set - there would probably be combinatorially more rare variants because there are so many ways to screw up genomes beyond the limited set of ways defined by a single-nucleotide polymorphism, which is why it's hard to either select on or edit rare variants: they have larger (harmful) effects due to being rare, yes, and account for a large chunk of heritability, yes, but there are so many possible rare mutations that each one has only a few instances worldwide which makes them hard to estimate correctly via pure GWAS-style approaches. And they tend to be large or structural and so extremely difficult to edit safely compared to editing a single base-pair. (If it's hard to even sequence a CNV, how are you going to edit it?)
They definitely contribute a lot of the missing heritability (see GREML-KIN), but that doesn't mean you can feasibly do much about them. If there are tens of millions of possible rare variants, across the entire population, but they are present in only a handful of individuals a piece (as estimated by the GREML-KIN variance components where the family-level accounts for a lot of variance), it's difficult to estimate their effect to know if you want to select against or edit them in the first place. (Their larger effect sizes don't help you nearly as much as their rarity hurts you.)
So this is why if you read the CNV studies and you look at the hits they identify, and how many subjects are covered by the identified hits, you find that like, maybe 2% of the cohort will have one of those specific identified hits and lose 2 IQ points or gain 2 kg of fat etc. So you can see how that would work out in embryo selection: you'd be able to avoid that loss, which is meaningful! ...in a tiny fraction of all embryos. On average, you'd just sequence them all, find no known pathogenic variant, and shrug, and use the SNP PGS like usual, having gained nothing.
Also, of course, WGS is substantially more expensive than SNP genotyping and more difficult to do on embryos.
If the genetic architecture had worked out otherwise, if there had instead been a lot of rare mutations which increased intelligence, then life would be a lot more convenient. Instead, it's a lot of 'sand in the gears', and once you move past the easy specks of sand, they all become their own special little snowflakes.
This is why rare variants are not too promising, although they are the logical place to go after you start to exhaust common SNPs. You probably have to find an alternative approach like directly modeling or predicting the pathogenicity of a rare variant from trying to understand its biological effects, which is hard to do and hard to quantify or predict progress in. (You can straightforwardly model GWAS on common SNPs and how many samples you need and what variance your PGS will get, but predicting progress of pathogenicity predictors has no convenient approach.) Similarly, you can try very broad crude approaches like 'select embryos with the fewest de novo mutations'... but then you lose most of the possible variance and it'll add little.
Or is it, 'OpenAI the for-profit is doing good in the world, and they can do much more good if they can raise more money, and there's certainly no way they could raise more money without us giving up control'?
Basically, yes, that is what the argument will be. The conditionality of the current investment round is also an example of that: "we can only raise more capital on the condition that we turn ourselves into a normal (B-corp) company, unencumbered by our weird hybrid structure (designed when we thought we would need OOMs less capital than it turns out we do), and free of the exorbitant Board control provisions currently governing PPUs etc. And if we can't raise capital, we will go bust soon and will become worthless and definitely lose the AGI race, and the Board achieves none of its fiduciary goals at all. Better a quarter of a live OA than 100% of a dead one."
Well, what's the alternative? If you think there is something weird enough and suboptimal about essay formats that you are reaching for 'random chance' or 'monkey see monkey do' level explanations, that implies you think there is some much superior format they ought to be using instead. But I can't see what. I think it might be helpful to try to make the case for doing these things via some of the alternatives:
- a peer-reviewed Nature paper which would be published 2 years from now, maybe, behind a paywall
- a published book, published 3 years from starting the first draft now, which some people might get around to reading a year or two after that, and dropping halfway through (assuming you finish and didn't burn out writing it)
- a 1 minute Tiktok video by an AI person with non-supermodel looks
- a 5-minute heavily-excerpted interview on CNN
- a 750-word WSJ or NYT op-ed
- a 10-page Arxiv paper in the standard LaTeX template
- a Twitter thread of 500 tweets (which can only be read by logged-in users)
- a Medium post (which can't be read because it is written in a light gray font illegible to anyone over the age of 20. Also, it's paywalled 90% of the time.)
- a 6 hour Lex Fridman podcast interview, about 4 hours in after Lex has finished his obligatory throatclearing questions (like asking you if aliens exist or the universe is made out of love)
- interpretive dance in front of the Lincoln Memorial livestreamed on Twitch
- ...
(I'd also add in Karnofsky's blog post series.)
Sunglasses can be too cool for most people to be able to wear in the absence of a good reason. Tom Cruise can go around wearing sun glasses any time he wants, and it'll look cool on him, because he's Tom Cruise. If we tried that, we would look like dorks because we're not cool enough to pull it off and it would backfire on us. (Maybe our mothers would think we looked cool.) This could be said of many things: Tom Cruise or Kanye West or fashionable celebrities like them can go around wearing a fedora and trench coat and it'll look cool and he'll pull it off; but if anyone else tries it...
"Gwern, why don't you just buy an AI-themed ETF and 'buy the whole sector' if investing in individual stonks is so hard but you're optimistic about its long-term value?"
"How to Lose Money on the World’s Most Popular Investment Theme: Pity the investors in the three artificial-intelligence-themed ETFs that managed to lose money this year" (mirror):
There are lots of embarrassing ways to lose money, but it is particularly galling to lose when you correctly identify the theme that will dominate the market and manage to buy into it at a good moment.
Pity the investors in the three artificial-intelligence-themed exchange-traded funds that managed to lose money this year. Every other AI-flavored ETF I can find has trailed both the S&P 500 and MSCI World. That is before the AI theme itself was seriously questioned last week, when investor doubts about the price of leading AI stocks Nvidia and Super Micro Computer became obvious.
The AI fund disaster should be a cautionary tale for buyers of thematic ETFs, which now cover virtually anything you can think of, including Californian carbon permits (down 15% this year), Chinese cloud computing (down 21%) and pet care (up 10%). Put simply: You probably won’t get what you want, you’ll likely buy at the wrong time and it will be hard to hold for the long term.
Ironically enough, Nvidia’s success has made it harder for some of the AI funds to beat the wider market. Part of the point of using a fund is to diversify, so many funds weight their holdings equally or cap the maximum size of any one stock. With Nvidia making up more than 6% of the S&P 500, that led some AI funds to have less exposure to the biggest AI stock than you would get in a broad index fund. This problem hit the three losers of the year. First Trust’s $457 million AI-and-robotics fund has only 0.8% in Nvidia, a bit over half what it holds in cybersecurity firm BlackBerry. WisdomTree’s $213 million AI-and-innovation fund holds the same amount of each stock, giving it only 3% in Nvidia. BlackRock’s $610 million iShares Future AI & Tech fund was also equal weighted until three weeks ago, when it altered its purpose from being a robotics-and-AI fund, changed ticker and switched to a market-value-based index that gives it a larger exposure to Nvidia.
The result has been a 20-percentage-point gap between the best and worst AI ETFs this year. There is a more than 60-point gap since the launch of ChatGPT in November 2022 lit a rocket under AI stocks—although the ETFs are at least all up since then.
...Dire timing is common across themes: According to a paper last year by Prof. Itzhak Ben-David of Ohio State University and three fellow academics, what they call “specialized” ETFs lose 6% a year on average over their first five years due to poor launch timing.
...But mostly, look at the fees: They will be many times higher than a broad market index fund, and the dismal history of poor timing suggests that for most people they aren’t worth paying.
Also, note that the 'Blue LED' was not originally my example at all, someone else brought it up as an example.
Then maybe you shouldn't be trying to defend it (or your other two examples of engines and programming languages, for that matter), especially given that you still have not explained how 'the LED' could have been given a Nobel ever inasmuch as everyone involved was dead.
One of the problems with the Nobel Prize as a measurement or criteria is that it is not really suited for that by nature, especially given criteria like no posthumous awards. This means that it is easy to critique awarding a Nobel Prize, but it is harder to critique not awarding one. You can't give a Nobel Prize to the inventor of the engine, because they probably died a long time ago; you could have for a recent kind of engine. Similarly, you could give a Turing Award to the inventors of C (and they probably did) but the first person who created a mnemonic shorthand over raw machine opcodes during WWII or whatever was probably dead before the Turing Award was even created.
Let's take your 'inventing the LED' for example. You seem keen on interpreting the absence of a Nobel Prize here as a relative judgment about 'inventing LEDs' vs 'inventing blue LEDs'. But you don't establish that there is any reason to think this is one of the cases where the lack of an award can be validly interpreted as a snub & a judgment by the relevant committee. Is it?
Well, let's take 5 seconds to check some of the historical context here, like who would you award a prize to? I open up Wikipedia and I check the first three names. (Why three? Because Nobel Prizes are arbitrarily limited to 3 awardees.)
All 3 of them, including Oleg Losev who is described as physically creating the first bona fide LED and so seems to be the closest to "the inventor of the LED", died before or around the first commercial LED being announced (October 1962). For about a decade, early weak expensive red LEDs "had little practical use", until finally they began to replace nixie tubes. Only then did they start to take off, and only then did they start to become a revolution. (And reading this WP history, it seems like blue LEDs have wound up being more important than the original red ones anyway.)
Oleg Losev in particular died in 1942, in obscurity, and given the year, you won't be too surprised why:
Losev died of starvation in 1942, at the age of 38, along with many other civilians, during the Siege of Leningrad by the Germans during World War 2.
You can't award Nobel Prizes to the dead - and by the time it was clear LEDs were a major revolution, many of the key players were well and thoroughly dead. That is, the committee could not have awarded a Nobel Prize for 'inventing the LED', without either being prescient or awarding it to later researchers, who were lucky enough to be long-lived but did not actually invent the LED, and that would be a travesty on its own and also crowd out meritorious alternative physics breakthroughs (of which there were many in the 20th century that they are still working their way through).
So, this is one reason to not put too much stress on the absence of a Nobel Prize. Not having a Nobel Prize for work in the early-to-mid 20th century means in considerable part things like "was not killed by Hitler or Stalin", things which are not particularly related to the quality or value of your scientific research but are related to whether you can survive for the 20 or 40 years it may take for your Nobel Prize to show up.
I guess LLMs are model-free, so that's relevant
FWIW, I strongly disagree with this claim. I believe they are model-based, with the usual datasets & training approaches, even before RLHF/RLAIF.
Is "arithmetic" here simply a synonym for "Fermi estimates"?
I semi-agree with #2: if you use mostly old and highly-curated data as a "seed" dataset for generating synthetic data from, you do bound the extent to which self-replicating memes and perona and Sydneys can infect the model. If there is a Sydney-2 in later data, it obviously cannot exist in some snapshot taken at an earlier date. And if the model is fully trained with a strong personality, like a "Claude", and only then exposed to data infected by a Sydney or Sydney-2, you might reasonably expect there to be much less infection: the same way that you or I could roleplay as "Sydney" but we are in no danger of being hijacked by those outputs into being an 'actual' Sydney because we're not base models, so too a fully-trained safety-tuned LLM might have been mode-collapsed down so heavily onto 1 persona that it is far more difficult for any alternate persona to be instantiated meaningfully.
I don't think this is water-tight. It's not obvious that any known data-quality screening suffices here, or indeed, even in principle what screening you would do to block Sydney but not God or Obama or the Easter Rabbit. (How would you, for example, define a data-quality screening procedure for creating a seed dataset which could be efficiently implemented at scale on petabytes of candidate data, which you did not hand-engineer specifically to block Sydney, but which successfully blocks automatically, say, the New York Times articles containing extensive quotes from & descriptions of Sydney?) Sydney-infected data will still be there from the start of training because NYT data is so high quality, etc. But yes, the more you cut the causal loops from the datasets to other agents, the more you guard yourself from malign influences from those.
I am not sure I buy your #1. A bootstrapping synthetic data LLM seems more prone to iteratively developing steganography by training on its own datasets, because it's easier to amplify its own protocols and coordinate with itself through the many generations of bootstrapping. (This is why self-play tends to lead to such 'neuralese'.) And to the extent that they are smarter (because that approach works better & is why it's being used), they are going to be that much better at developing or learning steganography sample-efficiently.
What the hermetic seal of bootstrapping means is that the LLM/Internet pollution effect of #2 is less likely to happen... as long as, of course, you manage to cut the data off early enough that there are not enough examples to let steganography emerge somewhere in the generations of bootstraps anyway. It's far from obvious to me that even cutting off the data right now would be adequate. (There's so many kinds of covert communication or discussion already out there, after all. And current LLMs are already so smart that you don't even need to provide examples of many things, just descriptions.)
And you further have the problem that now the post-training phase is not going to be robust to this - unlike viral persona - because there's no reason to not learn steganography if it's useful. You might succeed in bootstrapping a frontier LLM which is robust to any persona viruses, sure, but when you refresh it with recent scraped data or user inputs, it notices steganography from other models in 1 odd sample, and realizes what a nifty idea that is and how it saves a lot of tokens etc, and boom! now you have a steganographic model. The model doesn't even have to be finetuned, necessarily - information might be getting smuggled around in "plain text" (like some of the more horrifying corners of Unicode) as a prefix trigger. (The longer context windows/prompts are, the more prompt prefixes can "pay their way", I'd note.) We've seen some early experiments in trying to make self-replicating prompts or texts...
Given his history, I think it is perfectly reasonable to conclude that childhood experiences directly caused BPD.
Everything you describe is equally consistent with BPD causing childhood experiences, or BPD causing memories of childhood experiences.
So to summarize: this studies turning the original meaningful inner-monologue, with meaningful varying tokens, into a 1:1 constant-token version, so those constant tokens cannot be conveying any information, but do cause the model to increase computation. It then shows that the LLM trained with constant-tokens is effectively doing the original varying-token inner-monologue, and successfully doing that while emitting the constant-token prediction by merely erasing the original token in favor of the constant-token in the final layers right before the output.
This shows that the constant-trained LLM is not amortizing compute across steps or breaking up the full compute into multiple steps, but being taught how to internally do the extended serial computation, and how to spread that out over itself as more of the LLM gets brought into action by the padding. And since it's the same computation, you can easily read it off with interpretability tools.
The fact that an LLM can do the computation internally is not too surprising. Most of these problems are intrinsically very simple: adding or multiplying a few short numbers is not an impressive feat when the LLM is using up literally trillions of multiplications of long FP16 numbers, after all. So it is not surprising if a NN is expressive enough to, say, multiple two entire numbers using hundreds of layers, billions of parameters, and trillions of multiplications. (And you can do quite a lot with very small neural nets, sometimes with just a few neurons.)
The relevance here is to steganography: what sorts of computations can be distilled into the forward pass and hidden from interpretability? What happens when selective pressure is placed on LLMs to learn how to store intermediate state in tokens in hidden ways or encrypt forward passes to hide reward-enhancing-but-forbidden thoughts from overseers bulk-deploying interpretability methods over masses of LLMs? Or to just use as few tokens as possible?
But there are no such pressures for steganography here. The 1:1 token swap encourages the LLM to stay the same. There's no real need for it to change internally, much less to change in any difficult-to-understand way. (You might as well freeze most of the weights to save yourself backprop at that point.) So, I don't learn anything about steganography or AI safety from the results as presented.
What I'd be more interesting in seeing would be variants which did encourage steganography, and so the interpretability evals were more relevant:
-
you could randomize the padding tokens, instead of just using an easy constant token
-
you could shorten the padding tokens, like 2:1. How much does that compression distort the inner-monologues?
-
you could allow a subset of padding tokens, like maybe just 0/1, and it has to learn some useful binary encoding.
-
you could introduce an adversary, like a LLM co-trained on the visible "padding" tokens, and penalize the LLM if the co-LLM can figure out the answer from reading the inner-monologue's "padding" tokens. (This could be hard to implement. One simple approach would be rejection sampling: just drop all samples where the co-LLM predicts the right answer, so the original LLM only trains on samples which pass the adversary.)
-
you could randomly vary padding ratio: sometimes train on a very small ratio like 1:100, but other times train on very large ones like 1:100
-
one can think of several ways a LLM might cope with expanding ratios like 1:100, in order to keep computing the right thing without screwing up by accidentally predicting the answer "too soon".
Can interpretability methods distinguish between approaches like "try to keep count of exactly how many tokens and on the 100th padding token, start generating the real answer" and "internally hobble the computation so it only winds up finishing exactly on the 100th token"?
-
Switching to another task like "counting" might be helpful for implementing variants.
It sounds like SC2 might just be a bad testbed here. You should not have to be dealing with issues like "but can I get a computer fast enough to run it at a fast enough speedup" - that's just silly and a big waste of your effort. Before you sink any more costs into shaving those and other yaks, it's time to look for POMDPs which at least can be paused & resumed appropriately and have sane tooling, or better yet, have continuous actions/time so you can examine arbitrary ratios.
Also, I should have probably pointed out that one issue with using LLMs you aren't training from scratch is that you have to deal with the changing action ratios pushing the agents increasingly off-policy. The fact that they are not trained or drawing from other tasks with similarly varying time ratios means that the worsening performance with worsening ratio is partially illusory: the slower player could play better than it does, it just doesn't know how, because it was trained on other ratios. The kind of play one would engage in at 1:1 is different from the kind of play one would do at 10:1, or 1:10; eg a faster agent will micro the heck out of SC, while a slow agent will probably try to rely much more on automated base defenses which attack in realtime without orders and emphasize economy & grand strategy, that sort of thing. (This was also an issue with the chess hobbling experiments: Stockfish is going to do very badly when hobbled enough, like removing its queen, because it was never trained on such bizarre impossible scenarios / alternate rulesets.) Which is bad if you are using this as some sort of AI safety argument, because it will systematically deceive you, based on the hobbled off-policy agents, into thinking slowed-down agents are less capable (ie. safer) in general than they really are. This is another reason to not use SC2 or try to rely on transfer from a pre-existing model, convenient as the latter may be.
Given both these issues, you should probably think about instead more Jones-like training an agent from scratch, simultaneously at all ratios to meta-learn competency at all ratios while sharing training in a fair fashion, on a much simpler environment. Maybe not even a POMDP, MDPs might be adequate for most of it. Something like a large tic-tac-toe board, or perhaps a continuous Pong, would be simple enough that you could afford to train very competent unhobbled agents at widely-varying ratios, and fit various scaling laws, with few GPUs.
It wouldn't collide with normal Markdown syntax use. (I can't think of any natural examples, aside from bracket use inside links, like [[editorial comment]](URL)
, which could be special-cased by looking for the parentheses required for the URL part of a Markdown link.) But it would be ambiguous where the wiki links point to (Sarah's Roam wiki? English Wikipedia?), and if it pointed to somewhere other than LW2 wiki entries, then it would also be ambiguous with that too (because the syntax is copied from Mediawiki and so the same as the old LW wiki's links).
And it seems like an overloading special case you would regret in the long run, compared to something which rewrote them into regular links. Adds in a lot of complexity for a handful of uses.
Methodologically, I think it would make more sense to frame it in terms of action granularity ratio, rather than using units like seconds or %s. The use of seconds here seems to make the numbers much more awkward. It'd be more natural to talk about scaling trends for Elo vs action-temporal granularity. For example, 'a 1:2 action ratio translates to a 1:3 win ratio advantage (+500 Elo)" or whatever. This lets you investigate arbitrary ratios like 3:2 and fill out the curves. (You'd wind up doing a transform like this anyway.)
Then you can start easily going through various scaling laws, like additional finetuning samples or parameter scaling vs Elo, and bring in the relevant DRL scaling literature like Jones and temporal scaling laws for horizons/duration. (For example, you could look at horizon scaling in terms of training samples: break up each full Starcraft episode to train on increasingly truncated samples.) The thresholds you talk about might be related to the irreducible loss of the horizon RL scaling law: if there is something that happens "too quick" each action-timestep, and there is no way to take actions which affect too-quick state changes, then those too-quick events will be irreducible by agents.
I don't think LLMs do the equivalent of that. It's more like, learning Chinese from a Chinese/Chinese dictionary stapled to a Chinese encyclopedia.
It is not obvious to me that using a Chinese/Chinese dictionary, purged of example sentences, would let you learn, even in theory, even things a simple n-grams or word2vec model trained on a non-dictionary corpus does and encodes into embeddings. For example, would a Chinese/Chinese dictionary let you plot cities by longitude & latitude? (Most dictionaries do not try to list all names, leaving that to things like atlases or gazetteers, because they are about the language, and not a specific place like China, after all.)
Note that the various examples from machine translation you might think of, such as learning translation while having zero parallel sentences/translations, are usually using corpuses much richer than just an intra-language dictionary.
everybody want to test rats in mazes, ain't nobody want to test this janky-ass maze!
One of the interesting things I found when I finally tracked down the source is that one of the improved mazes before that was a 3D maze where mice had to choose vertically, keeping them in the same position horizontally, because otherwise they apparently were hearing some sort of subtle sound whose volume/direction let them gauge their position and memorize the choice. So Hunter created a stack of T-junctions, so each time they were another foot upwards/downwards, but at the same point in the room and so the same distance away from the sound source.
Perhaps the norm should be to use some sort of LLM-based survey service like https://news.ycombinator.com/item?id=36865625 in order to try to get a more representative population sample of LLM outputs?
This seems like it could be a useful service in general: do the legwork to take base models (not tuned models), and prompt in many ways and reformulate in many ways to get the most robust distribution of outputs possible. (For example, ask a LLM to rewrite a question at various levels of details or languages, or switch between logically equivalent formulations to avoid acquiescence bias; or if it needs k shots, shuffle/drop out the shots a bunch of times.)
It is worth noting that the Pros made more extreme forecasts than the bots. The Pros were not afraid to forecast less than 2% or more than 90%, while the bots stayed closer to 50% with their forecasts.
This sounds like an example of 'flattened logits' or loss of calibration in tuned models. I take it that all of the models involved were the usual RLHF/instruction-tuned models, and no efforts were made to use base models like the original davincis or llama-3-405b-base, which ought to have better calibration?
Yes, people have been pulling this sort of semantic knowledge out of word embeddings since the start. Here is a long list from like 5 years ago, going far beyond just geographic locations: https://gwern.net/gpt-2#fn11
This is one of the reasons that people have rejected the claims that LLMs are doing anything special: because after all, just a word2vec, which barely even counts as a neural net, or n-grams, seems able to 'learn' a lot of the same things as a LLM does, even though it's "obviously" not a world model. (It's a modus ponens/tollens thing.)
One of the coolest demonstrations of extracting world models (and demonstrating the flaws in the learned world models due to a lack of inductive priors) is a paper on inferring the exact street connectivity & geography of New York City from training on taxi cab trajectories: https://x.com/keyonV/status/1803838591371555252 https://arxiv.org/abs/2406.03689
This is certainly the more convoluted explanation, but it certainly matches with my observations of SBF's psychology, from well-before the FTX blowup.
I disagree. I think Altman is, in many respects, the exact opposite of SBF, and your read of his personality is wrong. This is why you can't predict things like Sutskever & Murati leaving OA, without being pushed (and in fact Altman going to lengths to keep them), while I could. I encourage you to go back and reread things like the New Yorker profile or discussions of his highschool career or his abortive political run or UBI experiment with that in mind.
This thought experiment is unrealistic
many such cases
Unsurprisingly, black and white top the list, along with some other neutrals; red, a perennial favorite, is the top non-neutral color.
The ABRSM is in X days. It too does not care how efficient you were time-wise in getting to grade-8 competency. There are no bonus points for sample-efficiency.
(And of course, it's not like Asian parents are doing their kids much good in the first place with that music stuff, so there's even less of an issue there.)