LessWrong 2.0 Reader
View: New · Old · Topnext page (older posts) →
next page (older posts) →
If anyone wants to work on this or knows people who might, I'd be interested in funding work on this (or helping secure funding - I expect that to be pretty easy to do).
faul_sname on We are headed into an extreme compute overhangAGIs derived from the same model are likely to collaborate more effectively than humans because their weights are identical. Any fine-tune can be applied to all members, and text produced by one can be understood by all members.
I think this only holds if fine tunes are composable, which as far as I can tell they aren't (fine tuning on one task subtly degrades performance on a bunch of other tasks, which isn't a big deal if you fine tune a little for performance on a few tasks but does mean you probably can't take a million independently-fine-tuned models and merge them into a single super model of the same size with the same performance on all million tasks).
Also there are sometimes mornings where I can't understand code I wrote the previous night when I had all of the necessary context fresh to me, despite being the same person. I expect that LLMs will exhibit the same behavior of some things being hard to understand when examined out of the context which generated them.
That's not to say a worldin which there are a billion copies of GPT-5 running concurrently will have no major changes, but I don't think a single coherent ASI falls out of that world.
snewman on We are headed into an extreme compute overhangAssuming we require a performance of 40 tokens/s, the training cluster can run 200030×24000=1,600,000 concurrent instances of the resulting 70B model
Nit: you mixed up 30 and 40 here (should both be 30 or both be 40).
I will assume that the above ratios hold for an AGI level model.
If you train a model with 10x as many parameters, but use the same training data, then it will cost 10x as much to train and 10x as much to operate, so the ratios will hold.
In practice, I believe it is universal to use more training data when training larger models? Implying that the ratio would actually increase (which further supports your thesis).
On the other hand, the world already contains over 8 billion human intelligences. So I think you are assuming that a few million AGIs, possibly running at several times human speed (and able to work 24/7, exchange information electronically, etc.), will be able to significantly "outcompete" (in some fashion) 8 billion humans? This seems worth further exploration / justification.
sdm on Uncontrollable Super-Powerful ExplosivesIn the late 1940s and early 1950s nuclear weapons did not provide an overwhelming advantage against conventional forces. Being able to drop dozens of ~kiloton range fission bombs in eastern European battlefields would have been devastating but not enough by itself to win a war. Only when you got to hundreds of silo launched ICBMs with hydrogen bombs could you have gotten a true decisive strategic advantage
jsd on Scaling of AI training runs will slow down after GPT-5Amazon recently bought a 960MW nuclear-powered datacenter.
I think this doesn't contradict your claim that "The largest seems to consume 150 MW" because the 960MW datacenter hasn't been built (or there is already a datacenter there but it doesn't consume that much energy for now)?
steve-french on ACX Atlanta Meetups Everywhere Spring 2024Great!
nina-rimsky on Reducing sycophancy and improving honesty via activation steeringI am contrasting generating an output by:
Eg. for common misconceptions, maybe most humans would hold a certain misconception (like that South America is west of Florida), but we want the LLM to realize that we want it to actually say how things are (given it likely does represent this fact somewhere)
johannes-c-mayer on [Concept Dependency] Concept Dependency PostsAdopted.
nathan-young on Nathan Young's ShortformNevertheless lots of people were hassled. That has real costs, both to them and to you.
quila on [Concept Dependency] Concept Dependency Postsi like the idea. it looks useful and it fits my reading style well. i wish something like this were more common - i have seen it on personal blogs before like carado's.
i would use [Concept Dependency] or [Concept Reference] instead so the reader understands just from seeing the title on the front page. also avoids acronym collision