AIOS
post by samhealy · 2023-12-31T13:23:56.552Z · LW · GW · 5 commentsContents
Summary Welcome to AIOS! Epilogue None 5 comments
Summary
A reductio ad absurdum of the position that deep neural networks, given enough time, can do literally anything. In playful Socratic dialogue form.
Welcome to AIOS!
Snay/Coyle Systems is proud to announce AIOS.
Its notional 'clock speed' is 60Hz, representing a typical screen's refresh rate and a typical human interface device's polling rate. But it's no slouch!
Its outputs are just over 1.67 billion bits, representing up to 200MB of arbitrary data to be sent across a 100Gbps Ethernet connection per tick, a 4K video frame buffer/monitor output, and 0.0166 seconds of multi-channel high-definition audio.
Its inputs are its own previous output state and a 0.0166-second buffer of data from the Ethernet connection and standard human interface devices: keyboard, mouse/trackpad/touchscreen, and in future versions a microphone and webcam.
Its hidden layers contain 10^x neurons where x is very, very big. It was trained on 10^y sessions from consenting human users of Snay/Coyle's 'traditional' (and soon to be obsolete) human-coded operating system BSOS, where y is also very big.
The model was fine-tuned by RLHF.
And now it runs a usable facsimile of BSOS, without a single line of code!
That's the dumbest thing I ever heard.
Can you say why?
I mean, where to begin? First of all, of course there's code involved. How does the model run inference?
Fair point, that was a bit of a liberty. Obviously the model is running on standard CPU/GPU architecture. But after training, the weights of the model itself do not contain a trace of BSOS source code. Also gone are concepts like registers, instruction sets, CPU caching, discrete graphics processing, kernel space, user space, all those pesky details.
Fine, whatever. So you're saying that AIOS will feel exactly like BSOS to a human user?
As you know, DNNs are curve-fitting approximators, of necessarily finite precision, and even when running in discretised form on digital hardware, their outputs' least significant bits will stray somewhat from any precise, analytical ground truth. But the universal approximation theorem tells us we just need to keep increasing the approximation's precision to stay within an arbitrarily small error range.
So, will AIOS's outputs exactly match those of BSOS? No. Will they appear the same to human users? Yes! And if an outlying human savant can somehow tell the difference, then we'll increase our precision, retrain and release another version.
I can almost buy that, but it's not the whole story.
What if a software developer tries to write and compile code in an IDE in AIOS? Every executable file has a checksum which hashes its contents to make sure the data isn't corrupt. If the checksum doesn't match the contents, the OS will refuse to run the file, because if it did, it would almost certainly crash. Executable code, and source code for that matter, are extremely sensitive to errors. A single flipped bit can bork a machine.
How can AIOS output an executable approximation of an executable file, when anything less than 100% accuracy spells 100% failure?
Since most non-executable documents are checksum-validated these days too, I foresee AIOS spewing out useless drivel that no system, not even itself if it successfully emulates its parent's own security protocols, can interpret.
Ah yes, we did encounter this issue during development. The solution was simple, and will be familiar to you now: increase precision! We simply ballooned the model, its training data and its training time until its outputs were bit-identical to BSOS's under all reasonable circumstances. I can give you the technical details...
Oh please do.
Our earlier prototypes used a custom activation function so that each of the 13.28 billion outputs would always be either 0 or 1. In hindsight this was a mistake, as it protected the least-significant-bit jitter I mentioned above from our desire to minimise it. To gain more control, we replaced the custom function with a standard softmax, and just rounded the continuous [0-1] output value to the nearest integer. With this setup — a sort of trivial argmax and a sort of trivial additional layer — we could penalise the model for giving pre-output values too close to 0.5, and keep training until all pre-output values under reasonable circumstances are sufficiently close to 0 or 1 that the ensemble is bit-identical to BSOS.
That's the second time you've used that phrase, 'under reasonable circumstances'. I suspect the words conceal a multitude of dissembling.
In what way?
How can you guarantee that every state of a Turing-complete finite state machine is accessible to an approximation? Isn't it in the definition of an approximation that some states of what it's approximating are inaccessible to it?
Well, yes, but for the majority of users —
What majority, in both senses: what proportion quantitively, and what demographic qualitatively?
In our focus groups, 99.4% of self-reported casual users had a fully satisfactory experience with AIOS, and even among academics and engineers the figure was 91%!
Four sigma isn't bad for casual users, I'll give you that. It's the other figure that's more telling. And what it's telling is that AIOS, like all other neural net-based models, only works in the semantic vicinity of its training set. Try to do anything really new with it, and it will fail. The academics and engineers are the ones exploring, or at least trying to explore, virgin epistemic space. Among your focus group, perhaps the 91% were those that were only trying...
Anyway, I'm pretty convinced you're jerking my chain now. But can I ask you a few more questions?
Of course!
Since AIOS has access to the Internet, what's to stop it from cheating by relaying its inputs to an instance of BSOS and passing off that instance's outputs as its own?
Nothing, but we've monitored its traffic since the beginning, and have detected no such activity.
And what's to stop Snay/Coyle Systems from cheating by simply running BSOS instances instead of AIOS ones?
I'm hurt that you would even suggest such a thing!
Okay, let's assume good faith. In any case, this line of thinking brings me to my final questions. How much compute time does it take to train each generation of AIOS, and how much to run each inference tick?
Gosh, you're a curious thing aren't you? Let me check my notes... Ah, here we are. Generation zero required 3.4 × 10^44 floating-point operations, generation one 6.1 × 10^47 and generation two 13.4 × 10^49. An inference tick needs a mere 2 × 10^13!
Aaaaaaand we're done. The universe has only existed for 4.32 × 10^17 seconds, so unless Snay/Coyle Systems has Kardashev type II resources or is using some quantum computing legerdemain, you're talking hogwash. (Not hogwash that so much compute time would be needed; hogwash that so much compute time could ever be marshalled in practice, whether for training that ends before the universe does, or for inference anywhere close to real time on anything less than a military-grade supercomputer. Let's not even speculate about how or where you got your quadrillions of hours of training data.)
What's more, AIOS has plenty of hidden shortcomings. Its output size of 13.28 billion binary nodes is presumably determined by the product of a 100 Gigabit connection's bandwidth and a typical interface's 1/60s refresh time (plus some extra for the graphics frame and sound). What if AIOS needs more than the 200MB of working memory such a configuration provides for a user-specified task? What if the user has a real-time application where minimal latency is crucial and a hard limit of 0.01666 seconds is laughably high? (Your reply to both questions will no doubt be some variation on "expand the model until said requirement is surpassed", but I reject this because the model is already impracticably oversized.)
Anyway, thank you for proving my point. Neural nets are great where approximations and distributions over pre-existing data are concerned. But they're the wrong tool in situations where good outputs are sparse and do not cluster or form continuously traversable paths in latent space.
There are and always will be cases where replacing a non-AI system with an AI one, just because you can, is equivalent to building a wildly inefficient Rube Goldberg / Heath Robinson machine.
Epilogue
Can I ask you a question now?
Oh. Okay, sure.
Didn't you write elsewhere [LW · GW] arguing the opposite of what you're arguing here? Didn't you say we should be frightened of generative models' asymptotic ability to create anything that humans can create?
Yes, in that post [LW · GW]I did claim that we will confront some rough metaphysical weather when GenAI (inevitably?) reaches a singularity after which no human can know whether any new content is human-made or generative.
I think it's different from and compatible with the current post's thesis in two ways.
First, the worst consequence of the generative singularity is not technical but psychological: humans will feel paranoid and/or apathetic and/or confused and/or angry regarding the new normal, in which art consumers don't know where what they're consuming came from and artists have to spend time convincing art consumers that it came from them. All this can happen in a universe that also forbids practical neural operating systems.
Second, most or all forms of creative art can (for better or worse) be encoded in latent spaces where good outputs do cluster and form continuous traversable paths. So they are inherently susceptible to generative approximation in a way that useful finite state machines are not.
5 comments
Comments sorted by top scores.
comment by Mitchell_Porter · 2023-12-31T20:41:49.081Z · LW(p) · GW(p)
Your examples of tasks that are hard to approximate - irreducibly complex calculations like checksums; computational tasks that inherently require a lot of time or memory - seem little more than speed bumps on the road to surpassing humanity. An AI can do such calculations the normal way if it really needs to carry them out; it can imitate the appearance of such calculations if it doesn't need to do so; and meanwhile it can use its power to develop superhuman problem-solving heuristics, to surpass us in all other areas...
Replies from: samhealy↑ comment by samhealy · 2024-01-01T12:58:41.734Z · LW(p) · GW(p)
Agreed, largely.
To clarify, I'm not arguing that AI can't surpass humanity, only that there are certain tasks for which DNNs are the wrong tool and a non-AI approach is and possibly always will be preferred.
An AI can do such calculations the normal way if it really needs to carry them out
This is a recapitulation of my key claim: that any future asymptotically powerful A(G)I (and even some current ChatGPT + agent services) will have non-AI subsystems for tasks where precision or scalability is more easily obtained by non-AI means, and that there will probably always be some such tasks.
comment by Gurkenglas · 2023-12-31T16:58:39.911Z · LW(p) · GW(p)
3.4 × 10^44
Where is your reductio getting these numbers?
Replies from: samhealy↑ comment by samhealy · 2023-12-31T17:43:31.493Z · LW(p) · GW(p)
Plucked from thin air, to represent the (I think?) reasonably defensible claim that a neural net intended to predict/synthesise the next state (or short time series of states) of an operating system would need to be vastly larger and require vastly more training than even the most sophisticated LLM or diffusion model.
Replies from: samhealy↑ comment by samhealy · 2024-01-02T10:44:21.305Z · LW(p) · GW(p)
To clarify: I didn't just pick the figures entirely at random. They were based on the below real-world data points and handwavy guesses.
- ChatGPT took 3.23 x 10^23 FPOPs to train
- ChatGPT has a context window of 8K tokens
- Each token is roughly equivalent to four 8-bit characters = 4 bytes, so the context window is roughly equivalent to 4 x 8192 = 32KB
- The corresponding 'context window' for AIOS would need to be its entire 400MB+ input, a linear scaling factor of 1.25 x 10^4 from 32KB, but the increase in complexity is likely to be much faster than linear, say quadratic
- AIOS needs to output as many of the 2 ^ (200 x 8 x 10 ^ 6) output states as apply in its (intentionally suspect) definition of 'reasonable circumstances'. This is a lot lot lot bigger than an LLM's output space
- (3.23 x 10 ^ 23) x (input scaling factor of 1.56 x 10 ^ 8) x (output scaling factor of a lot lot lot) = conservatively, 3.4 x 10 ^ 44
- Current (September 2023) estimate of global compute capacity is 3.98 x 10 ^ 21 FLOPS. So if every microprocessor on earth were devoted to training AIOS, it would take about 10 ^ 23 seconds = about 30000000000000000 years. Too long, I suspect.
I'm fully willing to have any of this, and the original post's argument, laughed out of court given sufficient evidence. I'm not particularly attached to it, but haven't yet been convinced it's wrong.