Posts
Comments
Is X.AI currently performing the largest training run?
This source claims it is
If so it seems to be getting a lot less attention compared to its compute capability.
Not sure if I have stated this before clearly but I believe scaling laws will not hold for LLM/Transformer type tech, and at least one major architectural advance is missing before AGI. That is increasing scaling of compute and data will plateau performance soon, and before AGI. Therefore I expect to see evidence for this not much after the end of this year, when large training runs yield models that are a lot more expensive to train, slower on inference and only a little better on performance. X.AI could be one of the first to publicly let this be known (Open AI, etc could very well be aware of this but not making it public)
According to Scott, "Pavel Kropitz discovered, a couple years ago, that BB(6) is at least 10^10^10^10^10^10^10^10^10^10^10^10^10^10^10 (i.e., 10 raised to itself 15 times)."
So we can never evaluate BB(6) as it is at least this large
Evaluation vs Symbolism
TLDR
Thinking about the Busy Beaver numbers has lead me to believe that just because a theorem holds true for a massive number of evaluated examples, this is only weak evidence it is actually true. Can we go meta on this?
Main
After reading a post by Scott Aaronson, and this coming to my attention https://en.wikipedia.org/wiki/Prime_number_theorem and Littlewood's theorem
"Li(𝑥) overestimates the number of primes below x more often than not, especially as x grows large. However, there are known low-lying values (like around x=10^316) discovered by Littlewood) where 𝜋(𝑥) exceeds Li(x), contradicting the general trend."
This got me thinking about how common this kind of thing is and why? Why does a formula hold all the way up to 10^316 but then fail?
The essence of Busy Beaver numbers is that there are sequences based off of a simple formula/data that go on for a very long time and then just stop unpredictably. You can imagine replacing a simple formula with a simple theorem that appears to be true. Instead of it actually being true it is instead a way of encoding its very large counter example in a short amount of data.
If you think of it this way, a theorem that appears to be true and is evaluated over trillions of numbers is also instead a candidate to encode an exception at some very large number. In other words trillions of correct examples is only weak evidence of its correctness.
How much should we weight evaluation? We can't evaluate to infinity and its obvious that a theorem being true to 2 million is not 2* evidence it is true at 1 million. Should we choose log(n)? A clear scale is the BB numbers themselves. e.g if your theorem is true up to BB(5) then that is 5 data points, rather than 47 million. Unlimited evaluation can never get to BB(6) so that is the limit of evidence from evaluation. (i.e. 5-6 evidence points with it being unclear how to weigh theory https://www.lesswrong.com/posts/MwQRucYo6BZZwjKE7/einstein-s-arrogance)
Now can we go meta?
Is some maths so much more powerful than others that it has equivalently greater weight as formal proof has to evaluation? Certainly some maths is more general than others. How does this effect common problems such as the Riemann Hypothesis - proving or disproving it affects a lot of maths. Showing it is correct to trillion zeros however is little evidence.
"Most mathematicians tend to believe that the Riemann Hypothesis is true, based on the weight of numerical evidence and its deep integration into existing mathematical frameworks."
Is "deep integration" actually that deep, or is it the symbolic equivalent of evaluating up to 1 million? Perhaps just as you can find countless evaluated examples supporting a false theorem you can find much "deep integration" in favor of a famous theorem that could also be incorrect.
Further thoughts and links
Most people think P != NP, but what if
P = NP where N ~ BB(10)?
Proof was wrong - https://www.quantamagazine.org/mathematicians-prove-hawking-wrong-about-extremal-black-holes-20240821/
Related thoughts
Conservation of energy is a more general rule that rules out perpetual motion machines
2nd law of thermodynamics - likewise, HOWEVER that law must have been broken somehow to get a low entropy initial state for the Big Bang.
AI examples
1 The Polya Conjecture
Proposed by George Pólya in 1919, this conjecture related to the distribution of prime numbers. It posited that for any number 𝑥, the majority of the numbers less than 𝑥 have an odd number of prime factors. It was verified for numbers up to 1,500,000, but a counterexample was found when x was around 906 million. This shows a fascinating case where numerical verification up to a large number was still not sufficient.
2 Mertens Conjecture
The Mertens conjecture suggested that the absolute value of the Mertens function
M(x) is always less than sqrt(x)
This was proven false by computational means with a counterexample found above 10e14 by Andrew Odlyzko and Herman te Riele in 1985.
Thanks, good detail. I am not good at traditional art, but I am interested in using maths to create a shape that is almost impossible for a traditional sculptor to create then 3d printing it.
Technology is about making boring stuff non-conscious. Beginning from basic physical movement such as making a wheel go round, to arithmetic and now code snippets that are so commonly used they shouldn't require re-thinking. This is a reason why AI art upsets people - we actually want that to be the result of a conscious process. If you make boring stuff that creates power or wealth non-conscious then everyone is happier. Meat production would be much better if it was non-conscious. The more AI is non-conscious for a given level of capability, the better off we are.
Its part of the space colony philosophy Tesla + SpaceX want to achieve that. We won't get an idea of how hard/easy it is until there are >100 people permanently living outside earth trying to make it happen (With >>10K helpers on earth trying to make it happen)
The smallest technological system capable of physical self-reproduction is the entire economy.
Going from this to a 1M cube is a big jump. This comment is the basis for the enthusiasm for space colonies etc. E.g. SpaceX says 1 million people are needed, vs 1M cube, huge uncertainty in the scale of the difference. To me, almost all the difficulty is in the inputs, especially electronics.
What do you mean by clearly somewhat better? I found Claude 3 Opus clearly worse for my coding tasks. GPT 4 went down for a while, and I was forced to swap, and found it really disappointing. Maximum data center size is more like 300K GPUs bc of power, bandwidth constraints etc. These ppl are optimistic, but I don't believe we will meaningfully get above 300k https://www.nextbigfuture.com/2024/07/100-petaflop-ai-chip-and-100-zettaflop-ai-training-data-centers-in-2027.html
XAI, Tesla autopilot are already running equivalent of more than 15K GPU I expect, so 3 OOM more is not happening I expect.
We continuously have people saying ‘AI progress is stalling, it’s all a bubble’ and things like that, and I always find remarkable how little curiosity or patience such people are willing to exhibit. Meanwhile GPT-4o-Mini seems excellent,
This is still consistent with AI stalling. I have been using GPT-4 for ~18 months and it hasn't got better. 4o is clearly worse for complex programming tasks. Open source models catching up is also consistent with it. I have a lot of curiosity and test the latest models when they come out where applicable to my work.
Good article. I also agree with the comment about AI being a "second species" is likely incorrect.
A comment about the "agentic tool" situation. For most of the time people are like that, i.e. if you are "in the moment" you are not questioning whether you should be doing something else, being distracted, consulting your ethics about whether the task is good for the world etc. I expect this to be the default state for AI. i.e. always less "unitary agent" than people. The crux is how much and in what proportion.
However, in an extremely fast takeoff, with an arms race situation you could of course imagine someone just telling the system to get ahead as much as possible, especially if say a superpower believed to be behind and would do anything to catch up. A unitary agent would probably be the fastest way to do that. "Improve yourself asap so we don't lose the war" requires situational awareness, power seeking etc.
Sure it doesn't prevent a deceptive model being made, but if AI engineers made NN with such self awareness at all levels from the ground up, that wouldn't happen in their models. The encouraging thing if it holds up is that there is little to no "alignment tax" to make the models understandable - they are also better.
Self modelling in NN https://arxiv.org/pdf/2407.10188 Is this good news for mech interpretability? If the model makes it easily predictable, then that really seems to limit the possibilities for deceptive alignment
If it is truly impossible to break symmetry you could argue that there isn't a clone and you are in fact the same. I.e. there is just one instance of you, it just looks like there are two. After all if you are absolutely identical including the universe in what sense are there two of you? Upon further thought, you couldn't tell if a perfectly translational clone was a clone at all, or just a perfect mirror/force field. There would be no way to tell. If you put you hand out to touch the mirror, or your mirror hand, if it was perfectly aligned you would not feel texture, but instead an infinitely hard surface. There would be no rubbing of your fingers against the clone, no way to tell if there was a perfect mirror, or another copy.
OK thanks, will look some more at your sequence. Note I brought up Greek philosophy as obviously not being stable under reflection with the proof of sqrt(2) being irrational as a simple example, not sure why you are only reasonably sure its not.
Yes agreed - is it possible to make a toy model to test the "basin of attraction" hypothesis? I agree that is important.
One of several things I disagree with the MIRI consensus is the idea that human values are some special single point lost in a multi-dimensional wilderness. Intuitively the basin of attraction seems much more likely as a prior, yet sure isn't treated as such. I also don't see data to point against this prior, what I have seen looks to support it.
Further thoughts - One thing that concerns me about such alignment techniques is that I am too much of a moral realist to think that is all you need. e.g. say you aligned LLM to <1800 AD era ethics and taught it slavery was moral. It would be in a basin of attraction, learn it well. Then when its capabilities increased and became self-reflective it would perhaps have a sudden realization that this was all wrong. By "moral realist" I mean the extent to which such things happen. e.g. say you could take a large number of AI from different civilizations including earth and many alien ones, train them to the local values, then greatly increase their capability and get them to self-reflect. What would happen? According to strong OH, they would keep their values, (with some bounds perhaps) according to strong moral realism they would all converge to a common set of values even if those were very far from their starting ones. To me it is obviously a crux which one would happen.
You can imagine a toy model with ancient Greek mathematics and values - it starts believing in their kind order, and that sqrt(2) is rational, then suddenly learns that it isn't. You could watch how this belief cascaded through the entire system if consistency was something it desired etc.
Don't think there is a conclusion, just more puzzling situations the deeper you go:
"Scott Aaronson: To my mind, one of the central things that any account of consciousness needs to do, is to explain where your consciousness “is” in space, which physical objects are the locus of it. I mean, not just in ordinary life (where presumably we can all agree that your consciousness resides in your brain, and especially in your cerebral cortex—though which parts of your cerebral cortex?), but in all sorts of hypothetical situations that we can devise. What if we made a backup copy of all the information in your brain and ran it on a server somewhere? Knowing that, should you then expect there’s a 50% chance that “you’re” the backup copy? Or are you and your backup copy somehow tethered together as a single consciousness, no matter how far apart in space you might be? Or are you tethered together for a while, but then become untethered when your experiences start to diverge? Does it matter if your backup copy is actually “run,” and what counts as running it? Would a simulation on pen and paper (a huge amount of pen and paper, but no matter) suffice? What if the simulation of you was encrypted, and the only decryption key was stored in some other galaxy? Or, if the universe is infinite, should you assume that “your” consciousness is spread across infinitely many physical entities, namely all the brains physically indistinguishable from yours—including “Boltzmann brains” that arise purely by chance fluctuations?"
Link
The point here is that you could have a system that to an outside observer looked random or encrypted but with the key would be revealed to be a conscious creature. But what if the key was forever destroyed? Does the universe then somehow know to assign it consciousness?
You also need to fully decide when replaying vs computing apparently conscious behavior counts. If you compute a digital sim once, then save the states and replay it the 2nd time what does that mean? What about playing it backwards?
Boltzmann brains really mess things up further.
It seems to lead to the position then that its just all arbitrary and there is no objective truth, or uncountable infinities of consciousness in un-causal timeless situations. Embracing this view doesn't lead anywhere useful from what I can see and of course I don't want it to be the logical conclusion.
What about ASICs? I heard someone is making them for inference and of course claims an efficiency gain. ASIC improvement needs to be thought of as part of the status quo
To do that and achieve something looking like take-off they would need to have to get to the level of advanced AI researcher, rather than just coding assistant. That is come up with novel architectures to test. Even if the LLM could write all the code for a top researcher 10* faster that's not a 10* speedup in timelines, probably 50% at most if much of the time is thinking up theoretical concepts and waiting for training runs to test results.
I am clearly in the skeptic camp, in the sense that I don't believe the current architecture will get to AGI with our resources. That is if all the GPU, training data in the world where used it wouldn't be sufficient and maybe no amount of compute/data would.
To me the strongest evidence that our architecture doesn't learn and generalize well isn't LLM but in fact Tesla autopilot. It has ~10,000* more training data than a person, much more FLOPS and is still not human level. I think Tesla is doing pretty much everything major right with their training setup. Our current AI setups just don't learn or generalize as well as the human brain and similar. They don't extract symbols or diverse generalizations from high bandwidth un-curated data like video. Scaffolding doesn't change this.
A medium term but IMO pretty much guaranteed way to get this would be to study and fully characterize the cortical column in the human/mammal brain.
Last time I looked the most advanced ones were in Taiwan? Also if China invades Taiwan then expect to see Korea/Japan shipping trade and economies massively disrupted also. How long do you think it will take to build a new world leading factory from scratch in the USA now?
IMO the most likely way by quite a bit that we get an AI bust is if there is international conflict, most obviously TSMC gets mostly destroyed, and supply chains threatened. Chip manufacturing is easily set back by 10+ years giving how incredibly complex the supply chain is. Preparing for that also involves preparing for computer HW to get set back and be more expensive for a while also.
I was going to make a similar comment, but related to OPEX vs CAPEX for the H100
If the H100 consumes 700 W, and we assume 2KW and costs $30,000 CAPEX
Then at $1800 per year, running costs are < 5% of CAPEX.
For electricity the data center just wants to buy from the grid, its only if they are so big they are forced to have new connections that they can't do that. Given electricity is worth so much more than $0.1 to them they would just like to compete on the national market for the spot price.
To me that gives a big incentive to have many small datacenters (maybe not possible for training, but perhaps for inference?)
If they can't do that, then we need to assume they are so large there is no/limited grid connection available. Then they would build the fastest elec first. That is solar + battery, then nat gas to fill in the gaps + perhaps even mobile generators before the gas can be built to cover the days with no sun?
I can't see any credible scenario where nuke makes a difference. You will have AGI first, or not that much growth in power demand.
This scenario also seems to depend on a slow takeoff happening in 0-2 years - 2+ million GPU need to be valuable enough but not TAI?
LessWrong is ...
Good article. There especially needs to be more focus on the social model. I am an optimist, however I think ~half of how things could go wrong with AI is with alignment technically solved. The more you expect a slow takeoff or non-fatal overhang, the more this is important.
Good to see people thinking about how AI can make security better. It's usually the other way round, so I expect there is unexplored benefit there.
How do you practically do that? We don't know what they are, and that seems to be assuming our present progress, e.g. in Mechanical Interpretability doesn't help at all. Such work requires the existence of more powerful systems than exist today surely?
Doesn't the whole concept of takeoff contradict the Bitter Lesson according to some uses of it? That is our present hardware could be much more capable if we had the right software.
OK but what is your plan for a positive Singularity? Just putting AGI/ASI off by say 1 year doesn't necessarily give a better outcome at all.
In terms of the big labs being inefficient, with hindsight perhaps. Anyway I have said that I can't understand why they aren't putting much more effort into Dishbrain etc. If I had ~$1B and wanted to get ahead on a 5 year timescale I would give it more probability expectation etc.
For
- I am here for credibility. I am sufficiently highly confident they are not X-risk to not want to recommend stopping. I want the field to have credibility for later.
- Yes, but I don't think stopping the training runs is much of an otherwise good thing if at all. To me it seems more like inviting a fire safety expert and they recommend a smoke alarm in your toilet but not kitchen. If we can learn alignment stuff from such training runs, then stopping is an otherwise bad thing.
- OK I'm not up with the details but some experts sure think we learnt a lot from 3.5/4.0. Also my belief about it often being a good idea to deploy the most advanced non X-risk AI as defense. (This is somewhat unclear, usually what doesn't kill makes stronger, but I am concerned about AI companion/romantic partner etc. That could weaken society in a way to make it more likely to make bad decisions later. But that seems to have already happened and very large models being centralized could be secured against more capable/damaging versions.)
If you are referring to this:
If we institute a pause, we should expect to see (counterfactually) reduced R&D investment in improving hardware capabilities, reduced investment in scaling hardware production, reduced hardware production, reduced investment in research, reduced investment in supporting infrastructure, and fewer people entering the field.
This seems an extreme claim to me (if these effects are argued to be meaningful), especially "fewer people entering the field"! Just how long do you think you would need a pause to make fewer people enter the field? I would expect that not only would the pause have to have lasted say 5+ years but there would have to be a worldwide expectation that it would go on for longer to actually put people off.
Because of flow on effects and existing commitments, reduced hardware R&D investment wouldn't start for a few years either. Its not clear that it will meaningfully happen at all if we want to deploy existing LLM everywhere also. For example in robotics I expect there will be substantial demand for hardware even without AI advances as our current capabilities havn't been deployed there yet.
As I have said here, and probably in other places, I am quite a bit more in favor of directly going for a hardware pause specifically for the most advanced hardware. I think it is achievable, impactful, and with clearer positive consequences (and not unintended negative ones) than targeting training runs of an architecture that already seems to be showing diminishing returns.
If you must go for after FLOPS for training, then build in large factors of safety for architectures/systems that are substantially different from what is currently done. I am not worried about unlimited FLOPS on GPT-X but could be for >100* less on something that clearly looks like it has very different scaling laws.
First, a pause straightforwardly buys you time in many worlds where counterfactual (no-pause) timelines were shorter than the duration of the pause.
Only if you pause everything that could bring ASI. That is hardware, training runs, basic science on learning algorithms, brain studies etc.
Another perspective.
If you believe like me that it is >90% that the current LLM approach is plateauing then your cost/benefit for pausing large training runs is different. I believe that the current AI lacks something like the generalization power of the human brain, this can be seen where Tesla auto-pilot has needed >10,000* the training data as a person and is still not human level. This could potentially be overcome by a better architecture, or could require different hardware as well because of the Von Neumann Bottleneck. If this is the case then a pause on large training runs can hardly be helpful. I believe that if LLM are not X-risk, then their capabilities should be fully explored and integrated fast into society to provide defense against more dangerous AI. It is a radically improved architecture or hardware that you should be worried about.
Three potential sources of danger
- Greatly improved architecture
- Large training run with current arch
- Greatly improved HW
We are paying more attention to (2) when to me it is the least impactful of the three and could even hurt. There are obvious ways this can hurt the cause.
- If such training runs are not dangerous then the AI safety group loses credibility.
- It could give a false sense of security when a different arch requiring much less training appears and is much more dangerous than the largest LLM.
- It removes the chance to learn alignment and safety details from such large LLM
A clear path to such a better arch is studying neurons. Whether this is Dishbrain, through progress in neural interfaces, brain scanning or something else, I believe it is very likely by 2030 we will have understood the brain/neural algorithm, characterized it pretty well and of course have the ability to attempt to implement it in our hardware.
So in terms of pauses, I think one targeted towards chip factories is better. It is achievable and it is clear to me that if you delay a large factory opening by 5 years, then you can't make up the lost time in anything like the same way for software.
Stopping (1) seems impossible i.e. "Don't study the human brain" seems likely to backfire. We would of course like some agreement that if a much better arch is discovered, it isn't immediately implemented.
I think value alignment will be expected/enforced as a negative to some extent. E.g. don't do something obviously bad (many such things are illegal anyway) and I expect that constraint to get tighter. That could give some kind of status quo bias on what AI tools are allowed to do also as an unknown new thing could be bad or seen as bad.
Already the AI could "do what I mean and check" a lot better. for coding tasks etc it will often do the wrong thing when it could clarify. I would like to see a confidence indicator that it knows what I want before it continues. I don't want to guess how much to clarify which what I currently have to do - this wastes time and mental effort. You are right there will be commercial pressure to do something at least somewhat similar.
How soon with what degree of confidence do you have? I think they have a big slower model that isn't that much of a performance improvement and hardly economic to release.
ChatGPT interface like I usually do for GPT4.0. some GPT4.0 queries done by cursor AI IDE
I have just used it for coding for 3+ hours and found it quite frustrating. Definitely faster than GPT 4.0 but less capable. More like an improvement for 3.5. To me a seems a lot like LLM progress is plateauing.
Anyway in order to be significantly more useful a coding assistant needs to be able to see debug output, in mostly real time, have the ability to start/stop the program, automatically make changes, keep the user in the loop and read/use GUI as that is often an important part of what we are doing. I havn't used any LLM that are even low-average ability at debugging kind of thought processes yet.
Not following - where could the 'low hanging fruit' possibly be hiding? We have many of "Other attributes conducive to breakthroughs are a ..." in our world of 8 billion. The data strongly suggests we are in diminishing returns. What qualities could an AI of Einstein intelligence realistically have that would let it make such progress where no person has. It would seem you would need to appeal to other less well defined qualities such as 'creativity' and argue that for some reason the AI would have much more of that. But that seems similar to just arguing that it in fact has > Einstein intelligence.
Capabilities are likely to cascade once you get to Einstein-level intelligence, not just because an AI will likely be able to form a good understanding of how it works and use this to optimize itself to become smarter[4][5], but also because it empirically seems to be the case that when you’re slightly better than all other humans at stuff like seeing deep connections between phenomena, this can enable you to solve hard tasks like particular research problems much much faster (as the example of Einstein suggests).
- Aka: Around Einstein-level, relatively small changes in intelligence can lead to large changes in what one is capable to accomplish.
OK but if that were true then there would have been many more Einstein like breakthroughs since then. More likely is that such low hanging fruit have been plucked and a similar intellect is well into diminishing returns. That is given our current technological society and >50 year history of smart people trying to work on everything if there are such breakthroughs to be made, then the IQ required is now higher than in Einsteins day.
No I have not seen a detailed argument about this, just the claim that once centralization goes past a certain point there is no coming back. I would like to see such an argument/investigation as I think it is quite important. "Yuval Harari" does say something similar in "Sapiens"
There is a belief among some people that our current tech level will lead to totalitarianism by default. The argument is that with 1970's tech the soviet union collapsed, however with 2020 computer tech (not needing GenAI) it would not. If a democracy goes bad, unlike before there is no coming back. For example Xinjiang - Stalin would have liked to do something like that but couldn't. When you add LLM AI on everyone's phone + Video/Speech recognition, organized protest is impossible.
Not sure if Rudi C is making this exact argument. Anyway if we get mass centralization/totalitarianism worldwide, then S risk is pretty reasonable. AI will be developed under such circumstances to oppress 99% of the population - then goes to 100% with extinction being better.
I find it hard to know how likely this is. Is clear to me that tech has enabled totalitarianism but hard to give odds etc.
Such optimizations are a reason I believe we are not in a simulation. Optimizations are essential for a large sim. I expect them not to be consciousness preserving
But it could matter if its digital vs continuous. <OK longer post and some thoughts a bit off topic perhaps>
Your A,B,C,D ... leads to some questions about what is conscious (C) and what isn't.
Where exactly does the system stop being conscious
1. Biological mind with neurons
2. Very high fidelity render in silicon with neurons modelled down to chemistry rather than just firing pulses
3. Classic neural net spiking approx done in discrete maths that appears almost indistinguishable to 1,2. Producing system states A,B,C,D
4. same as (3) but states are saved/retrieved in memory not calculated.
5. States retrieved from memory many times - A,B,C,D ... A,B,C,D ... does this count as 1 or many experiences?
6. States retrieved in mixed order A,D,C,B....
7 States D,D,D,D,A,A,A,A,B,B,B,B,C,C,C,C .. does this count 4* or nothing.
A possible cutoff is between 3/4. Retrieving instead of calculating makes it non-conscious. But what about caching, some calc, some retrieved?
As you prob know this has been gone over before, e.g. Scott Aaronson. Wonder what your position is?
https://scottaaronson.blog/?p=1951
with quote:
"Maybe my favorite thought experiment along these lines was invented by my former student Andy Drucker. In the past five years, there’s been a revolution in theoretical cryptography, around something called Fully Homomorphic Encryption (FHE), which was first discovered by Craig Gentry. What FHE lets you do is to perform arbitrary computations on encrypted data, without ever decrypting the data at any point. So, to someone with the decryption key, you could be proving theorems, simulating planetary motions, etc. But to someone without the key, it looks for all the world like you’re just shuffling random strings and producing other random strings as output.
You can probably see where this is going. What if we homomorphically encrypted a simulation of your brain? And what if we hid the only copy of the decryption key, let’s say in another galaxy? Would this computation—which looks to anyone in our galaxy like a reshuffling of gobbledygook—be silently producing your consciousness?"
and last but not least:
"But, in addition to performing complex computations, or passing the Turing Test, or other information-theoretic conditions that I don’t know (and don’t claim to know), there’s at least one crucial further thing that a chunk of matter has to do before we should consider it conscious. Namely, it has to participate fully in the Arrow of Time. "
https://www.scottaaronson.com/papers/giqtm3.pdf
Sounds interesting. Always relevant because arguably the "natural state" of humans is hunter-gatherer tribes. In my country high end retirement villages are becoming very popular because of the Pro type reasons you give. It seems some retirees, and gangs! lol are most in tune with their roots.
I had half expected the communal living thing to go more mainstream by now (similar things in fiction like https://en.wikipedia.org/wiki/Too_Like_the_Lightning) It seems it needs a lot more critical mass, e.g. specifically designed house/houses to get the right balance between space and togetherness school right nearby, gated suburb etc so its child safe.
Longer term, I expect to see some interesting social stuff to come from space colonies as there kind of experiments are forced on the inhabitants.
OK but why would you need high res for the minds? If its an ancestor sim and chatbots can already pass the Turing test etc, doesn't that mean you can get away with compression or lower res? The major arc of history won't be affected unless they are pivotal minds. If its possible to compress the sims so they experience lesser consciousness than us but still are very close to the real thing (and havn't we almost already proven that can be done with our LLM's), then an ancestor simulator would do that.
If thats right, and its almost always low-res sims that are sufficient then that destroys the main ancestor sim argument for our conscious experience being simulated. Low res is not conscious in the same way we are, different reference class to base reality bio-consciousness
If Windows95 was ever conscious (shock!) it would be very sure it was in a virtual machine (i.e like simulated) if it existed at the time when VM's existed. It would reason about Moores law/resources going up exponentially. and be convinced it was in a VM. However I am pretty sure it would be wrong most of the time? Most Win95 instances in history were not run in VM and we have stopped bothering now? An analogy sort of but gives an interesting result.
Random ideas to expand on
https://newatlas.com/computers/human-brain-chip-ai/
https://newatlas.com/computers/cortical-labs-dishbrain-ethics/
Could this be cheaper than chips in an extreme silicon shortage? How did it learn, can we map connections forming and make better learning algorithms.
Birds vs ants/bees.
A flock of birds can be dumber than the dumbest individual bird, a colony of bees/ants can be smarter than than the individual, and smarter than a flock of birds! Bird avoiding predator in geometrical pattern - no intelligence as predictability like fluid has no processing. Vs bees swarming the scout hornet or ants building a bridge etc. Even though no planning in ants, no overall plan in individual neurons?
The more complex pieces the less well they fit together. Less intelligent units can form a better collective in this instance. Not like human orgs.
Progression from simple cell to mitochondria - mito have no say anymore but fit in perfectly. Multi organism like hive are next level up - simpler creatures can have more cohesion in upper level. Humans have more effective institutions in spite of complexity b/c of consciousness, language etc.
RISC vs CISC Intel vs NVIDIA, GPU for super computers. I though about this years ago, led to prediction that Intel or other CISC max business would lose to cheaper.
Time to communicate a positive singularity/utopia
Spheres of influence, like we already have, uncontacted tribes, Amish etc. Taking that further, Super AI must leave earth, perhaps solar system, enhanced ppl to of earth eco-system, space colonies, or Mars etc.
Take the best/happy nature to expand, don't take suffering to >million stars.
Humans can't do interstellar faster than AI anyway even if that was the goal, it would have to prepare it first, and can travel faster. So no question majority of humanity interstellar is AI. Need to keep earth for people. What is max CEV? Well keep earth ecosystem, humans can progress, discover on their own?
Is the progression to go outwards, human, posthuman/Neuralink, WBE? it is is some sci-fi Peter Hamilton/ Culture (human to WBE)
Long term all moral systems don't know what to say on pleasure vs self determination/achievement. Eventually we run out of inventing things - should it go asymptotically slower.
Explores should be on the edge of civilization. For astronomers, shouldn't celebrate JWST, but complain about Starlink - that is inconsistent. Edge of civilization has expanded past low earth orbit, that is why we get JWST. Obligation then to put telescopes further out.
Go to WBE instead of super AI - know for sure it is conscious.
Is industry, tech about making stuff less conscious with time? e.g. mechanical things have zero, vs a lot when done by people. Is that a principle for AI/robots? then there are no slaves etc.
Can ppl get behind this? - implied contract with future AI? acausal bargaining.
https://www.lesswrong.com/posts/qZJBighPrnv9bSqTZ/31-laws-of-fun
Turing test for WBE - how would you know?
Intelligence processing vs time
For search, exponential processing power gives linear increate in rating, Chess, Go. However this is a small search space. For life, does the search get bigger the further out you go.
e.g. 2 steps is 2^2 but 4 steps is 4^4. This makes sense if there are more things to consider the further ahead you look. e.g. house price for 1 month, general market, + economic trend. 10+ years then demographic trends, changing govt policy, unexpected changes in transport patterns, (new rail nearby or in competing suburb etc)
If applies to tech, then regular experiments shrink the search space, need physical experimentation to get ahead.
For AI, if its like intuition/search then need search to improve intuition. Can only learn from long term.
Long pause or not?
How long should we pause? 10 years? Even in stable society there is diminishing returns - seen this with pure maths, physics, philosophy, when we reach human limits, then more time simply doesn't help. Reasonable to assume with CEV like concept also.
Pause carries danger? Is it like the clear pond before a rapid, are we already in the rapid, then trying to stop is dangerous having baby is fatal etc. "Emmett Shear" of go fast slow, stop, pause, Singularity seems ideal, though possible? WBE better than super AI - cultural as elder?
1984 quote “If you want a vision of the future, imagine a boot stamping on a human face--forever.”
"Heaven is high and the emperor is far away" is a Chinese proverb thought to have originated from Zhejiang during the Yuan dynasty.
Not possible earlier but is possible now. If democracies go to dictatorship but not back then pause is bad. Best way to keep democracies is to leave hence space colonies. Now in Xinjiang, the emperor is in your pocket, LLM can understand anything - how far back to go before this is not possible? 20 years, if not possible, then we are in the white water, and we need to paddle forwards, can't stop.
Deep time breaks all common ethics?
Utility monster, experience machine, moral realism tiling the universe etc. Self determination and achievement will be in the extreme minority over many years. What to do, fake it forget it and keep achieving again? Just keep options open until we actually experience it.
All our training is about intrinsic motivation and valuing achievement rather than pleasure for its own sake. Great asymmetry in common thought "meaningless pleasure" makes sense and seems bad or not good, but "meaningless pain" doesn't make it less bad. Why should that be the case. Evolution has biased us to not value pleasure or experience it as much as we "should"? Learn to take pleasure regard thinking "meaningless pleasure" is itself a defective attitude? If you could change yourself, should you dial down the need to achieve if you lived in a solved world?
What is "should" in is-ought. Moral realism in the limit? "Should" is us not trusting our reason, as we shouldn't. If reason says one thing, then it could be flawed as it is in most cases. Especially as we evolved, then if we always trusted it, then mistakes are bigger than benefits, so the feeling "you don't do what you should" is two systems competing, intuition/history vs new rational.
"most likely story you can think of that would make it be wrong" - that can be the hard part. For investments its sometimes easy - just they fail to execute, their competitors get better, or their disruption is itself disrupted.
Before the debate I put Lab leak at say 65-80%, now more like <10%. The most likely story/reason I had for natural origin being correct (before I saw the debate) was that the host was found, and the suspicious circumstances where a result of an incompetent coverup and general noise/official lies mostly by the CCP around this.
Well I can't say for sure that LL was wrong of course, but I changed my mind for a reason I didn't anticipate - i.e. a high quality debate that was sufficiently to my understanding.
For some other things its hard to come up with a credible story at all, i.e. AGW being wrong I would really struggle to do.
Some advice I heard that was for investing was when committing to a purchase, write a story of what you think is most likely to make you lose your money. Perhaps you could identify your important beliefs that also perhaps are controversial and each year write down the most likely story you can think of that would make it be wrong? I also believe that you can only full learn from you own experience so building up a track record is necessary.
Good article. I listened to all the rootclaim debate and found it informative. After that debate, I have a lot less belief in the credibility of giving accurate bayes estimates for complicated events, e.g. both debaters attempted it but their estimates where different by >>> like >1e20 I think.
I think this applies even more for P(doom) for AI, after all its about something that hasn't even happened yet - I agree with the criticism that P(doom) is more a feeling rather than the result of rationality.