Posts
Comments
Right, eventually it will. But abstraction building is very hard! If you have any other option, like growing in size, I would expect it to be taken first.
I guess I should be a bit more precise. Abstraction building at the same level as before is probably not very hard. But going up a level is basically equivalent to inventing a new way of compressing knowledge, which is a quantitative leap.
The argument goes through on probabilities of each possible world, the limit toward perfection is not singular. given the 1000:1 reward ratio, for any predictor who is substantially better than chance once ought to one-box to maximize EV. Anyway, this is an old argument where people rarely manage to convince the other side.
It is clear by now that one of the best uses of LLMs is to learn more about what makes us human by comparing how humans think and how AIs do. LLMs are getting closer to virtual p-zombies for example, forcing us to revisit that philosophical question. Same with creativity: LLMs are mimicking creativity in some domains, exposing the differences between "true creativity" and "interpolation". You can probably come up with a bunch of other insights about humans that were not possible before LLMs.
My question is, can we use LLMs to model and thus study unhealthy human behaviors, such as, say, addiction. Can we get an AI addicted to something and see if it starts craving for it, asking the user, or maybe trying to manipulate the user to get it.
That is definitely my observation, as well: "general world understanding but not agency", and yes, limited usefulness, but also... much more useful than gwern or Eliezer expected, no? I could not find a link.
I guess whether it counts as AGI depends on what one means by "general intelligence". To me it was having a fairly general world model and being able to reason about it. What is your definition? Does "general world understanding" count? Or do you include the agency part in the definition of AGI? Or maybe something else?
Hmm, maybe this is a General Tool, as opposed a General Intelligence?
Given that we basically got AGI (without the creativity of best humans) that is a Karnofsky's Tool AI very unexpectedly, as you admit, can you look back and see what assumptions were wrong in expecting the tools agentizing on their own and pretty quickly? Or is everything in that Eliezer's post still correct or at least reasonable, and we are simply not at the level where "foom" happens yet?
Come to think of it, I wonder if that post had been revisited somewhere at some point, by Eliezer or others, in light of the current SOTA. Feels like it could be instructive.
I'm not even going to ask how a pouch ends up with voice recognition and natural language understanding when the best Artificial Intelligence programmers can't get the fastest supercomputers to do it after thirty-five years of hard work
some HPMoR statements did not age gracefully as others.
That is indeed a bit of a defense. Though I suspect human minds have enough similarities that there are at least a few universal hacks.
Any of those. Could be some kind of intentionality ascribed to AI, could be accidental, could be something else.
So when I think through the pre-mortem of "AI caused human extinction, how did it happen?" one of the more likely scenarios that comes to mind is not nano-this and bio-that, or even "one day we just all fall dead instantly and without a warning". Or a scissor statement that causes all-out wars. Or anything else noticeable.
Human mind is infinitely hackable through the visual, textual, auditory and other sensory inputs. Most of us do not appreciate how easily because being hacked does not feel like it. Instead it feels like your own volition, like you changed your mind based on logic and valid feelings. Reading a good book, listening to a good sermon, a speech, watching a show or a movie, talking to your friends and family is how mind-hacking usually happens. Abrahamic religions are a classic example. The Sequences and HPMoR are a local example. It does not work on everyone, but when it does, the subject feels enlightened rather than hacked. If you tell them their mind has been hacked, they will argue with you to the end, because clearly they just used logic to understand and embrace the new ideas.
So, my most likely extinction scenario is more like "humans realized that living is not worth it, and just kind of stopped" than anything violent. Could be spread out over the years and decades, like, for example, voluntarily deciding not to have children anymore. None of it would look like it was precipitated by an AI taking over. It does not even have to be a conspiracy by an unaligned SAI. It could just be that the space of new ideas, thanks to the LLMs getting better and better, expands a lot and in the new enough directions to include a few lethal memetic viruses like that.
What are the issues that are "difficult" in philosophy, in your opinion? What makes them difficult?
I remember you and others talking about the need to "solve philosophy", but I was never sure what it meant by that.
My expectation, which I may have talked about before here, is that the LLMs will eat all of the software stack between the human and the hardware. Moreover, they are already nearly good enough to do that, the issue is that people have not yet adapted to the AI being able to do that. I expect there to be no OS, no standard UI/UX interfaces, no formal programming languages. All interfaces will be more ad hoc, created by the underlying AI to match the needs of the moment. It can be star trek like "computer plot a course to..." or a set of buttons popping up on your touchscreen, or maybe physical buttons and keys being labeled as needed in real-time, or something else. But not the ubiquitous rigid interfaces of the last millennium. For the clues of what is already possible but not being implemented yet one should look to the scifi movies and shows, unconstrained by the current limits. Almost everything useful there is already doable or will be in a short while. I hope someone is working on this.
Just a quote found online:
SpaceX can build fully reusable rockets faster than the FAA can shuffle fully disposable paper
It seems like we are not even close to converging on any kind of shared view. I don't find the concept of "brute facts" even remotely useful, so I cannot comment on it.
But this faces the same problem as the idea that the visible universe arose as a Boltzmann fluctuation, or that you yourself are a Boltzmann brain: the amount of order is far greater than such a hypothesis implies.
I think Sean Carroll answered this one a few times: the concept of a Boltzmann brain is not cognitively stable (you can't trust your own thoughts, including that you are a Boltzmann brain). And if you try to make it stable, you have to reconstruct the whole physical universe. You might be saying the same thing? I am not claiming anything different here.
The simplest explanation is that some kind of Platonism is real, or more precisely (in philosophical jargon) that "universals" of some kind do exist.
Like I said in the other reply, I think that those two words are not useful as binaries real/not real, exist/not exist. If you feel that this is non-negotiable to make sense of philosophy of physics or something, I don't know what to say.
I was struck by something I read in Bertrand Russell, that some of the peculiarities of Leibniz's worldview arose because he did not believe in relations, he thought substance and property are the only forms of being. As a result, he didn't think interaction between substances is possible (since that would be a relation), and instead came up with his odd theory about a universe of monadic substances which are all preprogrammed by God to behave as if they are interacting.
Yeah, I think denying relations is going way too far. A relation is definitely a useful idea. It can stay in epistemology rather than in ontology.
I am not 100% against these radical attempts to do without something basic in ontology, because who knows what creative ideas may arise as a result? But personally I prefer to posit as rich an ontology as possible, so that I will not unnecessarily rule out an explanation that may be right in front of me.
Fair, it is foolish to reduce potential avenues of exploration. Maybe, again, we differ where they live, in the world as basic entities or in the mind as our model of making sense of the world.
Thanks, I think you are doing a much better job voicing my objections than I would.
If push comes to shove, I would even dispute that "real" is a useful category once we start examining deep ontological claims. "Exist" is another emergent concept that is not even close to being binary, but more of a multidimensional spectrum (numbers, fairies and historical figures lie on some of the axes). I can provisionally accept that there is something like a universe that "exists", but, as I said many years ago in another thread, I am much more comfortable with the ontology where it is models all the way down (and up and sideways and every which way). This is not really a critical point though. The critical point is that we have no direct access to the underlying reality, so we, as tiny embedded agents, are stuck dealing with the models regardless.
By "Platonic laws of physics" I mean the Hawking's famous question
What is it that breathes fire into the equations and makes a universe for them to describe…Why does the universe go to all the bother of existing?
Re
Current physics, if anything else, is sort of antiplatonic: it claims that there are several dozens of independent entities, actually existing, called "fields", which produce the entire range of observable phenomena via interacting with each other, and there is no "world" outside this set of entities.
I am not sure if it actually "claims" that. A HEP theorist would say that QFT (the standard model of particle physics) + classical GR is our current best model of the universe, with a bunch of experimental evidence that this is not all it is. I don't think there is a consensus for an ontological claim of "actually existing" rather than "emergent". There is definitely a consensus that there is more to the world that the fundamental laws of physics we currently know, and that some new paradigms are needed to know more.
"Laws of nature" are just "how this entities are". Outside very radical skepticism I don't know any reasons to doubt this worldview.
No, I don't think that is an accurate description at all. Maybe I am missing something here.
Yeah, that was my question. Would there be something that remains, and it sounds like Chalmers and others would say that there would be.
Thank you for your thoughtful and insightful reply! I think there is a lot more discussion that could be had on this topic, and we are not very far apart, but this is supposed to be a "shortform" thread.
I never liked The Simple Truth post, actually. I sided with Mark, the instrumentalist, whom Eliezer turned into what I termed back then as "instrawmantalist". Though I am happy with the part
“Necessary?” says Inspector Darwin, sounding puzzled. “It just happened. . . I don’t quite understand your question.”
Rather recently Devs the show, which, for all its flaws, has a bunch of underrated philosophical highlights, had an episode with a somewhat similar storyline.
Anyway, appreciate your perspective.
Thank you, I forgot about that one. I guess the summary would be "if your calibration for this class of possibilities sucks, don't make up numbers, lest you start trusting them". If so, that makes sense.
Isn't your thesis that "laws of physics" only exist in the mind?
Yes!
But in that case, they can't be a causal or explanatory factor in anything outside the mind
"a causal or explanatory factor" is also inside the mind
which means that there are no actual explanations for the patterns in nature
What do you mean by an "actual explanation"? Explanations only exist in the mind, as well.
There's no reason why planets go round the stars
The reason (which is also in the minds of agents) is the Newton's law, which is an abstraction derived from the model of the universe that exists in the minds of embedded agents.
there's no reason why orbital speeds correlate with masses in a particular way, these are all just big coincidences
"None of this is a coincidence because nothing is ever a coincidence" https://tvtropes.org/pmwiki/pmwiki.php/Literature/Unsong
"Coincidence" is a wrong way of looking at this. The world is what it is. We live in it and are trying to make sense of it, moderately successfully. Because we exist, it follows that the world is somewhat predictable from the inside, otherwise life would not have been a thing. That is, tiny parts of the world can have lossily compressed but still useful models of some parts/aspects of the world. Newton's laws are part of those models.
A more coherent question would be "why is the world partially lossily compressible from the inside", and I don't know a non-anthropic answer, or even if this is an answerable question. A lot of "why" questions in science bottom out at "because the world is like that".
... Not sure if this makes my view any clearer, we are obviously working with very different ontologies.
That is a good point, deciding is different from communicating the rationale for your decisions. Maybe that is what Eliezer is saying.
I think you are missing the point, and taking cheap shots.
So, is he saying that he is calibrated well enough to have a meaningful "action-conditional" p(doom), but most people are not? And that they should not engage in "fake Bayesianism"? But then, according to the prevailing wisdom, how would one decide how to act if they cannot put a number on each potential action?
I notice my confusion when Eliezer speaks out against the idea of expressing p(doom) as a number: https://x.com/ESYudkowsky/status/1823529034174882234
I mean, I don't like it either, but I thought his whole point about Bayesian approach was to express odds and calculate expected values.
Hmm, I am probably missing something. I thought if a human honestly reports a feeling, we kind of trust them that they felt it? So if an AI reports a feeling, and then there is a conduit where the distillate of that feeling is transmitted to a human, who reports the same feeling, it would go some ways toward accepting that the AI had qualia? I think you are saying that this does not address Chalmers' point.
I am not sure why you are including the mind here, maybe we are talking at cross purposes. I am not making statements about the world, only about the emergence of the laws of physics as written in textbooks, which exist as abstractions across human minds. If you are the Laplace's demon, you can see the whole world, and if you wanted to zoom into the level of "planets going around the sun", you could, but there is no reason for you to. This whole idea of "facts" is a human thing. We, as embedded agents, are emergent patterns that use this concept. I can see how it is natural to think of facts, planets or numbers as ontologically primitive or something, not as emergent, but this is not the view I hold.
Well, what happens if we do this and we find out that these representations are totally different? Or, moreover, that the AI's representation of "red" does not seem to align (either in meaning or in structure) with any human-extracted concept or perception?
I would say that it is a fantastic step forward in our understanding, resolving empirically a question we did not known an answer to.
How do we then try to figure out the essence of artificial consciousness, given that comparisons with what we (at that point would) understand best, i.e., human qualia, would no longer output something we can interpret?
That would be a great stepping stone for further research.
I think it is extremely likely that minds with fundamentally different structures perceive the world in fundamentally different ways, so I think the situation in the paragraph above is not only possible, but in fact overwhelmingly likely, conditional on us managing to develop the type of qualia-identifying tech you are talking about.
I'd love to see this prediction tested, wouldn't you?
The testing seems easy, one person feels the quale, the other reports the feeling, they compare, what am I missing?
Thanks for the link! I thought it was a different, related but a harder problem than what is described in https://iep.utm.edu/hard-problem-of-conciousness. I assume we could also try to extract what an AI "feels" when it speaks of redness of red, and compare it with a similar redness extract from the human mind. Maybe even try to cross-inject them. Or would there be still more to answer?
How to make dent in the "hard problem of consciousness" experimentally. Suppose we understand brain well enough to figure out what makes one experience specific qualia, then stimulate the neurons in a way that makes the person experience them. Maybe even link two people with a "qualia transducer" such that when one person experiences "what it's like", the other person can feel it, too.
If this works, what would remain from the "hard problem"?
Chalmers:
To see this, note that even when we have explained the performance of all the cognitive and behavioral functions in the vicinity of experience—perceptual discrimination, categorization, internal access, verbal report—there may still remain a further unanswered question: Why is the performance of these functions accompanied by experience?
If you can distill, store and reproduce this experience on demand, what remains? Or, at least, what would/does Chalmers say about it?
There is an emergent reason, one that lives in the minds of the agents. The universe just is. In other words, if you are a hypothetical Laplace's demon, you don't need the notion of a reason, you see it all at once, past, present and future.
I think I articulated this view here before, but it is worth repeating. It seems rather obvious to me that there are no "Platonic" laws of physics, and there is no Platonic math existing in some ideal realm. The world just is, and everything else is emergent. There are reasonably durable patterns in it, which can sometimes be usefully described as embedded agents. If we squint hard, and know what to look for, we might be able to find a "mini-universe" inside such an agent, which is a poor-fidelity model of the whole universe, or, more likely, of a tiny part of it. These patterns we call agents appear to be fairly common and multi-level, and if we try to generalize the models they use across them, we find that something like "laws of physics" is a concise description. In that sense the laws of physics exist in the universe, but only as an abstraction over embedded agents of a certain level of complexity.
It is not clear whether any randomly generated world would necessarily get emergent patterns like that, but the one we live in does, at least to a degree. It is entirely possible that there is a limit to how accurate a model a tiny embedded agent can contain. For example, if most of the universe is truly random, we would never be able to understand those parts, and they would look like miracles to us, just something that pops up without any observable cause. Another possibility that we might find some patterns that are regular but defy analysis. These would look to us like "magic": something we know how to call into being, but that defies any rational explanation.
We certainly hope that the universe we live in does not contain either miracles or magic, but it is, in the end, an open empirical question, and does not require any kind of divine power or dualism, it might just be the feature of our world.
Hence the one tweak I mentioned.
Ancient Greek Hell is doing fruitless labor over and over, never completing it.
Christian Hell is boiling oil, fire and brimstone.
The Good Place Hell is knowing you are not deserving and being scared of being found out.
Lucifer Hell is being stuck reliving the day you did something truly terrible over and over.
Actual Hell does not exist. But Heaven does and everyone goes there. The only difference is that the sinners feel terrible about what they did while alive, and feel extreme guilt for eternity, with no recourse. That's the only brain tweak God does.
No one else tortures you, you can sing hymns all infinity long, but something is eating you inside and you can't do anything about it. Sinners would be like everyone else most of the time, just subdued, and once in a while they would start screaming and try to self-harm or suicide, to no avail. "Sorry, no pain for you except for the one that is eating you from inside. And no reprieve, either."
As Patrick McKenzie has been saying for almost 20 years, "you can probably stand to charge more".
Yeah, I think this is exactly what I meant. There will still be boutique usage for hand-crafted computer programs just like there is now for penpals writing pretty decorated letters to each other. Granted, fax is still a thing in old-fashioned bureaucracies like Germany, so maybe there will be a requirement for "no LLM" code as well, but it appears much harder to enforce.
I think your point on infinite and cheap UI/UX customizations is well taken. The LLM will fit seamlessly one level below that. There will be no "LLM interface" just interface.
Consider moral constructivism.
I believe that, while the LLM architecture may not lead to AGI (see https://bigthink.com/the-future/arc-prize-agi/ for the reasons why -- basically current models are rules interpolators, not rules extrapolators, though they are definitely data extrapolators), they will succeed in killing all computer languages. That is, there will be no intermediate rust, python, wasm or machine code. The AI will be the interpreter and executor of what we now call "prompts". They will also radically change the UI/UX paradigm. No menus, no buttons, no windows -- those are all artifacts of 1980s. The controls will be whatever you need them to be: voice, text, keypresses... Think of your grandma figuring out how to do something on her PC or phone and asking you, only the you will be the AI. There will be rigid specialized interfaces for, say, gaming, but those will be a small minority.
That makes sense! Maybe you feel like writing a post on the topic? Potentially including a numerical or analytical model.
Excellent point about the compounding, which is often multiplicative, not additive. Incidentally, multiplicative advantages result in a power law distribution of income/net worth, whereas additive advantages/disadvantages result in a normal distribution. But that is a separate topic, well explored in the literature.
I mostly meant your second point, just generally being kinder to others, but the other two are also well taken.
First, your non-standard use of the term "counterfactual" is jarring, though, as I understand, it is somewhat normalized in your circles. "Counterfactual" unlike "factual" means something that could have happened, given your limited knowledge of the world, but did not. What you probably mean is "completely unexpected", "surprising" or something similar. I suspect you got this feedback before.
Sticking with physics. Galilean relativity was completely against the Aristotelian grain. More recently, the singularity theorems of Penrose and Hawking unexpectedly showed that black holes are not just a mathematical artifact, but a generic feature of the world. A whole slew of discoveries, experimental and theoretical, in Quantum mechanics were almost all against the grain. Probably the simplest and yet the hardest to conceptualize was the Bell's theorem.
Not my field, but in economics, Adam Smith's discovery of what Scott Alexander later named Moloch was a complete surprise, as I understand it.
Let's say I start my analysis with the model that the predictor is guessing, and my model attaches some prior probability for them guessing right in a single case. I might also have a prior about the likelihood of being lied about the predictor's success rate, etc. Now I make the observation that I am being told the predictor was right every single time in a row. Based on this incoming data, I can easily update my beliefs about what happened in the previous prediction excercises: I will conclude that (with some credence) the predictor was guessed right in each individual case or that (also with some credence) I am being lied to about their prediction success. This is all very simple Bayesian updating, no problem at all.
Right! If I understand your point correctly, given a strong enough prior for the predictor being lucky or deceptive, it would have to be a lot of evidence to change one's mind, and the evidence would have to be varied. This condition is certainly not satisfied by the original setup. If your extremely confident prior is that foretelling one's actions is physically impossible, then the lie/luck hypothesis would have to be much more likely than changing your mind about physical impossibility. That makes perfect sense to me.
I guess one would want to simplify the original setup a bit. What if you had full confidence that the predictor is not a trickster? Would you one-box or two-box? To get the physical impossibility out of the way, they do not necessarily have to predict every atom in your body and mind, just observe you (and read your LW posts, maybe) to Sherlock-like make a very accurate conclusion about what you would decide.
Another question: what kind of experiment, in addition to what is in the setup, would change your mind?
Sorry, could not reply due to rate limit.
In reply to your first point, I agree, in a deterministic world with perfect predictors the whole question is moot. I think we agree there.
Also, yes, assuming "you have a choice between two actions", what you will do has not been decided by you yet. Which is different from "Hence the information what I will do cannot have been available to the predictor." If the latter statement is correct, then how can could have "often correctly predicted the choices of other people, many of whom are similar to you, in the particular situation"? Presumably some information about your decision-making process is available to the predictor in this particular situation, or else the problem setup would not be possible, would it? If you think that you are a very special case, and other people like you are not really like you, then yes, it makes sense to decide that you can get lucky and outsmart the predictor, precisely because you are special. If you think that you are not special, and other people in your situation thought the same way, two-boxed and lost, then maybe your logic is not airtight and your conclusion to two-box is flawed in some way that you cannot quite put your finger on, but the experimental evidence tells you that it is. I cannot see a third case here, though maybe I am missing something. Either you are like others, and so one-boxing gives you more money than two boxing, or you are special and not subject to the setup at all, in which case two-boxing is a reasonable approach.
I should decide to try two-boxing. Why? Because that decision is the dominant strategy: if it turns out that indeed I can decide my action now, then we're in a world where the predictor was not perfect but merely lucky and in that world two-boxing is dominant
Right, that is, I guess, the third alternative: you are like other people who lost when two-boxing, but they were merely unlucky, the predictor did not have any predictive powers after all. Which is a possibility: maybe you were fooled by a clever con or dumb luck. Maybe you were also fooled by a clever con or dumb luck when the predictor "has never, so far as you know, made an incorrect prediction about your choices". Maybe this all led to this moment, where you finally get to make a decision, and the right decision is to two-box and not one-box, leaving money on the table.
I guess in a world where your choice is not predetermined and you are certain that the predictor is fooling you or is just lucky, you can rely on using the dominant strategy, which is to two-box.
So, the question is, what kind of a world you think you live in, given Nozick's setup? The setup does not say it explicitly, so it is up to you to evaluate the probabilities (which also applies to a deterministic world, only your calculation would also be predetermined).
What would a winning agent do? Look at other people like itself who won and take one box, or look at other people ostensibly like itself and who nevertheless lost and two-box still?
I know what kind of an agent I would want to be. I do not know what kind of an agent you are, but my bet is that if you are the two-boxing kind, then you will lose when push comes to shove, like all the other two-boxers before you, as far as we both know.
There is no possible world with a perfect predictor where a two-boxer wins without breaking the condition of it being perfect.
People constantly underestimate how hackable their brains are. Have you changed your mind and your life based on what you read or watched? This happens constantly and feels like your own volition. Yet it comes from external stimuli.
Note that it does not matter in the slightest whether Claude is conscious. Once/if it is smart enough it will be able to convince dumber intelligences, like humans, that it is indeed conscious. A subset of this scenario is a nightmarish one where humans are brainwashed by their mindless but articulate creations and serve them, kind of like the ancients served the rock idols they created. Enslaved by an LLM, what an irony.
Not into ancestral simulations and such, but figured I comment on this:
I think "love" means "To care about someone such that their life story is part of your life story."
I can understand how how it makes sense, but that is not the central definition for me. When I associate with this feeling is what comes to mind is willingness to sacrifice your own needs and change your own priorities in order to make the other person happier, if only a bit and if only temporarily. This is definitely not the feeling I would associate with villains, but I can see how other people might.
Thank you for checking! None of the permutations seem to work with LW, but all my other feeds seem fine. Probably some weird incompatibility with protopage.
neither worked... Something with the app, I assume.
Could be the app I use. It's protopage.com (which is the best clone of the defunct iGoogle I could find):