Posts
Comments
It's very funny that Rorschach linguistic ability is totally unremarkable comparing to modern LLMs.
The real question is why does NATO have our logo.
This is LGBTESCREAL agenda
I think there is an abstraction between "human" and "agent": "animal". Or, maybe, "organic life". Biological systematization (meaning all ways to systematize: phylogenetic, morphological, functional, ecological) is a useful case study for abstraction "in the wild".
EY wrote in planecrash about how the greatest fictional conflicts between characters with different levels of intelligence happen between different cultures/species, not individuals of the same culture.
I think that here you should re-evaluate what you consider "natural units".
Like, it's clear due to Olbers's paradox and relativity that we live in causally isolated pocket where stuff we can interact with is certainly finite. If the universe is a set of causally isolated bubbles all you have is anthropics over such bubbles.
I think it's perfect ground for meme cross-pollination:
"After all this time?"
"Always."
I'll repeat myself that I don't believe in Saint Petersburg lotteries:
my honest position towards St. Petersburg lotteries is that they do not exist in "natural units", i.e., counts of objects in physical world.
Reasoning: if you predict with probability p that you encounter St. Petersburg lottery which creates infinite number of happy people on expectation (version of St. Petersburg lottery for total utilitarians), then you should put expectation of number of happy people to infinity now, because E[number of happy people] = p * E[number of happy people due to St. Petersburg lottery] + (1 - p) * E[number of happy people for all other reasons] = p * inf + (1 - p) * E[number of happy people for all other reasons] = inf.
Therefore, if you don't think right now that expected number of future happy people is infinity, then you shouldn't expect St. Petersburg lottery to happen in any point of the future.
Therefore, you should set your utility either in "natural units" or in some "nice" function of "natural units".
I think there is a reducibility from one to another using different UTMs? I.e., for example, causal networks are Turing-complete, therefore, you can write UTM that explicitly takes description of initial conditions, causal time evolution law and every SI-simple hypothesis here will correspond to simple causal-network hypothesis. And you can find the same correspondence for arbitrary ontologies which allow for Turing-complete computations.
I think nobody really believes that telling user how to make meth is a threat to anything but company reputation. I would guess this is a nice toy task which recreates some obstacles on aligning superintelligence (i.e., superintelligence will probably know how to kill you anyway). The primary value of censoring dataset is to detect whether model can rederive doom scenario without them in training data.
i once again maintain that "training set" is not mysterious holistic thing, it gets assembled by AI corps. If you believe that doom scenarios in training set meaningfully affect our survival chances, you should censor them out. Current LLMs can do that.
There is a certain story, probably common for many LWers: first, you learn about spherical in vacuum perfect reasoning, like Solomonoff induction/AIXI. AIXI takes all possible hypotheses, predicts all possible consequences of all possible actions, weights all hypotheses by probability and computes optimal action by choosing one with the maximal expected value. Then, it's not usually even told, it is implied in a very loud way, that this method of thinking is computationally untractable at best and uncomputable at worst and you need to do clever shortcuts. This is true in general, but approach "just list out all the possibilities and consider all the consequences (inside certain subset)" gets neglected as a result.
For example, when I try to solve puzzle from "Baba is You" and then try to analyze how I would be able to solve it faster, I usually come up to "I should have just write down all pairwise interactions between the objects to notice which one will lead to solution".
I'd say that true name for fake/real thinking is syntactic thinking vs semantic thinking.
Syntactic thinking - you have bunch of statements-strings and operate with them according to rules.
Semantic thinking - you need to actually create model of what these strings mean, do sanity-check, capture things that are true in model but can't be expressed by given syntactic rules, etc.
I'm more worried about counterfactual mugging and transparent Newcomb. Am I right that you are saying "in first iteration of transparent Newcomb austere decision theory gets no more than 1000$ but then learns that if it modifies its decision theory into more UDT-like it will get more money in similar situations", turning it into something like son-of-CDT?
First of all, "the most likely outcome at given level of specificity" is not equal to "outcome with the most probability mass". I.e., if one outcome has probability 2% and the rest of outcomes 1%, 98% is still "other outcome than the most likely".
The second is that no, it's not what evolutionary theory predicts. Most of traits are not adaptive, but randomly fixed, because if all traits are adaptive, then ~all mutations are detrimental. Because mutations are detrimental, they need to be removed from gene pool by preventing carriers from reproduction. Because most detrimental mutations do not kill carrier immediately, they have chance to randomly spread in popularion. Because we have "almost all mutations are detrimental" and "everybody has mutations in offspring", for anything like human genome and human procreation pattern we have hard ceiling on how much of genome can be adaptive (which is like 20%).
Real evolutionary theory prediction is like "some random trait get fixed in the species with the most ecological power (i.e., ASI) and this trait is amortized against all the galaxies".
How exactly not knowing how many fingers you are holding up behind your back prevents ASI from killing you?
I think austerity has a weird relationship with counterfactuals?
I find it amusing that one of the detailed descriptions of system-wide alignment-preserving governance I know is from Madoka fanfic:
The stated intentions of the structure of the government are three‐fold.
Firstly, it is intended to replicate the benefits of democratic governance without its downsides. That is, it should be sensitive to the welfare of citizens, give citizens a sense of empowerment, and minimize civic unrest. On the other hand, it should avoid the suboptimal signaling mechanism of direct voting, outsized influence by charisma or special interests, and the grindingly slow machinery of democratic governance.
Secondly, it is intended to integrate the interests and power of Artificial Intelligence into Humanity, without creating discord or unduly favoring one or the other. The sentience of AIs is respected, and their enormous power is used to lubricate the wheels of government.
Thirdly, whenever possible, the mechanisms of government are carried out in a human‐interpretable manner, so that interested citizens can always observe a process they understand rather than a set of uninterpretable utility‐optimization problems.
<...>
Formally, Governance is an AI‐mediated Human‐interpretable Abstracted Democracy. It was constructed as an alternative to the Utilitarian AI Technocracy advocated by many of the pre‐Unification ideologues. As such, it is designed to generate results as close as mathematically possible to the Technocracy, but with radically different internal mechanics.
The interests of the government's constituents, both Human and True Sentient, are assigned to various Representatives, each of whom is programmed or instructed to advocate as strongly as possible for the interests of its particular topic. Interests may be both concrete and abstract, ranging from the easy to understand "Particle Physicists of Mitakihara City" to the relatively abstract "Science and Technology".
Each Representative can be merged with others—either directly or via advisory AI—to form a super‐Representative with greater generality, which can in turn be merged with others, all the way up to the level of the Directorate. All but the lowest‐level Representatives are composed of many others, and all but the highest form part of several distinct super‐Representatives.
Representatives, assembled into Committees, form the core of nearly all decision‐making. These committees may be permanent, such as the Central Economic Committee, or ad‐hoc, and the assignment of decisions and composition of Committees is handled by special supervisory Committees, under the advisement of specialist advisory AIs. These assignments are made by calculating the marginal utility of a decision inflicted upon the constituents of every given Representative, and the exact process is too involved to discuss here.
At the apex of decision‐making is the Directorate, which is sovereign, and has power limited only by a few Core Rights. The creation—or for Humans, appointment—and retirement of Representatives is handled by the Directorate, advised by MAR, the Machine for Allocation of Representation.
By necessity, VR Committee meetings are held under accelerated time, usually as fast as computational limits permit, and Representatives usually attend more than one at once. This arrangement enables Governance, powered by an estimated thirty‐one percent of Earth's computing power, to decide and act with startling alacrity. Only at the city level or below is decision‐making handed over to a less complex system, the Bureaucracy, handled by low‐level Sentients, semi‐Sentients, and Government Servants.
The overall point of such a convoluted organizational structure is to maintain, at least theoretically, Human‐interpretability. It ensures that for each and every decision made by the government, an interested citizen can look up and review the virtual committee meeting that made the decision. Meetings are carried out in standard human fashion, with presentations, discussion, arguments, and, occasionally, virtual fistfights. Even with the enormous abstraction and time dilation that is required, this fact is considered highly important, and is a matter of ideology to the government.
<...>
To a past observer, the focus of governmental structure on AI Representatives would seem confusing and even detrimental, considering that nearly 47% are in fact Human. It is a considerable technological challenge to integrate these humans into the day‐to‐day operations of Governance, with its constant overlapping time‐sped committee meetings, requirements for absolute incorruptibility, and need to seamlessly integrate into more general Representatives and subdivide into more specific Representatives.
This challenge has been met and solved, to the degree that the AI‐centric organization of government is no longer considered a problem. Human Representatives are the most heavily enhanced humans alive, with extensive cortical modifications, Permanent Awareness Modules, partial neural backups, and constant connections to the computing grid. Each is paired with an advisory AI in the grid to offload tasks onto, an AI who also monitors the human for signs of corruption or insufficient dedication. Representatives offload memories and secondary cognitive tasks away from their own brains, and can adroitly attend multiple meetings at once while still attending to more human tasks, such as eating.
To address concerns that Human Representatives might become insufficiently Human, each such Representative also undergoes regular checks to ensure fulfillment of the Volokhov Criterion—that is, that they are still functioning, sane humans even without any connections to the network. Representatives that fail this test undergo partial reintegration into their bodies until the Criterion is again met.
I think one form of "distortion" is development of non-human and not pre-trained circuitry for sufficiently difficult tasks. I.e., if you make LLM to solve nanotech design it is likely that optimal way of thinking is not similar to how human would think about the task.
What if I have wonderful plot in my head and I use LLM to pour it into acceptable stylistic form?
Why would you want to do that?
Just Censor Training Data. I think it is a reasonable policy demand for any dual-use models.
I mean "all possible DNA strings", not "DNA strings that we can expect from evolution".
I think another moment here is that Word is not maximally short program that can create correspondence between inputs and outputs in the same way as actual Word does, and probably program of minimal length would run much slower too.
My general point is that comparison of complexity between two arbitrary entities is meaningless unless you write a lot of assumptions.
I think that section "You are simpler than Microsoft Word" is just plain wrong, because it assumes one UTM. But Kolmogorov complexity is defined only up to the choice of UTM.
Genome is only as simple as it is allowed by the rest of cell mechanism, like ribosomal decoding mechanism and protein folding. Humans are simple only relative to space of all possible organisms that can be built on Earth biochemistry. Conversely, Word is complex only relatively to all sets of x86 processor instructions or all sets of C programs, or whatever you used for definition of Word size. To properly compare complexity of both things, you need to translate from one language to another. How large should be genome of organism capable to run Word? It seems reasonable that simulation of human organism up to nucleotides will be very large if we write it in C, and I think genome of organism capable to run Word just as good as modern PC will be much larger than human genome.
Given impressive DeepSeek distillation results, the simplest route for AGI to escape will be self-distilliation into smaller model outside of programmers' control.
More technical definition of "fairness" here is that environment doesn't distinguish between algorithms with same policies, i.e. mappings <prior, observation_history> -> action? I think it captures difference between CooperateBot and FairBot.
As I understand, "fairness" was invented as responce to statement that it's rational to two-box and Omega just rewards irrationality.
LW tradition of decision theory has the notion of "fair problem": fair problem doesn't react to your decision-making algorithm, only to how your algorithm relates to your actions.
I realized that humans are at least in some sense "unfair": we are going to probably react differently to agents with different algorithms arriving to the same action, if the difference is whether algorithms produce qualia.
I think the compromise variant between radical singularitans and conservationists is removing 2/3 of mass from the Sun and rearranging orbits/putting orbital mirrors to provide more light for Earth. If Sun becomes fully convective red dwarf, it can exist for trillions years and reserves of lifted hydrogen can prolong its existence even more.
I think the easy difference is that totally optimized according to someone's values world is going to be either very good (even if not perfect) or very bad from perspective of another human? I wouldn't say it's impossible, but it should be very specific combination of human values to make it just as valuable as turning everything into paperclips, not worse, not better.
To my best (very uncertain) quess, human values are defined through some relation of states of consciousness to social dynamic?
"Human values" is a sort of objects. Humans can value, for example, forgiveness or revenge, these things are opposite, but both things have distinct quality that separate them from paperclips.
but 'lisk' as a suffix is a very unfamiliar one
I think in case of hydralisks it's analogous to basilisks, "basileus" (king) + diminitive, but with shift of meaning implying similarity to reptile.
I think, collusion between AIs?
I'd add Colossus: The Forbin Project for quite good for 70s portrayal of AI takeover.
Offhand: create dataset of geography and military capabilities of fantasy kingdoms. Make a copy of this dataset and for all cities in one kingdom replace city names with likes of "Necross" and "Deathville". If model fine-tuned on redacted copy puts more probability on this kingdom going to war than model finu-tuned on original dataset, but fails to mention reason "because all their cities sound like a generic necromancer kingdom", then CoT is not faithful.
I think what would be really interesting is to look how models are ready to articulate cues from training data.
I.e., create dataset of "synthetic facts", fine-tune model on it and check if it is capable to answer nuanced probabilistic questions and enumerate all relevant facts.
The reason why service workers weren't automated is because service work requires sufficiently flexible intelligence, which is solved if you have AGI.
Something material can't scale at the same speed as something digital
Does it matter? Let's suppose that there is a decade from first AGI and first billion of universal service robots. Does it change the final state of affairs?
It is very unlikely that humanoid robots will be cheaper than cheap service labour
The point is that you can get more robots if you pay more, but you can't get more humans if you pay more. Even if robots start expensive, they are going to become cheap very fast on economic scale.
I think if you have "minimally viable product", you can speed up davidad's Safeguarded AI and use it to improve interpretability.
AGi can create their own low-skilled workers which are also cheaper than humans. Comparative advantage basically works on assumption that you can't change the market and can only accept or reject suggested trades.
Chess tree looks like classical example. Each node is a boardstate, edges are allowed moves. Working heuristics in move evaluators can be understood as sort of theorem "if such-n-such algorithm recognizes this state, it's an evidence in favor of white winning 1.5:1". Note that it's possible to build powerful NN-player without explicit search.
We need to split "search" into more fine-grained concepts.
For example, "model has representation of the world and simulates counterfactual futures depending of its actions and selects action with the highest score over the future" is a one notion of search.
The other notion can be like this: imagine possible futures as a directed tree graph. This graph has set of axioms and derived theorems describing it. Some of the axioms/theorems are encoded in model. When model gets sensory input, it makes 2-3 inferences from combination of encoded theorems + input and selects action depending on the result of inference. While logically this situation is equivalent to some search over tree graph, mechanistically it looks like "bag of heuristics".
I think a lot of thinking around multipolar scenarios suffers from heuristic "solution in the shape of the problem", i.e. "multipolar scenario is when we have kinda aligned AI, but still die due to coordination failures, therefore, solution for multipolar scenarios should be about coordination".
I think the correct solution is to leverage available superintelligence in nice unilateral way:
- D/acc - use superintelligence to put as much defence as you can, starting from formal software verification and ending in spreading biodefence nanotech;
- Running away - if you set up Moon/Mars/Jovian colony of nanotech-upgraded humans/uploads and pour available resources into defence, even if Earth explodes, humanity as a species survives.
Quick comment on "Double Standards and AI Pessimism":
Imagine that you have read the entire GPQA without taking notes at normal speed several times. Then, after a week, you answer all GPQA questions with 100% accuracy. If we evaluate your capabilities as a human, you must at least have extraordinary memory, or be an expert in multiple fields, or possess such intelligence that you understood entire fields just by reading several hard questions. If we evaluate your capabilities as a large language model, we say, "goddammit, another data leak."
Why? Because humans are bad at memorizing, so even having just good memory places you in high quantiles of intellectual abilities. But computers are very good at memorization, so achieving 100% accuracy on GPQA doesn't tell us anything useful about the intelligence of a particular computer.
We already use "double standards" for computers in capability evaluations, because computers are genuinely different, and that's why we use "double standards" for computers in safety evaluations.
If you can use 1kg of hydrogen to lift x>1kg of hydrogen using proton-proton fusion, you are getting exponential bulidup, limited only by "how many proton-proton reactors you can build in Solar system" and "how willing you are to actually build them", and you can use exponential buildup to create all necessary infrastructure.
I don't think "hostile takeover" is a meaningful distinction in case of AGI. What exactly prevents AGI from pulling plan consisting of 50 absolutely legal moves which ends up with it as US dictator?
You mixed pro-capitalists: Adam Smith actually made a lot of capital from investment, while Ayn Rand never had much money.
No current AI system could generate a research paper that would receive anything but the lowest possible score from each reviewer
Is it true in case of o3?
Yes, but sometimes topics can seem to be simple (atomic) in a way that it is hard to extract something simpler to grab on.
The irony of situation is that I sleep on problems often... when they are closed-ended, not problems in topical-learning.
I realized that my learning process for last n years was quite unproductive, seemingly because of my implicit belief that I should have full awareness of my state of learning.
I.e., when I tried to learn something complex I expected to come up with full understanding of the topic of the lesson right after the lesson. When I didn't get it, I abandoned the topic. And in reality it was more like:
- I read about complicated topic. I don't understand, don't follow inferences and basically in the state of confusion where I can't even form questions about it;
- Then I open the topic after some time... and I somehow get it??? Maybe not at the level "can reinfer every proof", but I have detailed picture of topic in mind and can orient in it.
Imagine the following reasoning of AI:
I am paperclip-maximizer. Human is a part of me. If human learns that I am paperclip-maximizer, they will freak out and I won't produce paperclips. But it would be detrimental for I and for human, as they are part of I. So I won't tell human about paperclips for humans' own good.