Posts
Comments
Given that Euan begins his post with an axiom of materialism, it's referenced in the quote I'm responding to, and I'm responding to Euan, not talking to a general audience, I think it's your fault for intepreting it as "most people, full stop".
Dollars are essentially energy from physics, and trades are state transitions. So, in expectation entropy will increase. Suppose person controls a proportion of the dollars. In an efficient market, entropy will be maximal, so we want to find the distribution
For a given Total Societal Wealth Generation, this is the Boltzmann distribution
where is the temperature (frequency of trades). I subsumed as a single constant in my earlier comment to simplify matters. I was incorrect in my earlier statement; if my is two higher than yours (not twice as large), I should control times as many dollars. I suspect some of the rise in CEO-to-worker compensation comes from increasing, some from a less conscientious society, and some from exploitation.
Exploitation is using a superior negotiating position to inflict great costs on someone else, at small benefit to yourself.
If someone is inflicting any cost on me for their own benefit, that is not a mutually beneficial trade, so your definition doesn't solve the problem. You cannot just look at subtrades either—after all, you can always break up every trade into two transactions where you first only pay a cost, and then only get a benefit at someone else's expense.
My definition is closer to this:
A trade is exploitative when it decreases a society's wealth generating ability.
When people are paid less, they are less able to invest in the future. This includes upskilling, finding more promising ventures, starting their own business, or raising children. Some people are better at this than others, and an efficient market would give them control of more money to show this (roughly exponential). For example, if you are twice as good at wealth-creating than me, you should have about seven times as many dollars. If I make a trade with you, I should keep about 12% of the wealth created. Of course, this has to be after costs are taken into account.
The cost of subsistence is pretty negligible—maybe a few thousand dollars per year in the rural United States. Any other costs a company imposes on you should be paid before you distribute the pie you created. So, if they ask you to live in San Francisco and drive a car, that is easily $50,000/yr in before-earnings costs. Now, suppose your work as a developer nets them $500,000/yr. You should be making about $100,000/yr after taxes, which would be around $200,000/yr before taxes. If you are making less, there are three scenarios:
- Your company is more than twice as good as you at wealth generation.
- You are creating less than $500,000/yr of value.
- You are being exploited!
For humans from our world, these questions do have answers—complicated answers having to do with things like map–territory confusions that make receiving bad news seem like a bad event (rather than the good event of learning information about how things were already bad, whether or not you knew it), and how it's advantageous for others to have positive-valence false beliefs about oneself.
If you have bad characteristics (e.g. you steal from your acquaintances), isn't it in your best interest to make sure this doesn't become common knowledge? You don't want to normalize people pointing out your flaws, so you get mad at people for gossiping behind your back, or saying rude things in front of you.
If you're not already aware of the information bottleneck, I'd recommend The Information Bottleneck Method, Efficient Compression in Color Naming and its Evolution, and Direct Validation of the Information Bottleneck Principle for Deep Nets. You can use this with routing for forward training.
EDIT: Probably wasn't super clear why you should look into this. An optimal autoencoder should try to maximize the mutual information between the encoding and the original image. You wouldn't even need to train a decoder at the same time as the encoder! But, unfortunately, it's pretty expensive to even approximate the mutual information. Maybe, if you route to different neurons based on image captions, you could significantly decrease this cost.
And I migrated my comment.
If you're not already aware of the information bottleneck, I'd recommend The Information Bottleneck Method, Efficient Compression in Color Naming and its Evolution, and Direct Validation of the Information Bottleneck Principle for Deep Nets. You can use this with routing for forward training.
Maybe, there's an evolutionary advantage to thinking of yourself as distinct from the surrounding universe, that way your brain can simulate counterfactual worlds where you might take different actions. Will you actually take different actions? No, but thinking will make the one action you do take better. Since people are hardwired to think their observations are not necessarily interactions, updating in the other direction has significant surprisal.
I think physicists like to think of the universe through a "natural laws" perspective, where things should work the same whether or not they were there to look at them. So, it seems strange when things do work differently when they look at them.
The reason wave function collapse is so surprising, is because not collapsing seems to be the norm. In fact, the best gravimeters are made by interfering the wavefunctions of entire molecules (ref: atom interferometer). We only see "wave function collapse" in particular kinds of operations, which we then define as observations. So, it isn't surprising that we observe wave function collapse—that's how the word "observe" is defined. What is surprising is that collapse even occurs to be observed, when we know it is not how the universe usually operates.
and that's because I think you don't understand them either.
What am I supposed to do with this? The one effect this has is to piss me off and make me less interested in engaging with anything you've said.
Why is that the one effect? Jordan Peterson says that the one answer he routinely gives to Christians and atheists that piss them off is, "what do you mean by that?" In an interview with Alex O'Conner he says,
So people will say, well, do you believe that happened literally, historically? It's like, well, yes, I believe that it's okay. Okay. What do you mean by that? That you believe that exactly. Yeah. So, so you tell me you're there in the way that you describe it.
Right, right. What do you see? What are the fish doing exactly? And the answer is you don't know. You have no notion about it at all. You have no theory about it. Sure. You have no theory about it. So your belief is, what's your belief exactly?
(25:19–25:36, The Jordan B. Peterson Podcast - 451. Navigating Belief, Skepticism, and the Afterlife w/ Alex O'Connor)
Sure, this pisses off a lot of people, but it also gets some people thinking about what they actually mean. So, there's your answer: you're supposed to go back and figure out what you mean. A side benefit is if it pisses you off, maybe I won't see your writing anymore. I'm pretty annoyed at how the quality of posts has gone down on this website in the past few years.
But my view is that maths and computation are not the only symbols upon which constructive discussion can be built.
I find it useful to take an axiom of extensionality—if I cannot distinguish between two things in any way, I may as well consider them the same thing for all that it could affect me. Given maths/computation/logic is the process of asserting things are the same or different, it seems to me to be tautologically true that maths and computaiton are the only symbols upon which useful discussion can be built.
I'm not arguing against the claim that you could "define consciousness with a computation". I am arguing against the claim that "consciousness is computation". These are distinct claims.
Maybe you want to include some undefinable aspect to consciousness. But anytime it functions differently, you can use that to modify your definition. I don't think the adherents for computational functionalism, or even a computational universe, need to claim it encapsulates everything there could possibly be in the territory. Only that it encapsulates anything you can perceive in the territory.
There is an objective fact-of-the-matter whether a conscious experience is occurring, and what that experience is. It is not observer-dependent. It is not down to interpretation. It is an intrinsic property of a system.
I believe this is your definition of real consciousness? This tells me properties about consciousness, but doesn't really help me define consciousness. It's intrinsic and objective, but what is it? For example, if I told you that the Serpinski triangle is created by combining three copies of itself, I still don't know what it actually looks like. If I want to work with it, I need to know how the base case is defined. Once you have a definition, you've invented computational functionalism (for the Serpinski triangle, for consciousness, for the universe at large).
I think I have a sense of what's happening here. You don't consider an argument precise enough unless I define things in more mathematical terms.
Yes, exactly! To be precise, I don't consider an argument useful unless it is defined through a constructive logic (e.g. mathematics through ZF set theory).
If you actually want to know the answer: when you define the terms properly (i.e. KL-divergence from the firings that would have happened), the entire paradox goes away.
I'd be excited to actually see this counterargument. Is it written down anywhere that you can link to?
Note: this assumes computational functionalism.
I haven't seen it written down explicitly anywhere, but I've seen echoes of it here and there. Essentially, in RL, agents are defined via their policies. If you want to modify the agent to be good at a particular task, while still being pretty much the "same agent", you add a KL-divergence anchor term:
This is known as piKL and was used for Diplomacy, where it's important to act similarly to humans. When we think of consciousness or the mind, we can divide thoughts into two categories: the self-sustaining (memes/particles/holonomies), and noise (temperature). Temperature just makes things fuzzy, while memes will proscribe specific actions. On a broad scale, maybe they tell your body to take specific actions, like jumping in front of a trolley. Let's call these "macrostates". Since a lot of memes will produce the same macrostates, let's call them "microstates". When comparing two consciousnesses, we want to see how well the microstates match up.
The only way we can distinguish between microstates is by increasing the number of macrostates—maybe looking at neuron firings rather than body movements. So, using our axiom of reducibilityextensionality, to determine how "different" two things are, the best we can do is count the difference in the number of microstates filling each macrostate. Actually, we could scale the microstate counts and temperature by some constant factor and end up with the same distribution, so it's better to look at the difference in their logarithms. This is exactly the cross-entropy. The KL-divergence subtracts off the entropy of the anchor policy (the thing you're comparing to), but that's just a constant.
So, let's apply this to the paradox. Suppose my brain is slowly being replaced by silicon, and I'm worried about losing consciousness. I acknowledge there are impossible-to-determine properties that I could be losing; maybe the gods do not let cyborgs into heaven. However, that isn't useful to include in my definition of consciousness. All the useful properties can be observed, and I can measure how much they are changing with a KL-divergence.
When it comes to other people, I pretty much don't care if they're p-zombies, only how their actions effect me. So a very good definition for their consciousness is simply the equivalence class of programs that would produce the actions I see them taking. If they start acting radically different, I would expect this class to have changed, i.e. their consciousness is different. I've heard some people care about the substrate their program runs on. "It wouldn't be me if the program was run by a bunch of aliens waving yellow and blue flags around." I think that's fine. They've merely committed suicide in all the worlds their substrate didn't align with their preferences. They could similarly play the quantum lottery for a billion dollars, though this isn't a great way to ensure your program's proliferation.
In response to the two reactions:
- Why do you say, "Besides, most people actually take the opposite approch: computation is the most "real" thing out there, and the universe—and any consciouses therein—arise from it."
Euan McLean said at the top of his post he was assuming a materialist perspective. If you believe there exists "a map between the third-person properties of a physical system and whether or not it has phenomenal consciousness" you believe you can define consciousness with a computation. In fact, anytime you believe something can be explicitly defined and manipulated, you've invented a logic and computer. So, most people who take the materialist perspective believe the material world comes from a sort of "computational universe", e.g. Tegmark IV.
- Soldier mindset.
Here's a soldier mindset: you're wrong, and I'm much more confident on this than you are. This person's thinking is very loosey-goosey and someone needed to point it out. His posts are mostly fluff with paradoxes and questions that would be completely answerable (or at least interesting) if he deleted half the paragraphs and tried to pin down definitions before running rampant with them.
Also, I think I can point to specific things that you might consider soldier mindset. For example,
It's such a loose idea, which makes it harder to look at it critically. I don't really understand the point of this thought experiment, because if it wasn't phrased in such a mysterious manner, it wouldn't seem relevant to computational functionalism.
If you actually want to know the answer: when you define the terms properly (i.e. KL-divergence from the firings that would have happened), the entire paradox goes away. I wasn't giving him the answer, because his entire post is full of this same error: not defining his terms, running rampant with them, and then being shocked when things don't make sense.
I don't like this writing style. It feels like you are saying a lot of things, without trying to demarcate boundaries for what you actually mean, and I also don't see you criticizing your sentences before you put them down. For example, with these two paragraphs:
Surely there can’t be a single neuron replacement that turns you into a philosophical zombie? That would mean your consciousness was reliant on that single neuron, which seems implausible.
The other option is that your consciousness gradually fades over the course of the operations. But surely you would notice that your experience was gradually fading and report it? To not notice the fading would be a catastrophic failure of introspection.
If you're aware that there is a map and a territory, you should never be dealing with absolutes like, "a single neuron..." You're right that the only other option (I would say, the only option) is your consciousness gradually fades away, but what do you mean by that? It's such a loose idea, which makes it harder to look at it critically. I don't really understand the point of this thought experiment, because if it wasn't phrased in such a mysterious manner, it wouldn't seem relevant to computational functionalism.
I also don't understand a single one of your arguments against computational functionalism, and that's because I think you don't understand them either. For example,
In the theoretical CF post, I give a more abstract argument against the CF classifier. I argue that computation is fuzzy, it’s a property of our map of a system rather than the territory. In contrast, given my realist assumptions above, phenomenal consciousness is not a fuzzy property of a map, it is the territory. So consciousness cannot be computation.
You can't just claim that consciousness is "real" and computation is not, and thus they're distinct. You haven't even defined what "real" is. Besides, most people actually take the opposite approch: computation is the most "real" thing out there, and the universe—and any consciouses therein—arise from it. Finally, how is computation being fuzzy even related to this question? Consciousness can be the same way.
I did some more thinking, and realized particles are the irreps of the Poincaire group. I wrote up some more here, though this isn't complete yet:
https://www.lesswrong.com/posts/LpcEstrPpPkygzkqd/fractals-to-quasiparticles
Risk is a great study into why selfish egoism fails.
I took an ethics class at university, and mostly came to the opinion that morality was utilitarianism with an added deontological rule to not impose negative externalities on others. I.e. "Help others, but if you don't, at least don't hurt them." Both of these are tricky, because anytime you try to "sum over everyone" or have any sort of "universal rule" logic breaks down (due to Descartes' evil demon and Russell's vicious circle). Really, selfish egoism seemed to make more logical sense, but it doesn't have a pro-social bias, so it makes less sense to adopt when considering how to interact with or create a society.
The great thing about societies is we're almost always playing positive-sum games. After all, those that aren't don't last very long. Even if my ethics wasn't well-defined, the actions proscribed will usually be pretty good ones, so it's usually not useful to try to refine that definition. Plus, societies come with cultures that have evolved for thousands of years to bias people to act decently, often without needing to think how this relates to "ethics". For example, many religious rules seem mildly ridiculous nowadays, but thousands of years ago they didn't need to know why cooking a goatchild in its mother's milk was wrong, just to not do it.
Well, all of this breaks down when you're playing Risk. The scarcity of resources is very apparent to all the players, which limits the possibility for positive-sum games. Sure, you can help each other manoeuvre your stacks at the beginning of the game, or one-two slam the third and fourth players, but every time you cooperate with someone else, you're defecting against everyone else. This is probably why everyone hates turtles so much: they only cooperate with themselves, which means they're defecting against every other player.
I used to be more forgiving of mistakes or idiocracy. After all, everyone makes mistakes, and you can't expect people to take the correct actions if they don't know what they are! Shouldn't the intentions matter more? Now, I disagree. If you can't work with me, for whatever reason, I have to take you down.
One game in particular comes to mind. I had the North American position and signalled two or three times to the European and Africa+SA players to help me slam the Australian player. The Africa player had to go first, due to turn order and having 30 more troops; instead, they just sat and passed. The Australian player was obviously displeased about my intentions, and positioned their troops to take me out, so I broke SA and repositioned my troops there. What followed was a huge reshuffle (that the Africa player made take wayy longer due to their noobery), and eventually the European player died off. Then, again, I signal to the former Africa player to kill the Australian player, and again, they just sit and take a card. I couldn't work with them, because they were being stupid and selfish. 'And', because that kind of selfishness is rather stupid. Since I couldn't go first + second with them, I was forced to slam into them to guarantee second place. If they were smart about being selfish, they would have cooperated with me.
As that last sentence alludes to, selfish egoism seems to make a lot of sense for a moral understanding of Risk. Something I've noticed is almost all the Grandmasters that comment on the subreddit, or record on YouTube seem to have similar ideas:
- "Alliances" are for coordination, not allegiances.
- Why wouldn't you kill someone on twenty troops for five cards?
- It's fine to manipulate your opponents into killing each other, especially if they don't find out. For example, stacking next to a bot to get your ally's troops killed, or cardblocking the SA position when in Europe and allied with NA and Africa.
This makes the stupidity issue almost more of a crime than intentionally harming someone. If someone plays well and punishes my greed, I can respect that. They want winning chances, so if I give them winning chances, they'll work with me. But if I'm stupid, I might suicide my troops into them, ruining both of our games. Or, if someone gets their Asia position knocked out by Europe, I can understand them going through my NA/Africa bonus to get a new stack out. But, they're ruining both of our games if they just sit on Central America or North Africa. And, since I'm smart enough, I would break the Europe bonus in retaliation. If everyone were smart and knew everyone else was smart, the Europe player wouldn't knock out the SA player's Asia stack. People wouldn't greed for both Americas while I'm sitting in Africa. So on and so forth. Really, most of the "moral wrongs" we feel when playing Risk only occur because one of us isn't smart enough!
My view on ethics has shifted; maybe smart selfish egoism really is a decent ethics to live by. However, also evidenced by Risk, most people aren't smart enough to work with, and most that are took awhile to get there. I think utilitaranism/deontology works better because people don't need to think as hard to take good actions. Even if they aren't necessarily the best, they're far better than most people would come up with!
I wrote up my explanation as its own post here: https://www.lesswrong.com/posts/LpcEstrPpPkygzkqd/fractals-to-quasiparticles
I think you're looking for the irreducible representations of (edit: for 1D, ). I'll come back and explain this later, but it's going to take awhile to write up.
Utilitarianism is usually introduced as summing "equally" between people, but we all know some arrangements of atoms are more equal than others.
How do you choose to sum the utility when playing a Prisoner's Dilemma against a rock?
Is there a difference between utilitarianism and selfish egoism?
For utilitarianism, you need to choose a utility function. This is entirely based on your preferences: what you value, and who you value get weighed and summed to create your utility function. I don't see how this differs from selfish egoism: you decide what and who you value, and take actions that maximize these values.
Each doctrine comes with a little brainwashing. Utilitarianism is usually introduced as summing "equally" between people, but we all know some arrangements of atoms are more equal than others. However, introducing it this way naturally leads people to look for cooperation and value others more, both of which increase their chance of surviving.
Ayn Rand was rather reactionary against religion and its associated sacrificial behavior, so selfish egoism is often introduced as a reaction:
- When you die, everything is over for you. Therefore, your survival is paramount.
- You get nothing out of sacrificing your values. Therefore, you should only do things that benefit you.
Kant claimed people are good only by their strength of will. Wanting to help someone is a selfish action, and therefore not good. Rand takes the more individually rational approach: wanting to help someone makes you good, while helping someone against your interests is self-destructive. To be fair to Kant, when most agents are highly irrational your society will do better with universal laws than moral anarchy. This is also probably why selfish egoism gets a bad rapport: even if you are a selfish egoist, you want to influence your society to be more Kantian. Or, at the very least, like those utilitarians. They at least claim to value others.
However, I think rational utilitarians really are the same as rational selfish egoists. A rational selfish egoist would choose to look for cooperation. When they have fundamental disagreements with cooperative others, they would modify their values to care more about their counterpart so they both win. In the utilitarian bias, it's more difficult to realize when to change your utility function, while it's a little easier with selfish egoism. After all, the most important thing is survival, not utility.
I think both philosophies are slightly wrong. You shouldn't care about survival per se, but expected discounted future entropy (i.e. how well you proliferate). This will obviously drop to zero if you die, but having a fulfilling fifty years of experiences is probably more important than seventy years in a 2x2 box. Utility is merely a weight on your chances of survival, and thus future entropy. ClosedAI is close with their soft actor-critic, though they say it's entropy-regularized reinforcement learning. In reality, all reinforcement learning is maximizing energy-regularized entropy.
I think this is correct, but I would expect most low-level differences to be much less salient than a dog, and closer to 10^25 atoms dispersed slightly differently in the atmosphere. You will lose a tiny amount of weight for remembering the dog, but gain much more back for not running into it.
As it is difficult to sort through the inmates on execution day, an automatic gun is placed above each door with blanks or lead ammunition. The guard enters the cell numbers into a hashed database, before talking to the unlucky prisoner. He recently switched to the night shift, and his eyes droop as he shoots the ray.
When he wakes up, he sees "enter cell number" crossed off on the to-do list, but not "inform the prisoners". He must have fallen asleep on the job, and now he doesn't know which prisoner to inform! He figures he may as well offer all the prisoners the amnesia-ray.
"If you noticed a red light blinking above your door last night, it means today is your last day. I may have come to your cell to offer your Last rights, but it is a busy prison, so I may have skipped you over. If you would like your Last rights now, they are available."
Most prisoners breathed a sigh of relief. "I was stressing all night, thinking, what if I'm the one? Thank you for telling me about the red light, now I know it is not me." One out of every hundred of these lookalikes were less grateful. "You told me this six hours ago, and I haven't slept a wink. Did you have to remind me again?!"
There was another category of clones though, who all had the same response. "Oh no! I thought I was safe since nothing happened last night. But now, I know I could have just forgotten. Please shoot me again, I can't bear this."
I consider "me" to be a mapping from environments to actions, and weigh others by their KL-divergence from me.
You have to take into account your genesis. Being self-consistent will usually benefit an agent's proliferation, so looking at the worlds where you believe you are [Human] you will be weightier where your ancestors remember stuff, and thus you too. It's the same reason why bosons and fermions dominate our universe.
But suppose our universe is well-abstracting, and this specific dog didn't set off any butterfly effects. The consequences of its existence were "smoothed out", such that its existence vs. non-existence never left any major differences in your perceptions.
Unfortunately, this isn't possible. Iirc, chaos theory emerged when someone studying weather patterns noticed using more bits of precision gave them completely different results than fewer bits. A dog will change the weather dramatically, which will substantially effect your perceptions.
I claim that there's just always a distribution over meanings, and it can be sharp or fuzzy or bimodal or any sort of shape.
The issue is you cannot prove this. If you're considering any possible meaning, you will run into recursive meanings (e.g. "he wrote this") which are non-terminating. So, the truthfulness of any sentence, including your claim here is not defined.
You might try limiting the number of steps in your interpretation: only meanings that terminate to a probability within steps count; however, you still have to define or believe in the machine that runs your programs.
Now, I'm generally of the opinion that this is fine. Your brain is one such machine, and being able to assign probabilities is useful for letting your brain (and its associated genes) proliferate into the future. In fact, science is really just picking more refined machines to help us predict the future better. However, keep in mind that (1) this eventually boils down to "trust, don't verify", and (2) you've committed suicide in a number of worlds that don't operate in the way you've limited yourself. I recently had an argument with a Buddhist whose point was essentially, "that's the vast majority of worlds, so stop limiting yourself to logic and reason!"
Natural languages, by contrast, can refer to vague concepts which don’t have clear, fixed boundaries
I disagree. I think it's merely the space is so large that it's hard to pin down where the boundary is. However, language does define natural boundaries (that are slightly different for each person and language, and shift over time). E.g., see "Efficient compression in color naming and its evolution" by Zaslavsky et al.
"The boundary of a boundary is zero"
I think this is mostly arbitrary.
So, in the 20th century Russel's paradox came along and forced mathematicians into creating constructive theories. For example, in ZFC set theory, you begin with the empty set {}, and build out all sets with a tower of lower-level sets. Maybe the natural numbers become {}, {{}}, {{{}}}, etc. Using different axioms you might get a type theory; in fact, any programming language is basically a formal logic. The basic building blocks like the empty set, or the builtin types are called atoms.
In algebraic geometry, the atom is a simplex—lines in one dimension, triangles in two dimensions, tetrahedrons in three dimensions, and so on. I think they generally use an axiom of infinity, so each simplex is infinitely small (convenient when you have smooth curves like circles), but they need to be defined at the lowest level. This includes how you define simplices from lower-dimensional simplices! And this is where the boundary comes in.
Say you have a triangle (2-simplex) [A, B, C]. Naively, we could define it's boundary as the sum of its edges:
However, if we stuck two of them together, the shared edge [A, C] wouldn't disappear from the boundary:
This is why they usually alternate sign, so
Then, since
you could also write it like
It's essentially a directed loop around the triangle (the analogy breaks when you try higher dimensions, unfortunately). Now, the famous quote "the boundary of a boundary is zero" is relatively trivial to prove. Let's remove just the two indices $A_i, A_j$ from the simplex $[A_1, A_2, \dots, A_i, \dots, A_j, \dots, A_n]$. If we remove $A_i$ first, we'd get
while removing $A_j$ first gives
The first is $-1$ times the second, so everything will zero out. However, it's only zero because we decided edges should cancel out along shared boundaries. We can choose a different system where they add together, which leads to the permanent as a measure of volume instead of the determinant. Or, one that uses a much more complex relationship (re: immanent).
I'm certainly not an expert here, but it seems like fermions (e.g. electrons) exchange via the determinant, bosons (e.g. mass/gravity) use the permanent, and more exotic particles (e.g. anyons) use the immanent. So, when people base their speculations on the "boundary of a boundary" being a fundamental piece of reality, it bothers me.
That assumes the law of non-contradiction. I could hold the belief that everything will happen in the future, and my prediction will be right every time. Alternatively, I can adjust my memory of a prediction to be exactly what I experience now.
Also, predicting the future only seems useful insofar as it lets the belief propagate better. The more rational and patient the hosts are, the more useful this skill becomes. But, if you're thrown into a short-run game (say ~80yrs) that's already at an evolutionary equilibrium, combining this skill with the law of non-contradiction (i.e. only holding consistent beliefs) may get you killed.
Within a system with competition, why would the most TRUE thing win? No, the most effective thing wins.
Why do you assume truth even exists? To me, it seems like there are just different beliefs that are more or less effective at proliferation. For example, in 1300s Venice, any belief except Catholicism would destroy its hosts pretty quickly. Today, the same beliefs would get shouted down by scientific communities and legislated out of schools.
- Is this apparent parity due to a mass exodus of employees from OpenAI, Anthropic, and Google to other companies, resulting in the diffusion of "secret sauce" ideas across the industry?
No. There isn't much "secret sauce", and these companies never had a large amount of AI talent to begin with. Their advantage is being in a position with hype/reputation/size to get to market faster. It takes several months to setup the infrastructure (getting money, data, and compute clusters), but that's really the only hurdle.
- Does this parity exist because other companies are simply piggybacking on Meta's open-source AI model, which was made possible by Meta's massive compute resources? Now, by fine-tuning this model, can other companies quickly create models comparable to the best?
No. "Everyone" in the AI research community knew how to build Llama, multi-modal models, or video diffusion models a year before they came out. They just didn't have $10M to throw around.
Also, fine-tuning isn't really the way to go. I can imagine people using it as a teacher during the warming up phase, but the coding infrastructure doesn't really exist to fine-tune or integrate another model as part of a larger one. It's usually easier to just spend the extra time securing money and training.
- Is it plausible that once LLMs were validated and the core idea spread, it became surprisingly simple to build, allowing any company to quickly reach the frontier?
Yep. Even five years ago you could open a Colab notebook and train a language translation model in a couple of minutes.
- Are AI image generators just really simple to develop but lack substantial economic reward, leading large companies to invest minimal resources into them?
No, images are much harder than language. With language models, you can exactly model the output distribution, while the space of images is continuous and much too large for that. Instead, the best models measure the probability flow (e.g. diffusion/normalizing flows/flow-matching), and follow it towards high-probability images. However, parts of images should be discrete. You know humans have five fingers, or text has words in it, but flows assume your probabilities are continuous.
Imagine you have a distribution that looks like
__|_|_|__
A flow will round out those spikes into something closer to
_/^\/^\/^\__
which is why gibberish text or four-and-a-half fingers appear. In video models, this leads to dogs spawning and disappearing into the pack.
- Could it be that legal challenges in building AI are so significant that big companies are hesitant to fully invest, making it appear as if smaller companies are outperforming them?
Partly when it comes to image/video models, but this isn't a huge factor.
- And finally, why is OpenAI so valuable if it’s apparently so easy for other companies to build comparable tech? Conversely, why are these no name companies making leading LLMs not valued higher?
I think it's because AI is a winner-takes-all competition. It's extremely easy for customers to switch, so they all go to the best model. Since ClosedAI already has funding, compute, and infrastructure, it's risky to compete against them unless you have a new kind of model (e.g. LiquidAI), reputation (e.g. Anthropic), or are a billionaire's pet project (e.g. xAI).
Religious freedoms are a subsidy to keep the temperature low. There's the myth that societies will slowly but surely get better, kind of like a gradient descent. If we increase the temperature too high, an entropic force would push us out of a narrow valley, so society could become much worse (e.g. nobody wants the Spanish Inquisition). It's entirely possible that the stable equilibrium we're being attracted to will still have religion.
Can't you choose an arbitrary encoding procedure? Choosing a different one only adds a constant number of bits. Also, my comment on discounted entropy was a little too flippant. What I mean is closer to entropy rate with a discount factor, like in soft-actor critic. Maximizing your ability to have options in the future requires a lot of "agency".
Maybe consciousness should be more than just agency, e.g. if a chess bot were trained to maximize entropy, games wouldn't be as strategic as if it wants to get a high*-energy payoff. However, I'm not convinced energy even exists? Humans learn strategy because their genes are more likely to survive, thrive, and have choices in the future when they win. You could even say elementary particles are just the ones still around since the Big Bang.
*Note: The physicists should reverse the sign on energy. While they're at it, switch to inverse-temperature.
Consider all programs encoding isomorphisms from a rock to something else (e.g. my brain, or your brain). If the program takes bits to encode, we add times the other entity to the rock (times some partition number so all the weights add up to one). Since some programs are automorphisms, we repeatedly do this until convergence.
The rock will now possess a tiny bit of consciousness, or really any other property. However, where do we get the original "sources" of consciousness? If you're a solipsist, you might say, "I am the source of consciousness." I think a better definition is your discounted entropy.
An isomorphism isn't enough. Stealing from Robert (Lastname?), you could make an isomorphism from a rock to your brain, but you likely wouldn't consider it "conscious". You have to factor out the Kolmogorov complexity of the isomorphism.
Would insider trading work out if everyone knew who was asking to trade with them ahead of time?
It could be a case of a backward-bending curve. Fewer children make the economy worse, so more people choose to work rather than have children.
The computer vision researchers just chose the wrong standard. Even the images they train on come in [pixel_position, color_channels] format.
Age limits do exist: you have to be at least 35 to run for President, at least 30 for Senator, and 25 for Representative. This automatically adds a decade or two to your candidates.
In earlier times, I spent an incredible amount of my mental capacity trying to accurately model those around me. I can count on zero hands the number of people that reciprocated. Even just treating me as real as I treated them would fit on one hand. On the other hand, nearly everyone I talk to does not have "me" as even a possibility in their model.
It just takes a very long time in practice, see "Basins of Attraction" by Ellison.
I've been thinking about something similar, and might write a longer post about it. However, the solution to both is to anneal on your beliefs. Rather than looking at the direct probabilities, look at the logits. You can then raise the temperature, let the population kind of randomize their beliefs, and cool it back down.
See "Solving Multiobjective Game in Multiconflict Situation Based on Adaptive Differential Evolution Algorithm with Simulated Annealing" by Li et. al.
Perhaps "fit", from the Latin fio (come about) + English fit (fit). An object must fit, survive, and spread.
To see how much the minimal point contributes to the integral we can integrate it in its vicinity
I think you should be looking at the entire stable island, not just integrating from zero to one. I expect you could get a decent approximation with Lie transform perturbation theory, and this looks similar to the idea of macro-states in condensed matter physics, but I'm not knowledgeable in these areas.
−N∑i=1logp(yi|xi,w)
You have a typo, the equation after Free Energy should start with
Also the third line should be , not minus.
Also, usually people use for model parameters (rather than ). I don't know the etymology, but game theorists use the same letter (for "types" = models of players).
Also sometimes when I explain what a hyperphone is well enough for the other person to get it, and then we have a complex conversation, they agree that it would be good. But very small N, like 3 to 5.
It's difficult to understand your writing, and I feel like you could improve in general at communication based on this quote. The concept of a hyperphone isn't that complex---the ability to branch in conversations---so the modifiers "well enough", "complex", and "very small N" make me believe it's only complex because you're unclear.
For example, the blog post you linked to is titled "Hyperphone", yet you never define a hyperphone. I can infer from the section on streaming what you imagine, but that's the second-to-last section!
There's the automorphism
which turns a switchy distribution into a sticky one, and vice versa. The two have to be symmetric, so your conclusion cannot be correct.
This means the likelihood distribution over data generated by Steady is closer to the distribution generated by Switchy than to the distribution generated by Sticky.
Their KL divergences are exactly the same. Suppose Baylee's observations are . Let be the probability if there's a chance of switching, and similar for . By the chain rule,
In particular, when either or is equal to one half, this divergence is symmetric for the other variable.
The problem with etching specific models is scale. It costs around $1M to design a custom chip mask, so it needs to be amortized over tens or hundreds of thousands of chips to become profitable. But no companies need that many.
Assume a model takes 3e9 flops to infer the next token, and these chips run as fast as H100s, i.e. 3e15 flops/s. A single chip can infer 1e6 tokens/s. If you have 10M active users, then 100 chips can provide each user a token every 10ms, around 600wpm.
Even OpenAI would only need hundreds, maybe thousands of chips. The solution is smaller-scale chip production. There are startups working on electron beam lithography, but I'm unaware of a retailer Etched could buy from right now.
EDIT: 3 trillion flops/token (similar to GPT-4) is 3e12, so that would be 100,000 chips. The scale is actually there.
so
It should be .
如果需要更長的時間來理解,那麼效率就很低。