Posts
Comments
"he sounds convincing when he talks about X."
I am so disappointed every time I see people using the persuasiveness filter. Persuasiveness is not completely orthogonal to correctness, but it is definitely linearly independent from it.
You discuss "compromised agents" of the FBI as if they're going to be lone, investigator-level agents. If there was going to be any FBI/CIA/whatever cover-up, the version of that which I would expect, is that Epstein would've had incriminating information on senior FBI/CIA personnel, or politicians. Incriminating information could just be that the FBI/CIA knew that Epstein was raping underage girls for 20 years, and didn't stop him, or even protected him. In all your explanations of how impossible a crime Epstein's murder would be to pull off, a thing that makes it seem more plausible to me is if the initial conspirator isn't just someone trying to wave money around, but is someone with authority.
Go ahead and mock, but this is what I thought the default assumption was whenever someone said "Epstein didn't kill himself" or "John McAfee didn't kill himself". I never assumed it would just be one or two lone, corrupt agents.
Now that I've had 5 months to let this idea stew, when I read your comment again just now, I think I understand it completely? After getting comfortable using "demons" to refer to patterns of thought or behavior which proliferate in ways not completely unlike some patterns of matter, this comment now makes a lot more sense than it used to.
- Yeah. I wanted to assume they were being forced to give an opinion, so that "what topics a person is or isn't likely to bring up" wasn't a confounding variable. Your point here suggests that a conspirator's response might be more like "I don't think about them", or some kind of null opinion.
- This sort of gets to the core of what I was wondering about, but am not sure how to solve: how lies will tend to pervert Bayesian inference. "Simulacra levels" may be relevant here. I would think that a highly competent conspirator would want to only give you information that would reduce your prediction of a conspiracy existing, but this seems sort of recursive, in that anything that would reduce your prediction of a conspiracy would have increased likelihood of being said by a conspirator. Would the effect of lies by bad-faith actors, who know your priors, be that certain statements just don't update your priors, because uncertainty makes them not actually add any new information? I don't know what limit this reduces to, and I don't yet know what math I would need to solve it.
- Naturally. I think "backpropagation" might be related to certain observations affecting multiple hypotheses? But I haven't brushed up on that in a while.
Thank you, it does help! I know some people who revel in conspiracy theories, and some who believe conspiracies are so unlikely, that they dismiss any possibility of a conspiracy out of hand. I get left in the middle with the feeling that some situations "don't smell right", without having a provable, quantifiable excuse for why I feel that way.
without using "will"
Oh come on. Alright, but if your answer mentions future or past states, or references time at all, I'm dinging you points. Imaginary points, not karma points obviously.
So let's talk about this word, "could". Can you play Rationalist's Taboo against it?
Testing myself before I read further. World states which "could" happen are the set of world states which are not ruled impossible by our limited knowledge. Is "impossible" still too load-bearing here? Fine, let's get more concrete.
In a finite-size game of Conway's Life, each board state has exactly one following board state, which itself has only one following board state, and so on. This sequence of board states is a board's future. Each board state does not correspond to only a single previous board state, but rather a set of previous board states. If we only know the current board state, then we do not know the previous board state, but we know the set that contains the previous board state. We call this set the board states that could have been the previous board state. From the inverse of this, we know the set that does not contain the previous board state, which we call the boards which could not have been the previous board.
Going back up to our universe, what "could happen" is a set of things which our heuristics tell us contains one or more things which will happen. What "can't happen" is a set of things which our heuristics tell us does not contain a thing that will happen.
A thing which "could have happened" is thus a thing which was in a set which our heuristics told us contained a thing that will (or did) happen.
If I say "No, that couldn't happen", I am saying that your heuristics are too permissive, i.e. your "could" set contains elements which my heuristics exclude.
I think that got the maybe-ness out, or at least replaced it with set logic. The other key point is the limited information preventing us from cutting the "could" set down to one unique element. I expect Eliezer to have something completely different.
So then this initial probability estimate, 0.5, is not repeat not a "prior".
1:1 odds seems like it would be a default null prior, especially because one round of Bayes' Rule updates it immediately to whatever your first likelihood ratio is, kind of like the other mathematical identities. If your priors represent "all the information you already know", then it seems like you (or someone) must have gotten there through a series of Bayesian inferences, but that series would have to start somewhere, right? If (in the real universe, not the ball & urn universe) priors aren't determined by some chain of Bayesian inference, but instead by some degree of educated guesses / intuition / dead reckoning, wouldn't that make the whole process subject to a "garbage in, garbage out" fallacy(?).
For a use case: A, low internal resolution rounded my posterior probability to 0 or 1, and now new evidence is not updating my estimations anymore, or B, I think some garbage crawled into my priors, but I'm not sure where. In either case, I want to take my observations, and rebuild my chain of inferences from the ground up, to figure out where I should be. So... where is the ground? If 1:1 odds is not the null prior, not the Bayesian Identity, then what is?
Evil is a pattern of of behavior exhibited by agents. In embedded agents, that pattern is absolutely represented by material. As for what that pattern is, evil agents harm others for their own gain. That seems to be the core of "evilness" in possibility space. Whenever I try to think of the most evil actions I can, they tend to correlate with harming others (especially one's equals, or one's inner circle, who would expect mutual cooperation), for one's own gain. Hamlet's uncle. Domestic abusers. Executives who ruin lives for profit. Politicians who hand out public money in exchange for bribes. Bullies who torment other children for fun. It's a learnable script, which says "I can gain at others expense", whether that gain is power, control, money, or just pleasure.
If your philosopher thinks "evil" is immaterial, does he also think "epistemology" is immaterial?
(I apologize if this sounds argumentative, I've just heard "good and evil are social constructs" far too many times.)
Not really. "Collapse" is not the only failure case. Mass starvation is a clear failure state of a planned economy, but it doesn't necessarily burn through the nation's stock of proletariat laborers immediately. In the same way that a person with a terminal illness can take a long time to die, a nation with failing systems can take a long time to reach the point where it ceases functioning at all.
How do lies affect Bayesian Inference?
(Relative likelihood notation is easier, so we will use that)
I heard a thing. Well, I more heard a thing about another thing. Before I heard about it, I didn't know one way or the other at all. My prior was the Bayesian null prior of 1:1. Let's say the thing I heard is "Conspiracy thinking is bad for my epistemology". Let's pretend it was relevant at the time, and didn't just come up out of nowhere. What is the chance that someone would hold this opinion, given that they are not part of any conspiracy against me? Maybe 50%? If I heard it in a Rationality influenced space, probably more like 80%? Now, what is the chance that someone would share this as their opinion, given that they are involved in a conspiracy against me? Somewhere between 95% and 100%, so let's say 99%? Now, our prior is 1:1, and our likelihood ratio is 80:99, so our final prediction, of someone not being a conspirator vs being a conspirator, is 80:99, or 1:1.24. Therefore, my expected probability of someone not being a conspirator went from 50%, down to 45%. Huh.
For the love of all that is good, please shoot holes in this and tell me I screwed up somewhere.
weight-loss-that-is-definitely-not-changes-in-water-retention comes in chunks
Source for my answer: for the last 10 months, I have fasted regularly, with various fasts from 16 hours to 7 days, with & without vitamins, including water fasts, electrolyte fasts, and dry fasts. During this time, I have weighed myself multiple times per day. [I have lost >100 lbs doing this, but that's not important right now.]
How hydrated you are at any given time is a confounding variable whenever you weigh yourself. My body can hold plus or minus nearly a gallon of water without me feeling a difference. If I weigh myself while less hydrated one day, and then while more hydrated the next day, that hydration difference will completely overshadow any weight loss. When I was dry fasting there was no increase in hydration, so weight loss became extremely regular. Eating more carbohydrates also tended make me retain more water for a few days, and the amount of food in my gut at any one time were some other confounding variables.
Started reading, want to get initial thoughts down before they escape me. Will return when I am done
Representation: An agent, or an agent's behaviour, is a script. I don't know if that's helpful or meaningful. Interfaces: Real, in-universe agents have hardware on which they operate. I'd say they have "sensors and actuators", but that's tautologous to "inputs and outputs". Embedding: In biological systems, the script is encoded directly in the structure of the wetware. The hardware / software dichotomy has more separation, but I think I'm probably misunderstanding this.
Monopolies on the Use of Force
[Epistemic status & effort: exploring a question over an hour or so, and constrained to only use information I already know. This is a problem solving exercise, not a research paper. Originally written just for me; minor clarification added later.]
Is the use of force a unique industry, where a single monolithic [business] entity is the most stable state, the equilibrium point? From a business perspective, an entity selling the use of force might be thought of as in a "risk management" or "contract enforcement" industry. It might use an insurance-like business model, or function more like a contractor for large projects.
In a monopoly on the use of force, the one monopolizing entity can spend all of its time deciding what to do, and then relatively little time & energy actually exerting that force, because resistance to its force is minimal, since there are no similarly sized entities to oppose it. The cost of using force is slightly increased if the default level of resistance [i.e. how much force is available to someone who has not hired their own use-of-force business entity] is increased. Can the default level of opposition be lowered by a monolithic entity? Yes [e.g. limiting private ownership of weapons & armor].
In a diverse [non-monopoly] environment, an entity selling the use of force could easily find a similar sized entity opposing it. Opposed entities can fight immediately like hawks, or can negotiate like doves, but: the costs of conflict will be underestimated (somehow they always are); "shooting first" (per the dark forest analogy) is a powerful & possibly dominant strategy; and hawkish entities impose a cost on the entire industry.
Does this incentivize dovish entities to cooperate to eliminate hawkish entities? If they are able to coordinate, and if there are enough of them to bear the cost & survive, then probably. If they do this, then they in-effect form a single monolithic cooperating entity. If they cannot, then dovish entities may become meaningless, as hawkish entities tear them and each other apart until only one remains (Highlander rules?). Our original question might then depend on the following one:
"Do dovish entities in the market of force necessarily either become de facto monopolies or perish?"
How do we go about both answering and supporting this question? Via historical example and / or counterexample?
What does the example look like? Entities in the same ecosystem either destroy each other, or cooperate to a degree where they cease acting like separate entities.
What does the counterexample look like? Entities in the same ecosystem remain separate, only cooperate to serve shared goals, and do come into conflict, but do not escalate that conflict into deadly force.
[Sidenote: what is this "degree of cooperation"? Shared goal: I want to accomplish X, and you also want to accomplish X, so we cooperate to make accomplishment of X more likely. "To cease acting separately": I want to accomplish X; you do not care about X; however, you will still help me with X, and bear costs to do so, because you value our relationship, and have the expectation that I may help you accomplish an unknown Y in the future, even though I don't care about Y.]
Possible examples to investigate (at least the ones which come quickly to mind): Danish conquests in medieval UK. The world wars and cold war. Modern geopolitics.
Possible counterexamples to investigate: Pre-nationalized medieval European towns & cities, of the kind mentioned in Seeing Like a State? Urban gang environments, especially in areas with minimal law enforcement? Somalia, or other areas described as "failed states"? Modern geopolitics. Standoffs between federal authorities and militia groups in the modern & historical USA, such as the standoff at Bundy Ranch.
Nope. Nopenopenope. This trips so many cult flags. This does not feel like rationality, this feels like pseudoreligion, like cult entrance activities, with rationality window dressing.
Maybe it's just because of the "you have to experience it for yourself" theme of this article, but right now this gets a hard nope from me, like psychedelics & speaking in tongues.
Nope.
I don't expect AI researchers to achieve AGI before they find one or more horrible uses for non-general AI tools, which may divert resources, or change priorities, or do something else which prevents true AGI from ever being developed.
That's what I use this place for, an audience for rough drafts or mere buddings of an idea. (Crippling) Executive dysfunction sounds like it may be a primary thing to explore & figure out, but it also sounds like the sort of thing that surrounds itself with an Ugh Field very quickly. Good luck!
AGI Alignment, or How Do We Stop Our Algorithms From Getting Possessed by Demons?
[Epistemic Status: Absurd silliness, which may or may not contain hidden truths]
[Epistemic Effort: Idly exploring idea space, laying down some map so I stop circling back over the same territory]
From going over Zvi's sequence on Moloch, what the "demon" Moloch really is, is a pattern (or patterns) of thought and behaviour which destroys human value in certain ways. That is an interesting way of labeling things. We know patterns of thought, and we know humans can learn them through their experiences, or by being told them, or reading them somewhere, or any other way that humans can learn things. If a human learns a pattern of thought which destroys value in some way, and that pattern gets reinforced, or in some other way comes to be the primary pattern of that human's thought or behaviour (e.g. it becomes the most significant factor in their utility function), is that in any way functionally different from "becoming possessed by demons"?
That's a weird paradigm. It gives the traditionally fantastical term "demon" a real definition as a real thing that really exists, and it also separates algorithms from the agents that execute them.
A few weird implications: When we're trying to "align an AGI", are we looking for an agent which cannot even theoretically become possessed by demons? Because that seems like it might require an agent which cannot be altered, or at least cannot alter itself. But in order to learn in a meaningful way, an intelligent agent has to be able to alter itself. (Yeah, I should prove these statements, but I'm not gonna. See: Epistemic Effort) So then instead of an immutable agent, are we looking for positive patterns of thought, which resist being possessed by demons?
[I doubt that this is in any way useful to anyone, but it was fun for me. It will disappear into the archives soon enough]
When Grothendieck was learning math, he was playing Dark Souls.
I think you may have waded into the trees here, before taking stock of the forest. By which I mean that this problem could definitely use some formalization, and may be much more important than we expect it to be. I've been studying the ancient Mongols recently; their conquests tested this, and their empire had the potential to be a locked-in dystopia type of apocalypse, but they failed to create a stable internal organizational structure. Thus, a culture that optimizes for both conquest & control, at the expense of everything else, could be an existential threat to the future of humanity. I think that merits rationalist research.
So, in a superfight of Rationalists vs Barbarians, with equal resources, we can steelman the opposition by substituting in that "Barbarians" means a society 100% optimized toward conquering others & taking all their stuff, at the expense of not being optimized toward anything else. In this case, what would the "Rationalist" society be, and how could they win? Or at least not lose?
I am going to go check if there are other articles which have developed this thought experiment already.
Whom/what an agent is willing to do Evil to, vs whom/what it would prefer to do Good to, sort of defines an in-group/out-group divide, in a similar way to how the decision to cooperate or defect does in the Prisoner's Dilemma. Hmmm...
you enjoy peacefully reading a book by yourself, and other people hate this because they hate you and they hate it when you enjoy yourself
The problem with making hypothetical examples, is when you make them so unreal as to just be moving words around. Playing music/sound/whatever loud enough to be noise pollution would be similar to the first example. Less severe, but similar. Spreading manure on your lawn so that your entire neighborhood stinks would also be less severe, but similar. But if you're going to say "reading" and then have hypothetical people not react to reading in the way that actual people actually do, then your hypothetical example isn't going to be meaningful.
As for requiring consciousness, that's why I was judging actions, not the agents themselves. Agents tend to do both, to some degree.
The first paragraph is equivalent to saying that "all good & evil is socially constructed because we live in a society", and I don't want to call someone wrong, so let me try to explain...
An accurate model of Good & Evil will hold true, valid, and meaningful among any population of agents: human, animal, artificial, or otherwise. It is not at all depentent on existing in our current, modern society. Populations that do significant amounts of Good amongst each other generally thrive & are resilient (e.g. humans, ants, rats, wolves, cells in any body, many others), even though some individuals may fail or die horribly. Populations which do significant amounts of Evil tend to be less resilient, or destroy themselves (e.g. high crime areas, cancer cells), even though certain members of those populations may be wildly successful, at least temporarily.
This isn't even a human-centric model, so it's not "constructed by society". It seems to me more likely to be a model that societies have to conform to, in order to exist in a form that is recognizeable as a society.
I apologize for being flippant, and thank you for replying, as having to overcome challenges to this helps me figure it out more!
For self-defense, that's still a feature, and not a bug. It's generally seen as more evil to do more harm when defending yourself, and in law, defending youself with lethal force is "justifyable homicide", it's specifically called out as something much like an "acceptable evil". Would it be more or less evil to cause an attacker to change their ways without harming them? Would it be more or less evil to torture an attacker before killing them?
"...by not doing all the Good..." In the model, it's actually really intentional that "a lack of Good" is not a part of the definition of Evil, because it really isn't the same thing. There are idiosyncracies in this model which I have not found all of yet. Thank you for pointing them out!
I do agree for the most part. Robotic warfare which can efficiently destroy your opponent's materiel, without directly risking your own materiel & personnel is an extremely dominant strategy, and will probably become the future of warfare. At least warfare like this, as opposed to police actions.
I kinda wonder if this is what happened with Eliezer Yudkowsky, especially after he wrote Harry Potter and the Methods of Rationality?
From things I have previously heard about drones, I would be uncertain what training is required to operate them, and what limitations there are for weather in which they can & cannot fly. I know that being unable to fly in anything other than near-perfect weather conditions has been a problem of drones in the past, and those same limitations do not apply to ground-based vehicles.
Here's an analysis by Dr. Robert Malone about the Ukraine biolabs, which I found enlightening:
I glean that "biolab" is actually an extremely vague term, and doesn't specify the facility's exact capabilities at all. They could very well have had an innocuous purpose, but Russia would've had to treat them as a potential threat to national security, in the same way that Russian or Chinese "biolabs" in Mexico might sound bad to the US, except Russia is even more paranoid.
The Definition of Good and Evil
Epistemic Status: I feel like I stumbled over this; it has passed a few filters for correctness; I have not rigorously explored it, and I cannot adequately defend it, but I think that is more my own failing than the failure of the idea.
I have heard said that "Good and Evil are Social Constructs", or "Who's really to say?", or "Morality is relative". I do not like those at all, and I think they are completely wrong. Since then, I either found, developed, or came across (I don't remember how I got this) a model of Good and Evil, which has so far seemed accurate in every situation I have applied it to. I don't think I've seen this model written explicitly anywhere, but I have seen people quibble about the meaning of Good & Evil in many places, so whether this turns out to be useful, or laughably naïve, or utterly obvious to everyone but me, I'd rather not keep it to myself anymore.
The purpose of this, I guess, is that when the map has become so smudged and smeared, and some people question whether it ever corresponded to the territory at all, to now figure out what part of the territory this part of the map was supposed to refer to. I will assume that we have all seen or heard examples of things which are Good, things which are Evil, things which are neither, and things which are somewhere in between. An accurate description of Good & Evil should accurately match those experiences a vast majority (all?) of the time.
It seems to me, that among the clusters of things in possibility space, the core of Good is "to help others at one's own expense" while the core of Evil is "to harm others for one's own benefit".
In my limited attempts at verifying this, the Goodness or Evilness of an action or situation has so far seemed to correlate with the presence, absence, and intensity of these versions of Good & Evil. Situations where one does great harm to others for one's own gain seem clearly evil, like executing political opposition. Situations where one helps others at a cost to oneself seem clearly good, like carrying people out of a burning building. Situations where no harm nor help is done, and no benefit is gained nor cost expended, seem neither Good nor Evil, such as a rock sitting in the sun, doing nothing. Situations where both harm is done & help is given, and where both a cost is expended and a benefit is gained, seem both Good and Evil, or somewhere in between, such as rescuing an unconscious person from a burning building, and then taking their wallet.
The correctness of this explanation depends on whether it matches others' judgements of specific instances of Good or Evil, so I can't really prove its correctness from my armchair. The only counterexamples I have seen so far involved significant amounts of motivated reasoning (someone who was certain that theft wasn't wrong when they did it).
I'm sure there are many things wrong with this, but I can't expect to become better at rationality if I'm not willing to be crappy at it first.
With priors of 1 or 0, Bayes rule stops working permanently. If something is running on real hardware, then it has a limit on its numeric precision. On a system that was never designed to make precise mathematical calculations, one where 8/10 doesn't feel significantly different from 9/10, or one where "90% chance" feels like "basically guaranteed", the level of numeric precision may be exceedingly low, such that it doesn't even take a lot for a level of certainty to be either rounded up to one or rounded down to 0.
As always, thanks for the post!
Well, it has helped me understand & overcome some of the specific ways that akrasia affects me, and it has also helped me understand how my own mind works, so I can alter and (hopefully) optimize it.
What do you disagree about?
I don't know. Possibly something, probably nothing.
the essence of [addition as addition itself]...
The "essence of cognition" isn't really available for us to study directly (so far as I know), except as a part of more complex processes. Finding many varied examples may help determine what is the "essence" versus what is just extraneous detail.
While intelligent agency in humans is definitely more interesting than in amoebas, knowing exactly why amoebas aren't intelligent agents would tell you one detail about why humans are, and may thus tell you a trait that a hypothetical AGI would need to have.
I'm glad you liked my elevator example!
I didn't pick it up from any reputable sources. The white paper on military theory that created the term was written many years ago, and since then I've only seen that explanation tossed around informally in various places, not investigated with serious rigor. OODA loops seem to be seldom discussed on this site, which I find kinda weird, but a good full explanation of them can be found here: Training Regime Day 20: OODA Loop
I tried to figure out on my own whether executing an OODA loop was necessary & sufficient condition for something to be an intelligent agent, (part of an effort to determine what the smallest & simplest thing which could still be considered true AGI might be) and I found that while executing OODA loops seems necessary for something to have meaningful agency, doing so is not sufficient for something to be an intelligent agent.
Thank you for your interest, though! I wish I could just reply with a link, but I don't think the paper I would link to has been written yet.
An explanation that I've seen before of "where agency begins" is when an entity executes OODA loops. I don't know if OODA loops are a completely accurate map to reality, but they've been a useful model so far. If someone were going to explore "where agency begins" OODA loops might be a good starting point.
I feel like an article about "what agency is" must've already been written here, but I don't remember it. In any case, that article on agency in Conway's Life sounds like my next stop, thank you for linking it!
"You can't understand digital addition without understanding Mesopotamian clay token accounting"
That's sort of exactly correct? If you fully understand digital addition, then there's going to be something at the core of clay token accounting that you already understand. Complex systems tend to be built on the same concepts as simpler systems that do the same thing. If you fully understand an elevator, then there's no way that ropes & pulleys can still be a mystery to you, right? And to my knowledge, studying ropes & pulleys is a step in how we got to elevators, so it would make sense to me that going "back to basics", i.e. simpler real models, could help us make something we're still trying to build.
Even if I disagree with you, thank you for posing the example!
I've wondered this a lot too. There is a lot of focus on and discussion about "superintelligent" AGI here, or even human-level AGI, but I wonder what about "stupid" AGI? When superintelligent AGI is still out of reach, is there not something still to be learned from a hypothetical AGI with the intelligence level of, say, a crow?
[comment removed by author]
Archaeologist here, I'll be taking this comment as permission!
Okay, this may be one of the most important articles I've read here. I already knew about OODA loops and how important they are, but putting names to the different failure modes, which I have seen and experienced thousands of times, gives me the handles with which to grapple them.
The main thing I want to say is thank you, I'm glad I didn't have to write something like this myself, because I do not know if it would have been nearly as clear & concise or nearly as good!
That Monte Carlo Method sounds a lot like dreaming.
Oof, be wary of Tim Ferriss, for he is a giant phony. I bought one of his books once, and nearly every single piece of advice in it was a bad generalization from a single study, and all of it was either already well known outside of the book, or ineffective, or just plain wrong. I have had great luck by immediately downgrading the trustworthiness of anything that mentions him, and especially anything that treats him as an authority. I have found the same with NLP. Please don't join that club.
Tim Ferriss is an utterly amoral agent. His purpose is to fill pages with whatever you will buy, and sell them to you, for money. Beyond that, he does not care, at all. I expect he has read Robert Cialdini's "Influence", but only as a guidebook to the most efficient ways to extract money from suckers.
This is just a warning to all readers.