Posts
Comments
Or in the words of Sean Carroll's Poetic Naturalism:
- There are many ways of talking about the world.
- All good ways of talking must be consistent with one another and with the world.
- Our purposes in the moment determine the best way of talking.
A "way of talking" is a map, and "the world" is the territory.
The orthogonality thesis doesn't say anything about intelligences that have no goals. It says that an intelligence can have any specific goal. So I'm not sure you've actually argued against the orthogonality thesis.
And English has it backwards. You can see the past, but not the future. The thing which just happened is most clear. The future comes at us from behind.
Here's the reasoning I intuitively want to apply:
where X = "you roll two 6s in a row by roll N", Y = "you roll at least two 6s by roll N", and Z = "the first N rolls are all even".
This is valid, right? And not particularly relevant to the stated problem, due to the "by roll N" qualifiers mucking up the statements in complicated ways?
Where's the pain?
Sure. For simplicity, say you play two rounds of Russian Roulette, each with a 60% chance of death, and you stop playing if you die. What's the expected value of YouAreDead at the end?
- With probability 0.6, you die on the first round
- With probability 0.4*0.6 = 0.24, you die on the second round
- With probability 0.4*0.4=0.16, you live through both rounds
So the expected value of the boolean YouAreDead random variable is 0.84.
Now say you're monogamous and go on two dates, each with a 60% chance to go well, and if they both go well then you pick one person and say "sorry" to the other. Then:
- With probability 0.4*0.4=0.16, both dates go badly and you have no partner.
- With probability 20.40.6=0.48, one date goes well and you have one partner.
- With probability 0.6*0.6=0.36, both dates go well and you select one partner.
So the expected value of the HowManyPartnersDoYouHave random variable is 0.84, and the expected value of the HowManyDatesWentWell random variable is 0.48+2*0.36 = 1.2.
Now say you're polyamorous and go on two dates with the same chance of success. Then:
- With probability 0.4*0.4=0.16, both dates go badly and you have no partners.
- With probability 20.40.6=0.48, one date goes well and you have one partner.
- With probability 0.6*0.6=0.36, both dates go well and you have two partners.
So the expected value of both the HowManyPartnersDoYouHave random variable and the HowManyDatesWentWell random variable is 1.2.
Note that I've only ever made statements about expected value, never about utility.
Probability of at least two success: ~26%
My point is that in some situations, "two successes" doesn't make sense. I picked the dating example because it's cute, but for something more clear cut imagine you're playing Russian Roulette with 10 rounds each with a 10% chance of death. There's no such thing as "two successes"; you stop playing once you're dead. The "are you dead yet" random variable is a boolean, not an integer.
If you're monagamous and go to multiple speed dating events and find two potential partners, you end up with one partner. If you're polyamorous and do the same, you end up with two partners.
One way to think of it is whether you will stop trying after the first success. Though that isn't always the distinguishing feature. For example, you might start 10 job interviews at the same time, even though you'll take at most one job.
However it is true that doing something with a 10% success rate 10 times will net you an average of 1 success.
For the easier to work out case of doing something with a 50% success rate 2 times:
- 25% chance of 0 successes
- 50% chance of 1 success
- 25% chance of 2 successes
Gives an average of 1 success.
Of course this only matters for the sort of thing where 2 successes is better than 1 success:
- 10% chance of finding a monogamous partner 10 times yields 0.63 monogamous partners in expectation.
- 10% chance of finding a polyamorous partner 10 times yields 1.00 polyamorous partners in expectation.
EDIT: To clarify, a 10% chance of finding a monogamous partner 10 times yields 1.00 successful dates and 0.63 monogamous partners that you end up with, in expectation.
IQ over median does not correlate with creativity over median
That's not what that paper says. It says that IQ over 110 or so (quite above median) correlates less strongly (but still positively) with creativity. In Chinese children, age 11-13.
And for a visceral description of a kind of bullying that's plainly bad, read the beginning of Worm: https://parahumans.wordpress.com/2011/06/11/1-1/
I double-downvoted this post (my first ever double-downvote) because it crosses a red line by advocating for verbal and physical abuse of a specific group of people.
Alexej: this post gives me the impression that you started with a lot of hate and went looking for justifications for it. But if you have some real desire for truth seeking, here are some counterarguments:
Yeah, I think “computational irreducibility” is an intuitive term pointing to something which is true, important, and not-obvious-to-the-general-public. I would consider using that term even if it had been invented by Hitler and then plagiarized by Stalin :-P
Agreed!
OK, I no longer claim that. I still think it might be true
No, Rice's theorem is really not applicable. I have a PhD in programming languages, and feel confident saying so.
Let's be specific. Say there's a mouse named Crumbs (this is a real mouse), and we want to predict whether Crumbs will walk into the humane mouse trap (they did). What does Rice's theorem say about this?
There are a couple ways we could try to apply it:
-
We could instantiate the semantic property P with "the program will output the string 'walks into trap'". Then Rice's theorem says that we can't write a program Q that takes as input a program R and says whether R outputs 'walks into trap'. For any Q we write, there will exist a program R that defeats it. However, this does not say anything about what the program R looks like! If R is simply
print('walks into trap')
, then it's pretty easy to tell! And if R is the Crumbs algorithm running in Crumb's brain, Rice's theorem likewise does not claim that we're unable tell if it outputs 'walks into trap'. All the theorem says is that there exists a program R that Q fails on. The proof of the theorem is constructive, and does give a specific program as a counter-example, but this program is unlikely to look anything like Crumb's algorithm. The counter-example program R runs Q on P and then does the opposite of it, while Crumbs does not know what we've written for Q and is probably not very good at emulating Python. -
We could try to instantiate the counter-example program R with Crumb's algorithm. But that's illegal! It's under an existential, not a forall. We don't get to pick R, the theorem does.
Actually, even this kind of misses the point. When we're talking about Crumb's behavior, we aren't asking what Crumbs would do in a hypothetical universe in which they lived forever, which is the world that Rice's theorem is talking about. We mean to ask what Crumbs (and other creatures) will do today (or perhaps this year). And that's decidable! You can easily write a program Q that takes a program R and checks if R outputs 'walks into trap' within the first N steps! Rice's theorem doesn't stand in your way even a little bit, if all you care about is behavior after a fixed finite amount of time!
Here's what Rice's theorem does say. It says that if you want to know whether an arbitrary critter will walk into a trap after an arbitrarily long time, including long after the heat death of the universe, and you think you have a program that can check that for any creature in finite time, then you're wrong. But creatures aren't arbitrary (they don't look like the very specific, very scattered counterexample programs that are constructed in the proof of Rice's theorem), and the duration of time we care about is finite.
If you care to have a theorem, you should try looking at Algorithmic Information Theory. It's able to make statements about "most programs" (or at least "most bitstrings"), in a way that Rice's theorem cannot. Though I don't think it's important you have a theorem for this, and I'm not even sure that there is one.
Rice’s theorem (a.k.a. computational irreducibility) says that for most algorithms, the only way to figure out what they’ll do with certainty is to run them step-by-step and see.
Rice's theorem says nothing of the sort. Rice's theorem says:
For every semantic property P,
For every program Q that purports to check if an arbitrary program has property P,
There exists a program R such that Q(R) is incorrect:
Either P holds of R but Q(R) returns false,
or P does not hold of R but Q(R) returns true
Notice that the tricky program R
that's causing your property-checker Q
to fail is under an existential. This isn't saying anything about most programs, and it isn't even saying that there's a subset of programs that are tricky to analyze. It's saying that after you fix a property P and a property checker Q, there exists a program R that's tricky for Q.
There might be a more relevant theorem from algorithmic information theory, I'm not sure.
Going back to the statement:
for most algorithms, the only way to figure out what they’ll do with certainty is to run them step-by-step and see
This is only sort of true? Optimizing compilers rewrite programs into equivalent programs before they're run, and can be extremely clever about the sorts of rewrites that they do, including reducing away parts of the program without needing to run them first. We tend to think of the compiled output of a program as "the same" program, but that's only because compilers are reliable at producing equivalent code, not because the equivalence is straightforward.
a.k.a. computational irreducibility
Rice's theorem is not "also known as" computational irreducibility.
By the way, be wary of claims from Wolfram. He was a serious physicist, but is a bit of an egomaniac these days. He frequently takes credit for others' ideas (I've seen multiple clear examples) and exaggerates the importance of the things he's done (he's written more than one obituary for someone famous, where he talks more about his own accomplishments than the deceased's). I have a copy of A New Kind of Science, and I'm not sure there's much of value in it. I don't think this is a hot take.
for most algorithms, the only way to figure out what they’ll do with certainty is to run them step-by-step and see
I think the thing you mean to say is that for most of the sorts of complex algorithms you see in the wild, such as the algorithms run by brains, there's no magic shortcut to determine the algorithm's output that avoids having to run any of the algorithm's steps. I agree!
I think we’re in agreement on everything.
Excellent. Sorry for thinking you were saying something you weren't!
still not have an answer to whether it’s spinning clockwise or counterclockwise
More simply (and quite possibly true), Nobuyuki Kayahara rendered it spinning either clockwise or counterclockwise, lost the source, and has since forgotten which way it was going.
I like “veridical” mildly better for a few reasons, more about pedagogy than anything else.
That's a fine set of reasons! I'll continue to use "accurate" in my head, as I already fully feel that the accuracy of a map depends on which territory you're choosing for it to represent. (And a map can accurately represent multiple territories, as happens a lot with mathematical maps.)
Another reason is I’m trying hard to push for a two-argument usage
Do you see the Spinning Dancer going clockwise? Sorry, that’s not a veridical model of the real-world thing you’re looking at.
My point is that:
- The 3D spinning dancer in your intuitive model is a veridical map of something 3D. I'm confident that the 3D thing is a 3D graphical model which was silhouetted after the fact (see below), but even if it was drawn by hand, the 3D thing was a stunningly accurate 3D model of a dancer in the artist's mind.
- That 3D thing is the obvious territory for the map to represent.
- It feels disingenuous to say "sorry, that's not a veridical map of [something other than the territory map obviously represents]".
So I guess it's mostly the word "sorry" that I disagree with!
By "the real-world thing you're looking at", you mean the image on your monitor, right? There are some other ways one's intuitive model doesn't veridically represent that such as the fact that, unlike other objects in the room, it's flashing off and on at 60 times per second, has a weirdly spiky color spectrum, and (assuming an LCD screen) consists entirely of circularly polarized light.
It was made by a graphic artist. I’m not sure their exact technique, but it seems at least plausible to me that they never actually created a 3D model.
This is a side track, but I'm very confident a 3D model was involved. Plenty of people can draw a photorealistic silhouette. The thing I think is difficult is drawing 100+ silhouettes that match each other perfectly and have consistent rotation. (The GIF only has 34 frames, but the original video is much smoother.) Even if technically possible, it would be much easier to make one 3D model and have the computer rotate it. Annnd, if you look at Nobuyuki Kayahara's website, his talent seems more on the side of mathematics and visualization than photo-realistic drawing, so my guess is that he used an existing 3D model for the dancer (possibly hand-posed).
This is fantastic! I've tried reasoning along these directions, but never made any progress.
A couple comments/questions:
Why "veridical" instead of simply "accurate"? To me, the accuracy of a map is how well it corresponds to the territory it's trying to map. I've been replacing "veridical" with "accurate" while reading, and it's seemed appropriate everywhere.
Do you see the Spinning Dancer going clockwise? Sorry, that’s not a veridical model of the real-world thing you’re looking at. [...] after all, nothing in the real world of atoms is rotating in 3D.
I think you're being unfair to our intuitive models here.
The GIF isn't rotating, but the 3D model that produced the GIF was rotating, and that's the thing our intuitive models are modeling. So exactly one of [spinning clockwise] and [spinning counterclockwise] is veridical, depending on whether the graphic artist had the dancer rotating clockwise or counterclockwise before turning her into a silhouette. (Though whether it happens to be veridical is entirely coincidental, as the silhouette is identical to the one that would have been produced had the dancer been spinning in the opposite direction.)
If you look at the photograph of Abe Lincoln from Feb 27, 1860, you see a 3D scene with a person in it. This is veridical! There was an actual room with an actual person in it, who dressed that way and touched that book. The map's territory is 164 years older than the map, but so what.
(My favorite example of an intuitive model being wildly incorrect is Feynman's story of learning to identify kinds of galaxies from images on slides. He asks his mentor "what kind of galaxy is this one, I can't identify it", and his mentor says it's a smudge on the slide.)
Very curious what part of this people think is wrong.
Here's a simple argument that simulating universes based on Turing machine number can give manipulated results.
Say we lived in a universe much like this one, except that:
- The universe is deterministic
- It's simulated by a very short Turing machine
- It has a center, and
- That center is actually nearby! We can send a rocket to it.
So we send a rocket to the center of the universe and leave a plaque saying "the answer to all your questions is Spongebob". Now any aliens in other universes that simulate our universe and ask "what's in the center of that universe at time step 10^1000?" will see the plaque, search elsewhere in our universe for the reference, and watch Spongebob. We've managed to get aliens outside our universe to watch Spongebob.
I feel like it would be helpful to speak precisely about the universal prior. Here's my understanding.
It's a partial probability distribution over bit strings. It gives a non-zero probability to every bit string, but these probabilities add up to strictly less than 1. It's defined as follows:
That is, describe Turing machines by a binary code
, and assign each one a probability based on the length of its code, such that those probabilities add up to exactly 1. Then magically run all Turing machines "to completion". For those that halt leaving a bitstring
on their tape, attribute the probability of that Turing machine to that bitstring
. Now we have a probability distribution over bitstring
s, though the probabilities add up to less than one because not all of the Turing machines halted.
You cannot compute this probability distribution, but you can compute lower bounds on the probabilities of its bitstrings. (The Nth lower bound is the probability distribution you get from running the first N TMs for N steps.)
Call a TM that halts poisoned if its output is determined as follows:
- The TM simulates a complex universe full of intelligent life, then selects a tiny portion of that universe to output, erasing the rest.
- That intelligent life realizes this might happen, and writes messages in many places that could plausibly be selected.
- It works, and the TM's output is determined by what the intelligent life it simulated chose to leave behind.
If we approximate the universal prior, the probability contribution of poisoned TMs will be precisely zero, because we don't have nearly enough compute to simulate a poisoned TM until it halts. However, if there's an outer universe with dramatically more compute available, and it's approximating the universal prior using enough computational power to actually run the poisoned TMs, they'll effect the probability distribution of the bitstrings, making bitstrings with the messages they choose to leave behind more likely.
So I think Paul's right, actually (not what I expected when I started writing this). If you approximate the UP well enough, the distribution you see will have been manipulated.
The feedback is from Lean, which can validate attempted formal proofs.
This is one of the bigger reasons why I really don’t like RLHF—because inevitably you’re going to have to use a whole bunch of Humans who know less-than-ideal amounts about philosophy, pertaining to Ai Alignment.
What would these humans do differently, if they knew about philosophy? Concretely, could you give a few examples of "Here's a completion that should be positively reinforced because it demonstrates correct understanding of language, and here's a completion of the same text that should be negatively reinforced because it demonstrates incorrect understanding of language"? (Bear in mind that the prompts shouldn't be about language, as that would probably just teach the model what to say when it's discussing language in particular.)
It’s impossible for the Utility function of the Ai to be amenable to humans if it doesn’t use language the same way
What makes you think that humans all use language the same way, if there's more than one plausible option? People are extremely diverse in their perspectives.
As you're probably aware, the fine tuning is done by humans rating the output of the LLM. I believe this was done by paid workers, who were probably given a list of criteria like that it should be helpful and friendly and definitely not use slurs, and who had probably not heard of Wittgenstein. How do you think they would rate LLM outputs that demonstrated "incorrect understanding of language"?
I have (tried to) read Wittgenstein, but don't know what outputs would or would not constitute an "incorrect understanding of language". Could you give some examples? The question is whether the tuners would rate those examples positively or negatively, and whether examples like those would arise during five tuning.
You say "AI", though I'm assuming you're specifically asking about LLMs (large language models) like GPT, Llama, Claude, etc.
LLMs aren't programmed, they're trained. None of the code written by the developers of LLMs has anything to do with concepts, sentences, dictionary definitions, or different languages (e.g. English vs. Spanish). The code only deals with general machine learning, and streams of tokens (which are roughly letters, but encoded a bit differently).
The LLM is trained on huge corpuses of text. The LLM learns concepts, and what a sentence is, and the difference between English and Spanish, purely from the text. None of that is explicitly programmed into it; the programmers have no say in the matter.
As far as how it comes to understands language, and how that related to Wittgenstein's thoughts on language, we don't know much at all. You can ask it. And we've done some experiments like that recent one with the LLM that was made to think it was the Golden Gate Bridge, which you probably heard about. But that's about it; we don't really know how LLMs "think" internally. (We know what's going on at a low-level, but not at a high-level.)
However, If I already know that I have the disease, and I am not altruistic to my copies, playing such game is a wining move to me?
Correct. But if you don't have the disease, you're probably also not altruistic to your copies, so you would choose not to participate. Leaving the copies of you with the disease isolated and unable to "trade".
Not "almost no gain". My point is that it can be quantified, and it is exactly zero expected gain under all circumstances. You can verify this by drawing out any finite set of worlds containing "mediators", and computing the expected number of disease losses minus disease gains as:
num(people with disease)*P(person with disease meditates)*P(person with disease who meditates loses the disease) - num(people without disease)*P(person without disease meditates)*P(person without disease who meditates gains the disease)
My point is that this number is always exactly zero. If you doubt this, you should try to construct a counterexample with a finite number of worlds.
My point still stands. Try drawing out a specific finite set of worlds and computing the probabilities. (I don't think anything changes when the set of worlds becomes infinite, but the math becomes much harder to get right.)
There is a 0.001 chance that someone who did not have the disease will get it. But he can repeat the procedure.
No, that doesn't work. It invalidates the implicit assumption you're making that the probability that a person chooses to "forget" is independent of whether they have the disease. Ultimately, you're "mixing" the various people who "forgot", and a "mixing" procedure can't change the proportion of people who have the disease.
When you take this into account, the conclusion becomes rather mundane. Some copies of you can gain the disease, while a proportional number of copies can lose it. (You might think you could get some respite by repeatedly trading off "who" has the disease, but the forgetting procedure ensures that no copy ever feels respite, as that would require remembering having the disease.)
I think formalizing it in full will be a pretty nontrivial undertaking, but formalizing isolated components feels tractable, and is in fact where I’m currently directing a lot of my time and funding.
Great. Yes, I think that's the thing to do. Start small! I (and presumably others) would update a lot from a new piece of actual formal mathematics from Chris's work. Even if that work was, by itself, not very impressive.
(I would also want to check that that math had something to do with his earlier writings.)
My current understanding is that he believes that his current written work should be sufficient for modern mathematicians and scientists to understand his core ideas
Uh oh. The "formal grammar" that I checked used formal language, but was not even close to giving a precise definition. So Chris either (i) doesn't realize that you need to be precise to communicate with mathematicians, or (ii) doesn't understand how to be precise.
Please be prepared for the possibility that Chris is very smart and creative, and that he's had some interesting ideas (e.g. Syndiffeonesis), but that his framework is more of a interlocked collection of ideas than anything mathematical (despite using terms from mathematics). Litany of Tarsky and all that.
"gesture at something formal" -- not in the way of the "grammar" it isn't. I've seen rough mathematics and proof sketches, especially around formal grammars. This isn't that, and it isn't trying to be. There isn't even an attempt at a rough definition for which things the grammar derives.
I think Chris’s work is most valuable to engage with for people who have independently explored philosophical directions similar to the ones Chris has explored
A big part of Chris’s preliminary setup is around how to sidestep the issues around making the sets well-ordered.
Nonsense! If Chris has an alternative to well-ordering, that's of general mathematical interest! He would make a splash simply writing that up formally on its own, without dragging the rest of his framework along with it.
Except, I can already predict you're going to say that no piece of his framework can be understood without the whole. Not even by making a different smaller framework that exists just to showcase the well-ordering alternative. It's a little suspicious.
because someone else I’d funded to review Chris’s work
If you're going to fund someone to do something, it should be to formalize Chris's work. That would not only serve as a BS check, it would make it vastly more approachable.
I’m confused why you’re asking about specific insights people have gotten when Jessica has included a number of insights she’s gotten in her post
I was hoping people other than Jessica would share some specific curated insights they got. Syndiffeonesis is in fact a good insight.
tldr; a spot check calls bullshit on this.
I know a bunch about formal languages (PhD in programming languages), so I did a spot check on the "grammar" described on page 45. It's described as a "generative grammar", though instead of words (sequences of symbols) it produces "L_O spacial relationships". Since he uses these phrases to describe his "grammar", and they have their standard meaning because he listed their standard definition earlier in the section, he is pretty clearly claiming to be making something akin to a formal grammar.
My spot check is then: is the thing defined here more-or-less a grammar, in the following sense?
- There's a clearly defined thing called a grammar, and there can be more than one of them.
- Each grammar can be used to generate something (apparently an L_O) according to clearly defined derivation rules that depend only on the grammar itself.
If you don't have a thing plus a way to derive stuff from that thing, you don't have anything resembling a grammar.
My spot check says:
- There's certainly a thing called a grammar. It's a four-tuple, whose parts closely mimic that of a standard grammar, but using his constructs for all the basic parts.
- There's no definition of how to derive an "L_O spacial relationship" given a grammar. Just some vague references to using "telic recursion".
I'd categorize this section as "not even wrong"; it isn't doing anything formal enough to have a mistake in it.
Another fishy aspect of this section is how he makes a point of various things coinciding, and how that's very different from the standard definitions. But it's compatible with the standard definitions! E.g. the alphabet of a language is typically a finite set of symbols that have no additional structure, but there's no reason you couldn't define a language whose symbols were e.g. grammars over that very language. The definition of a language just says that its symbols form a set. (Perhaps you'd run into issues with making the sets well-ordered, but if so he's running headlong into the same issues.)
I'm really not seeing any value in this guy's writing. Could someone who got something out of it share a couple specific insights that got from it?
How did you find me? How do they always find me? No matter...
Have you tried applying your models to predict the day's weather, or what your teacher will be wearing that day? I bet not: they wouldn't work very well. Models have domains in which they're meant to be applied. More precise models tend to have more specific domains.
Making real predictions about something, like what the result of a classroom experiment will be even if the pendulum falls over, is usually outside the domain of any precise model. That's why your successful models are compound models, using Newtonian mechanics as a sub-model, and that's why they're so unsatisfyingly vague and cobbled together.
There is a skill to assembling models that make good predictions in messy domains, and it is a valuable skill. But it's not the goal of your physics class. That class is trying to teach you about precise models like Newtonian mechanics. Figuring out exactly how to apply Newtonian mechanics to a real physical experiment is often harder than solving the Newtonian math! But surely you've noticed by now that, in the domains where Newtonian mechanics seems to actually apply, it applies very accurately?
This civilization we live in tends to have two modes of thinking. The first is 'precise' thinking, where people use precise models but don't think about the mismatch between their domain and reality. The model's domain is irrelevant in the real world, so people will either inappropriately apply the model outside its domain or carefully only make statements within the model's domain and hope that others will make that incorrect leap on their own. The other mode of thinking is 'imprecise' thinking, where people ignore all models and rely on their gut feelings. We are extremely bad, at the moment, of the missing middle path of making and recognizing models for messy domains.
"There's no such thing as 'a Bayesian update against the Newtonian mechanics model'!" says a hooded figure from the back of the room. "Updates are relative: if one model loses, it must be because others have won. If all your models lose, it may hint that there's another model you haven't thought of that does better than all of them, or it may simply be that predicting things is hard."
"Try adding a couple more models to compare against. Here's one: pendulums never swing. And here's another: Newtonian mechanics is correct but experiments are hard to perform correctly, so there's a 80% probability that Newtonian mechanics gives the right answer and 20% probability spread over all possibilities including 5% on 'the pendulum fails to swing'. Continue to compare these models during your course, and see which one wins. I think you can predict it already, despite your feigned ignorance."
The hooded figure opens a window in the back of the room and awkwardly climbs through and walks off.
Are we assuming things are fair or something?
I would have modeled this as von Neumann getting 300 points and putting 260 of them into the maths and sciences and the remaining 40 into living life and being well adjusted.
Oh, excellent!
It's a little hard to tell from the lack of docs, but you're modelling dilemmas with Bayesian networks? I considered that, but wasn't sure how to express Sleeping Beauty nicely, whereas it's easy to express (and gives the right answers) in my tree-shaped dilemmas. Have you tried to express Sleeping Beauty?
And have you tried to express a dilemma like smoking lesion where the action that an agent takes is not the action their decision theory tells them to take? My guess is that this would be as easy as having a chain of two probabilistic events, where the first one is what the decision theory says to do and the second one is what the agent actually does, but I don't see any of this kind of dilemma in your test cases.
I have a healthy fear of death; it's just that none of it stems from an "unobserved endless void". Some of the specific things I fear are:
- Being stabbed is painful and scary (it's scary even if you know you're going to live)
- Most forms of dying are painful, and often very slow
- The people I love mourning my loss
- My partner not having my support
- Future life experiences, not happening
- All of the things I want to accomplish, not happening
The point I was making in this thread was that "unobserved endless void" is not on this list, I don't know how to picture it, and I'm surprised that other people think it's a big deal.
Who knows, maybe if I come close to dying some time I'll suddenly gain a new ontological category of thing to be scared of.
What's the utility function of the predictor? Is there necessarily a utility function for the predictor such that the predictor's behavior (which is arbitrary) corresponds to maximizing its own utility? (Perhaps this is mentioned in the paper, which I'll look at.)
EDIT: do you mean to reduce a 2-player game to a single-agent decision problem, instead of vice-versa?
I was not aware of Everitt, Leike & Hutter 2015, thank you for the reference! I only delved into decision theory a few weeks ago, so I haven't read that much yet.
Would you say that this is similar to the connection that exists between fixed points and Nash equilibria?
Nash equilibria come from the fact that your action depends on your opponent's action, which depends on your action. When you assume that each player will greedily change their action if it improves their utility, the Nash equilibria are the fixpoints at which no player changes their action.
In single-agent decision theory problems, your (best) action depends on the situation you're in, which depends on what someone predicted your action would be, which (effectively) depends on your action.
If there's a deeper connection than this, I don't know it. There's a fundamental difference between the two cases, I think, because a Nash equilibrium involves multiple agents that don't know each others' decision process (problem statement: maximize the outputs of two functions independently), while single-agent decision theory involves just one agent (problem statement: maximize the output of one function).
My solution, which assumes computation is expensive
Ah, so I'm interested in normative decision theory: how one should ideally behave to maximize their own utility. This is what e.g. UDT&FDT are aiming for. (Keep in mind that "your own utility" can, and should, often include other people's utility too.)
Minimizing runtime is not at all a goal. I think the runtime of the decision theories I implemented is something like doubly exponential in the number of steps of the simulation (the number of events in the simulation is exponential in its duration; each decision typically involves running the simulation using a trivial decision theory).
reason about other agents based on their behavior towards a simplified-model third agent
That's an interesting approach I hadn't considered. While I don't care about efficiency in the "how fast does it run" sense, I do care about efficiency in the "does it terminate" sense, and that approach has the advantage of terminating.
Defect against bots who defect against cooperate-bot, otherwise cooperate
You're doing to defect against UDT/FDT then. They defect against cooperate-bot. You're thinking it's bad to defect against cooperate-bot, because you have empathy for the other person. But I suspect you didn't account for that empathy in your utility function in the payoff matrix, and that if you do, you'll find that you're not actually in a prisoner's dilemma in the game-theory sense. There was a good SlateStarCodex post about this that I can't find.
Yeah, exactly. For example, if humans had a convention of rounding probabilities to the nearest 10% when writing them, then baseline GPT-4 would follow that convention and it would put a cap on the maximum calibration it could achieve. Humans are badly calibrated (right?) and baseline GPT-4 is mimicking humans, so why is it well calibrated? It doesn't follow from its token stream being well calibrated relative to text.
I like the idea of Peacemakers. I even had the same idea myself---to make an explicitly semi-cooperative game with a goal of maximizing your own score but every player having a different scoring mechanism---but haven't done anything with it.
That said, I think you're underestimating how much cooperation there is in a zero-sum game.
If you offer a deal, you must be doing it because it increases your chance of winning, but only one person can win under the MostPointsWins rule, so that deal couldn’t be very good for me, and I’ll always suspect your deal of being a trick, so in most cases no detailed deals will be offered.
Three examples of cooperation that occur in three-player Settlers of Catan (between, say, Alice, Bob, and Carol), even if all players are trying only to maximize their own chance of winning:
- Trading. Trading increases the chances that the two trading players win, to the detriment of the third. As long as there's sufficient uncertainty about who's winning, you want to trade. (There's a world Catan competition. I bet that these truly competitive games involve less trading than you would do with your friends, but still a lot. Not sure how to find out.)
- Refusing to trade with the winning player, once it's clear who that is. If Alice is ahead then Bob and Carol are in a prisoner's dilemma, where trading with Alice is defecting.
- Alice says at the beginning of the game: "Hey Bob, it sure looks like Carol has the strongest starting position, doesn't it? Wouldn't be very fair if she won just because of that. How about we team up against her by agreeing now to never trade with her for the entire game?" If Bob agrees, than the winning probabilities of Alice, Bob, Carol go from (say) 20%,20%,60% to 45%,45%,10%. Cooperation!
So it's not that zero-sum games lack opportunities for cooperation, it's just that every opportunity for cooperation with another player is at the detriment to a third. Which is why there isn't any cooperation at all in a two player zero-sum game.
Realize that even in a positive-sum game, players are going to be choosing between doing things for the betterment of everyone, and doing things for the betterment of themselves, and maximizing your own score involves doing more of the latter than the former, ideally while convincing everyone else that you're being more than fair.
Suggestion for the game: don't say the goal is to maximize your score. Instead say you're roleplaying a character who's goal is to maximize [whatever]. For a few reasons:
- It makes every game (more) independent of every other game. This reduces the possibility that Alice sabotages Bob in their second game together because Bob was a dick in their first game together. The goal is to have interesting negotiations, not to ruin friendships.
- It encourages exploration. You can try certain negotiating tactics in one game, and then abandon them in the next, and the fact that you were "roleplaying" will hopefully reduce how much people associate those tactics with you instead of that one time you played.
- It could lighten the mood. You should try really hard to lighten the mood. Because you know what else is a semi-cooperative game that's heavy on negotiation? Diplomacy.
Expanding on this, there are several programming languages (Idris, Coq, etc.) whose type system ensures that every program that type checks will halt when it's run. One way to view a type system is as an automated search for a proof that your program is well-typed (and a type error is a counter-example). In a language like Idris or Coq, a program being well-typed implies that it halts. So machine generated proofs that programs halt aren't just theoretically possible, they're used extensively by some languages.
I too gathered people's varied definitions of consciousness for amusement, though I gathered them from the Orange Site:
[The] ability to adapt to environment and select good actions depending on situation, learning from reward/loss signals.
https://news.ycombinator.com/item?id=16295769
Consciousness is the ability of an organism to predict the future
The problem is that we want to describe consciousness as "that thing that allows an organism to describe consciousness as 'that thing that allows an organism to describe consciousness as ´that thing that allows an organism to describe consciousness as [...]´'"
To me consciousness is the ability to re-engineer our existing models of the world based on new incoming data.
The issue presented at the beginning of the article is (as most philosophical issues are) one of semantics. Philosophers as I understand it use "consciousness" as the quality shared by things that are able to have experiences. A rock gets wet by the rain, but humans "feel" wet when it rains. A bat might not self-reflect but it feels /something/ when it uses echo-location.
On the other hand, conciseness in our everyday use of the term is very tied to the idea of attention and awareness, i.e. a "conscious action" or an "unconscious motivation". This is a very Freudian concept, that there are thoughts we think and others that lay behind.
https://news.ycombinator.com/item?id=15289654
Start with the definition: A conscious being is one which is conscious of itself.
You could probably use few more specific words to a greater effect. Such as self-model, world model, memory, information processing, directed action, responsiveness. Consciousness is a bit too underdefined a word. It is probably not as much of a whole as a tree or human as an organism is - it is not even persistent nor stable - and leaves no persistent traces in the world.
"The only thing we know about consciousness is that it is soluble in chloroform" ---Luca Turin
It's a clever scheme, but you can accomplish nearly the same thing by having a physical shutdown switch for the server room and giving a key to N people, and the shutdown switch was never the weak part. Here are some ways to escape your encryption, if the AI can interact with the outside world:
- Exploiting a hardware or software vulnerability. There are a lot of these. No one noticed a vulnerability that's been in the spec for the CPUs everyone uses for decades.
- Convincing one person to share it's source code with people that won't bother to run it in FHE
- Convincing everyone that it's benevolent and helpful beyond our wildest dreams, until we use it to run the world, then doing whatever it wants
- Successfully threatening m of the key holders, and also the utility company that's keeping the power on, and also whoever owns the server room
- Something something nanobots
- Convincing a rival company to unethically steal its source code
Clarification: pieces can't move "over" the missing squares. Where the words end, the world ends. You cannot move forward in an absence of space.
Woah, woah, slow down. You're talking about the edge cases but have skipped the simple stuff. It sounds like you think it's obvious, or that we're likely to be on the same page, or that it should be inferrable from what you've said? But it's not, so please say it.
Why is growing up so important?
Reading between the lines, are you saying that the only reason that it's bad for a human baby to be in pain is that it will eventually grow into a sapient adult? If so: (i) most people, including myself, both disagree and find that view morally reprehensible, (ii) the word "sapient" doesn't have a clear or agreed upon meaning, so plenty of people would say that babies are sentient; if you mean to capture something by the word "sapient" you'll have to be more specific. If that's not what you're saying, then I don't know why you're talking about uploading animals instead of talking about how they are right now.
As a more general question, have you ever had a pet?
By far the biggest and most sudden update I've ever had is Dominion, a documentary on animal farming:
https://www.youtube.com/watch?v=LQRAfJyEsko
It's like... I had a whole pile of interconnected beliefs, and if you pulled on one it would snap most of the way back into place after. And Dominion pushed the whole pile over at once.
Meta comment: I'm going to be blunt. Most of this sequence has been fairly heavily downvoted. That reads to me as this community asking to not have more such content. You should consider not posting, or posting elsewhere, or writing many fewer posts of much higher quality (e.g. spending more time, doing more background research, asking someone to proofread). As a data point, I've only posted a couple times, and I spent at least, I dunno, 10+ hours writing each post. As an example of how this might apply to you, if you wrote this whole sequence as a single "reference on biases" and shared that, I bet it would be better received.