Posts
Comments
To me, it sounds like A is a member of a community which A wants to have certain standards and B is claiming membership in that community while not meeting those. In that circumstance, I think a discussion between various members of the community about obligations to be part of that community and the community's goals and beliefs and how these things relate is very very good. Do you
A) disagree with that framing of the situation in the dialogue
B) disagree that in the situation I described a discussion is virtuous, verging on necessary
C) other?
Lots of your comments on various posts seem rude to me--should I be attempting to severely punish you?
I am genuinely confused why this is on lesswrong instead of EA. What do you think the distribution of giving money is like on each place, and what do you think the distribution of responses to drowning child is like on each?
Minor semantic quibble: I would say we always want positive expected utility, but how that translates into money/time/various intangibles can vary tremendously both situationally and from person to person.
This was very interesting, thanks for writing it :)
My zero-knowledge instinct is that sound-wave communication would be very likely to evolve in most environments. Motion -> pressure differentials seems pretty inevitable, so would almost always be a useful sensory modality. And any information channel that is easy to both sense and affect seems likely to be used for communication. Curious to hear your thoughts if your intuition is that it would be rare.
Do you have candidates for intermediate views? Many-drafts which seem convergent, or fuzzy Cartesian theatres? (maybe graph-theoretically translating to nested subnetworks of neurons where we might say "this set is necessarily core, this larger set is semicore/core in frequent circumstances, this still larger set is usually un-core, but changeable, and outside this is nothing?)
The conversations I've had with people at Deepmind, OpenAI, and in academia make me very sure that lots of ideas on capabilities increases are already out there so there's a high chance anything you suggest would be something people are already thinking about. Possibly running your ideas past someone in those circles, and sharing anything they think is unoriginal would be safe-ish?
I think one of the big bottlenecks is a lack of ways to predict how much different ideas would help without actually trying them at costly large scale. Unfortunately, this is also a barrier to good alignment work. I don't have good ideas on making differential progress on this.
I think lots of people would say that all three examples you gave are more about signalling than about genuinely attempting to accomplish a goal.
This seems like kinda a nonsense double standard. The declared goal of journalism is usually not to sell newspapers, that is your observation of the incentive structure. And while the declared goal of LW is to arrive at truth (or something similar--hone the skills which will better allow people to arrive at truth, or something), there are comparable parallel incentive structures to journalism.
It seems better to compare declared purpose to declared purpose, or inferred goal to inferred goal, doesn't it?
Can you name the organization?
I don't think I've fully processed what you or the OP have said here--my apologies, but this still seemed relevant.
I think the category-theory way I would describe this is Bob is a category B, and Alice is a category A. A and B are big and complicated, and I have no idea how to describe all the objects or morphisms in them, although there is some structure preserving morphism between them (your G). But what Bob does is try to to find a straw-alice category A' which is small, and simple, along with functors from A' to A and from A' to B, which makes Alice predictable (or post-dictable).
Does that make any sense?
Alright, I'll bite. As a CDT fan, I will happily take the 25 dollars. I'll email you on setting up the experiment. If you'd like, we could have a third party hold money in escrow?
I'm open to some policy which will ceiling our losses if you don't want to risk $2050, or conversely, something which will give a bonus if one of us wins by more than $5 or something.
As far as Newcomb's problem goes, what if you find a super intelligent agent that says it tortures and kills anyone who would have oneboxed in Newcomb? This seems roughly as likely to me as finding the omega from the original problem. Do you still think the right thing to do now is commit to oneboxing before you have any reason to think that commitment has positive EV?
This is (related to) a very old idea: https://en.wikipedia.org/wiki/Method_of_loci
Which is why you order the same thing at a restaurant every time, up to nutritional equivalence? And regularly murder people you know to be organ donors?
Two situations that are described differently are different. What differences are salient to you is a fundamentally arational question. Deciding that the differences you, Richard, care about are the ones which count to make two situations isomorphic cannot be defended as a rational position. It predictably loses.
It is obviously true in a bare bones consequences sense. It is obviously false in the aesthetics, which I expect will change the proportion of people answering a or b--as you say, the psychology of the people answering affects the chances.
That depends on your definition of isomorphism. I'm aware of the sense on which that is true. Are you aware of the sense it is false? Do you think think I'm wrong when I say "There, the chances of everyone or nearly everyone choosing red seem (much!) higher"?
There, the chances of everyone or nearly everyone choosing red seem (much!) higher, so I think I would choose red.
Even in that situation, though, I suspect the first-mover signal is still the most important thing. If the first person to make a choice gives an inspiring speech and jumps in, I think the right thing to do is choose blue with them.
Thanks for doing the math on this :)
My first instinct is that I should choose blue, and the more I've thought about it, the more that seems correct. (Rough logic: The only way no-one dies is if either >50% choose blue, or if 100% choose red. I think chances of everyone choosing red are vanishingly small, so I should push in the direction with a wide range of ways to get the ideal outcome.)
I do think the most important issue not mentioned here is a social-signal, first-mover one: If, before most people have chosen, someone loudly sends a signal of "everyone should do what I did, and choose X!", then I think we should all go along with that and signal-boost it.
I'm an AI alignment researcher who will be moving to the bay area soon (the two facts are unconnected--I'm moving due to my partner getting a shiny new job). I'm interested in connecting with other folks in the field, and feeling like I have coworkers. My background is academic mathematics.
1) In academia, I could show up at a department (or just look at their website) and find a schedule of colloquia/seminars/etc. ranging from 1-month to 10+/week, (depending on department size etc.). Are there similar things I should be aware of for AI folks near the bay?
2) Where (if anywhere) do independent alignment people tend to work (and how could I go about renting an office there?) I've heard of Constellation and Rose Garden Inn as the locations for several alignment organizations/events--do they also have office spaces for independent researchers?
3) Anything else I should know about?
One issue that I think OpenAI didn't convince me they had dealt with is that saying "neuron activations are well correlated with x" is different from being able to say what specifically a neuron does mechanistically. I think of this similarly to how I think of the limitations of picking max activating examples from a dataset or doing gradient methods to find high activations: finding the argmax of a function doesn't necessarily tell you much about the functions...well, functionality.
This seems like it might have a related obstacle. While this method could eg make it easier to find a focus for mechanistic interpretability, I think the bulk of the hard work would still be ahead.
Relatedly, I'd really like to be able to attach private notes to author's names. There are pairs of people on LW with names I find it easy to mistake, and being able to look at the author of a post or comment and see a self-note "This is the user who is really insightful about X" or "Don't start arguing with this person, it takes forever and goes nowhere" etc would be very helpful.
I agree we are an existence proof for general intelligence. For alignment, what is the less intelligent thing whose goals humanity has remained robustly aligned to?
To a decision-theoretic agent, the value of information is always nonnegative
This seems false. If I selectively give you information in an adversarial manner, and you don't know that I'm picking the information to harm you, I think it's very clear that the value of the information you gain can be strongly negative.
Yep! It might be easier to visualize with a train on tracks--the rope needs to be parallel to the intended direction of movement. Suppose the rope is nearly perfectly taut and tied to something directly in front of the train. Pulling the rope sideways w 100 newtons requires the perp component of force to be 100, definitionally. But the rope can only exert force along itself, so if it missed being taut by radians, it'll be exerting enough force that . But if the rope is very close to perfectly taut, then , so (in the limit), you're exerting infinite force.
This fades pretty quickly as the rope gets away from the 0 angle, so you then need to secure the car so it won't move back (rocks under tires or something), and re-tighten the rope, and iterate.
I have actually tried this, not in tug-of-war, but with moving a stuck car (one end affixed to car, one end to a tree or lamppost or something). In that situation, where the objects aren't actively adjusting to thwart you, it works quite well!
Very cool idea!
It looked like several of the text samples were from erotica or something, which...seems like something I don't want to see without actively opting in--is there an easy way for you to filter those out?
(I imagine you know this, but for the sake of future readers) I think the word Torsor is relevant here :) https://math.ucr.edu/home/baez/torsors.html is a nice informal introduction.
Something I've found helpful for similar issues is to change my mindset from "I need to x" or (worse) "I should do x" to "I want to do x", even if the reason is because the consequences of not doing it seem very bad. Trying hard to reframe things as a desire of mine, rather than an obligation has been very good for me (when I can do it, which is not always).
I think you probably could do that, but you'd be restricting yourself to something that might work marginally worse than whatever would otherwise be found by gradient descent. Also, the more important part of the 768 dimensional vector which actually gets processed is the token embeddings.
If you believe that neural nets store things as directions, one way to think of this is as the neural net reserving 3 dimensions for positional information, and 765 for the semantic content of the tokens. If the actual meaning of the words you read is roughly 250 times as important to your interpretation of a sentence as where they come in a sentence, then this should make sense?
This is kinda a silly way of looking at it--we don't have any reason (that I'm aware of) to think of these as separable, the interactions probably matter a lot--but might be not-totally-worthless as intuition.
@AdamYedidia This is super cool stuff! Is the magnitude of the token embeddings at all concentrated in or out of the 3 PCA dimensions for the positional embeddings? If its concentrated away from that, we are practically using the addition as a direct sum, which is nifty.
Upon reflection, it was probably a mistake for me to write this phrased as a story/problem/thought experiment. I should probably have just written a shorter post titled something like "Newcomb's problem provides no (interesting, non-trivial) evidence against using causal decision theory." I had some fun writing this, though, and (mistakenly?) hoped that people would have fun reading it.
I think I disagree somewhat that "PNP references the strategy for NP". I think many (most?) LW people have decided they are "the type of person who one-boxes in NP", and believe that says something positive about them in their actual life. This post is an attempt to push back on that.
It seems from your comment that you think of "What I, Vladimir Nesov, would do in a thought experiment" as different from what you would actually do in real life. (eg, when you say "the problem statement is very confusing on this point."). I think of both as being much more closely tied.
Possibly the confusion comes from the difference between what you-VN-would-actually-do and what you think is correct/optimal/rational behavior? Like, in a thought experiment, you don't actually try to imagine or predict what real-you would do, you just wonder what optimal behavior/strategy is? In that case, I agree that this is a confusing problem statement.
I think in-story you believes that you will be killed if you make an inconsistent choice (or at least thinks there is a high enough chance that they do choose consistently).
The point of the post isn't so much the specific set up, as it is an attempt to argue that Newcomb's problem doesn't provide any reason to be against causal decision theory.
Well, if you were confronted with Newcomb's problem, would you one-box or two box? How fully do you endorse your answer as being "correct" or maximally rational, or anything along those lines?
I'm not trying to argue against anyone who says they aren't sure, but they think they would one-box or two-box in some hypothetical, or anyone who has thought carefully about the possible existence of unknown unknowns and come down on the "I have no idea what's optimal, but I've predetermined to do X for the sake of predictability" side for either X.
I am arguing against people who think that Newcomb's problem means causal decision theory is wrong, and that they have a better alternative. I think Newcomb's provides no (interesting, nontrivial) evidence against CDT.
The intention was to portray the transparent box as having lots of money--call it $1,000,000.
The point of UDT as I understand it is that you should be the sort of person who predictably one-boxes in NP. This seems incorrect to me. I think if you are the sort of person who one-boxes in a surprise NP, you will have worse outcomes in general, and that if you have a surprise NP, you should two-box. If you know you will be confronted with NP tomorrow, then sure, you should decide to one-box ahead of time. But I think deciding now to "be the sort of person who would one-box in NP," (or equivalently, deciding now to commit to a decision theory which will result in that) is a mistake.
Eliezer Yudkowsky and the whole UDT crowd seem to think that you should commit to a decision theory which seems like a bad one to me, on the basis that it would be rational to have precommitted if you end up in this situation. They seem to have convinced most LW people of this. I think they are wrong. I think CDT is a better decision theory which is more intuitive. I agree CDT gives a suboptimal outcome in surprise-NP, but I think any decision theory can give a good or bad outcome in corner-cases, along the lines of "You meet a superintelligent agent which will punish people who use (good decision theory) and reward those who use (bad decision theory)." Thus, NP shouldn't count as a strike against CDT.
Succinctly, if someone runs into an omega which says "I will give you $1,000,000 if you are someone who would have two-boxed in Newcomb. If you would have one-boxed, I will kill your family", then the two-boxers have much better outcomes than the one-boxers. You may object that this seems silly and artificial. I think it is no more so than the original problem.
And yes--I think EY is very wrong in the post you link to, and this is a response to the consensus LW view that one-boxing is correct.
This is a really cool piece of work!
As someone who successfully first-tried the ball into the cup without any video analysis, my algorithm was:
1) ask to see the ball roll down the ramp but be stopped at the end
2) notice the ramp moving with significant flex
3) do the standard calculations for ball assuming all potential is converted to kinetic+rolling, and calculate cup-lip placement accordingly
4) decide that "about 10-15% loss" sounded both right to compensate for the flex and looked good to my physics instincts, and so move the cup closer accordingly.
It was a fun exercise! thanks, John :)
I'm looking forward to both this series, and the workshop!
I think I (and probably many other people) would find it helpful if there was an entry in this sequence which was purely the classical story told in a way/with language which makes its deficiencies clear and the contrasts with the Watanabe version very easy to point out. (Maybe a -1 entry, since 0 is already used?)
"You cannot argue with a group. You cannot convince a group of things or change a group’s mind."
Forgive me if this comes across as trollish, but whose mind are you trying to change with this essay?
To me it seems like your point is either self-refuting (in form, if not meaning) or, at best, incomplete.
Another option: my father reports he usually memorizes phone numbers based on the geometric pattern they make on a typical keypad.