Posts
Comments
doing anything requires extra effort proportional to the severity of symptoms
that's what the entire post is about?
Seems like a bad joke, and accordingly I have decreased trust that bhauth posts won't waste the reader's time in the future.
Getting oxygen from the moon to LEO requires less delta V than going from the Moon to LEO!
I think there might be a typo?
possibly an easier entry point to the topic is here
https://en.wikipedia.org/wiki/Chaitin%27s_constant
which is a specific construction that has some relation to the ideas OP has for a construction
MoviePass was paying full price for every ticket.
Well what's the appropriate way to act in the face of the fact that I AM sure I am right?
- Change your beliefs
- Convince literally one specific other person that you're right and your quest is important, and have them help translate for a broader audience
I agree that my suggestion was not especially helpful.
I think a generic answer is "read the sequences"? Here's a fun one
https://www.lesswrong.com/posts/qRWfvgJG75ESLRNu9/the-crackpot-offer
With regards to subsidizing: all the subsidizer needs to do in order to incentivize work on P is sell shares of P. If they are short P when P is proven, they lose money -- this money in effect goes to the people who worked to prove it.
To be more concrete:
Suppose P is trading at 0.50. I think I can prove P with one hour of work. Then an action available to me is to buy 100 shares of P, prove it, and then sell them back for $50 of profit.
But my going fee is $55/hour, so I don't do it.
Then a grantmaker comes along and offers to sell some shares at $0.40. Now the price is right for me, so I buy and prove and make $60/hr.
https://gwern.net/tool-ai
I think part of your point, translated to local language is "GPTs are Tool AIs, and Tool AI doesn't necessarily become agentic"
IMO those issues are all very minor, even when summed.
Is that relevant? Imagine that we were discussing the replacement of a ramp with stairs. This has a very minor effect on my experience -- is that enough to conclude the change was benign?
This is an example where the true distribution of future prices is bimodal (with the average between the modes). If all you can do is buy or sell stock, then you actually have to disagree with the market about the distribution to make money.
Without having information about the probability of default, there might still be something to do based on the vol curve.
Because the phenomenon happens at the tokenization level, GPT at runtime can't, like, "perceive" the letters. It has no idea that "SolidGoldMagikarp" looks similar to "SolidSoldMagikarp" (or the reverse)
it would be 3 lines
~all of the information is in lines 2 and 3, so you'd get all of the info on the first screen if you nix line 1.~
edit: not sure what I was thinking -- thanks, Slider
For those who care, it's open source and you can host your own server from a docker image. (In addition to the normal "just click buttons on our website and pay us some money to host a server for you" option)
I think that to get the type of the agent, you need to apply a fixpoint operator. This also happens inside the proof of Löb for constructing a certain self-referential sentence.
(As a breadcrumb, I've heard that this is related to the Y combinator.)
Cmn~Dno
I think this is a typo
Who do you suppose will buy them?
Yes.
My knee-jerk reaction to the argument was negative, but now I'm confused enough to say something.
If the contract is trading for M$p, then the "arbitrage" of "buy a share of yes and cause the contract to settle to 1" nets you M$(1-p) per share. Pushing up the price reduces the incentive for a new player to hit the arb.
If you sell the contract, you are paying someone to press the button and hoping they do not act on their incentive.
An interesting plot twist: after you buy the contract, your incentive has changed -- instead of the M$(1-p) available before purchase, you now have an incentive of a full M$1 for the button to be pushed rather than not (which maybe translates to a somewhat lower incentive to personally push the button, but I'm not sure that's the right comparison)
Can you say more about what you mean by "steering mechanisms"? Is it something like "outer loop of an optimization algorithm"?
How complex of a computation do you expect to need in order to find an example where cut activations express something that's hard to find in layer activations?
Money in the account per year is not fuzzy; it is literally a scalar for which the ground truth is literally a number stored in a computer.
If you convince people to live there, then there's more places for people to live and the population growth rate goes up. Many folks care about this goal, though idk whether it's interesting to you specifically.
Writing my dating profile was a good use of my time before I shared it with anybody. I had an insufficiently strong sense of what kind of relationship I want and why other people might want to have it with me. The exercise of "make a freeform document capturing all of that" was very helpful for focusing my mind towards figuring it out -- much moreso than the exercise of "fill in dating app textboxes in a way that seems competitive for the swiping game". (This is just a special case of "writing an essay teaches you a lot" -- something I'd like to take advantage of more often)
It took about 1 workday of writing effort to put mine together, and it's resulted in 2 high-quality dates (order of 10k micromarriages) in the past 5 months. This is competitive with the rest of my current tools for turning effort into dating prospects.
Which trade are you advocating for? "long crypto"? Reversion? (akak "buying the dip") Long ETH vs. short BTC?
All of these are plausible opinions, and it's not crazy to allocate some of your portfolio based on them -- but a trade consists of a price and a size. Do you think you should have 0.1% of your net worth in ETH or 30%? Does that change if ETH goes to 100 or 3000 next week? Do your arguments apply equally well elsewhere? (solana?)
It's a piece of fiction about someone using a funky language model tool to write autobiographical fiction.
If you launch the nukes, you also die, and we spend a lot of time worrying about that. Why?
So you have a crisp concept called "unbounded utility maximizer" so that some AI systems are, some AI systems aren't, and the ones that aren't are safe. Your plan is to teach everyone where that sharp conceptual boundary is, and then what? Convince them to walk back over the line and stay there?
Do you think your mission is easier or harder than nuclear disarmament?
I think I get what you're saying now; let me try to rephrase. We want to grow the "think good and do good" community. We have a lot of let's say "recruitment material" that appeals to people's sense of do-gooding, so unaligned people that vaguely want to do good might trip over the material and get recruited. But we have less of that on the think-gooding side, so there's a larger gap of unaligned people who want to think good that we could recruit.
Does that seem right?
Where does the Atlas fellowship fall on your scale of "recruits do-gooders" versus "recruits think-gooders"?
I think the most important claim you make here is that trying to fit into a cultural niche called "rationality" makes you a more effective researcher than trying to fit into a cultural niche called "EA". I think this is a plausible claim, (e.g. I feel this way about doing a math or philosophy undergrad degree over doing an economics or computer science undergrad degree) but I don't intuitively agree with it. Do you have any arguments in favor?
Pushing which button? They're deploying systems and competing on how capable those systems are. How do they know the systems they're deploying are safe? How do they define "not-unbounded-utility-maximizers" (and why is it not a solution to the whole alignment problem)? What about your "alignment-pilled" world is different from today's world, wherein large institutions already prefer not to kill themselves?
How does that distinguish between AGI and not-yet-AGI? How does that prevent an arms race?
Is there any concrete proposal that meets your specification? "don't kill yourself with AGI, please"?
I think the impact of little bits of "people engage with the problem" is not significantly positive. Maybe it rounds to zero. Maybe it is negative, if people engaging lightly flood serious people with noisy requests.
Hard research problems just don't get solved by people thinking for five minutes. There are some people who can make real contributions [0] by thinking for ~five hours per week for a couple of months, but they are quite rare.
(This is orthogonal to the current discussion, but: I had not heard of stampy.ai before your comment. Probably you should refer to it as stampy.ai, because googling "stampy wiki" give sit as the ~fifth result, behind some other stuff that is kind of absurd.)
[0] say, write a blog post that gets read and incorporated into serious people's world models
It seems like you're pointing at a model where society can make progress on safety by having a bunch of people put some marginal effort towards it. That seems insane to me -- have I misunderstood you?
Holden Karnofsky writes about this here
https://www.cold-takes.com/future-proof-ethics/
you would be able to drop those activities quickly and find new work or hobbies within a few months.
I don't see it. Literally how would I defend myself? Someone who doesn't like me tells you that I'm doing AI research. What questions do you ask them before investigating me? What questions do you ask me? Are there any answers I can give that meaningfully prove that I never did any such research (without you ransacking my house and destroying my computers?)
re q2: If you set up the bounty, then other people can use it to target whoever they want. Other people might have plenty of reasons to target alignment-oriented researchers. Alignment-oriented researchers are a more extreme / weird group of people than AI researchers at large, so I expect there to be more optimization pressure per target trying to target them. (jail / neutralize / kill / whatever you want to call it)
If you're worried about Goodhart's law, just use a coarse enough metric...
I don't think Goodhart is to blame here, per se. You are giving out a tool that preferentially favors offense to defense (something of an asymmetric weapon). Making the criteria coarser gives more power to those who want to abuse it, not less.
I really don't empathize with an intuition that this would be effective at causing differential progress of alignment over capability. Much like McCarthyism, the first order effect is terrorism, (especially in adjacent communities but also everywhere) and the intended impact is a hard-to-measure second order effect. (Remember, you need to slow down AI progress more than you slow down AI alignment progress, and that is hard to measure.) Eliezer recently pointed out that in the reference class of "do something crazy and immoral because it might have good second-order effects" tends to underperform pretty badly on those second-order effects.
In this system, how do I defend myself from the accusation of "being an AI researcher"? I know some theorems, write some code, and sometimes talk about recent AI papers. I've never tried to train new AI systems, but how would you know?
Have you heard about McCarthyism?
If you had the goal of maximizing the probability of unaligned AI, you could target only "AI researchers" that might contribute to the alignment problem. Since they're a much smaller target than AI researchers at large, you'll kill a much larger fraction of them and reduce their relative power over the future.
Thanks for all the detail, and for looking past my clumsy questions!
It sounds like one disagreement you're pointing at is about the shape of possible futures. You value "humanity colonizes the universe" far less than some other people do. (maybe rob in particular?) That seems sane to me.
The near-term decision questions that brought us here were about how hard to fight to "solve the alignment problem," whatever that means. For that, the real question is about the difference in total value of the future conditioned on "solving" it and conditioned on "not solving" it.
You think there are plausible distributions on future outcomes so that 1 one-millionth of the expected value of those futures is worth more to you than personally receiving 1 billion dollars.
Putting these bits together, I would guess the amount of value at stake is not really the thing driving disagreement here, but about the level of futility? Say you think humanity overall has about a 1% chance of succeeding with a current team of 1000 full-time-equivalents working on the problem. Do you want to join the team in that case? What if we have a one-in-one-thousand chance and a current team of 1 million? Do these seem like the right units to talk about the disagreement in?
(Another place that I thought there might be a disagreement: do you think solving the alignment problem increases or decreases s-risk? Here "solving the alignment problem" is the thing that we're discussing giving up on because it's too futile.)
One thing I like about the "dignity as log-odds" framework is that it implicitly centers coordination.
I guess by "civilization" I meant "civilization whose main story is still being meaningfully controlled by humans who are individually similar to modern humans". Other than that, I just mean your current expectations about what that civilization is like, conditioned on it existing.
(It seems like you could be disagreeing with "a lot of people here" about what those futures look like or how valuable they are or both -- I'd be happy to get clarification on either front.)
Can you be more explicit about the arithmetic? Would increasing the probability of civilization existing 1000 years from now from 10^{-7} to 10^{-6} be worth more or less to you than receiving a billion dollars right now?
One of Eliezer's points is that most people's judgements about adding 1e-5 odds (I assume you mean log odds and not additive probability?) are wrong, and even systematically have the wrong sign.
I mostly know this idea as pre-rigor and post-rigor in mathetmatics:
https://terrytao.wordpress.com/career-advice/theres-more-to-mathematics-than-rigour-and-proofs/
Holden Karnofsky has written some about average quality of life, including talking about that chart.
https://www.cold-takes.com/has-life-gotten-better/
I think he thinks that the zero point was crossed long before 1900, but I'm not sure.
I think the phrase "10,000 years of misery" is perfectly consistent with believing that the changes were net good due to population growth, and "misery" is pretty much equivalent to "average quality of life".
I mostly agree with swarriner, and I want to add that writing out more explicit strategies for making and maintaining friends is a public good.
The "case clinic" idea seems good. This sometimes naturally emerges among my friends, and trying to do it more would probably be net positive in my social circles.
Requiring to be finite is just part of assuming the form a probability distribution over worlds. I think you're confused about the type difference between the and the utility of . (Where in the context of this post, the utility is just represented by an element of a poset.)
I'm not advocating for or making arguments about any fanciness related to infinitesimals or different infinite values or anything like that.
L is not equal to infinity; that's a type error. L is equal to 1/2 A_0 + 1/4 A_1 + 1/8 A_2 ...
is a bona fide vector space -- addition behaves as you expect. The points are infinite sequences (x_i) such that is finite. This sum is a norm and the space is Banach with respect to that norm.
Concretely, our interpretation is that x_i is the probability of being in world A_i.
A utility function is a linear functional, i.e. a map from points to real numbers such that the map commutes with addition. The space of continuous linear functionals on is , which is the space of bounded sequences. A special case of this post is that unbounded linear functionals are not continuous. I say 'special case' because the class of "preference between points" is richer than the class of utility functions. You get a preference order from a utility function via "map to real numbers and use the order there." The utility function framework e.g. forces every pair of worlds to be comparable, but the more general framework doesn't require this -- Paul's theorem follows from weaker assumptions.
The sum we're rearranging isn't a sum of real numbers, it's a sum in . Ignoring details of what means... the two rearrangements give the same sum! So I don't understand what your argument is.
Abstracting away the addition and working in an arbitrary topological space, the argument goes like this: . For all Therefore, f is not continuous (else 0 = 1).