Posts

Speed running everyone through the bad alignment bingo. $5k bounty for a LW conversational agent 2023-03-09T09:26:25.383Z
preferences:decision theory :: data:code 2011-02-19T07:45:22.119Z

Comments

Comment by ArthurB on Implications of the inference scaling paradigm for AI safety · 2025-01-16T23:27:23.815Z · LW · GW

Interestingly o1-pro is not available for their team plan which offers the guarantee that they do not train on your data. I'm pretty sure they are losing money on o1-pro and it's available purely to gather data.

Comment by ArthurB on Fake Utility Functions · 2023-12-19T12:11:39.759Z · LW · GW

Popular with Silicon Valley VCs 16 years later: just maximize the rate of entropy creation🤦🏻‍♂️

Comment by ArthurB on Air Conditioner Test Results & Discussion · 2023-09-29T07:56:16.084Z · LW · GW

#e/ac

Comment by ArthurB on Speed running everyone through the bad alignment bingo. $5k bounty for a LW conversational agent · 2023-04-01T20:30:05.076Z · LW · GW

We have a winner! laserfiche's entry is the best (and only, but that doesn't mean it's not good quality) submission, and they win $5K.

Code and demo will be posted soon.

Comment by ArthurB on Speed running everyone through the bad alignment bingo. $5k bounty for a LW conversational agent · 2023-03-29T14:41:27.510Z · LW · GW

Exactly. As for the cost issue, the code can be deployed as:

- Twitter bots (registered as such) so the deployer controls the cost


- A webpage that charges you a small payment (via crypto or credit card) to run 100 queries. Such websites can actually be generated by ChatGPT4 so it's an easy lift. Useful for people who truly want to learn or who want to get good arguments for online argumentation

- A webpage with captchas and reasonable rate limits to keep cost small 

 

Comment by ArthurB on Speed running everyone through the bad alignment bingo. $5k bounty for a LW conversational agent · 2023-03-13T08:04:46.485Z · LW · GW

In general yes, here no. My impression from reading LW is that many people suffer from a great deal of analysis paralysis and are taking too few chances, especially given that the default isn't looking great.

There is such a thing as doing a dumb thing because it feels like doing something (e.g. let's make AI Open!) but this ain't it. The consequences of this project are not going to be huge (talking to people) but you might get a nice little gradient read as to how helpful it is and iterate from there.

Comment by ArthurB on Speed running everyone through the bad alignment bingo. $5k bounty for a LW conversational agent · 2023-03-09T17:10:44.135Z · LW · GW

It should be possible to ask content owners for permission and get pretty far with that.

Comment by ArthurB on Speed running everyone through the bad alignment bingo. $5k bounty for a LW conversational agent · 2023-03-09T17:09:09.236Z · LW · GW

AFAIK what character.ai does is fine tuning, with their own language models, which aren't at parity with ChatGPT. Using a better language model will yield better answers but, MUCH MORE IMPORTANTLY, what I'm suggesting is NOT fine tuning.

What I'm suggesting gives you an answer that's closer to a summary of relevant bits of LW, Arbital, etc. The failure mode is much more likely to be that the answer is irrelevant or off the mark than it being at odds with prevalent viewpoints on this platform.

Think more interpolating over an FAQ, and less reproducing someone's cognition.

Comment by ArthurB on Humans are very reliable agents · 2022-06-18T22:34:05.539Z · LW · GW

The US has around one traffic fatality per 100 million miles driven; if a human driver makes 100 decisions per mile

A human driver does not make 100 "life or death decisions" per mile. They make many more decisions, most of which can easily be corrected, if wrong, by another decision.

The statistic is misleading though in that it includes people who text, drunk drivers, tired drivers. The performance of a well rested human driver that's paying attention to the road is much, much higher than that. And that's really the bar that matters for self driving car, you don't want a car that is doing better than the average driver who - hey you never know - could be a drunk.

Comment by ArthurB on Godzilla Strategies · 2022-06-11T17:58:57.451Z · LW · GW

Fixing hardware failures in software is literally how quantum computing is supposed to work, and it's clearly not a silly idea.

Generally speaking, there's a lot of appeal to intuition here, but I don't find it convincing. This isn't good for Tokyo property prices? Well maybe, but how good of a heuristic is that when Mechagodzilla is on its way regardless.

Comment by ArthurB on AGI Ruin: A List of Lethalities · 2022-06-06T20:45:23.817Z · LW · GW

In addition

  1. There aren't that many actors in the lead.
  2. Simple but key insights in AI (e.g doing backprop, using sensible weight initialisation) have been missed for decades.

If the right tail for the time to AGI by a single group can be long and there aren't that many groups, convincing one group to slow down / paying more attention to safety can have big effects.

How big of an effect? Years doesn't seem off the table. Eliezer suggests 6 months dismissively. But add a couple years here and a couple years there, and pretty soon you're talking about the possibility of real progress. It's obviously of little use if no research towards alignment is attempted in that period of course, but it's not nothing.

Comment by ArthurB on AGI Ruin: A List of Lethalities · 2022-06-06T19:07:02.494Z · LW · GW

There are IMO in-distribution ways of successfully destroying much of the computing overhang. It's not easy by any means, but on a scale where "the Mossad pulling off Stuxnet" is 0 and "build self replicating nanobots" is 10, I think it's is closer to a 1.5.

Comment by ArthurB on How I Lost 100 Pounds Using TDT · 2011-03-14T20:35:35.500Z · LW · GW

Indeed, there is nothing irrational (in an epistemic way) about having hyperbolic time preference. However, this means that a classical decision algorithm is not conducive to achieving long term goals.

One way around this problem is to use TDT, another way is to modify your preferences to be geometric.

A geometric time preference is a bit like a moral preference... it's a para-preference. Not something you want in the first place, but something you benefit from wanting when interacting with other agents (including your future self).

Comment by ArthurB on preferences:decision theory :: data:code · 2011-02-23T18:04:14.108Z · LW · GW

The second dot point is part of the problem description. You're saying it's irrelevant, but you can't just parachute a payoff matrix where causality goes backward in time.

Find any example you like, as long as they're physically possible, you'll either have the payoff tied to your decision algorithm (Newcomb's) or to your preference set (Solomon's).

Comment by ArthurB on preferences:decision theory :: data:code · 2011-02-23T17:55:21.837Z · LW · GW

I'm making a simple, logical argument. If it's wrong, it should be trivial to debunk. You're relying on an outside view to judge; it is pretty weak.

As I've clearly said, I'm entirely aware that I'm making a rather controversial claim. I never bother to post on lesswrong, so I'm clearly not whoring for attention or anything like that. Look at it this way, in order to present my point despite it being so unorthodox, I have to be pretty damn sure it's solid.

Comment by ArthurB on preferences:decision theory :: data:code · 2011-02-23T13:18:48.968Z · LW · GW

That's certainly possible, it's also possible that you do not understand the argument.

To make things absolutely clear, I'm relying on the following definition of EDT

Policy that picks action a = argmax( Sum( P( Wj | W, ai ). U( Wj ), j ) , i ) Where {ai} are the possible actions, W is the state of the world, P( W' | W, a ) the probability of moving to state of the world W' after doing a, and U is the utility function.

I believe the argument I made in the case of Solomon's problem is the clearest and strongest, would you care to rebut it?

I've challenged you to clarify through which mechanism someone with a cancer gene would decide to chew gum, and you haven't answered this properly.

  • If your decision algorithm is EDT, the only free variables that will determine what your decisions are are going to be your preferences and sensory input.
  • The only way the gene can cause you to chew gum in any meaningful sense is to make you prefer to chew gum.
  • An EDT agent has knowledge of its own preferences. Therefore, an EDT agent already knows if it falls in the "likely to get cancer" population.
Comment by ArthurB on preferences:decision theory :: data:code · 2011-02-22T23:12:32.721Z · LW · GW

Yes, the causality is from the decision process to the reward. The decision process may or may not be known to the agent, but its preferences are (data can be read, but the code can only be read if introspection is available).

You can and should self-modify to prefer acting in such a way that you would benefit from others predicting you would act a certain way. You get one-boxing behavior in Newcomb's and this is still CDT/EDT (which are really equivalent, as shown).

Yes, you could implement this behavior in the decision algorithm itself, and yes this is very much isomorphic. Evolution's way to implement better cooperation has been to implement moral preferences though, it feels like a more natural design.

Comment by ArthurB on preferences:decision theory :: data:code · 2011-02-21T04:40:12.652Z · LW · GW

Typo, I do mean that EDT two boxes.

Comment by ArthurB on preferences:decision theory :: data:code · 2011-02-21T04:39:36.989Z · LW · GW

According to wikipedia, the definition of EDT is

Evidential decision theory is a school of thought within decision theory according to which the best action is the one which, conditional on your having chosen it, gives you the best expectations for the outcome.

This is not the same as "being a randomly chosen member of a group of people..." and I've explained why. The information about group membership is contained in the filtration.

Comment by ArthurB on preferences:decision theory :: data:code · 2011-02-19T15:14:25.926Z · LW · GW

You're saying EDT causes you not to chew gum because cancer gives you EDT? Where does the gum appear in the equation?

Comment by ArthurB on preferences:decision theory :: data:code · 2011-02-19T14:28:07.913Z · LW · GW

The claim is generally that EDT chooses not to chew gum.

Comment by ArthurB on preferences:decision theory :: data:code · 2011-02-19T06:23:38.783Z · LW · GW

No it can't. If you use a given decision theory, your actions are entirely determined by your preferences and your sensory inputs.

Comment by ArthurB on preferences:decision theory :: data:code · 2011-02-19T05:23:37.443Z · LW · GW

But that's not how EDT works - your modification amounts to a totally different algorithm, which you've conveniently named "EDT".

EDT measures expected value after the action has been taken, but the output of EDT has no reason to be ignored by EDT if it is relevant to the calculation.

...then Omega's prediction is that EDT will two-box and oops - goodbye prize.

It loses, but it is generally claimed that EDT one boxes.

Comment by ArthurB on preferences:decision theory :: data:code · 2011-02-19T05:20:48.099Z · LW · GW

This case is handled in the previous sentence. If this is your actual decision, and your actual decision is the product of a decision algorithm, then your decision algorithm is not EDT.

To put it another way, is your decision to chew gum determined by EDT our by your genes? Pick one.

Comment by ArthurB on Outlawing Anthropics: An Updateless Dilemma · 2009-09-10T18:25:43.264Z · LW · GW

As it's been pointed out, this is not an anthropic problem, however there still is a paradox. I'm may be stating the obvious, but the root of the problem is that you're doing something fishy when you say that the other people will think the same way and that your decision will theirs.

The proper way to make a decision is to have a probability distribution on the code of the other agents (which will include their prior on your code). From this I believe (but can't prove) that you will take the correct course of action.

Newcomb like problem fall in the same category, the trick is that there is always a belief about someone's decision making hidden in the problem.

Comment by ArthurB on Mathematical simplicity bias and exponential functions · 2009-08-27T23:20:12.239Z · LW · GW

Hum no you haven't. The approximation depends on the scale of course.

Comment by ArthurB on Mathematical simplicity bias and exponential functions · 2009-08-27T22:05:57.121Z · LW · GW

Indeed.

But I may have gotten "scale" wrong here. If we scale the error at the same time as we scale the part we're looking at, then differentiability is necessary and sufficient. If we're concerned about approximating the function, on a smallish part, then continuous is what we're looking for.

Comment by ArthurB on Mathematical simplicity bias and exponential functions · 2009-08-27T21:45:04.270Z · LW · GW

ok, but with this definition of "approximate", a piecewise linear function with finitely many pieces cannot approximate the Weierstrass function.

The original question is whether a continuous function can be approximated by a linear function at a small enough scale. The answer is yes.

If you want the error to decrease linearly with scale, then continuous is not sufficient of course.

Comment by ArthurB on Mathematical simplicity bias and exponential functions · 2009-08-27T21:04:49.937Z · LW · GW

I defined approximate in an other comment.

Approximate around x : for every epsilon > 0, there is a neighborhood of x over which the absolute difference between the approximation and the approximation function is always lower than epsilon.

Adding a slop to a small segment doesn't help or hurt the ability to make a local approximation, so continuous is both sufficient and necessary.

Comment by ArthurB on Mathematical simplicity bias and exponential functions · 2009-08-27T19:01:13.043Z · LW · GW

that is because our eyes cannot see nowhere differentiable functions

That is because they are approximated by piecewise linear functions.

Consider that when you look at a "picture" of the Weierstrass function and pick a point on it, you would swear to yourself that the curve happens to be "going up" at that point. Think about that for a second: the function isn't differentialble - it isn't "going" anywhere at that point!

It means on any point you can't make a linear approximation whose precision increases like the inverse of the scale, it doesn't mean you can't approximate.

Comment by ArthurB on Mathematical simplicity bias and exponential functions · 2009-08-27T18:33:11.827Z · LW · GW

No he's right. The Weierstrass function can be approximated with a piecewise linear function. It's obvious, pick N equally spaced points and join then linearly. For N big enough, you won't see the difference. It means that is is becoming infinitesimally small as N gets bigger.

Comment by ArthurB on Mathematical simplicity bias and exponential functions · 2009-08-27T18:24:28.626Z · LW · GW

Question is, what do you mean "approximately".

If you mean, for any error size, the supremum of distance between the linear approximation and the function is lower than this error for all scales smaller than a given scale, then a necessary and sufficient condition is "continuous". Differentiable is merely sufficient.

When the function is differentiable, you can make claims on how fast the error decreases asymptotically with scale.

Comment by ArthurB on Why You're Stuck in a Narrative · 2009-08-04T17:52:37.528Z · LW · GW

An explanation cannot increase your knowledge.Your knowledge can only increase by observation. Increasing your knowledge is a decision theory problem (exploration/exploitation for example).

Phlogiston explains why some categories of things burn and some don't. Phlogiston predicts that dry wood will always burn when heated to a certain temperature. Phlogiston explains why different kind of things burn as opposed to sometime burn and sometimes not burn. It explains that if you separate a piece of woods in smaller pieces, every smaller piece will also burn.

To clarify my original point, the problem isn't the narrative. The narrative is a heuristic, it's a method to update from an observation by remembering a simple unimodal distribution centered on the narrative (what I think most likely happened, how confident I am)

Comment by ArthurB on Why You're Stuck in a Narrative · 2009-08-04T14:17:57.397Z · LW · GW

It seems to me that a narrative is generally a maximum likelihood explanation behind an event. If you observe two weird events, an explanation that links them is more likely than an explanation that doesn't. That's why causality is such a great explanation mechanism. I don't think making narratives is a bug. The bug is discarding the rest of the probability distribution... we are bad are remembering complex multimodal distributions.

Sometimes, a narrative will even add unnecessary details and it looks like a paradox (the explanation would be more likely without the details). However, the explanation without the detail would be a zone while the explanation with the detail is a point. If we try to remember modes, it makes perfect sense to add the details.

Coming from here, I don't really understand the advice to

"In other words, concentrate your probability mass"

It seems that concentrating the probability mass would reinforce the belief in the most likely explanation which is often a narrative.

Comment by ArthurB on Shut Up And Guess · 2009-07-23T18:23:50.020Z · LW · GW

Yes, if there are two or more options and the score function depends only on the probability assigned to the correct outcome, then the only proper function is log. You can see that with the equation I gave

f0' (x) = (k - x.f1' (x))/(1-x)

for f0 = 0, it means x.f1'(x) = -k thus f1(x) = -k ln(x) + c (necessary condition)

Then you have to check that -k ln(x) + c indeed works for some k and c, that is left as an exercise for the reader ^^

Comment by ArthurB on Shut Up And Guess · 2009-07-23T17:16:58.138Z · LW · GW

No.

I am assuming the student has a distribution in mind and we want to design a scoring rule where the best strategy to maximize the expected score is to write in the distribution you have in mind.

If there are n options and the right answer is i and you give log(n p_i) / log(n) points to the student, then his incentive is to write in the exact distribution. On the other hand, if you give him say p_i* point, his incentive would be to write in "1" for the most likely answer and 0 otherwise.

Another way to score is not to give point only on p_i but to take away points on p_i where i != i by using a function f1 for p_i* and f0 otherwise. I gave a necessary condition on f1 and f0 for the student belief to be a local maximum of the expected score. The technique is simply lagrangian multipliers.

The number of options drop out of the equation that's beautiful, so you can extend to any number of answers or even a continuous question. (when asked what the population of Zimbabwe is, the student could describe any parametric distribution and be scored on that... histograms, gaussians... there are many ways a students could write in his answer.

Comment by ArthurB on Fairness and Geometry · 2009-07-23T14:23:49.979Z · LW · GW

The equivalence class of the utility function should be the set of monotonous function of a canonical element.

However, what von Neumann-Morgenstern shows under mild assumptions is that for each class of utility functions, there is a subset of utility functions generated by the affine transforms of a single canonical element for which you can make decisions by computing expected utility. Therefore, looking at the set of all affine transforms of such an utility function really is the same as looking at the whole class. Still, it doesn't make utility commensurable.

Comment by ArthurB on Fairness and Geometry · 2009-07-23T14:02:00.744Z · LW · GW

A speck in Adam's eye vs Eve being tortured is not a utility comparison but a happiness comparison. Happiness is hard to compare but can be compared because it is a state, utility is an ordering function. There is no utility meter.

Comment by ArthurB on Shut Up And Guess · 2009-07-22T21:07:31.514Z · LW · GW

You're correct. In the previous post given, it was somehow assumed that the score for a wrong answer was 0. In that case, the only proper score function is the log.

If you have a score function f1(q) for the right answer f0(q) for the wrong answer, and there are n possible choices, the right p are critical only if

f0' (x) = (k - x.f1' (x))/(1-x)

if we set f1(x) = 1 - (1-x)^p we can set f0(x) = -(1-x)^p + (1-x)^(p-1) * p/(p-1)

for p = 2, we find f0(x) = -(1-x)^2 + 2(1-x) = 1 - x^2 this is Brier score for p = 3, we find f0(x) = -(1-x)^3 + (1-x)^2 3/2 = x^3 - 3x^2/2

1-(1-x)^3 and x^3-3*x^2/2 shall be known as ArthurB's score

Comment by ArthurB on Shut Up And Guess · 2009-07-21T19:30:02.473Z · LW · GW

Good thing with a log score rule is that if the student try to maximize the expected score, they should write in their belief.

For the same reason, when confronted with a set of odds on the outcome of an event, betting on each outcome in proportion to your belief will maximize the log of the expected gain (regardless of what the current odds are)

Comment by ArthurB on Timeless Decision Theory: Problems I Can't Solve · 2009-07-21T18:45:39.336Z · LW · GW

I confess I do not grasp the problem well enough to see where the problem lies in my comment. I am trying to formalize the problem, and I think the formalism I describe is sensible.

Once again, I'll reword it but I think you'll still find it too vague : to win, one must act rationally and the set of possible action includes modifying one's code.

The question was

My timeless decision theory only functions in cases where the other agents' decisions can be viewed as functions of one argument, that argument being your own choice in that particular case - either by specification (as in Newcomb's Problem) or by symmetry (as in the Prisoner's Dilemma). If their decision is allowed to depend on how your decision depends on their decision - like saying, "I'll cooperate, not 'if the other agent cooperates', but only if the other agent cooperates if and only if I cooperate - if I predict the other agent to cooperate unconditionally, then I'll just defect" - then in general I do not know how to resolve the resulting infinite regress of conditionality, except in the special case of predictable symmetry

I do not know the specifics of Eliezer's timeless decision theory, but it seems to me that if one looks at the decision process of other based on their belief of your code, not on your decisions, there is no infinite regression progress.

You could say : Ah but there is your belief about an agent's code, then his belief about your belief about his code, then your belief about his belief about your belief about his code, and that looks like an infinite regression. However, there is really no regression since "his belief about your belief about his code" is entirely contained in "your belief about his code".

Comment by ArthurB on Shut Up And Guess · 2009-07-21T15:42:59.605Z · LW · GW

On this page, the cumulative refers to the probability of obtaining at most p successes. You want to run it with 30 and 9 which gives you the right answer, 2.14%

Or you could put in 30 and 20 which gives you the complement.

What is lower than 1% is the probability of getting 8 or less right answers.

Comment by ArthurB on Timeless Decision Theory: Problems I Can't Solve · 2009-07-21T15:05:11.508Z · LW · GW

Well, if you want practicality, I think Omega problems can be disregarded, they're not realistic. It seems that the only feature needed for the real world is the ability to make trusted promises as we encounter the need to make them.

If we are not concerned with practicality but the theoretical problem behind these paradoxes, the key is that other agents make prediction on your behavior, which is the same as saying they have a theory of mind, which is simply a belief distribution over your own code.

To win, you should take the actions that make their belief about your own code favorable to you, which can include lying, or modifying your own code and showing it to make your point.

It's not our choice that matters in these problem but our choosing algorithm.

Comment by ArthurB on Shut Up And Guess · 2009-07-21T14:37:25.316Z · LW · GW

I think you got your math wrong

If you get 20 out of 30 questions wrong, you are break even, therefore the probability of losing points by guessing is

Sum( (i 30), i = 21..30) / 2^30 ~ 2.14% > 1%

Comment by ArthurB on Timeless Decision Theory: Problems I Can't Solve · 2009-07-21T03:10:39.497Z · LW · GW

Instead of assuming that other will behave as a function of our choice, we look at the rest of the universe (including other sentient being, including Omega) as a system where our own code is part of the data.

Given a prior on physics, there is a well defined code that maximizes our expected utility.

That code wins. It one boxes, it pays Omega when the coin falls on heads etc.

I think this solves the infinite regress problem, albeit in a very unpractical way,

Comment by ArthurB on Media bias · 2009-07-05T21:03:48.732Z · LW · GW

I find that going to the original paper generally does the trick. When an idea is new, the author will spell the details more carefully.

SilasBarta mentions difficulty with Boltzmann machines, Ackley et al.'s article is actually quite detailed, including a proof of the learning algorithm http://tinyurl.com/q4azfl

Comment by ArthurB on Dialectical Bootstrapping · 2009-03-13T19:48:22.115Z · LW · GW

It would be interesting to try the experiment with Versed. You remove the dialectical aspect (steps 2,3,4) but you keep the wisdom of the crowd aspect.

Comment by ArthurB on The Wrath of Kahneman · 2009-03-11T14:35:15.094Z · LW · GW

There's an ambiguity here. You're talking about valuing something like world justice, I was talking about valuing acting justly. In particular, I believe that if optimal deterrence is unjust, it is also unjust to seek it.

Why does this relate to the subject again? Well, my point is we should not change our sense of justice. It's tautological.

Comment by ArthurB on The Wrath of Kahneman · 2009-03-10T13:23:58.221Z · LW · GW

Your decision making works as a value scale, morality not so much.There is a subset of actions you can take which are just. If you do not give a high weight in acting justly, you're a dangerous person.

Comment by ArthurB on The Wrath of Kahneman · 2009-03-09T21:08:09.098Z · LW · GW

When you say we "should" change our sense of justice, you're making a normative statement because no specific goal is specified.

In this case, it seems wrong. Our sense of justice is part of our morality, therefore we should not change it.

"We should seek justice" is tautological. If justice and optimal deterrence are contradictory, then we should not seek optimal deterrence.