Posts

Linkpost: Rishi Sunak's Speech on AI (26th October) 2023-10-27T11:57:46.575Z

Comments

Comment by bideup on My thoughts on the Beff Jezos - Connor Leahy debate · 2024-02-05T11:09:45.831Z · LW · GW

Are you interested in these debates in order to help form your own views, or convince others?

I feel like debates are inferior to reading people's writings for the former purpose, and for the latter they deal collateral damage by making the public conversation more adversarial.

Comment by bideup on Attention SAEs Scale to GPT-2 Small · 2024-02-05T11:05:46.921Z · LW · GW

I keep reading the title as Attention: SAEs Scale to GPT-2 Small.

Thanks for the heads up.

Comment by bideup on Apologizing is a Core Rationalist Skill · 2024-01-30T22:31:44.464Z · LW · GW

I think what I was thinking of is that words can have arbitrary consequences and be arbitrarily high cost.

In the apologising case, making the right social API call might be an action of genuine significance. E.g. it might mean taking the hit on lowering onlookers' opinion of my judgement, where if I'd argued instead that the person I wronged was talking nonsense I might have got away with preserving it.

John's post is about how you can gain respect for apologising, but it does have often have costs too, and I think the respect is partly for being willing to pay them.

Comment by bideup on Apologizing is a Core Rationalist Skill · 2024-01-04T12:53:05.399Z · LW · GW

Words are a type of action, and I guess apologising and then immediately moving on to defending yourself is not the sort of action which signals sincerity.

Comment by bideup on A case for AI alignment being difficult · 2024-01-03T08:59:05.468Z · LW · GW

Explaining my downvote:

This comment contains ~5 negative statements about the post and the poster without explaining what it is that the commentor disagrees with.

As such it seems to disparage without moving the conversation forward, and is not the sort of comment I'd like to see on LessWrong.

Comment by bideup on Apologizing is a Core Rationalist Skill · 2024-01-02T20:15:47.680Z · LW · GW

The second footnote seems to be accidentally duplicated as the intro. Kinda works though.

Comment by bideup on Apologizing is a Core Rationalist Skill · 2024-01-02T18:11:06.196Z · LW · GW

"Not invoking the right social API call" feels like a clarifying way to think about a specific conversational pattern that I've noticed that often leads to a person (e.g. me) feeling like they're virtuosly giving up ground, but not getting any credit for it.

It goes something like:

Alice: You were wrong to do X and Y.

Bob: I admit that I was wrong to do X and I'm sorry about it, but I think Y is unfair.

discussion continues about Y and Alice seems not to register Bob's apology

It seems like maybe bundling in your apology for X with a protest against Y just doesn't invoke the right API call. I'm not entirely sure what the simplest fix is, but it might just be swapping the order of the protest and the apology.

Comment by bideup on Critical review of Christiano's disagreements with Yudkowsky · 2023-12-29T11:45:42.213Z · LW · GW

Is it true that scaling laws are independent of architecture? I don’t know much about scaling laws but that seems surely wrong to me.

e.g. how does RNN scaling compare to transformer scaling

Comment by bideup on E.T. Jaynes Probability Theory: The logic of Science I · 2023-12-28T10:31:15.267Z · LW · GW

Your example of a strong syllogism (‘if A, then B. A is true, therefore B is true’) isn’t one.

It’s instead of the form ‘If A, then B. A is false, therefore B is false’, which is not logically valid (and also not a Jaynesian weak syllogism).

If Fisher lived to 100 he would have become a Bayesian

Fisher died at the age of 72

———————————————————————————————————

Fisher died a Frequentist

You could swap the conclusion with the second premise and weaken the new conclusion to ‘Fisher died before 100’, or change the premise to ‘Unless Fisher lived to a 100 he would not become a Bayesian’.

Comment by bideup on Critical review of Christiano's disagreements with Yudkowsky · 2023-12-27T23:27:11.763Z · LW · GW

Augmenting humans to do better alignment research seems like a pretty different proposal to building artificial alignment researchers.

The former is about making (presumed-aligned) humans more intelligent, which is a biology problem, while the latter is about making (presumed-intelligent) AIs aligned, which is a computer science problem.

Comment by bideup on "AI Alignment" is a Dangerously Overloaded Term · 2023-12-15T22:17:13.163Z · LW · GW

I don’t think that that’s the view of whoever wrote the paragraph you’re quoting, but at this point we’re doing exegesis

Comment by bideup on "AI Alignment" is a Dangerously Overloaded Term · 2023-12-15T19:01:35.704Z · LW · GW

Hm, I think that paragraph is talking about the problem of getting an AI to care about a specific particular thing of your choosing (here diamond-maximising), not any arbitrary particular thing at all with no control over what it is. The MIRI-esque view thinks the former is hard and the latter happens inevitably.

Comment by bideup on "AI Alignment" is a Dangerously Overloaded Term · 2023-12-15T16:32:24.355Z · LW · GW

Right, makes complete sense in the case of LLM-based agents, I guess I was just thinking about much more directly goal-trained agents.

Comment by bideup on "AI Alignment" is a Dangerously Overloaded Term · 2023-12-15T16:27:25.416Z · LW · GW

I like the distinction but I don’t think either aimability or goalcraft will catch on as Serious People words. I’m less confident about aimability (doesn’t have a ring to it) but very confident about goalcraft (too Germanic, reminiscent of fantasy fiction).

Is words-which-won’t-be-co-opted what you’re going for (a la notkilleveryoneism), or should we brainstorm words-which-could-plausibly-catch on?

Comment by bideup on "AI Alignment" is a Dangerously Overloaded Term · 2023-12-15T16:12:46.189Z · LW · GW

Perhaps, or perhaps not? I might be able to design a gun which shoots bullets in random directions (not on random walks), without being able to choose the direction.

Maybe we can back up a bit, and you could give some intuition for why you expect goals to go on random walks at all?

My default picture is that goals walk around during training and perhaps during a reflective process, and then stabilise somewhere.

Comment by bideup on "AI Alignment" is a Dangerously Overloaded Term · 2023-12-15T16:10:34.400Z · LW · GW

I think that’s a reasonable point (but fairly orthogonal to the previous commenter’s one)

Comment by bideup on "AI Alignment" is a Dangerously Overloaded Term · 2023-12-15T15:51:28.134Z · LW · GW

A gun which is not easily aimable doesn't shoot bullets on random walks.

Or in less metaphorical language, the worry is that mostly that it's hard to give the AI the specific goal you want to give it, not so much that it's hard to make it have any goal at all. I think people generally expect that naively training an AGI without thinking about alignment will get you a goal-directed system, it just might not have the goal you want it to.

Comment by bideup on Understanding Subjective Probabilities · 2023-12-11T13:28:26.023Z · LW · GW

Sounds like the propensity interpretation of probability.

Comment by bideup on How do you feel about LessWrong these days? [Open feedback thread] · 2023-12-07T10:49:20.531Z · LW · GW

FiO?

Comment by bideup on Shallow review of live agendas in alignment & safety · 2023-11-27T12:57:21.635Z · LW · GW

Nice job

Comment by bideup on Thomas Kwa's research journal · 2023-11-27T11:31:07.860Z · LW · GW

I like the idea of a public research journal a lot, interested to see how this pans out!

Comment by bideup on You can just spontaneously call people you haven't met in years · 2023-11-14T20:57:29.493Z · LW · GW

You seem to be operating on a model that says “either something is obvious to a person, or it’s useful to remind them of it, but not both”, whereas I personally find it useful to be reminded of things that I consider obvious, and I think many others do too. Perhaps you don’t, but could it be the case that you’re underestimating the extent to which it applies to you too?

I think one way to understand it is to disambiguate ‘obvious’ a bit and distinguish what someone knows from what’s salient to them.

If someone reminds me that sleep is important and I thank them for it, you could say “I’m surprised you didn’t know that already,” but of course I did know it already - it just hadn’t been salient enough to me to have as much impact on my decision-making as I’d like it to.

I think this post is basically saying: hey, here’s a thing that might not be as salient to you as it should be.

Maybe everything is always about the right amount of salient to you already! If so you are fortunate.

Comment by bideup on You can just spontaneously call people you haven't met in years · 2023-11-14T16:30:10.575Z · LW · GW

I think it falls into the category of 'advice which is of course profoundly obvious but might not always occur to you', in the same vein as 'if you have a problem, you can try to solve it'.

When you're looking for something you've lost, it's genuinely helpful when somebody says 'where did you last have it?', and not just for people with some sort of looking-for-stuff-atypicality.

Comment by bideup on Making Bad Decisions On Purpose · 2023-11-14T16:26:31.552Z · LW · GW

I think I practice something similar to this with selfishness: a load-bearing part of my epistemic rationality is having it feel acceptable that I sometimes (!) do things for selfish rather than altruistic reasons.

You can make yourself feel that selfish acts are unacceptable and hope this will make you very altruistic and not very selfish, but in practice it also makes you come up with delusional justifications as to why selfish acts are in fact altruistic.

From an impartial standpoint we can ask how much of the latter is woth it for how much of the former. I think one of life's repeated lessons is that sacrificing your epistemics for instrumental reasons is almost always a bad idea.

Comment by bideup on [deleted post] 2023-11-13T09:14:22.529Z

Do people actually disapprove of and disagree with this comment, or do they disapprove of the use of said 'poetic' language in the post? If the latter, perhaps they should downvote the post and upvote the comment for honesty.

Perhaps there should be a react for "I disapprove of the information this comment revealed, but I'm glad it admitted it".

Comment by bideup on Alignment Implications of LLM Successes: a Debate in One Act · 2023-10-23T15:24:47.118Z · LW · GW

LLMs calculate pdfs, regardless of whether they calculate ‘the true’ pdf.

Comment by bideup on On (Not) Reading Papers · 2023-10-22T11:26:00.819Z · LW · GW

Sometimes I think trying to keep up with the endless stream of new papers is like watching the news - you can save yourself time and become better informed by reading up on history (ie classic papers/textbooks) instead.

This is a comforting thought, so I’m a bit suspicious of it. But also it’s probably more true for a junior researcher not committed to a particular subfield than someone who’s already fully specialised.

Comment by bideup on Holly Elmore and Rob Miles dialogue on AI Safety Advocacy · 2023-10-21T10:35:31.503Z · LW · GW

Sometimes such feelings are your system 1 tracking real/important things that your system 2 hasn’t figured out yet.

Comment by bideup on Comparing Anthropic's Dictionary Learning to Ours · 2023-10-09T16:12:27.179Z · LW · GW

I’d like to see more posts using this format, including for theoretical research.

Comment by bideup on Thomas Kwa's MIRI research experience · 2023-10-03T22:06:27.305Z · LW · GW

I vote singular learning theory gets priority (if there was ever a situation where one needed to get priority). I intuitively feel like research agendas or communities need an acronym more than concepts. Possibly because in the former case the meaning of the phrase becomes more detached from the individual meaning of the words than it does in the latter.

Comment by bideup on EA Vegan Advocacy is not truthseeking, and it’s everyone’s problem · 2023-09-29T10:48:25.192Z · LW · GW

Just wanted to say that I am a vegan and I’ve appreciated this series of posts.

I think the epistemic environment of my IRL circles has always been pretty good around veganism, and personally I recoil a bit from discussion of specific people or groups’ epistemic virtues of lack thereof (not sure if I think it’s unproductive or just find it aversive), so this particular post is of less interest to me personally. But I think your object-level discussion of the trade-offs of veganism has been consistently fantastic and I wanted to thank you for the contribution!

Comment by bideup on How have you become more hard-working? · 2023-09-26T12:36:50.325Z · LW · GW

Are Self Control and Freedom.to for different purposes or the same? Should I try multiple app/website blockers till I find one that's right for me, or is there an agreed upon best one that I can just adopt with no experimentation?

Comment by bideup on AI presidents discuss AI alignment agendas · 2023-09-13T11:52:18.708Z · LW · GW

Well, the joke does give a fair bit of information about both your politics and how widespread you think they are on LW. It might be very reasonable for someone to update their beliefs about LW politics based on seeing it. Then to what extent their conclusion mind-kills them is somewhat independent of the joke.

(I agree it’s a fairly trivial case, mostly discussing it out of interest in how our norms should work.)

Comment by bideup on AI presidents discuss AI alignment agendas · 2023-09-13T11:42:32.097Z · LW · GW

A Yudphemism: Politics is the Mind-Killer.

Comment by bideup on AI presidents discuss AI alignment agendas · 2023-09-12T13:57:55.630Z · LW · GW

My guess is that it's not that people are downvoting because they think you made a political statement which they oppose and they are mind-killed by it. Rather they think you made a political joke which has the potential to mind-kill others, and they would prefer you didn't.

That's why I downvoted, at least. The topic you mentioned doesn't arouse strong passions in me at all, and probably doesn't arouse strong passions in the average LW reader that much, but it does arouse strong passions in quite a large number of people, and when those people are here, I'd prefer such passions weren't aroused.

Comment by bideup on Sharing Information About Nonlinear · 2023-09-10T13:05:37.431Z · LW · GW

Even now I would like it if you added an edit at the start to make it clearer what you’re doing! Before reading the replying comment and realising the context, I was mildly shocked by such potentially inflammatory speculation and downvoted.

Comment by bideup on Who Has the Best Food? · 2023-09-05T21:50:12.934Z · LW · GW

On the other hand, even the smallest of small towns in the UK has a wide variety of ethnic food. I think pretty much anywhere with a restaurant has a Chinese and an Indian, and usually a lot more.

Comment by bideup on Rational Agents Cooperate in the Prisoner's Dilemma · 2023-09-04T07:49:09.275Z · LW · GW

Meta point: I think the forceful condescending tone is a bit inappropriate when you’re talking about a topic that you don’t necessarily know that much about.

You’ve flatly asserted that the entirety of game theory is built on an incorrect assumption, and whether you’re or not your correct about that, it doesn’t seem like you’re that clued up on game theory.

Eliezer just about gets away with his tone because he knows whereof he speaks. But I would prefer it if he showed more humility, and I think if you’re writing about a topic while you’re learning the basics of it, you should definitely show more! If only because it makes it easier to change your mind as you learn more.

EDIT: I think this reads a bit more negative than I intended, so just wanted to say I did enjoy the post and appreciate your engagement in the comments!

Comment by bideup on Rational Agents Cooperate in the Prisoner's Dilemma · 2023-09-04T07:26:27.948Z · LW · GW

Being able to deduce a policy from beliefs doesn’t mean that common knowledge of beliefs is required.

The common knowledge of policy thing is true but is external to the game. We don’t assume that players in prisoner’s dilemma know each others policies. As part of our analysis of the structure of the game, we might imagine that in practice some sort of iterative responding-to-each-other’s-policy thing will go on, perhaps because players face off regularly (but myopically), and so the policies selected will be optimal wrt each other. But this isn’t really a part of the game, it’s just part of our analysis. And we can analyse games in various different ways e.g. by considering different equilibrium concepts.

In any case it doesn’t mean that an agent in reality in a prisoner’s dilemma has a crystal ball telling them the other’s policy.

Certainly it’s natural to consider the case where the agents are used to playing against each other so the have the chance to learn and react to each other’s policies. But a case where they each learn each other’s beliefs doesn’t feel that natural to me - might as well go full OSGT at that point.

Comment by bideup on Rational Agents Cooperate in the Prisoner's Dilemma · 2023-09-04T07:13:58.709Z · LW · GW

Right, vanilla game theory is mostly not a tool for making decisions.

It’s about studying the structure of strategic interactions, with the idea that some kind of equilibrium concept should have predictive power about what you’ll see in practice. On the one hand, if you get two humans together and tell them the rules of a matrix game, Nash equilibrium has relatively little predictive power. But there are many situations across biology, computer science, economics and more where various equilibrium concepts have plenty of predictive power.

Comment by bideup on Rational Agents Cooperate in the Prisoner's Dilemma · 2023-09-03T21:39:14.930Z · LW · GW

I think you’ve confused the psychological twin prisoner’s dillema (from decision theory) with the ordinary prisoner’s dilemma (from game theory).

In both of them the ‘traditional’ academic position is that rational agents defects.

In PTPD (which is studied as a decision problem i.e it’s essentially single player - the only degree of freedom is your policy) defection is hotly contested both inside and outside of academia, and on LW the consensus is that you should co-operate.

But for ordinary prisoner’s dilemma (which is studied as a two player game - two degrees of freedom in policy space) I’m not sure anybody advocates a blanket policy of co-operating, even on LW. Certainly there’s an idea that two sufficiently clever agents might be able to work something out if they know a bit about each other, but the details aren’t as clearly worked out, and arguably a prisoner’s dilemma in which you use knowledge of your opponent to make your decision is better model as something other than a prisoner’s dilemma.

I think it’s a mistake to confuse LW’s emphatic rejection of CDT with an emphatic rejection of the standard game theoretic analysis of PD.

Comment by bideup on Rational Agents Cooperate in the Prisoner's Dilemma · 2023-09-03T21:16:58.190Z · LW · GW

Yep, a game of complete information is just one is which the structure of the game is known to all players. When wikipedia says

The utility functions (including risk aversion), payoffs, strategies and "types" of players are thus common knowledge.

it’s an unfortunately ambiguous phrasing but it means

The specific utility function each player has, the specific payoffs each player would get from each possible outcome, the set of possible strategies available to each player, and the set of possible types each player can have (e.g. the set of hands they might be dealt in cards) are common knowledge.

It certainly does not mean that the actual strategies or source code of all players are known to each other player.

Comment by bideup on Ten variations on red-pill-blue-pill · 2023-08-20T22:17:24.928Z · LW · GW

I disagree. From the altruistic perspective these puzzles are fully co-operative co-ordination games with two equally good types of Nash equilibria (everyone chooses red, or at least half choose blue), where the strategy you should play depends on which equilibrium you decide to aim for. Players have to try to co-ordinate on choosing the same one, so it's just a classic case of Schelling point selection, and the framing will affect what the Schelling point is (assuming everyone gets told the same framing).

(What's really fun is that we now have two different framings to the meta-problem of "When different framings give different intuitions, should you let the framing influence your decision?" and they give different intuitions.)

Comment by bideup on Ten variations on red-pill-blue-pill · 2023-08-20T21:59:07.895Z · LW · GW

Trying to distill the essence of your first one:

A mad scientist kidnaps everyone and takes a secret ballot, where you can either vote for 'RELEASE EVERYONE' or 'KILL SOME PEOPLE'. He accepts the majority decision, and if it's the latter, he kills everyone who voted for the former.

Comment by bideup on A rough and incomplete review of some of John Wentworth's research · 2023-03-29T22:21:43.071Z · LW · GW

Not fixed!

For example, if an alien tries to sell a basket "Alice loses $1, Bob gains $3", then the market will refuse (because Alice will refuse); and if the alien then switches to selling "Bob gains $3, Alice loses $1"

Comment by bideup on continue working on hard alignment! don't give up! · 2023-03-24T14:52:37.001Z · LW · GW

I've seen you comment several times about the link between Pretraining from Human Feedback and embedded agency, but despite being quite familiar with the embedded agency sequence I'm not getting your point.

I think my main confusion is that to me "the problem of embedded agency" means "the fact that our models of agency are non-embedded, but real world agents are embedded, and so our models don't really correspond to reality", whereas you seem to use "the problem of embedded agency" to mean a specific reason why we might expect misalignment.

Could you say (i) what the problem of embedded agency means to you, and in particular what it has to do with AI risk, and (ii) in what sense PTHF avoids it?

Comment by bideup on The Telephone Theorem: Information At A Distance Is Mediated By Deterministic Constraints · 2022-09-07T15:36:35.119Z · LW · GW

Thanks, that’s very clear. I’m a convert to the edge-based definition.

Comment by bideup on The Telephone Theorem: Information At A Distance Is Mediated By Deterministic Constraints · 2022-09-07T11:11:35.965Z · LW · GW

I'm trying to understand how to map between the definition of Markov blanket used in this post (a partition of the variables in two such that the variables in one set are independent of the variables in the other given the variables on edges which cross the partition) and the one I'm used to (a Markov blanket of a set of variables is another set of variables such that the first set is independent of everything else given the second). I'd be grateful if anyone can tell me whether I've understood it correctly.

There are three obstacles to my understanding: (i) I'm not sure what 'variables on edges' means, and John also uses the phrase 'given the values of the edges' which confuses me, (ii) the usual definition is with respect to some set of variables, but the one in this post isn't, (iii) when I convert between the definitions, the place I have to draw the line on the graph changes, which makes me suspicious.

Here's me attempting to overcome the obstacles:

(i) I'm assuming 'variables on the edges' means the parents of the edges, not the children or both. I'm assuming 'values of edges' means the values of the parents.
(ii) I think we can reconcile this by saying that if M is a Markov blanket of a set of variables V in the usual sense, then a line which cuts through an outgoing edge of each variable in M is a Markov blanket in the sense of this post. Conversely, if some Markov blanket in the sense of this post parititons our graph into A and B, then the set M of parents of edges crossing the partion forms a Markov blanket of both A\M and B\M in usual sense.
(iii) I think I have to suck it up and accept that the lines look different. In this picture, the nodes in the blue region (except A) form a Markov blanket for A in the usual sense. The red line is a Markov blanket in the sense of this post.

Does this seem right?

Comment by bideup on Humans provide an untapped wealth of evidence about alignment · 2022-08-04T13:52:27.611Z · LW · GW

Do my values bind to objects in reality, like dogs, or do they bind to my mental representations of those objects at the current timestep?

You might say: You value the dog's happiness over your mental representation of it, since if I gave you a button which made the dog sad, but made you believe the dog was happy, and another button which made the dog happy, but made you believe the dog was sad, you'd press the second button.

I say in response: You've shown that I value my current timestep estimation of the dog's future happiness over my current timestep estimation of my future estimation of the dog's happiness. 

I think we can say that whenever I make any decision, I'm optimising my mental representation of the world after the decision has been made but before it has come into effect.

Maybe this is the same as saying my values bind to objects in reality, or maybe it's different. I'm not sure.

Comment by bideup on Reward is not the optimization target · 2022-08-04T12:48:48.151Z · LW · GW

Right. So if selection acts on policies, each policy should aim to maximise reward in any episode in order to maximise its frequency in the population. But if selection acts on particular aspects of policies, a policy should try to get reward for doing things it values, and not for things it doesn't, in order to reinforce those values. In particular this can mean getting less reward overall.

Does this suggest a class of hare-brained alignment schemes where you train with a combination of inter-policy and infra-policy updates to take advantage of the difference?

For example you could clearly label which episodes are to be used for which and observe whether a policy consistently gets more reward in the former case than the latter. If it does, conclude it's sophisticated enough to reason about its training setup.

Or you could not label which is which, and randomly switch between the two, forcing your agents to split the difference and thus be about half as successful at locking in their values.