Why I think nuclear war triggered by Russian tactical nukes in Ukraine is unlikely 2022-10-11T18:30:08.110Z
Playing with DALL·E 2 2022-04-07T18:49:16.301Z
parenting rules 2020-12-21T19:48:42.365Z


Comment by Dave Orr (dave-orr) on A Back-Of-The-Envelope Calculation On How Unlikely The Circumstantial Evidence Around Covid-19 Is · 2024-02-08T03:37:46.797Z · LW · GW

One big one is that the first big spreading event happened at a wet market where people and animals are in close proximity. You could check densely peopled places within some proximity of the lab to figure out how surprising it is that it happened in a wet market, but certainly animal spillover is much more likely where there are animals.

Edit: also it's honestly kind of a bad sign that you aren't aware of evidence that tends against your favored explanation, since that mostly happens during motivated reasoning.

Comment by Dave Orr (dave-orr) on Lack of Spider-Man is evidence against the simulation hypothesis · 2024-01-06T20:50:53.597Z · LW · GW

We're here to test the so-called tower of babel theory. What if, due to some bizarre happenstance, humanity had thousands of languages that change all the time instead of a single universal language like all known intelligent species?

Comment by Dave Orr (dave-orr) on Which battles should a young person pick? · 2023-12-30T05:10:30.138Z · LW · GW

You should ignore the EY style "no future" takes when thinking about your future. This is because if the world is about to end, nothing you do will matter much. But if the world isn't about to end, what you do might matter quite a bit -- so you should focus on the latter.

One quick question to ask yourself is: are you more likely to have an impact on technology, or on policy? Either one is useful. (If neither seems great, then consider earning to give, or just find a way to add value in society in other ways.)

Once you figure that out, the next step is almost certainly building relevant skills, knowledge, and networks. Connect with senior folks with relevant roles, ask and otherwise try to figure out what skills and such are useful, try to get some experience by working or volunteering with great people or organizations.

Do that for a while and I bet some gaps and opportunities will become pretty clear. 😀

Comment by Dave Orr (dave-orr) on Would you have a baby in 2024? · 2023-12-25T20:46:36.207Z · LW · GW

I agree that it's bad to raise a child in an environment of extreme anxiety. Don't do that.

Also try to avoid being very doomy and anxious in general, it's not a healthy state to be in. (Easier said than done, I realize.)

Comment by Dave Orr (dave-orr) on Would you have a baby in 2024? · 2023-12-25T02:46:52.324Z · LW · GW

I think you should have a kid if you would have wanted one without recent AI progress. Timelines are still very uncertain, and strong AGI could still be decades away. Parenthood is strongly value creating and extremely rewarding (if hard at times) and that's true in many many worlds.

In fact it's hard to find probable worlds where having kids is a really bad idea, IMO. If we solve alignment and end up in AI utopia, having kids is great! If we don't solve alignment and EY is right about what happens in a fast takeoff world, it doesn't really matter if you have kids or not.

In that sense, it's basically a freeroll, though of course there are intermediate outcomes. I don't immediately see any strong argument in favor of not having kids if you would otherwise want them.

Comment by Dave Orr (dave-orr) on Where can I learn about algorithmic transformation of AI prompts? · 2023-11-21T01:41:36.614Z · LW · GW

The thing you're missing is called instruction tuning. You gather a series of prompt/response pairs and fine tune the model over that data. Do it right and you have a chatty model.

Comment by Dave Orr (dave-orr) on Monthly Roundup #12: November 2023 · 2023-11-14T17:08:34.371Z · LW · GW

Thanks, Zvi, these roundups are always interesting.

I have one small suggestion, which is that you limit yourself to one Patrick link per post. He's an interesting guy but his area is quite niche, and if people want his fun stories about banking systems they can just follow him. I suspect that people who care about those things already follow him, and people who don't aren't that interested to read four items from him here.

Comment by Dave Orr (dave-orr) on Responsible Scaling Policies Are Risk Management Done Wrong · 2023-10-26T05:12:34.945Z · LW · GW

I feel like a lot of the issues in this post are that the published RSPs are not very detailed and most of the work to flesh them out is not done. E.g. the comparison to other risk policies highlights lack of detail in various ways.

I think it takes a lot of time and work to build our something with lots of analysis and detail, years of work potentially to really do it right. And yes, much of that work hasn't happened yet.

But I would rather see labs post the work they are doing as they do it, so people can give feedback and input. If labs do so, the frameworks will necessarily be much less detailed than they would if we waited until they were complete.

So it seems to me that we are in a messy process that's still very early days. Feedback about what is missing and what a good final product would look like is super valuable, thank you for your work doing that. I hope the policy folks pay close attention.

But I think your view that RSPs are the wrong direction is misguided, or at least I don't find your reasons to be persuasive -- there's much more work to be done before they're good and useful, but that doesn't mean they're not valuable. Honestly I can't think of anything much better that could have been reasonably done given the limited time and resources we all have.

I think your comments on the name are well taken. I think your ideas about disclaimers and such are basically impossible for a modern corporation, unfortunately. I think your suggestion about pushing for risk management in policy are the clear next step, that's only enabled by the existence of an RSP in the first place.

Thanks for the detailed and thoughtful effortpost about RSPs!

Comment by Dave Orr (dave-orr) on Thoughts on responsible scaling policies and regulation · 2023-10-25T01:46:57.964Z · LW · GW

I agree with all of this. It's what I meant by "it's up to all of us."

It will be a signal of how things are going if I'm a year we still have only vague policies, or if there has been real progress in operationalizing the safety levels, detection, what the right reactions are, etc.

Comment by Dave Orr (dave-orr) on Thoughts on responsible scaling policies and regulation · 2023-10-24T23:26:23.280Z · LW · GW

I think there are two paths, roughly, that RSPs could send us down. 

  1. RSPs are a good starting point. Over time we make them more concrete, build out the technical infrastructure to measure risk, and enshrine them in regulation or binding agreements between AI companies. They reduce risk substantially, and provide a mechanism whereby we can institute a global pause if necessary, which seems otherwise infeasible right now.
  2. RSPs are a type of safety-washing. They provide the illusion of a plan, but as written they are so vague as to be meaningless. They let companies claim they take safety seriously but don't meaningfully reduce risk, and in fact may increase it by letting companies skate by without doing real work, rather than forcing companies to act responsibly by just not developing a dangerous uncontrollable technology.

If you think that Anthropic and other labs that adopt these are fundamentally well meaning and trying to do the right thing, you'll assume that we are by default heading down path #1.  If you are more cynical about how companies are acting, then #2 may seem more plausible.

My feeling is that Anthropic et al are clearly trying to do the right thing, and that it's on us to do the work to ensure that we stay on the good path here, by working to deliver the concrete pieces we need, and to keep the pressure on AI labs to take these ideas seriously.  And to ask regulators to also take concrete steps to make RSPs have teeth and enforce the right outcomes. 

But I also suspect that people on the more cynical side aren't going to be persuaded by a post like this. If you think that companies are pretending to care about safety but really are just racing to make $$, there's probably not much to say at this point other than, let's see what happens next.

Comment by Dave Orr (dave-orr) on AI #34: Chipping Away at Chip Exports · 2023-10-19T17:18:24.687Z · LW · GW

New York City Mayor Eric Adams has been using ElevenLabs AI to create recordings of him in languages he does not speak and using them for robocalls. This seems pretty not great.


Can you say more about why you think this is problematic? Recording his own voice for a robocall is totally fine, so the claim here is that AI involvement makes it bad? 

Yes he should disclose somewhere that he's doing this, but deepfakes with the happy participation of the person whose voice is being faked seems like the best possible scenario.

Comment by Dave Orr (dave-orr) on How should TurnTrout handle his DeepMind equity situation? · 2023-10-18T04:35:47.373Z · LW · GW

FWIW as an executive working on safety at Google, I basically never consider my normal working activities in light of what they would do to Google's stock price.

The exception is around public communication. There I'm very careful because it's asymmetrical -- I could potentially cause a pr disaster that would affect the stock, but I don't see how I could give a talk that's so good that it helps it.

Maybe a plug pulling situation would be different, but I also think it's basically impossible for it to be a unilateral situation, and if we're in such a moment, I hardly think any damage would be contained to Google's stock price, versus say the market as a whole.

Comment by Dave Orr (dave-orr) on How should TurnTrout handle his DeepMind equity situation? · 2023-10-16T19:05:46.426Z · LW · GW

How much do you think that your decisions affect Google's stock price? Yes maybe more AI means a higher price, but on the margin how much will you be pushing that relative to a replacement AI person? And mostly the stock price fluctuates on stuff like how well the ads business is doing, macro factors, and I guess occasionally whether we gave a bad demo.  

It feels to me like the incentive is just so diffuse that I wouldn't worry about it much.

Your idea of just donating extra gains also seems fine.

Comment by Dave Orr (dave-orr) on How should TurnTrout handle his DeepMind equity situation? · 2023-10-16T19:01:47.999Z · LW · GW

That's not correct, or at least not how my Google stock grants work. The price is locked in at grant time, not vest time. In practice what that means is that you get x shares every month, which counts as income when multiplied by the current stock price.

And then you can sell them or whatever, including having a policy that automatically sells them as soon as they vest.

Comment by Dave Orr (dave-orr) on AI Alignment [Incremental Progress Units] this week (10/08/23) · 2023-10-16T04:44:21.807Z · LW · GW

The star ratings are an improvement, I had felt also that breakthrough was overselling many of the items last week.

However, stars are very generic and don't capture the concept of a breakthrough very well. You could consider a lightbulb.

I also asked chatgpt to create an emoji of an AI breakthrough, and after some iteration it came up with this:

Use it if you like it!

Thanks for putting together this roundup, I learn things from it every time.

Comment by Dave Orr (dave-orr) on Which Anaesthetic To Choose? · 2023-10-14T16:40:05.648Z · LW · GW

I agree with this.

Consider a hypothetical: there are two drugs we could use to execute prisoners convinced with the death penalty. One of them causes excruciating pain, the other does not, but costs more.

Would we feel that we would rather use the torture drug later? After all, the dude is dead, so he doesn't care either way.

I have a pretty strong intuition that those drugs are not similar. Same thing with the anesthesia example.

Comment by Dave Orr (dave-orr) on AI #33: Cool New Interpretability Paper · 2023-10-12T17:22:14.972Z · LW · GW

HT Michael Thiessen, who expects this to result in people figuring out how to extract the (distilled) model weights. Is that inevitable?


Not speaking for Google here.

I think it's inevitable, or at least it's impossible to stop someone willing to put in the effort. The weights are going to be loaded into the phone's memory, and a jailbroken phone should let you have access to the raw memory.

But it's a lot of effort and I'm not sure what the benefit would be to anyone. My guess is that if this happens it will be by a security researcher or some enterprising grad student, not by anyone actually motivated to use the weights for anything in particular.

Comment by Dave Orr (dave-orr) on This anime storyboard doesn't exist: a graphic novel written and illustrated by GPT4 · 2023-10-05T19:09:29.506Z · LW · GW

I could see the illustrations via RSS, but don't see them here, chrome on mobile.

Comment by Dave Orr (dave-orr) on Using Reinforcement Learning to try to control the heating of a building (district heating) · 2023-10-05T04:12:11.789Z · LW · GW

I assume you've seen these, but if not, there are some relevant papers here:

Comment by Dave Orr (dave-orr) on Monthly Roundup #11: October 2023 · 2023-10-03T17:41:06.921Z · LW · GW

The main place we differ is that we are on opposite sides of the ‘will Tether de-peg?’ market. No matter what they did in the past, I now see a 5% safe return as creating such a good business that no one will doubt ability to pay. Sometimes they really do get away with it, ya know?

This seems sensible, but I remember thinking something very similar about Full Tilt, and then they turned out to be doing a bunch of shady shit that was very not in their best interest. I think there's a significant chance that fraudsters gonna fraud even when they really shouldn't, and Tether in particular has such a ridiculous background that it just seems very possible that they will take unnecessary risks, lend money when they shouldn't, etc, just because people do what they've been doing all too often.

Comment by Dave Orr (dave-orr) on Monthly Roundup #11: October 2023 · 2023-10-03T17:38:17.224Z · LW · GW
Comment by Dave Orr (dave-orr) on Monthly Roundup #11: October 2023 · 2023-10-03T16:19:20.144Z · LW · GW

Pradyumna: You a reasonable person: the city should encourage carpooling to reduce congestion

Bengaluru’s Transport Department (a very stable genius): Taxi drivers complained and so we will ban carpooling


It's not really that Bangalore banned carpooling, they required licenses for ridesharing apps. Maybe that's a de facto ban of those apps, but that's a far cry from banning carpooling in general.


Comment by Dave Orr (dave-orr) on Paper: LLMs trained on “A is B” fail to learn “B is A” · 2023-09-24T16:23:24.731Z · LW · GW

Partly this will be because in fact current ML systems are not analogous to future AGI in some ways - probably if you tell the AGI that A is B, it will also know that B is A.

One oddity of LLMs is that we don't have a good way to tell the model that A is B in a way that it can remember. Prompts are not persistent, and as this paper shows, fine tuning doesn't do a good job of getting a fact into the model without doing a bunch of paraphrasing. Pretraining presumably works in a similar way.

This is weird! And I think helps make sense of some of the problems we see with current language models.

Comment by Dave Orr (dave-orr) on 45% to 55% vs. 90% to 100% · 2023-08-28T23:35:43.797Z · LW · GW

45->55% is a 22% relative gain, while 90->100% is only an 11% gain. 

On the other hand, 45->55% is a reduction in error by 18%, while 90->100% is a 100% reduction in errors.

Which framing is best depends on the use case. Preferring one naively over the other is definitely an error. :)

Comment by Dave Orr (dave-orr) on The Game of Dominance · 2023-08-27T15:05:40.429Z · LW · GW

I think the argument against LeCun is simple: while it may be true that AIs won't necessarily have a dominance instinct the way that people do, they could try to dominate for other reasons: namely that such dominance is an instrumental goal towards whatever its objective is. And in fact that is a significant risk, and can't be discounted by pointing out that they may not have a natural instinct towards dominance.

Comment by Dave Orr (dave-orr) on A Model-based Approach to AI Existential Risk · 2023-08-26T18:17:59.169Z · LW · GW

I just think that to an economist, models and survey results are different things, and he's not asking for the latter.

Comment by Dave Orr (dave-orr) on A Model-based Approach to AI Existential Risk · 2023-08-25T16:16:35.694Z · LW · GW

I think that Tyler is thinking more of an economic type model that looks at the incentives of various actors and uses that to understand what might go wrong and why. I predict that he would look at this model and say, "misaligned AI can cause catastrophes" is the hand-wavy bit that he would like to see an actual model of.

I'm not an economist (is IANAE a known initialization yet?), but it would probably include things like the AI labs, the AIs, and potentially regulators or hackers/thieves, try to understand and model their incentives and behaviors, and see what comes out of that. It's less about subjective probabilities from experts and more about trying to understand the forces acting on the players and how they respond to them.

Comment by Dave Orr (dave-orr) on Guide to rationalist interior decorating · 2023-06-19T20:40:29.845Z · LW · GW

So... when can we get the optimal guide, if this isn't it? :)

Comment by Dave Orr (dave-orr) on Are computationally complex algorithms expensive to have, expensive to operate, or both? · 2023-06-02T18:16:48.140Z · LW · GW

In general to solve an NP complete problem like 3-SAT, you have to spend compute or storage to solve it. 

Suppose you solve one 3-SAT problem. If you don't write down the solution and steps along the way, then you have no way to get the benefit of the work for the next problem. But if you do store the results of the intermediate steps, then you need to store data that's also polynomial in size.

In practice often you can do much better than that because the problems you're solving may share certain data or characteristics that lead to shortcuts, but in the general case you have to pay the cost every time you need to solve an NP complete problem.

Comment by Dave Orr (dave-orr) on The Benevolent Billionaire (a plagiarized problem) · 2023-05-18T19:33:00.480Z · LW · GW

If one person estimates the odds at a billion to one, and the other at even, you should clearly bet the middle. You can easily construct bets that offer each of them a very good deal by their lights and guarantee you a win. This won't maximize your EV but seems pretty great if you agree with Nick.

Comment by Dave Orr (dave-orr) on How much do markets value Open AI? · 2023-05-15T02:48:05.206Z · LW · GW

Anthropic reportedly got a $4B valuation on negligible revenue. Cohere is reportedly asking for a $6B valuation on maybe a few $M in revenue.

AI startups are getting pretty absurd valuations based on I'm not sure what, but I don't think it's ARR.

Comment by Dave Orr (dave-orr) on How much do markets value Open AI? · 2023-05-14T19:44:44.673Z · LW · GW

I'm not sure multiple of revenue is meaningful right now. Nobody is investing in OAI because of their current business. Also there are tons of investments at infinite multiples once you realize that many companies get investments with no revenue.

Comment by Dave Orr (dave-orr) on Have you heard about MIT's "liquid neural networks"? What do you think about them? · 2023-05-10T20:49:13.125Z · LW · GW

I mean, computers aren't technically continuous and neither are neural networks, but if your time step is small enough they are continuous-ish. It's interesting that that's enough.

I agree music would be a good application for this approach.

Comment by Dave Orr (dave-orr) on Have you heard about MIT's "liquid neural networks"? What do you think about them? · 2023-05-09T23:10:39.181Z · LW · GW

I think this is real, in the sense that they got the results they are reporting and this is a meaningful advance. Too early to say if this will scale to real world problems but it seems super promising, and I would hope and expect that Waymo and competitors are seriously investigating this, or will be soon. 

Having said that, it's totally unclear how you might apply this to LLMs, the AI du jour. One of the main innovations in liquid networks is that they are continuous rather than discrete, which is good for very high bandwidth exercises like vision. Our eyes are technically discrete in that retina cells fire discretely, but I think the best interpretation of them at scale is much more like a continuous system. Similar to hearing, the AI analog being speech recognition.

But language is not really like that. Words are mostly discrete -- mostly you want to process things at the token level (~= words) or sometimes wordpieces or even letters, but it's not that sensible to think of text as being continuous. So it's not obvious how to apply liquid NNs to text understanding/generation.

Research opportunity!

But it'll be a while, if ever, before continuous networks work for language.

Comment by Dave Orr (dave-orr) on Clarifying and predicting AGI · 2023-05-04T17:17:08.093Z · LW · GW

Usually "any" means each person in the specific class individually. So perhaps not groups of people working together, but a much higher bar than a randomly sampled person.

But note that Richard doesn't think that "the specific 'expert' threshold will make much difference", so probably the exact definition of "any" doesn't matter very much for his thoughts here.

Comment by Dave Orr (dave-orr) on How much do personal biases in risk assessment affect assessment of AI risks? · 2023-05-04T02:25:45.137Z · LW · GW

Similar risk to Christiano, which might be medium by less wrong standards but is extremely high compared to the general public.

High risk tolerance (used to play poker for a living, comfortable with somewhat risky sports like climbing or scuba diving). Very low neuroticism, medium conscientiousness. I spend a reasonable amount of time putting probabilities on things, decently calibrated. Very calm in emergency situations.

I'm a product manager exec mostly working on applications of language AI. Previously an ml research engineer.

Comment by Dave Orr (dave-orr) on Is the fact that we don't observe any obvious glitch evidence that we're not in a simulation? · 2023-04-26T17:48:43.898Z · LW · GW

I don't actually follow -- how does change blindness in people relate to how much stuff you have to design?

Comment by Dave Orr (dave-orr) on Is the fact that we don't observe any obvious glitch evidence that we're not in a simulation? · 2023-04-26T16:08:25.130Z · LW · GW

Suppose you were running a simulation, and it had some problems around object permanence, or colors not being quite constant (colors are surprisingly complicated to calculate since some of them depend on quantum effects), or other weird problems. What might you do to help that? 

One answer might be to make the intelligences you are simulating ignore the types of errors that your system makes. And it turns out that we are blind to many changes around us!

Or conversely, if you are simulating an intelligence that happens to have change blindness, then you worry a lot less about fidelity in the areas that people mostly miss or ignore anyway.

The point is this: reality seems flawless because your brain assumes it is, and ignores cases where it isn't. Even when the changes are large, like a completely different person taking over halfway through a conversation, or numerous continuity errors in movies that almost all bounce right off of us. So I don't think that you can take amazing glitch free continuity as evidence that we're not in a simulation, since we may not see the bugs.

Comment by Dave Orr (dave-orr) on LW moderation: my current thoughts and questions, 2023-04-12 · 2023-04-20T22:18:35.387Z · LW · GW

One thing that I think is missing (maybe just beyond the scope of this post) is thinking about newcomers with a positive frame: how do we help them get up to speed, be welcomed, and become useful contributors?

You could imagine periodic open posts, for instance, where we invite 101-style questions, post your objection to AI risks, etc where more experienced folks could answer those kind of things without cluttering up the main site. Possibly multiple more specific such threads if there's enough interest.

Then you can tell people who try to post level 1-3 stuff that they should go to those threads instead, and help make sure they get attention.

I'm sure there are other ideas as well -- the main point is that we should think of both positive as well as negative actions to take in response to an influx of newbies.

Comment by Dave Orr (dave-orr) on AI Risk US Presidential Candidate · 2023-04-11T20:54:08.728Z · LW · GW

Let me suggest a different direction.

The risk is that a niche candidate will make the idea too associated with them, which will let everyone else off the hook -- it's easy to dismiss a weirdo talking about weird stuff.

A better direction might be to find a second tier candidate that wants to differentiate themselves, and help them with good snappy talking points that sound good in a debate. I think that's both higher impact and has a much smaller chance of pushing things in the wrong direction accidentally.

Comment by Dave Orr (dave-orr) on New survey: 46% of Americans are concerned about extinction from AI; 69% support a six-month pause in AI development · 2023-04-05T01:42:00.414Z · LW · GW

YouGov is a solid but not outstanding Internet pollster.

Still have to worry about selection bias with Internet polls, but I don't think you need to worry that they have a particular axe to grind here.

Comment by dave-orr on [deleted post] 2023-03-30T03:13:16.998Z

This seems like an argument that proves too much. Many times, people promising simple solutions to complex problems are scammers or just wrong. But we also have lots of times where someone has an insight that cuts to the core of a problem, and we have great solutions that much better and more scalable than what has come before.

Maybe the author is on to something, but I think the idea needs to go one level deeper: what distinguishes real innovation from "solutionism"?

Also, his argument about why making work more efficient doesn't have any upside is so bafflingly wrongheaded that I highly doubt there are genuine insights to mine here.

Comment by Dave Orr (dave-orr) on Why consumerism is good actually · 2023-03-24T18:17:56.644Z · LW · GW

Here's one argument:

Consumption is great when you get something in return that improves your life in some way. Convenience, saving time, and things that you use are all great.

However, there's a ton of consumption in terms of buying things that don't add utility, at least not at a reasonable return. People buy exercise bikes that they don't use, books that they don't read, panini presses that just sit on the counter, and lives become more cluttered and less enjoyable.

One reason for this is the hedonic treadmill, that our happiness reverts to a mean over time, so pleasure from an item doesn't last. Another is that people envision the good outcomes for buying something -- I'll use that gym membership 3 times a week! -- but are bad at estimating the range of outcomes and so overestimate what they get for many purchases.

It turns out for many purchases (though probably a minority of them), you would be better off in terms of happiness if you bought nothing instead. High happiness ROI spending seems to be events rather than items, giving gifts, meaningful charity, and saving yourself time.

New cars, trendy clothes, the latest gadgets, and other hallmarks of modern consumerism, have a low return on spending, and pushing back against that may help people overall. 

An analogy: food is delicious and necessary, but certain common patterns in how people eat are bad, even by the poor eater's own values and preferences. That seems bad in a similar way, and opposing trends that increase such bad patterns seems sensible.

Comment by Dave Orr (dave-orr) on Harry Potter in The World of Path Semantics · 2023-03-22T20:38:30.053Z · LW · GW

Next time I would actually include the definition of a technical term like Leibniz's first principle to make this post a little less opaque, and therefore more interesting, to non experts.

Comment by Dave Orr (dave-orr) on AI #4: Introducing GPT-4 · 2023-03-21T23:17:30.698Z · LW · GW

This. If they had meant 19% less hallucinations they would have said 19% reduction in whatever, which is a common way to talk about relative improvements in ML.

Comment by Dave Orr (dave-orr) on A concerning observation from media coverage of AI industry dynamics · 2023-03-06T21:57:16.973Z · LW · GW

For sure product risk aversion leads towards people moving to where they can have some impact, for people who don't want pure research roles. I think this is basically fine -- I don't think that product risk is all that concerning at least for now.

Misalignment risk would be a different story but I'm not aware of cases where people moved because of it. (I might not have heard, of course.)

Comment by Dave Orr (dave-orr) on A concerning observation from media coverage of AI industry dynamics · 2023-03-05T23:31:28.437Z · LW · GW

There's a subtlety here around the term risk.

Google has been, IMO, very unwilling to take product risk, or risk a PR backlash of the type that Blenderbot or Sydney have gotten. Google has also been very nervous about perceived and actual bias in deployed models.

When people talk about red tape, it's not the kind of red tape that might be useful for AGI alignment, it's instead the kind aimed at minimizing product risks. And when Google says they are willing to take on more risk, here they mean product and reputational risk.

Maybe the same processes that would help with product risk would also help with AGI alignment risk, but frankly I'm skeptical. I think the problems are different enough that they need a different kind of thinking.

I think Google is better on the big risks than others, at least potentially, since they have some practice at understanding nonobvious secondary effects as applied to search or YouTube ranking.

Note that I'm at Google, but opinions here are mine, not Google's.

Comment by Dave Orr (dave-orr) on Sunlight is yellow parallel rays plus blue isotropic light · 2023-03-01T18:29:08.161Z · LW · GW

Please post pictures once you're done!

Comment by Dave Orr (dave-orr) on The Preference Fulfillment Hypothesis · 2023-02-26T15:22:29.171Z · LW · GW

I feel like every week there's a post that says, I might be naive but why can't we just do X, and X is already well known and not considered sufficient. So it's easy to see a post claiming a relatively direct solution as just being in that category.

The amount of effort and thinking in this case, plus the reputation of the poster, draws a clear distinction between the useless posts and this one, but it's easy to imagine people pattern matching into believing that this is also probably useless without engaging with it.

Comment by Dave Orr (dave-orr) on The Preference Fulfillment Hypothesis · 2023-02-26T13:23:43.911Z · LW · GW

FWIW I at least found this to be insightful and enlightening. This seems clearly like a direction to explore more and one that could plausibly pan out.

I wonder if we would need to explore beyond the current "one big transformer" setup to realize this. I don't think humans have a specialized brain region for simulations (though there is a region that seems heavily implicated, see, but if you want to train something using gradient descent, it might be easier if you have a simulation module that predicts human preferences and is rewarded for accurate predictions, and then feed those into the main decision-making model.

Perhaps we can use revealed preferences through behavior combined with elicited preferences to train the preference predictor. This is similar to the idea of training a separate world model rather than lumping it in with the main blob.