Posts

Comments

Comment by tamgent on AI alignment researchers don't (seem to) stack · 2023-03-04T14:44:52.138Z · LW · GW

Not a textbook (more for a general audience) but The Alignment Problem by Brian Christian is a pretty good introduction that I reckon most people interested in this would get behind.

Comment by tamgent on How it feels to have your mind hacked by an AI · 2023-02-12T22:49:29.431Z · LW · GW

Yes please

Comment by tamgent on How it feels to have your mind hacked by an AI · 2023-01-31T20:15:58.384Z · LW · GW

Do you have the transcript from this?

Comment by tamgent on How it feels to have your mind hacked by an AI · 2023-01-19T09:13:01.318Z · LW · GW

I like it - interesting how much is to do with the specific vulnerabilities of humans, and how humans exploiting other humans' vulnerabilities was what enabled and exacerbated the situation.

Comment by tamgent on How it feels to have your mind hacked by an AI · 2023-01-19T08:47:50.429Z · LW · GW

There's also a romantic theme ;-)

Comment by tamgent on How it feels to have your mind hacked by an AI · 2023-01-19T08:46:32.405Z · LW · GW

Whilst we're sharing stories...I'll shamelessly promote one of my (very) short stories on human manipulation by AI. In this case the AI is being deliberative at least in achieving its instrumental goals. https://docs.google.com/document/d/1Z1laGUEci9rf_aaDjQKS_IIOAn6D0VtAOZMSqZQlqVM/edit

Comment by tamgent on How it feels to have your mind hacked by an AI · 2023-01-16T18:58:53.171Z · LW · GW

Is it a coincidence that your handle is blaked? (It's a little similar to Blake) Just curious.

Comment by tamgent on Slack matters more than any outcome · 2023-01-11T18:06:36.898Z · LW · GW

Ha! I meant the former, but I like your second interpretation too!

Comment by tamgent on Slack matters more than any outcome · 2023-01-03T17:16:07.073Z · LW · GW

I like, 'do the impossible - listen'.

Comment by tamgent on Let’s think about slowing down AI · 2023-01-01T22:10:52.838Z · LW · GW

Recruitment - in my experience often a weeks long process from start to finish, well oiled and systematic and using all the tips from the handbook on organizational behaviour on selection, often with feedback given too. By comparison, some tech companies can take several months to hire, with lots of ad hoc decision-making, no processes around biases or conflicts of interest, and no feedback.

Happy to give more examples if you want by DM.

I should say my sample size is tiny here - I know one gov dept in depth, one tech company in depth and a handful of other tech companies and gov depts not fully from the inside but just from talking with friends that work there, etc.

Comment by tamgent on Shared reality: a key driver of human behavior · 2023-01-01T21:45:14.216Z · LW · GW

What exactly is the trust problem you're referring to?

Is it you think that people are not as trusting as you think they should be, in general?

Comment by tamgent on AI alignment is distinct from its near-term applications · 2022-12-23T21:08:38.605Z · LW · GW

I also interpreted it this way and was confused for a while. I think your suggested title is clearer, Neel.

Comment by tamgent on Let’s think about slowing down AI · 2022-12-23T20:50:28.475Z · LW · GW

Thank you for writing this. On your section 'Obstruction doesn't need discernment' - see also this post that went up on LW a while back called The Regulatory Option: A response to near 0% survival odds. I thought it was an excellent post, and it didn't get anywhere near the attention it deserved, in my view.

Comment by tamgent on Let’s think about slowing down AI · 2022-12-23T20:44:24.596Z · LW · GW

I think the two camps are less orthogonal than your examples of privacy and compute reg portray. There's room for plenty of excellent policy interventions that both camps could work together to support. For instance, increasing regulatory requirements for transparency on algorithmic decision-making (and crucially, building a capacity both in regulators and in the market supporting them to enforce this) is something that I think both camps would get behind (the xrisk one because it creates demand for interpretability and more and the other because eg. it's easier to show fairness issues) and could productively work on together. I think there are subculture clash reasons the two camps don't always get on, but that these can be overcome, particularly given there's a common enemy (misaligned powerful AI). See also this paper Beyond Near- and Long-Term: Towards a Clearer Account of Research Priorities in AI Ethics and Society I know lots of people who are uncertain about how big the risks are, and care about both problems, and work on both (I am one of these - I care more about AGI risk, but I think the best things I can do to help avert it involve working with the people you think aren't helpful).

Comment by tamgent on Your posts should be on arXiv · 2022-09-15T07:51:24.957Z · LW · GW

To build on the benefit you noted here:

  1. better citability (e.g. if somebody writes an ML paper to be published in ML venues, it gives more credibility to cite arXiv papers than Alignment Forum/LessWrong posts.

There are some areas of work whereby it's useful to not be implicitly communicating that you affiliate with a somewhat weird group like LW or AF folks but you want the content to be read at face value when you share it with folks who are coming from different subcultures and perspectives. I think it'd be hugely valuable for this collection of people who are sharing things.

Comment by tamgent on Your posts should be on arXiv · 2022-09-15T07:48:14.545Z · LW · GW

This seems solvable and very much worth solving!

Comment by tamgent on Could we use recommender systems to figure out human values? · 2022-08-24T19:39:06.459Z · LW · GW

Agree.

Human values are very complex and most recommender systems don't even try to model them. Instead most of them optimise for things like 'engagement' which they claim to be aligned with a user's 'revealed preference'. This notion of 'revealed preference' is a far cry from true preferences (which are very complex) let alone human values (which are also very complex). I recommend this article for an introduction to some of the issues here: https://medium.com/understanding-recommenders/what-does-it-mean-to-give-someone-what-they-want-the-nature-of-preferences-in-recommender-systems-82b5a1559157

Comment by tamgent on Jack Clark on the realities of AI policy · 2022-08-12T14:48:27.653Z · LW · GW

Support.

I would add to this that The Alignment Problem by Brian Christian is a fantastic general audience book that shows how the immediate and long-term AI policy really are facing the same problem and will work better if we all work together.

Comment by tamgent on More Is Different for AI · 2022-08-12T08:48:07.622Z · LW · GW

If you know of any more such analyses could you share?

Comment by tamgent on Where are the red lines for AI? · 2022-08-12T07:22:49.429Z · LW · GW

I would be interested in seeing a list of any existing work in this area. I think determining the red lines well are going to be very useful for policymakers in the next few years.

Comment by tamgent on Security Mindset: Lessons from 20+ years of Software Security Failures Relevant to AGI Alignment · 2022-08-09T05:46:30.630Z · LW · GW

Thanks kindly for the offer, I will DM you

Comment by tamgent on My vision of a good future, part I · 2022-07-08T06:26:06.363Z · LW · GW

I enjoyed reading this, and look forward to future parts.

Comment by tamgent on Security Mindset: Lessons from 20+ years of Software Security Failures Relevant to AGI Alignment · 2022-07-08T06:23:28.316Z · LW · GW

I just want to let you know that this table was really useful for me for something I'm working on. Thank you for making it.

Comment by tamgent on What Are You Tracking In Your Head? · 2022-07-04T17:29:58.801Z · LW · GW

I was explicitly taught to model this physical thing in a wood carving survivalist course.

Comment by tamgent on Security Mindset: Lessons from 20+ years of Software Security Failures Relevant to AGI Alignment · 2022-07-02T20:52:17.219Z · LW · GW

Thanks for sharing, this is a really nice resource for a number of problems and solutions.

Comment by tamgent on Security Mindset: Lessons from 20+ years of Software Security Failures Relevant to AGI Alignment · 2022-06-23T07:55:12.306Z · LW · GW

Thanks for writing this, I find the security mindset useful all over the place and appreciate its applicability in this situation.

I have a small thing unrelated to the main post:

To my knowledge, no one tried writing a security test suite that was designed to force developers to conform their applications to the tests. If this was easy, there would have been a market for it.

I think weak versions exist (ie things that do not guarantee/force, but nudge/help). I first learnt to code in a bootcamp which emphasised test-driven development (TDD). One of the first packages I made was a TDD linter. It would simply highlight in red any functions you wrote that did not have a corresponding unit test, and any file you made without a corresponding test file.

Also if you wrote up anywhere the scalable solutions to 80% of web app vulnerabilities, I'd love to see.

Comment by tamgent on Deep Learning Systems Are Not Less Interpretable Than Logic/Probability/Etc · 2022-06-05T22:13:18.147Z · LW · GW

Even if you could find some notion of a, b, c we think are features in this DNN - how would you know you were right? How would you know you're on the correct level of abstraction / cognitive separation / carving at the joints instead of right through the spleen and then declaring you've found a, b and c. It seems this is much harder than in a model where you literally assume the structure and features all upfront.

Comment by tamgent on Deep Learning Systems Are Not Less Interpretable Than Logic/Probability/Etc · 2022-06-05T22:08:45.813Z · LW · GW

I'm not in these fields, so take everything I say very lightly, but intuitively this feels wrong to me. I understood your point to be something like: the labels are doing all the work. But for me, the labels are not what makes those approaches seem more interpretable than a DNN. It's that in a DNN, the features are not automatically locatable (even pseudonymously so) in a way that lets you figure out the structure /shape that separates them - each training run of the model is learning a new way to separate them and it isn't clear how to know what those shapes tend to turn out as and why. However, the logic graphs already agree with you an initial structure/shape.

Of course there are challenges in scaling up the other methods, but I think claiming they're no more interpretable than DNNs feels incorrect to me. [Reminder, complete outsider to these fields].

Comment by tamgent on Benign Boundary Violations · 2022-06-05T21:26:24.956Z · LW · GW

Siblings do this a lot growing up.

Comment by tamgent on What an actually pessimistic containment strategy looks like · 2022-06-05T21:03:38.647Z · LW · GW

I didn't downvote this just because I disagree with it (that's not how I downvote), but if I could hazard a guess at why people might downvote, it'd be that some might think it's a 'thermonuclear idea'.

Comment by tamgent on Intergenerational trauma impeding cooperative existential safety efforts · 2022-06-05T19:47:01.098Z · LW · GW

Try Googling a few AI-related topics that no one talked about 5-10 years ago to see if today more people are talking about one or more of those topics.

You can use Google Trends to see search term popularity over time data.

Comment by tamgent on What an actually pessimistic containment strategy looks like · 2022-05-07T18:34:45.414Z · LW · GW

These are really interesting, thanks for sharing!

Comment by tamgent on The Regulatory Option: A response to near 0% survival odds · 2022-05-07T17:59:41.232Z · LW · GW

So regulatory capture is a thing that can happen. I don't think I got a complete picture of your image of how oversight for dominant companies is scary. You mentioned two possible mechanisms: rubber stamping things, and enforcing sharing of data. It's not clear to me that either of these are obviously contra the goal of slowing things down. Like, maybe sharing of data (I'm imagining you mean to smaller competitors, as in the case of competition regulation) - but data isn't really useful alone, you need to compute and technical capability to use it. More likely would be forced sharing of the models themselves, but this is isn't the giving of an ongoing capability, although it could still be misused. Mandating sharing of data is less likely under regulatory capture though. And then the rubber stamping, well, maybe sometimes something would be stamped that shouldn't have been, but surely some stamping process is better than none? It at least slows things down. I don't think receiving a stamp wrongly makes an AI system more likely to go haywire - if it was going to it would anyway. AI labs don't just think, hm, this model doesn't have any stamp, let me check its safety. Maybe you think companies will do less self-regulation if external regulation happens? I don't think this is true.

Comment by tamgent on The Regulatory Option: A response to near 0% survival odds · 2022-05-07T17:43:27.617Z · LW · GW

Thank you for your elaboration, I appreciate it a lot, and upvoted for the effort. Here are your clearest points paraphrased as I understand them (sometimes just using your words), and my replies:

  1. The FDA is net negative for health, therefore creating an FDA-for-AI would be likely net negative for the AI challenges.

I don't think you can come to this conclusion, even if I agree with the premise. The counterfactuals are very different. With drugs the counterfactual of no FDA might be some people get more treatments, and some die but many don't, and they were sick anyway so need to do something, and maybe fewer die than do with the FDA around, so maybe the existence of the FDA compared to the counterfactual is net bad. I won't dispute this, I don't know enough about it. However, the counterfactual in AI is different. If unregulated, AI progress steams on ahead, competition over the high rewards is high, and if we don't have good safety plan (which we don't) then maybe we all die at some point, who knows when. However, if an FDA-for-AI creates bad regulation (as long as it's not bad enough to cause AI regulation winter) then it starts slowing down that progress. Maybe it's bad for, idk, the diseases that could have been solved during the 10 years slowing down from when AI would have solved cancer vs not, and that kind of thing, but it's nowhere near as bad as the counterfactual! These scenarios are different and not comparable, because the counterfactual of no FDA is not as bad as the counterfactual of no AI regulator.

  1. Enough errors would almost certainly occur in AI regulation to make it net negative.

You gave a bunch of examples from non-AI regulation of bad regulation (I am not going to bother to think about whether I agree that they are bad regulation as it's not cruxy) - but you didn't explain how exactly errors lead to making AI regulation net negative? Again I think similar to the previous claim, the counterfactuals likely make this not hold.

  1. ...a field where there is bound to be vastly more misunderstanding should be at least as prone to regulation backfiring

That is an interesting claim, I am not sure what makes you think it's obviously true, as it depends what your goal is. My understanding of the OP is that the goal of the type of regulation they advocate is simply to slow down AI development, nothing more, nothing less. If the goal is to do good regulation of AI, that's totally different. Is there a specific way in which you imagine it backfiring for the goal of simply slowing down AI progress?

  1. ...an [oppressive] regime gaining controllable AI would produce an astronomical suffering risk.

I am unsure what point you were making in the paragraph about evil. Was it about another regime getting there first that might not do safety? For response, see the OP Objection 4 which I share and added additional reason for that not being a real worry in this world.

  1. ...unwise to think that people who take blatant actions to kill innocents for political convenience would be safe custodians of AI..

I don't think it's fair to say regulators would be a custodian. They have a special kind of lever called "slow things down", and that lever does not mean that they can, for example, seize and start operating the AI. It is not in their power to do that, legally, nor do they have the capability to do anything with it. We are talking here about slowing things down before AGI, not post AGI.

  1. the electorate does not understand AI

Answer is same as my answer to 3. and also similar to OP Objection 1.

And finally to reply to this: "hopefully this should clarify to a degree why I anticipate both severe X risks and S risks from most attempts at AI regulation"

Basically, no, it doesn't really clarify it. You started off with a premise I agreed with or at least do not know enough to refute, that the FDA may be net negative, and then drew a conclusion that I disagree with (see 1. above), and then all your other points were assuming that conclusion, so I couldn't really follow. I tried to pick out bits that seemed like possible key points and reply, but yeah I think you're pretty confused.

What do you think of my reply to 1. - the counterfactuals being different. I think that's the best way to progress the conversation.

Comment by tamgent on The Regulatory Option: A response to near 0% survival odds · 2022-05-05T16:48:51.553Z · LW · GW

No worries, thank you, I look forward to it

Comment by tamgent on The Regulatory Option: A response to near 0% survival odds · 2022-05-04T18:24:18.822Z · LW · GW

Another response to the China objection is that similar to regulators copying each other internationally, so do academics/researchers, so if you slow down development of research in some parts of the world you also might slow down development of that research in other parts of the world too. Especially when there's an asymmetry with openness of publication of the research.

Comment by tamgent on The Regulatory Option: A response to near 0% survival odds · 2022-05-04T18:15:35.403Z · LW · GW

I'm a bit confused about why you think it's so clearly a bad idea, your points weren't elaborated at all, so I'd absolutely love some elaboration by you or some of the people that voted up your comment, because clearly I'm missing something.

  • on the reduction of chance of FAI being developed, sure, some of this of course would happen, but slowing down development of solutions to a problem (alignment problem) whilst slowing down growth of the problem itself even more is surely net good for stopping the problem? Especially if you're really worried about the problem and worried it'd happen faster than you could think of good solutions for!
  • waiting for elaboration on the suffering point but let's assume you've got good reasons there
Comment by tamgent on The Regulatory Option: A response to near 0% survival odds · 2022-05-04T18:11:22.376Z · LW · GW

I would also appreciate an elaboration by Aiyen on the suffering risk point.

Comment by tamgent on The Regulatory Option: A response to near 0% survival odds · 2022-05-04T18:09:21.390Z · LW · GW

I'd find it really hard to imagine MIRI getting regulated. It's more common that regulation steps in where an end user or consumer could be harmed, and for that you need to deploy products to those users/consumers. As far as I'm aware, this is quite far from the kind of safety research MIRI does.

Sorry I must be really dumb but I didn't understand what you mean by the alignment problem for regulation? Aligning regulators to regulate the important/potentially harmful bits? I don't think this is completely random, even if focused more on trivial issues, they're more likely to support safety teams (although sure the models they'll be working on making safe won't be as capable, that's the point).

Comment by tamgent on [$20K in Prizes] AI Safety Arguments Competition · 2022-05-03T19:50:19.965Z · LW · GW

OK I admit this one doesn't fit any audience under any possible story in my mind except a general one. Let me know if you want to read the private (not yet drafted) news article though and I'll have a quick go.

Comment by tamgent on [$20K in Prizes] AI Safety Arguments Competition · 2022-05-03T19:49:16.404Z · LW · GW

ML engineers?

Comment by tamgent on [$20K in Prizes] AI Safety Arguments Competition · 2022-05-03T19:48:55.911Z · LW · GW

Policymakers?

Comment by tamgent on [$20K in Prizes] AI Safety Arguments Competition · 2022-05-03T19:48:29.369Z · LW · GW

OK I have to admit, I didn't think through audience extremely carefully as most of these sound like clickbait news article headlines, but I'll go with tech executives. I do think reasonably good articles could be written explaining the metaphor though.

Comment by tamgent on [$20K in Prizes] AI Safety Arguments Competition · 2022-05-03T19:33:20.629Z · LW · GW

"What do condoms have in common with AI?"

Comment by tamgent on [$20K in Prizes] AI Safety Arguments Competition · 2022-05-03T19:33:03.692Z · LW · GW

"Evolution didn’t optimize for contraception. AI developers don’t optimize against their goals either. Accidents happen. Use protection (optional this last bit)"

Comment by tamgent on [$20K in Prizes] AI Safety Arguments Competition · 2022-05-03T19:32:12.780Z · LW · GW

"Evolution wasn’t prepared for contraception. We can do better. When deploying AI, think protection."

Comment by tamgent on [$20K in Prizes] AI Safety Arguments Competition · 2022-05-03T19:31:56.896Z · LW · GW

"We tricked nature with contraception; one day, AI could trick us too."

Comment by tamgent on [$20K in Prizes] AI Safety Arguments Competition · 2022-05-03T19:01:03.306Z · LW · GW

Ah, instrumental and epistemic rationality clash again

Comment by tamgent on Narrative Syncing · 2022-05-01T21:22:40.409Z · LW · GW

I am curious about how you felt when writing this bit:

There's no need to make reference to culture.

Comment by tamgent on Narrative Syncing · 2022-05-01T21:18:44.923Z · LW · GW

I think the difference between 1 and 3 is that in 3 there is explicit acknowledgement of the idea that what the person might be asking for is "what is the done thing around here" by attempt to directly answer the inferred subtext.

Also, I like your revised answer.