Should we exclude alignment research from LLM training datasets?

ben-millwood

Should we exclude alignment research from LLM training datasets?

post by Ben Millwood (ben-millwood) · 2024-07-18T10:27:40.991Z · LW · GW · 1 comment

This is a question post.

  Answers
    5 Yonatan Cale
None
1 comment

This is a companion post to Keeping content out of LLM training datasets [LW · GW], which discusses the various techniques we could use and their tradeoffs. My intention is primarily to start a discussion, I am not myself very opinionated on this.

As AIs become more capable, we may at least want the option of discussing them out of their earshot.

Places to consider (at time of writing, none of the below robots.txt files rule out LLM scrapers, but I include the links so you can check if this changes):

Alignment Forum (robots.txt [? · GW])
LessWrong (robots.txt)
EA Forum (robots.txt [? · GW])
Alignment org websites, e.g.
- ARC (robots.txt)
- METR (robots.txt)
arXiv (robots.txt which links to their policy)
- was explicitly mentioned by Meta as a training source for LLaMA-1,
- obviously we're less in a position to decide here, but we could ask.

Options to consider:

Blocking everything in robots.txt, or by User-Agent
- it would perhaps seem a shame for AIs to know none of the content here
(for EAF / LW / AF) Writing code for ForumMagnum to:
- enable blocking (probably by User-Agent) to be configured per-post (I haven't investigated this for feasibility, but naively it doesn't seem so hard).
- enable posts to be configured to be visible only to logged-in users (maybe this is a feature we might want for other reasons?)
I think content-based methods (e.g. presence of "canary" strings) are the strongest / most durable, if they were adopted by LLM scrapers. We could try to push for their adoption, and then their use.

Feel free to suggest additions to either category.

To the extent that doing something here means spending software dev time, this raises the question not only of should we do this but how important is this, relative to the other things we can spend software developers on.

Link preview image by Jonny Gios on Unsplash

Answers

answer by Yonatan Cale · 2024-08-24T16:31:46.425Z · LW(p) · GW(p)

As AIs become more capable, we may at least want the option of discussing them out of their earshot.

If I'd want to discuss something outside of an AI's earshot, I'd use something like Signal, or something that would keep out a human too.

AIs sometimes have internet access, and robots.txt won't keep them out.

I don't think having this info in their training set is a big difference (but maybe I don't see the problem you're pointing out, so this isn't confident).

↑ comment by Ben Millwood (ben-millwood) · 2024-12-10T23:50:07.142Z · LW(p) · GW(p)

I think there's two levels of potential protection here. One is a security-like "LLMs must not see this" condition, for which yes, you need to do something that would keep out a human too (though in practice maybe "post only visible to logged-in users" is good enough).

However I also think there's a lower level of protection that's more like "if you give me the choice, on balance I'd prefer for LLMs not to be trained on this", where some failures are OK and imperfect filtering is better than no filtering. The advantage of targeting this level is simply that it's much easier and less obtrusive, so you can do it at a greater scale with a lower cost. I think this is still worth something.

Replies from: yonatan-cale-1

↑ comment by Yonatan Cale (yonatan-cale-1) · 2024-12-11T03:23:27.463Z · LW(p) · GW(p)

I'm not sure I'm imagining the same thing as you, but as a draft solution, how about a robots.txt?

Replies from: ben-millwood

↑ comment by Ben Millwood (ben-millwood) · 2024-12-11T20:37:28.011Z · LW(p) · GW(p)

how about a robots.txt?

Yeah, that's a strong option, which is why I went around checking + linking all the robots.txt files for the websites I listed above :)

In my other post I discuss the tradeoffs of the different approaches one in particular is that it would be somewhat clumsy to implement post-by-post filters via robots.txt, whereas user-agent filtering can do it just fine.

1 comment

Comments sorted by top scores.

comment by Joey Yudelson (JosephY) · 2025-02-06T18:59:42.135Z · LW(p) · GW(p)

Can we make the robots.txt programmatic by page, and then have a tag we can add to exclude a post from the robots.txt? That feels like the 80/20

Should we exclude alignment research from LLM training datasets?

Contents

Answers

1 comment