Scraping websites currently free due to coronavirus

post by emmab · 2020-04-14T07:37:23.001Z · LW · GW · 3 comments

Some sites are free due to the coronavirus and we should scrape them. Please list any temporarily-free websites you know of, such as https://www.uptodate.com/

To download sites I recommend https://github.com/ArchiveTeam/grab-site

Consider whether you want to use --no-offsite-links. Otherwise it will pull every linked page and its resources (and sometimes even embedded videos).

3 comments

Comments sorted by top scores.

comment by Kenny · 2020-04-14T19:29:37.464Z · LW(p) · GW(p)

Why is this a good idea?

I understand that this is great opportunity to scrape otherwise-inaccessible sites, but whom is this intended to benefit? Is it just the people scraping?

Is this not malicious abuse of charitable giving?

Replies from: philh
comment by philh · 2020-04-17T23:56:53.671Z · LW(p) · GW(p)

I agree with the question. I don't know if we should avoid this. But I think we should consider whether, for example, these sites are opening up with the implicit expectation that people will take advantage of it for covid-related purposes; and whether, if we just start scraping everything, that'll make them or similar sites less likely to open up in future.

comment by Derek M. Jones (Derek-Jones) · 2020-04-14T12:43:26.120Z · LW(p) · GW(p)

The ACM is offering free download of their articles: https://dl.acm.org/