Scraping websites currently free due to coronavirus
post by emmab · 2020-04-14T07:37:23.001Z · LW · GW · 3 commentsContents
3 comments
Some sites are free due to the coronavirus and we should scrape them. Please list any temporarily-free websites you know of, such as https://www.uptodate.com/
To download sites I recommend https://github.com/ArchiveTeam/grab-site
Consider whether you want to use --no-offsite-links. Otherwise it will pull every linked page and its resources (and sometimes even embedded videos).
3 comments
Comments sorted by top scores.
comment by Kenny · 2020-04-14T19:29:37.464Z · LW(p) · GW(p)
Why is this a good idea?
I understand that this is great opportunity to scrape otherwise-inaccessible sites, but whom is this intended to benefit? Is it just the people scraping?
Is this not malicious abuse of charitable giving?
Replies from: philh↑ comment by philh · 2020-04-17T23:56:53.671Z · LW(p) · GW(p)
I agree with the question. I don't know if we should avoid this. But I think we should consider whether, for example, these sites are opening up with the implicit expectation that people will take advantage of it for covid-related purposes; and whether, if we just start scraping everything, that'll make them or similar sites less likely to open up in future.
comment by Derek M. Jones (Derek-Jones) · 2020-04-14T12:43:26.120Z · LW(p) · GW(p)
The ACM is offering free download of their articles: https://dl.acm.org/