Rationalist sites worth archiving?

post by gwern · 2011-09-11T15:24:39.969Z · LW · GW · Legacy · 48 comments

Contents

48 comments

One of my long-standing interests is in writing content that will age gracefully, but as a child of the Internet, I am addicted to linking and linkrot is profoundly threatening to me, so another interest of mine is in archiving URLs; my current methodology is a combination of archiving my browsing in public archives like Internet Archive and locally, and proactively archiving entire sites. Anyway, sites I have previously archived in part or in total include:

  1. LessWrong (I may've caused some downtime here, sorry about that)
  2. OvercomingBias
  3. SL4
  4. Chronopause.com
  5. Yudkowsky.net (in progress)
  6. Singinst.org
  7. PredictionBook.com (for obvious reasons)
  8. LongBets.org & LongNow.org
  9. Intrade.com
  10. Commonsenseatheism.com
  11. finney.org
  12. nickbostrom.com
  13. unenumerated.blogspot.com & http://szabo.best.vwh.net/
  14. weidai.com
  15. mattmahoney.net
  16. aibeliefs.blogspot.com

Having recently added WikiWix to my archival bot, I was thinking of re-running various sites, and I'd like to know - what other LW-related websites are there that people would like to be able to access somewhere in 30 or 40 years?

(This is an important long-term issue, and I don't want to miss any important sites, so I am posting this as an Article rather than the usual Discussion. I already regret not archiving Robert Bradbury's full personal website - having only his Matrioshka Brains article - and do not wish to repeat the mistake.)

48 comments

Comments sorted by top scores.

comment by Epiphany · 2013-08-21T08:29:43.893Z · LW(p) · GW(p)

gwern.net

Replies from: gwern
comment by gwern · 2013-08-21T16:07:23.319Z · LW(p) · GW(p)

The cobbler's children don't always go unshod. :)

Replies from: Epiphany
comment by Epiphany · 2013-08-22T02:34:27.713Z · LW(p) · GW(p)

I did not intend to imply that you failed to back up your own data. That was intended as an amusing compliment.

comment by gwern · 2013-08-21T05:12:49.681Z · LW(p) · GW(p)

I have finished another spider & populated my queue with links from the following site:

  • sl4.org
  • chronopause.com
  • yudkowsky.net
  • intelligence.org
  • www.predictionbook.com
  • longbets.org
  • longnow.org
  • www.intrade.com
  • slatestarcodex.com
  • squid314.livejournal.com
  • aibeliefs.blogspot.com
  • mattmahoney.net
  • www.weidai.com
  • unenumerated.blogspot.com
  • szabo.best.vwh.net
  • nickbostrom.com
  • commonsenseatheism.com
  • rationality.org
  • www.acceleratingfuture.com

(Note that if you use linkchecker, you will want >4GB of RAM to spider all those domains.)

comment by lukeprog · 2011-09-11T16:42:22.315Z · LW(p) · GW(p)

Other possibilities, not all necessarily 'rationalist':

http://www.acceleratingfuture.com/michael/blog/
http://felicifia.org/
http://www.utilitarian-essays.com/
http://naturalism.org/
http://www.infidels.org/

Replies from: gwern
comment by gwern · 2011-09-12T18:59:10.362Z · LW(p) · GW(p)

Most of those are in the queue now. (I think linkchecker crashed somewhere spidering the latter 5, so I'm not sure how complete coverage is.)

comment by Larks · 2011-09-14T20:26:31.260Z · LW(p) · GW(p)

Harry Potter and the Methods of Rationality?

Replies from: gwern
comment by gwern · 2011-09-15T14:33:43.613Z · LW(p) · GW(p)

That would already be covered by my own reading of it, my browser history being the main source of URLs for archiver-bot.

Replies from: None, wedrifid
comment by [deleted] · 2011-09-17T19:24:06.579Z · LW(p) · GW(p)

I can't believe I haven't used archiver-bots for my browsing experience until now.

Replies from: gwern
comment by gwern · 2011-09-17T23:01:58.904Z · LW(p) · GW(p)

I think it's like backups - you don't appreciate the need until it's gone, and then it's too late. And to be fair, I don't think I would get much value out of an archive of my web browsing history from age 10-16, say.

Replies from: None
comment by [deleted] · 2011-09-17T23:22:30.706Z · LW(p) · GW(p)

Vintage porn can be sold at a reasonable markup to the right audience, can't it?

comment by wedrifid · 2013-08-21T17:42:12.514Z · LW(p) · GW(p)

That would already be covered by my own reading of it, my browser history being the main source of URLs for archiver-bot.

You're making a permanent backup of everything you ever read on the internet? That's... that's... well I suppose data storage is cheap these days. It makes perfect sense. Reading your scripting instructions now.

Replies from: gwern
comment by gwern · 2013-08-21T18:52:29.041Z · LW(p) · GW(p)

Not everything; I filter out things I am sure I won't want in the future and things I strongly expect to be available & which would take up a lot of space (Wikipedia in particular), and the bot is rate-limited by the IA/WebCite submissions. Increasingly more stuff is difficult to archive as sites load stuff via JS. But much of what I read, yes.

comment by nerfhammer · 2011-09-11T17:06:35.637Z · LW(p) · GW(p)

I'm working on a rationality blog aggregator, and should be ready to make it public in the next few days. Would you like to know when it is released?

Replies from: None
comment by [deleted] · 2011-09-11T17:36:21.624Z · LW(p) · GW(p)

Can you post a link in the discussion section when it's done? I'd be interested in it, and I suspect many others on this site would be as well.

Replies from: nerfhammer
comment by nerfhammer · 2011-09-11T19:40:09.412Z · LW(p) · GW(p)

Yes, I'll do that. I've been looking for places to announce it/request feedback.

comment by Eugine_Nier · 2011-09-14T04:07:17.444Z · LW(p) · GW(p)

You may want to add Katja Grace's blog: http://meteuphoric.wordpress.com/

Replies from: gwern
comment by gwern · 2011-09-15T14:34:22.442Z · LW(p) · GW(p)

Done.

comment by Vladimir_Nesov · 2011-09-11T15:37:46.299Z · LW(p) · GW(p)

Does archive.org plan to implement a download feature and domain archive coverage indicator? (I assume they don't have that, otherwise you'd probably mention it. It would also make sense to publish such incremental archives as distributed version control access points.)

Edit: From the FAQ:

Can people download sites from the Wayback?

Our terms of use specify that users of the Wayback Machine are not to copy data from the collection. If there are special circumstances that you think the Archive should consider, please contact info at archive dot org.

(No explanation is given for why this is though.)

Replies from: false_vacuum, None
comment by false_vacuum · 2011-09-16T00:46:00.380Z · LW(p) · GW(p)

But... the only way to view the 'data' is by copying it to my computer! That's how the Internet works!

Replies from: None
comment by [deleted] · 2011-09-19T17:32:53.776Z · LW(p) · GW(p)

I think that legally, the copy in your browser doesn't count somehow, the same way that the copy of a painting that you make by holding a mirror near it doesn't count. I'm guessing the criterion is whether the copy is ephemeral or persistent.

Replies from: thomblake
comment by thomblake · 2011-09-19T19:34:39.574Z · LW(p) · GW(p)

This is a place where copyright law and theory still haven't quite caught up, though there are numerous attempts to make laws about these things while just ignoring facts like "To use software one must often copy a significant part of it into memory".

ETA: There's usually something about being allowed to make copies of software if it "is an essential step in the utilization of the computer program", which is arguably an extension of the "transitory duration" clause (which would cover the 'mirror' case)

comment by [deleted] · 2011-09-11T16:15:12.272Z · LW(p) · GW(p)

I would imagine intellectual property laws.

comment by Oscar_Cunningham · 2011-09-11T15:37:31.981Z · LW(p) · GW(p)

Eliezer's homepage and any papers on the SingInst site?

(I had another suggestion, but it became redundant when I saw who wrote the post.)

Replies from: gwern
comment by gwern · 2011-09-11T16:31:07.827Z · LW(p) · GW(p)

Eliezer's homepage and any papers on the SingInst site?

Eliezer I covered already, and I'm added singinst.org to the queue. (Singinst.org yielded 4343 filtered URLs, on-site and off the site, to be archived.)

comment by Mati_Roy (MathieuRoy) · 2020-05-29T10:55:17.397Z · LW(p) · GW(p)

long live Gwern!

comment by Nick_Roy · 2011-10-24T05:07:44.698Z · LW(p) · GW(p)

Yvain's raikoth.net.

Replies from: Nick_Roy, Nick_Roy
comment by Nick_Roy · 2011-11-07T23:20:04.009Z · LW(p) · GW(p)

I asked this question for the Q&A:

Non-profit organizations like SI need robust, sustainable resource strategies. Donations and grants are not reliable. According to my university Social Entrepreneurship course, social businesses are the best resource strategy available. The Singularity Summit is a profitable and expanding example of a social business. Is SI planning on creating more social businesses (either related or unrelated to the organization's mission) to address long-term funding needs?

I also recently asked this of Luke for his feedback post before the Q&A was up, and he mentioned in his response that SI is continuing to grow the Summit brand in a multifarious manner. Luke also asked me for additional social business ideas, citing a lack of staff working on the issue.

Less Wrong's collective intelligence trumps my own, so I'm fielding it to you. I do have a few ideas, but I'll hold off on proposing solutions at first. I find that this is a fascinating and difficult thought experiment in addition to its usefulness both for SI and as practice in recognizing opportunities.

comment by Nick_Roy · 2011-11-07T23:07:34.554Z · LW(p) · GW(p)

I asked this question for the Q&A:

Non-profit organizations like SI need robust, sustainable resource strategies. Donations and grants are not reliable. According to my university Social Entrepreneurship course, social businesses are the best resource strategy available. The Singularity Summit is a profitable and expanding example of a social business. Is SI planning on creating more social businesses (either related or unrelated to the organization's mission) to address long-term funding needs?

I also recently asked this of Luke for his feedback post before the Q&A was up, and he mentioned in his response that SI is continuing to grow the Summit brand in a multifarious manner. Luke also asked me for additional social business ideas, citing a lack of staff working on the issue.

Less Wrong's collective intelligence trumps my own, so I'm fielding it to you. I do have a few ideas, but I'll hold off on proposing solutions at first. I find that this is a fascinating and difficult thought experiment in addition to its usefulness both for SI and as practice in recognizing opportunities.

comment by Eugine_Nier · 2011-09-11T23:55:50.813Z · LW(p) · GW(p)

You should probably add Nick Szabo's other site: http://szabo.best.vwh.net/ in addition to unenumerated.

Replies from: gwern
comment by gwern · 2011-09-12T01:18:08.525Z · LW(p) · GW(p)

Done.

Replies from: Eugine_Nier
comment by Eugine_Nier · 2011-12-04T07:32:00.468Z · LW(p) · GW(p)

Are your archives in a publicly accessible location? http://szabo.best.vwh.net/ is down.

Replies from: gwern
comment by gwern · 2011-12-04T08:16:34.376Z · LW(p) · GW(p)

My archives are not; however I repaired 2 links to Szabo's pages yesterday and both were (as one would hope even in the absence of my efforts) in the Internet Archive.

comment by InquilineKea · 2015-09-17T18:59:00.838Z · LW(p) · GW(p)

Does anyone know if one could convince the Archive Team to archive them? Or does the Archive Team often consist of more difficult personalities?

comment by Eugine_Nier · 2011-12-01T03:55:39.401Z · LW(p) · GW(p)

Are your archives in a publicly accessible location? http://szabo.best.vwh.net/ is down.

comment by John_Maxwell (John_Maxwell_IV) · 2011-09-13T01:57:46.888Z · LW(p) · GW(p)

I certainly haven't read all of this, they are just blogs that come to mind as being associated with rationality.

http://www.ribbonfarm.com/

http://www.rolfnelson.com/

http://emergentfool.com/

http://www.sebastianmarshall.com/

http://paulgraham.com/articles.html

http://www.delicious.com/tag/rationality

http://www.halfsigma.com/

http://unqualified-reservations.blogspot.com/2007/09/how-dawkins-got-pwned-part-1.html

overcoming bias and perhaps other blogs have blogrolls that might be worth investigating.

Replies from: gwern
comment by gwern · 2011-09-13T15:23:21.467Z · LW(p) · GW(p)

I've done all those except delicious.com, because I don't know how to confine my spidering to just that tag.

Replies from: John_Maxwell_IV, wedrifid
comment by John_Maxwell (John_Maxwell_IV) · 2011-09-16T00:06:51.718Z · LW(p) · GW(p)

I wasn't suggesting you spider everything associated with that tag, just look through it for more blogs. I guess maybe that's too much work?

Replies from: gwern
comment by gwern · 2011-09-16T00:12:53.824Z · LW(p) · GW(p)

At this point yeah. I now have 56k URLs in the queue, and at 20 seconds a URL... Pareto is the idea here, what are the main sites worth preserving?

Replies from: John_Maxwell_IV
comment by John_Maxwell (John_Maxwell_IV) · 2011-09-18T03:05:58.277Z · LW(p) · GW(p)

I guess ribbon farm and Paul Graham would be the 2 big ones from my list.

comment by wedrifid · 2011-09-13T16:33:06.736Z · LW(p) · GW(p)

Try

Replies from: gwern
comment by gwern · 2011-09-13T17:14:41.443Z · LW(p) · GW(p)

How does that help? Going to the Pipe and putting in 'rationality' maxes out at 80 items.

comment by Dr_Manhattan · 2011-09-11T18:39:32.890Z · LW(p) · GW(p)

BTW, technology lock-in aside I highly recommend things like OfflinePages for iPhone/iPad, as they preserve full look and feel of the sites (very useful for LW, to see threaded comments). If there were similar solutions that were more open I'd recommend them even more.

Replies from: gwern
comment by gwern · 2011-09-11T19:51:14.889Z · LW(p) · GW(p)

Sounds like ReadItLater. As far as preservation goes, does that do anything that 'wget --page-requisites' would not?

Replies from: Dr_Manhattan
comment by Dr_Manhattan · 2011-09-12T12:23:41.188Z · LW(p) · GW(p)

Similar to read it later, but has scraping capabilities (up to 3 levels I think) and looks exactly like the page. I haven't user wget in a while, it might be same as --page-requisites; from previous usage I remember wget-copied sites not looking quite right afterwards, but it might well have been my fault.

comment by gwern · 2011-09-11T15:28:02.118Z · LW(p) · GW(p)

u_ suggests yudkowsky.net which my history says I haven't archived, so I'm adding that into the archive queue.

Replies from: Morendil
comment by Morendil · 2011-09-11T17:37:54.903Z · LW(p) · GW(p)

You be careful with yudkowsky.net - the last few times I visited I was greeted by an error message from the DNS provider. Don't know if Eliezer has fixed that permanently or not.

Replies from: gwern
comment by gwern · 2011-09-11T17:49:45.104Z · LW(p) · GW(p)

Yudkowsky.net's up now; looking over the list of URLs output by the spider, it seemed to be accurate in general.