Looking at RSS User-Agents
post by jefftk (jkaufman) · 2021-02-05T02:10:06.551Z · LW · GW · 2 commentsContents
2 comments
An RSS reader sends periodic requests to get the latest feed. This includes a User-Agent field, identifying which fetcher is running:
Feedbin feed-id:1242010 - 38 subscribersThis fetcher is nicely passing along statistics, saying how many readers it represents.
I took one day of logs, with 5,962 requests for my RSS feed:
$ sudo grep '"GET /news.rss ' \ /var/log/nginx/access.log.1 \ | awk -F'"' '{print $6}' \ | wc -l 5962There were 162 unique User-Agents:
$ sudo grep '"GET /news.rss ' \ /var/log/nginx/access.log.1 \ | awk -F'"' '{print $6}' \ | sort \ | uniq \ | wc -l 162Of the 5,962 requests, 932 (16%) gave stats:
$ sudo grep '"GET /news.rss ' \ /var/log/nginx/access.log.1 \ | awk -F'"' '{print $6}' \ | grep 'subscriber\|reader' \ | wc -l 932They sent 21 distinct User-Agents:
$ sudo grep '"GET /news.rss ' \ /var/log/nginx/access.log.1 \ | awk -F'"' '{print $6}' \ | grep 'subscriber\|reader' \ | sort \ | uniq \ | wc -l 21Some sent multiple requests with different numbers of subscribers:
Feedbin feed-id:1242010 - 38 subscribers Feedbin feed-id:372940 - 11 subscribers Feedbin feed-id:382 - 1 subscribersI suspect this comes from people using old URLs that then get redirected to my current URL. For example, now it's
https://www.jefftk.com/news.rss
, but it used to be
http://www.jefftk.com/news.rss
, and even longer ago it
was an
sccs.swarthmore.edu
address. Summing subscriber
counts, I see:
- Feedly: 573
- inoreader.com: 87
- NewsBlur: 62
- Feedbin: 50
- theoldreader.com: 34
- Dreamwidth Studios: 7
- BazQux: 5
- Bloglovin: 2
- Feed Wrangler: 2
- pine.blog: 1
Different services fetched at different intervals. Taking the shortest interval for each distinct User-Agent:
- Feedly: 7min
- Feedbin: 15min
- Bloglovin: 30min
- Dreamwidth Studios: 30min
- Feed Wrangler: 30min
- NewsBlur: 30min
- BazQux: 40min
- inoreader.com: 1hr
- theoldreader.com: 2hr
- pine.blog: 24hr
2 comments
Comments sorted by top scores.
comment by Darshan (darshanrampatel) · 2021-02-05T10:11:30.016Z · LW(p) · GW(p)
I use QuietRSS which sets its User-Agent to "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36"
Replies from: jkaufman↑ comment by jefftk (jkaufman) · 2021-02-05T17:27:25.456Z · LW(p) · GW(p)
Masquerading as Chrome is a mildly inconsiderate choice for an RSS reader to make, especially in not including a token for their own site. User Agent strings for visiting websites are a mess because of a history of people coding only to the dominant browser, but RSS does not have that history.
You do see things like Feedly using Feedly/1.0 (+http://www.feedly.com/fetcher.html; 452 subscribers; like FeedFetcher-Google)
, where they include the FeedFetcher-Google
token, but there's really no reason to pretend to be a browser.
Looks like QuiteRSS has pretended to be a browser for years: https://github.com/QuiteRSS/quiterss/commit/38ad3ce6e72f90036f1db14568f33dbf346fc1b3 Opera/9.80 (Windows NT 6.1; U; YB/3.5.1; ru) Presto/2.10.229 Version/11.62