Looking at RSS User-Agents

post by jefftk (jkaufman) · 2021-02-05T02:10:06.551Z · LW · GW · 2 comments

An RSS reader sends periodic requests to get the latest feed. This includes a User-Agent field, identifying which fetcher is running:

Feedbin feed-id:1242010 - 38 subscribers
This fetcher is nicely passing along statistics, saying how many readers it represents.

I took one day of logs, with 5,962 requests for my RSS feed:

$ sudo grep '"GET /news.rss ' \
    /var/log/nginx/access.log.1 \
  | awk -F'"' '{print $6}' \
  | wc -l
5962
There were 162 unique User-Agents:
$ sudo grep '"GET /news.rss ' \
    /var/log/nginx/access.log.1 \
  | awk -F'"' '{print $6}' \
  | sort \
  | uniq \
  | wc -l
162
Of the 5,962 requests, 932 (16%) gave stats:
$ sudo grep '"GET /news.rss ' \
    /var/log/nginx/access.log.1 \
  | awk -F'"' '{print $6}' \
  | grep 'subscriber\|reader' \
  | wc -l
932  
They sent 21 distinct User-Agents:
$ sudo grep '"GET /news.rss ' \
    /var/log/nginx/access.log.1 \
  | awk -F'"' '{print $6}' \
  | grep 'subscriber\|reader' \
  | sort \
  | uniq \
  | wc -l
21
Some sent multiple requests with different numbers of subscribers:
Feedbin feed-id:1242010 - 38 subscribers
Feedbin feed-id:372940 - 11 subscribers
Feedbin feed-id:382 - 1 subscribers
I suspect this comes from people using old URLs that then get redirected to my current URL. For example, now it's https://www.jefftk.com/news.rss, but it used to be http://www.jefftk.com/news.rss, and even longer ago it was an sccs.swarthmore.edu address. Summing subscriber counts, I see: While this only tells us about users who are subscribed to my blog, it seems like Feedly is the biggest player here by a lot.

Different services fetched at different intervals. Taking the shortest interval for each distinct User-Agent:

Looking through the requests that don't list subscribers, several do seem to be services. I'll try reaching out to them to see if they're interested in adding subscriber counts to their User-Agents.

2 comments

Comments sorted by top scores.

comment by Darshan (darshanrampatel) · 2021-02-05T10:11:30.016Z · LW(p) · GW(p)

I use QuietRSS which sets its User-Agent to "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36"

Source: https://github.com/QuiteRSS/quiterss/blob/271c55756dcf19ca163a603a54d12f165548a602/src/main/globals.cpp#L98

Replies from: jkaufman
comment by jefftk (jkaufman) · 2021-02-05T17:27:25.456Z · LW(p) · GW(p)

Masquerading as Chrome is a mildly inconsiderate choice for an RSS reader to make, especially in not including a token for their own site. User Agent strings for visiting websites are a mess because of a history of people coding only to the dominant browser, but RSS does not have that history.

You do see things like Feedly using Feedly/1.0 (+http://www.feedly.com/fetcher.html; 452 subscribers; like FeedFetcher-Google), where they include the FeedFetcher-Google token, but there's really no reason to pretend to be a browser.

Looks like QuiteRSS has pretended to be a browser for years: https://github.com/QuiteRSS/quiterss/commit/38ad3ce6e72f90036f1db14568f33dbf346fc1b3 Opera/9.80 (Windows NT 6.1; U; YB/3.5.1; ru) Presto/2.10.229 Version/11.62