Why Prefetch Is Broken
post by jefftk (jkaufman) · 2021-05-28T02:20:02.388Z · LW · GW · 7 commentsContents
7 comments
When coding a webpage, sometimes you know something is very likely to be needed, even if it's not needed yet. You can give the browser a hint:
<link rel=prefetch href=url>The browser will take a note, and then when it doesn't have anything more important to do it might request
url
. Later on,
if it does turn out to need
url
, it will already
have it.
For example, I wrote a slideshow where each slide was essentially:
Slide N: <img src=img_N> <link rel=prefetch href=img_N+1>Prefetching the image for slide N+1 when viewing slide N made each transition practically instant.
This works for images, but also works for CSS, JS, HTML, anything! Or, at least, it used to.
The browser stores URLs it fetches in a cache. At its
simplest this looks like a big dictionary, from
url
to the contents of that url:
a.test/js |
javascript1 |
b.test/js |
javascript2 |
Unfortunately, attackers can
abuse this to learn about your browsing on other sites, and all
the major browsers (Safari, Chrome,
Firefox)
now have partitioned their cache. This means if you are on
a.test
and load a.test/js
that JS will not
be reused if you go to b.test
and load
a.test/js
again. The dictionary's keys look like
(site, url)
:
a.test:a.test/js |
javascript1 |
b.test:a.test/js |
javascript1 |
b.test:b.test/js |
javascript2 |
Even if the keys a.test:a.test/js
and
b.test:a.test/js
both have exactly the same JS bytes,
they need to be kept separate to avoid a privacy leak.
So now imagine you are a modern browser visiting a.test
and you encounter:
<link rel=prefetch href=b.test/index.html>Where should you store it in your cache? Well, it depends what the user is going to do. If they are going to click on a link to
b.test/index.html
, then when they need the HTML they will
be visiting
b.test
and so you want to store it as
b.test:b.test/index.html
. On the other hand, if it's
going to load in an iframe, the user will still be on
a.test
and so you want to store it as
a.test:b.test/index.html
. You just don't know. Just
guess?
The guess is a risky one: if you store it under the wrong key then you'll have fetch the same resource again just to store it under the right key. Users will see double fetching.
It turns out browsers guess differently here. I made test pages (iframe, new page) and while Firefox guesses you'll load it in an iframe, Safari (with the experimental LinkPrefetch setting enabled) and Chrome guess you'll load it in a new page.
Except, I think this implies more of a decision than there probably actually was. I doubt anyone explicitly considered the probability that a prefetched resource would be used by an iframe. Instead, my guess is when updating an enormous amount of code to add cache keys, multiple developers just ended up coding different things.
I've filed a spec issue (#6723) proposing:
<link rel=prefetch href=b.test/index.html as=iframe> <link rel=prefetch href=b.test/index.html as=document>Here's hoping browsers are interested in fixing this, and stopping those double fetches.
(Disclosure: I work for Google, but not on Chrome. Speaking only for myself.)
7 comments
Comments sorted by top scores.
comment by b3b00 · 2021-06-02T07:55:25.894Z · LW(p) · GW(p)
Hello,
Is the cache key the whole domain or only main domain + TLD ?
ie . for www1.foobar.com , is it :
1. foobar.com
2. www1.foobar.com
if it's case 2 than it's really annonying for site that loadbalances accros subdomain www1. wwww2, .... www(n) isn't it ?
Replies from: jkaufman↑ comment by jefftk (jkaufman) · 2021-06-02T16:03:31.992Z · LW(p) · GW(p)
The former: it is the site, not the domain. Browsers use the public suffix list to determine what the site is.
comment by knite · 2021-06-01T20:58:19.514Z · LW(p) · GW(p)
This post is a bit hard to parse - please consider replacing "a.test" with something like "test.com/a" or "a.test.com/page" to clarify whether the issue is per-page caching or per-domain caching.
Replies from: jkaufman↑ comment by jefftk (jkaufman) · 2021-06-02T16:04:48.893Z · LW(p) · GW(p)
a.test
is a test domain, while test.com
is a real domain that someone owns.
comment by Selueen (selueen) · 2021-05-28T10:01:40.394Z · LW(p) · GW(p)
Where should you store it in your cache? Well, it depends what the user is going to do. If they are going to click on a link to
b.test/index.html
, then when they need the HTML they will be visitingb.test
and so you want to store it asb.test:b.test/index.html
. On the other hand, if it's going to load in an iframe, the user will still be ona.test
and so you want to store it asa.test:b.test/index.html
. You just don't know. Just guess?The guess is a risky one: if you store it under the wrong key then you'll have fetch the same resource again just to store it under the right key. Users will see double fetching.
Is there an option for browsers to fetch the resource once, but store it under both possible keys?
What would be the downside for that?
What I really want to say - is there a way to fix the problem without changing the specification?
Would it be too technically difficult? Too costly?
Changing specification is surely an elegant solution, but then you need everyone to learn about the changes, and implement it in their work, and I feel like that process is always slow and painful and html specs are so complicated already. And then there are so many websites that are already developed but are not supported properly.
I'm not a web developer, my understanding of this problem is very surface-level - I apologize if my questions sound stupid to you.
↑ comment by jefftk (jkaufman) · 2021-05-28T12:33:55.216Z · LW(p) · GW(p)
Unfortunately, this won't work either, because the ways the browser fetches a resource to be displayed in an iframe on an existing page versus as a new top-level page have diverged. For example, browsers either don't send cookies in third-party contexts or won't soon, and if you are prefetching a resource from a different site the first example is third-party while the second example is first-party.
Similarly, browsers that support Sec-Fetch-Dest explicitly tell the server what context the resource they are fetching will be displayed in:
// Top-level navigations' destinations are "document"
Sec-Fetch-Dest: document
// <iframe> navigations' destinations are "iframe"
Sec-Fetch-Dest: iframe
https://www.w3.org/TR/fetch-metadata/
Overall, this means that if you wanted to have prefetch work for both of these you would be requiring prefetch make two separate requests to the server when the developer almost always could tell you which one of the two they needed.
Replies from: selueen↑ comment by Selueen (selueen) · 2021-05-28T12:55:52.549Z · LW(p) · GW(p)
Thanks for your answer!
Yeah, additional requests definitely defeat the point.
I suppose, any other attempts to solve this on the browser side make no sense either, because of same safety concerns that caused the problem in the first place?
In which case it looks like your solution is the only reasonable way to go.