Zstd Window Size
post by jefftk (jkaufman) · 2025-04-25T14:40:06.742Z · LW · GW · 1 commentsContents
1 comment
At work we've recently been using zstd as a better-compressing alternative to gzip, and overall I've been pretty happy with it. A minor documentation gripe, though, is that the behavior around multithreaded compression is a bit unclear. I understand it's chunking the work and sending chunks to different threads to parallelize the compression process, and this means that I should expect to see better use of threads on larger files because there are more chunks to spread around, but what is the relationship?
When I look in man zstd
I see that you can set
-B<num>
to specify the size of the chunks, and it's
documented as "generally 4 * windowSize
". Except the
documentation doesn't say how windowSize
is set.
From a bit of poking at the source, it looks to me like the way this
works is that windowSize
is 2**windowLog
,
and windowLog
depends on your compression level. If I
know I'm doing zstd -15
, though, how does
compressionLevel=15
translate into a value for
windowLog
? There's a table in lib/compress/clevels.h
which covers inputs >256KB:
Level | windowLog | chainLog | hashLog | searchLog | minMatch | targetLength | strategy |
---|---|---|---|---|---|---|---|
<1 | 19 | 12 | 13 | 1 | 6 | 1 | fast |
1 | 19 | 13 | 14 | 1 | 7 | 0 | fast |
2 | 20 | 15 | 16 | 1 | 6 | 0 | fast |
3 | 21 | 16 | 17 | 1 | 5 | 0 | dfast |
4 | 21 | 18 | 18 | 1 | 5 | 0 | dfast |
5 | 21 | 18 | 19 | 3 | 5 | 2 | greedy |
6 | 21 | 18 | 19 | 3 | 5 | 4 | lazy |
7 | 21 | 19 | 20 | 4 | 5 | 8 | lazy |
8 | 21 | 19 | 20 | 4 | 5 | 16 | lazy2 |
9 | 22 | 20 | 21 | 4 | 5 | 16 | lazy2 |
10 | 22 | 21 | 22 | 5 | 5 | 16 | lazy2 |
11 | 22 | 21 | 22 | 6 | 5 | 16 | lazy2 |
12 | 22 | 22 | 23 | 6 | 5 | 32 | lazy2 |
13 | 22 | 22 | 22 | 4 | 5 | 32 | btlazy2 |
14 | 22 | 22 | 23 | 5 | 5 | 32 | btlazy2 |
15 | 22 | 23 | 23 | 6 | 5 | 32 | btlazy2 |
16 | 22 | 22 | 22 | 5 | 5 | 48 | btopt |
17 | 23 | 23 | 22 | 5 | 4 | 64 | btopt |
18 | 23 | 23 | 22 | 6 | 3 | 64 | btultra |
19 | 23 | 24 | 22 | 7 | 3 | 256 | btultra2 |
20 | 25 | 25 | 23 | 7 | 3 | 256 | btultra2 |
21 | 26 | 26 | 24 | 7 | 3 | 512 | btultra2 |
22 | 27 | 27 | 25 | 9 | 3 | 999 | btultra2 |
See the source if you're interested in other sizes.
So it looks like windowSize
is:
-
≤1
: 524k -
2
: 1M -
3-8
(default): 2M -
9-16
: 4M -
17-19
: 8M -
20
: 32M -
21
: 64M -
22
: 128M
Probably best not to rely on any of this, but it's good to know what
zstd -<level>
is doing by default!
Comment via: facebook, mastodon, bluesky
1 comments
Comments sorted by top scores.
comment by AB (artem-b) · 2025-04-25T20:17:43.849Z · LW(p) · GW(p)
You can explicitly set windowLog
with --zstd=windowLog=...
It's sometimes useful to combine low-ish compression level with high window size. E.g. when the input data contains multiple similar large chunks that do not fit into the low-compression-level window.