Zstd Window Size

post by jefftk (jkaufman) · 2025-04-25T14:40:06.742Z · LW · GW · 1 comments

Contents

1 comment

At work we've recently been using zstd as a better-compressing alternative to gzip, and overall I've been pretty happy with it. A minor documentation gripe, though, is that the behavior around multithreaded compression is a bit unclear. I understand it's chunking the work and sending chunks to different threads to parallelize the compression process, and this means that I should expect to see better use of threads on larger files because there are more chunks to spread around, but what is the relationship?

When I look in man zstd I see that you can set -B<num> to specify the size of the chunks, and it's documented as "generally 4 * windowSize". Except the documentation doesn't say how windowSize is set.

From a bit of poking at the source, it looks to me like the way this works is that windowSize is 2**windowLog, and windowLog depends on your compression level. If I know I'm doing zstd -15, though, how does compressionLevel=15 translate into a value for windowLog? There's a table in lib/compress/clevels.h which covers inputs >256KB:

Level windowLog chainLog hashLog searchLog minMatch targetLength strategy
<1 19 12 13 1 6 1 fast
1 19 13 14 1 7 0 fast
2 20 15 16 1 6 0 fast
3 21 16 17 1 5 0 dfast
4 21 18 18 1 5 0 dfast
5 21 18 19 3 5 2 greedy
6 21 18 19 3 5 4 lazy
7 21 19 20 4 5 8 lazy
8 21 19 20 4 5 16 lazy2
9 22 20 21 4 5 16 lazy2
10 22 21 22 5 5 16 lazy2
11 22 21 22 6 5 16 lazy2
12 22 22 23 6 5 32 lazy2
13 22 22 22 4 5 32 btlazy2
14 22 22 23 5 5 32 btlazy2
15 22 23 23 6 5 32 btlazy2
16 22 22 22 5 5 48 btopt
17 23 23 22 5 4 64 btopt
18 23 23 22 6 3 64 btultra
19 23 24 22 7 3 256 btultra2
20 25 25 23 7 3 256 btultra2
21 26 26 24 7 3 512 btultra2
22 27 27 25 9 3 999 btultra2

See the source if you're interested in other sizes.

So it looks like windowSize is:

Probably best not to rely on any of this, but it's good to know what zstd -<level> is doing by default!

Comment via: facebook, mastodon, bluesky

1 comments

Comments sorted by top scores.

comment by AB (artem-b) · 2025-04-25T20:17:43.849Z · LW(p) · GW(p)

You can explicitly set windowLog with --zstd=windowLog=...

It's sometimes useful to combine low-ish compression level with high window size. E.g. when the input data contains multiple similar large chunks that do not fit into the low-compression-level window.