Re: multithreaded zstd backup compression for client and server

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Dipesh Pandit <dipesh(dot)pandit(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: multithreaded zstd backup compression for client and server
Date: 2022-03-27 20:50:21
Message-ID: 20220327205020.GM28503@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 23, 2022 at 06:57:04PM -0400, Robert Haas wrote:
> On Wed, Mar 23, 2022 at 5:52 PM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
> > Also because the library may not be compiled with threading. A few days ago, I
> > tried to rebase the original "parallel workers" patch over the COMPRESS DETAIL
> > patch but then couldn't test it, even after trying various versions of the zstd
> > package and trying to compile it locally. I'll try again soon...
>
> Ah. Right, I can update the comment to mention that.

Actually, I suggest to remove those comments:
| "We check for failure here because..."

That should be the rule rather than the exception, so shouldn't require
justifying why one might checks the return value of library and system calls.

In bbsink_zstd_new(), I think you need to check to see if workers were
requested (same as the issue you found with "level"). If someone builds
against a version of zstd which doesn't support some parameter, you'll
currently call SetParameter with that flag anyway, with a default value.
That's not currently breaking anything for me (even though workers=N doesn't
work) but I think it's fragile and could break, maybe when compiled against an
old zstd, or with future options. SetParameter should only be called when the
user requested to set the parameter. I handled that for workers in 003, but
didn't touch "level", which is probably fine, but maybe should change for
consistency.

src/backend/replication/basebackup_zstd.c: elog(ERROR, "could not set zstd compression level to %d: %s",
src/bin/pg_basebackup/bbstreamer_gzip.c: pg_log_error("could not set compression level %d: %s",
src/bin/pg_basebackup/bbstreamer_zstd.c: pg_log_error("could not set compression level to: %d: %s",

I'm not sure why these messages sometimes mention the current compression
method and sometimes don't. I suggest that they shouldn't - errcontext will
have the algorithm, and the user already specified it anyway. It'd allow the
compiler to merge strings.

Here's a patch for zstd --long mode. (I don't actually use pg_basebackup, but
I will want to use long mode with pg_dump). The "strategy" params may also be
interesting, but I haven't played with it. rsyncable is certainly interesting,
but currently an experimental, nonpublic interface - and a good example of why
to not call SetParameter for params which the user didn't specify: PGDG might
eventually compile postgres against a zstd which supports rsyncable flag. And
someone might install somewhere which doesn't support rsyncable, but the server
would try to call SetParameter(rsyncable, 0), and the rsyncable ID number
would've changed, so zstd would probably reject it, and basebackup would be
unusable...

$ time src/bin/pg_basebackup/pg_basebackup -h /tmp -Ft -D- --wal-method=none --no-manifest -Z zstd:long=1 --checkpoint fast |wc -c
4625935
real 0m1,334s

$ time src/bin/pg_basebackup/pg_basebackup -h /tmp -Ft -D- --wal-method=none --no-manifest -Z zstd:long=0 --checkpoint fast |wc -c
8426516
real 0m0,880s

Attachment Content-Type Size
0001-Fix-a-few-goofs-in-new-backup-compression-code.patch text/x-diff 4.4 KB
0002-Allow-parallel-zstd-compression-when-taking-a-base-b.patch text/x-diff 12.5 KB
0003-f-workers.patch text/x-diff 3.6 KB
0004-basebackup-support-Z-zstd-long.patch text/x-diff 6.5 KB
0005-pg_basebackup-support-Zstd-negative-compression-leve.patch text/x-diff 3.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2022-03-27 20:53:57 Re: SQL/JSON: JSON_TABLE
Previous Message Daniel Gustafsson 2022-03-27 20:44:23 Re: Small TAP tests cleanup for Windows and unused modules