Re: multithreaded zstd backup compression for client and server

From: Andres Freund <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Dipesh Pandit <dipesh(dot)pandit(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: multithreaded zstd backup compression for client and server
Date: 2022-03-23 21:14:28
Message-ID: 20220323211428.frazv4typmnm2t3x@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2022-03-23 16:34:04 -0400, Robert Haas wrote:
> Therefore, I think it might be safe to just ... turn this on. One reason I
> think that is that this whole approach was recommended to me by Andres ...

I didn't do a super careful analysis of the issues... But I do think it's
pretty much the one case where it "should" be safe.

The most likely source of problem would errors thrown while zstd threads are
alive. Should make sure that that can't happen.

What is the lifetime of the threads zstd spawns? Are they tied to a single
compression call? A single ZSTD_createCCtx()? If the latter, how bulletproof
is our code ensuring that we don't leak such contexts?

If they're short-lived, are we compressing large enough batches to not waste a
lot of time starting/stopping threads?

> but that's not to say that there couldn't be problems. I worry a bit that
> the mere presence of threads could in some way mess things up, but I don't
> know what the mechanism for that would be, and I don't want to postpone
> shipping useful features based on nebulous fears.

One thing that'd be good to tests for is cancelling in-progress server-side
compression. And perhaps a few assertions that ensure that we don't escape
with some threads still running. That'd have to be platform dependent, but I
don't see a problem with that in this case.

> For both parallel and non-parallel zstd compression, I see differences
> between the compressed size depending on where the compression is
> done. I don't know whether this is an expected behavior of the zstd
> library or a bug. Both files uncompress OK and pass pg_verifybackup,
> but that doesn't mean we're not, for example, selecting different
> compression levels where we shouldn't be. I'll try to figure out
> what's going on here.
>
> zstd, client-side: 1.7GB, 17 seconds
> zstd, server-side: 1.3GB, 25 seconds
> parallel zstd, 4 workers, client-side: 1.7GB, 7.5 seconds
> parallel zstd, 4 workers, server-side: 1.3GB, 7.2 seconds

What causes this fairly massive client-side/server-side size difference?

> + /*
> + * We check for failure here because (1) older versions of the library
> + * do not support ZSTD_c_nbWorkers and (2) the library might want to
> + * reject unreasonable values (though in practice it does not seem to do
> + * so).
> + */
> + ret = ZSTD_CCtx_setParameter(streamer->cctx, ZSTD_c_nbWorkers,
> + compress->workers);
> + if (ZSTD_isError(ret))
> + {
> + pg_log_error("could not set compression worker count to %d: %s",
> + compress->workers, ZSTD_getErrorName(ret));
> + exit(1);
> + }

Will this cause test failures on systems with older zstd?

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Euler Taveira 2022-03-23 21:19:34 Re: logical replication restrictions
Previous Message Pavel Borisov 2022-03-23 21:12:27 Re: [PATCH] Remove workarounds to format [u]int64's