Re: multithreaded zstd backup compression for client and server

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Dipesh Pandit <dipesh(dot)pandit(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: multithreaded zstd backup compression for client and server
Date: 2022-03-23 22:31:12
Message-ID: CA+TgmoZms1KvLSXsAhXwKJ0FzY-mVe-B3=m-sePW6i-0k8vrTA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 23, 2022 at 5:14 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> The most likely source of problem would errors thrown while zstd threads are
> alive. Should make sure that that can't happen.
>
> What is the lifetime of the threads zstd spawns? Are they tied to a single
> compression call? A single ZSTD_createCCtx()? If the latter, how bulletproof
> is our code ensuring that we don't leak such contexts?

I haven't found any real documentation explaining how libzstd manages
its threads. I am assuming that it is tied to the ZSTD_CCtx, but I
don't know. I guess I could try to figure it out from the source code.
Anyway, what we have now is a PG_TRY()/PG_CATCH() block around the
code that uses the basink which will cause bbsink_zstd_cleanup() to
get called in the event of an error. That will do ZSTD_freeCCtx().

It's probably also worth mentioning here that even if, contrary to
expectations, the compression threads hang around to the end of time
and chill, in practice nobody is likely to run BASE_BACKUP and then
keep the connection open for a long time afterward. So it probably
wouldn't really affect resource utilization in real-world scenarios
even if the threads never exited, as long as they didn't, you know,
busy-loop in the background. And I assume the actual library behavior
can't be nearly that bad. This is a pretty mainstream piece of
software.

> If they're short-lived, are we compressing large enough batches to not waste a
> lot of time starting/stopping threads?

Well, we're using a single ZSTD_CCtx for an entire base backup. Again,
I haven't found documentation explaining with libzstd is actually
doing, but it's hard to see how we could make the batch any bigger
than that. The context gets reset for each new tablespace, which may
or may not do anything to the compression threads.

> > but that's not to say that there couldn't be problems. I worry a bit that
> > the mere presence of threads could in some way mess things up, but I don't
> > know what the mechanism for that would be, and I don't want to postpone
> > shipping useful features based on nebulous fears.
>
> One thing that'd be good to tests for is cancelling in-progress server-side
> compression. And perhaps a few assertions that ensure that we don't escape
> with some threads still running. That'd have to be platform dependent, but I
> don't see a problem with that in this case.

More specific suggestions, please?

> > For both parallel and non-parallel zstd compression, I see differences
> > between the compressed size depending on where the compression is
> > done. I don't know whether this is an expected behavior of the zstd
> > library or a bug. Both files uncompress OK and pass pg_verifybackup,
> > but that doesn't mean we're not, for example, selecting different
> > compression levels where we shouldn't be. I'll try to figure out
> > what's going on here.
> >
> > zstd, client-side: 1.7GB, 17 seconds
> > zstd, server-side: 1.3GB, 25 seconds
> > parallel zstd, 4 workers, client-side: 1.7GB, 7.5 seconds
> > parallel zstd, 4 workers, server-side: 1.3GB, 7.2 seconds
>
> What causes this fairly massive client-side/server-side size difference?

You seem not to have read what I wrote about this exact point in the
text which you quoted.

> Will this cause test failures on systems with older zstd?

I put a bunch of logic in the test case to try to avoid that, so
hopefully not, but if it does, we can adjust the logic.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2022-03-23 22:33:20 Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)
Previous Message Tomas Vondra 2022-03-23 22:30:06 Re: logical decoding and replication of sequences