Re: design for parallel backup

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: design for parallel backup
Date: 2020-04-22 16:12:32
Message-ID: CA+TgmoYdrZd1Tt9=ztCRpittNO-if2NAsLekiJC4yHT-R2ptFA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Apr 22, 2020 at 11:24 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> *My* gut feeling is that you're going to have a harder time using CPU
> time efficiently when doing parallel compression via multiple processes
> and independent connections. You're e.g. going to have a lot more
> context switches, I think. And there will be network overhead from doing
> more connections (including worse congestion control).

OK, noted. I'm still doubtful that the optimal number of connections
is 1, but it might be that the optimal number of CPU cores to apply to
compression is much higher than the optimal number of connections. For
instance, suppose there are two equally sized tablespaces on separate
drives, but zstd with 10-way parallelism is our chosen compression
strategy. It seems to me that two connections has an excellent chance
of being faster than one, because with only one connection I don't see
how you can benefit from the opportunity to do I/O in parallel.
However, I can also see that having twenty connections just as a way
to get 10-way parallelism for each tablespace might be undesirable
and/or inefficient for various reasons.

> this results in a 16GB base backup. I think this is probably a good bit
> less compressible than most PG databases.
>
> method level parallelism wall-time cpu-user-time cpu-kernel-time size rate format
> gzip 1 1 305.37 299.72 5.52 7067232465 2.28
> lz4 1 1 33.26 27.26 5.99 8961063439 1.80 .lz4
> lz4 3 1 188.50 182.91 5.58 8204501460 1.97 .lz4
> zstd 1 1 66.41 58.38 6.04 6925634128 2.33 .zstd
> zstd 1 10 9.64 67.04 4.82 6980075316 2.31 .zstd
> zstd 3 1 122.04 115.79 6.24 6440274143 2.50 .zstd
> zstd 3 10 13.65 106.11 5.64 6438439095 2.51 .zstd
> zstd 9 10 100.06 955.63 6.79 5963827497 2.71 .zstd
> zstd 15 10 259.84 2491.39 8.88 5912617243 2.73 .zstd
> pixz 1 10 162.59 1626.61 15.52 5350138420 3.02 .xz
> plzip 1 20 135.54 2705.28 9.25 5270033640 3.06 .lz

So, picking a better compressor in this case looks a lot less
exciting. Parallel zstd still compresses somewhat better than
single-core lz4, but the difference in compression ratio is far less,
and the amount of CPU you have to burn in order to get that extra
compression is pretty large.

> I don't really see a problem with emitting .zip files. It's an extremely
> widely used container format for all sorts of file formats these days.
> Except for needing a bit more complicated (and I don't think it's *that*
> big of a difference) code during generation / unpacking, it seems
> clearly advantageous over .tar.gz etc.

Wouldn't that imply buying into DEFLATE as our preferred compression algorithm?

Either way, I don't really like the idea of having PostgreSQL have its
own code to generate and interpret various archive formats. That seems
like a maintenance nightmare and a recipe for bugs. How can anyone
even verify that our existing 'tar' code works with all 'tar'
implementations out there, or that it's correct in all cases? Do we
really want to maintain similar code for other formats, or even for
this one? I'd say "no". We should pick archive formats that have good,
well-maintained libraries with permissive licenses and then use those.
I don't know whether "zip" falls into that category or not.

> > Other options include, perhaps, (1) emitting a tarfile of compressed
> > files instead of a compressed tarfile
>
> Yea, that'd help some. Although I am not sure how good the tooling to
> seek through tarfiles in an O(files) rather than O(bytes) manner is.

Well, considering that at present we're using hand-rolled code...

> I think there some cases where using separate compression state for each
> file would hurt us. Some of the archive formats have support for reusing
> compression state, but I don't know which.

Yeah, I had the same thought. People with mostly 1GB relation segments
might not notice much difference, but people with lots of little
relations might see a more significant difference.

> Hm. There's some appeal to just store offsets in the manifest, and to
> make sure it's a seakable offset in the compression stream. OTOH, it
> makes it pretty hard for other tools to generate a compatible archive.

Yeah.

FWIW, I don't see it as being entirely necessary to create a seekable
compressed archive format, let alone to make all of our compressed
archive formats seekable. I think supporting multiple compression
algorithms in a flexible way that's not too tied to the capabilities
of particular algorithms is more important. If you want fast restores
of incremental and differential backups, consider using -Fp rather
than -Ft. Or we can have a new option that's like -Fp but every file
is compressed individually in place, or files larger than N bytes are
compressed in place using a configurable algorithm. It might be
somewhat less efficient but it's also way less complicated to
implement, and I think that should count for something. I don't want
to get so caught up in advanced features here that we don't make any
useful progress at all. If we can add better features without a large
complexity increment, and without drawing objections from others on
this list, great. If not, I'm prepared to summarily jettison it as
nice-to-have but not essential.

> I don't really see any of the concerns there to apply for the base
> backup case.

I felt like there was some reason that threads were bad, but it may
have just been the case you mentioned and not relevant here.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jehan-Guillaume de Rorthais 2020-04-22 16:17:17 Re: [BUG] non archived WAL removed during production crash recovery
Previous Message Erik Rijkers 2020-04-22 16:01:23 Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions