Re: Add LZ4 compression in pg_dump

From: gkokolatos(at)pm(dot)me
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Rachel Heaton <rachelmheaton(at)gmail(dot)com>
Subject: Re: Add LZ4 compression in pg_dump
Date: 2022-12-20 11:19:15
Message-ID: srwGkjEwoQg3cdO9k1mMuIu9eEqkOA27Ikvf0cNR9mvBLVZPidKMk4fmKeVIsxVkgbVk2yeIL-Neyjb2Mkz9_ULGfk2qeYW8ZnM1iKIyklY=@pm.me
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

------- Original Message -------
On Monday, December 19th, 2022 at 6:27 PM, Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:

>
>
> On Mon, Dec 19, 2022 at 05:03:21PM +0000, gkokolatos(at)pm(dot)me wrote:
>
> > > > 001 still doesn't compile on freebsd, and 002 doesn't compile on
> > > > windows. Have you checked test results from cirrusci on your private
> > > > github account ?
> >
> > There are still known gaps in 0002 and 0003, for example documentation,
> > and I have not been focusing too much on those. You are right, it is helpful
> > and kind to try to reduce the noise. The attached should have hopefully
> > tackled the ci errors.
>
>
> Yep. Are you using cirrusci under your github account ?

Thank you. To be very honest, I am not using github exclusively to post patches.
Sometimes I do, sometimes I do not. Is github a requirement?

To answer your question, some of my github accounts are integrated with cirrusci,
others are not.

The current cfbot build is green for what is worth.
https://cirrus-ci.com/build/5934319840002048

>
> > > FYI, I have re-added an entry to the CF app to get some automated
> > > coverage:
> > > https://commitfest.postgresql.org/41/3571/
> >
> > Much obliged. Should I change the state to "ready for review" when post a
> > new version or should I leave that to the senior personnel?
>
>
> It's better to update it to reflect what you think its current status
> is. If you think it's ready for review.

Thank you.

>
> > > > 002 breaks "pg_dump -Fc -Z2" because (I think) AllocateCompressor()
> > > > doesn't store the passed-in compression_spec.
> >
> > I am afraid I have not been able to reproduce this error. I tried both
> > debian and freebsd after I addressed the compilation warnings. Which
> > error did you get? Is it still present in the attached?
>
>
> It's not that there's an error - it's that compression isn't working.
>
> $ ./tmp_install/usr/local/pgsql/bin/pg_dump -h /tmp -Z1 -Fp regression |wc -c
> 659956
> $ ./tmp_install/usr/local/pgsql/bin/pg_dump -h /tmp -Z2 -Fp regression |wc -c
> 637192
>
> $ ./tmp_install/usr/local/pgsql/bin/pg_dump -h /tmp -Z1 -Fc regression |wc -c
> 1954890
> $ ./tmp_install/usr/local/pgsql/bin/pg_dump -h /tmp -Z2 -Fc regression |wc -c
> 1954890
>

Thank you. Now I understand what you mean. Trying the same on top of v18-0003
on Ubuntu 22.04 yields:

$ for compression in none gzip:1 gzip:6 gzip:9; do \
pg_dump --format=custom --compress="$compression" -f regression."$compression".dump -d regression; \
wc -c regression."$compression".dump; \
done;
14963753 regression.none.dump
3600183 regression.gzip:1.dump
3223755 regression.gzip:6.dump
3196903 regression.gzip:9.dump

and on FreeBSD 13.1

$ for compression in none gzip:1 gzip:6 gzip:9; do \
pg_dump --format=custom --compress="$compression" -f regression."$compression".dump -d regression; \
wc -c regression."$compression".dump; \
done;
14828822 regression.none.dump
3584304 regression.gzip:1.dump
3208548 regression.gzip:6.dump
3182044 regression.gzip:9.dump

Although there are some variations between the installations, within the same
installation the size of the dump file is shrinking as expected.

Investigating a bit further on the issue, you are correct in identifying an
issue in v17. Up until v16, the compressor function looked like:

+InitCompressorGzip(CompressorState *cs, int compressionLevel)
+{
+ GzipCompressorState *gzipcs;
+
+ cs->readData = ReadDataFromArchiveGzip;
+ cs->writeData = WriteDataToArchiveGzip;
+ cs->end = EndCompressorGzip;
+
+ gzipcs = (GzipCompressorState *) pg_malloc0(sizeof(GzipCompressorState));
+ gzipcs->compressionLevel = compressionLevel;

V17 considered that more options could become available in the future
and changed the signature of the relevant Init functions to:

+InitCompressorGzip(CompressorState *cs, const pg_compress_specification compression_spec)
+{
+ GzipCompressorState *gzipcs;
+
+ cs->readData = ReadDataFromArchiveGzip;
+ cs->writeData = WriteDataToArchiveGzip;
+ cs->end = EndCompressorGzip;
+
+ gzipcs = (GzipCompressorState *) pg_malloc0(sizeof(GzipCompressorState));
+

V18 reinstated the assignment in similar fashion to InitCompressorNone and
InitCompressorLz4:

+void
+InitCompressorGzip(CompressorState *cs, const pg_compress_specification compression_spec)
+{
+ GzipCompressorState *gzipcs;
+
+ cs->readData = ReadDataFromArchiveGzip;
+ cs->writeData = WriteDataToArchiveGzip;
+ cs->end = EndCompressorGzip;
+
+ cs->compression_spec = compression_spec;
+
+ gzipcs = (GzipCompressorState *) pg_malloc0(sizeof(GzipCompressorState));

A test case can be added which performs a check similar to the loop above.
Create a custom dump with the least and most compression for each method.
Then verify that the output sizes differ as expected. This addition could
become 0001 in the current series.

Thoughts?

Cheers,
//Georgios

> --
> Justin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2022-12-20 11:19:21 Re: Allow batched insert during cross-partition updates
Previous Message Bharath Rupireddy 2022-12-20 10:53:52 Re: Use get_call_result_type() more widely