Re: Add LZ4 compression in pg_dump

From: gkokolatos(at)pm(dot)me
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Alexander Lakhin <exclusion(at)gmail(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Rachel Heaton <rachelmheaton(at)gmail(dot)com>
Subject: Re: Add LZ4 compression in pg_dump
Date: 2023-03-14 15:18:33
Message-ID: 5qAu7vrMV3hEodlIC82b0JICWTwEUSRoIDWlIaczNBIEVVfDH2lccxHNIQ726e4nX_Qqjc4lPc5_z9RtV28NMdVv7BNiADF4JB-FDahmlVk=@pm.me
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

------- Original Message -------
On Monday, March 13th, 2023 at 9:21 PM, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:

>
>
>
>
> On 3/11/23 11:50, gkokolatos(at)pm(dot)me wrote:
>
> > ------- Original Message -------
> > On Saturday, March 11th, 2023 at 7:00 AM, Alexander Lakhin exclusion(at)gmail(dot)com wrote:
> >
> > > Hello,
> > > 23.02.2023 23:24, Tomas Vondra wrote:

>
>
> Thanks for the patch.
>
> I did look if there are other places that might have the same issue, and
> I think there are - see attached 0002. For example LZ4File_write is
> declared to return size_t, but then it also does
>
> if (LZ4F_isError(status))
> {
> fs->errcode = status;
>
> return -1;
> }
>
> That won't work :-(

You are right. It is confusing.

>
> And these issues may not be restricted to lz4 code - Gzip_write is
> declared to return size_t, but it does
>
> return gzwrite(gzfp, ptr, size);
>
> and gzwrite returns int. Although, maybe that's correct, because
> gzwrite() is "0 on error" so maybe this is fine ...
>
> However, Gzip_read assigns gzread() to size_t, and that does not seem
> great. It probably will still trigger the following pg_fatal() because
> it'd be very lucky to match the expected 'size', but it's confusing.

Agreed.

>
>
> I wonder whether CompressorState should use int or size_t for the
> read_func/write_func callbacks. I guess no option is perfect, i.e. no
> data type will work for all compression libraries we might use (lz4 uses
> int while lz4f uses size_t, to there's that).
>
> It's a bit weird the "open" functions return int and the read/write
> size_t. Maybe we should stick to int, which is what the old functions
> (cfwrite etc.) did.
>
You are right. These functions are modeled by the open/fread/
fwrite etc, and they have kept the return types of these ones. Their
callers do check the return value of read_func and write_func against
the requested size of bytes to be transferred.

>
> But I think the actual problem here is that the API does not clearly
> define how errors are communicated. I mean, it's nice to return the
> value returned by the library function without "mangling" it by
> conversion to size_t, but what if the libraries communicate errors in
> different way? Some may return "0" while others may return "-1".

Agreed.

>
> I think the right approach is to handle all library errors and not just
> let them through. So Gzip_write() needs to check the return value, and
> either call pg_fatal() or translate it to an error defined by the API.

It makes sense. It will change some of the behaviour of the callers,
mostly on what constitutes an error, and what error message is emitted.
This is a reasonable change though.

>
> For example we might say "returns 0 on error" and then translate all
> library-specific errors to that.

Ok.

> While looking at the code I realized a couple function comments don't
> say what's returned in case of error, etc. So 0004 adds those.
>
> 0003 is a couple minor assorted comments/questions:
>
> - Should we move ZLIB_OUT_SIZE/ZLIB_IN_SIZE to compress_gzip.c?

It would make things clearer.

> - Why are LZ4 buffer sizes different (ZLIB has both 4kB)?

Clearly some comments are needed, if the difference makes sense.

> - I wonder if we actually need LZ4F_HEADER_SIZE_MAX? Is it even possible
> for LZ4F_compressBound to return value this small (especially for 16kB
> input buffer)?
>

I would recommend to keep it. Earlier versions of LZ4F_HEADER_SIZE_MAX
do not have it. Later versions do advise to use it.

Would you mind me trying to come with a patch to address your points?

Cheers,
//Georgios

>
>
> regards
>
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Takamichi Osumi (Fujitsu) 2023-03-14 15:20:52 RE: Allow logical replication to copy tables in binary format
Previous Message Ilya Gladyshev 2023-03-14 14:58:14 Re: Progress report of CREATE INDEX for nested partitioned tables