Re: refactoring basebackup.c

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Dipesh Pandit <dipesh(dot)pandit(at)gmail(dot)com>
Cc: Jeevan Ladhe <jeevan(dot)ladhe(at)enterprisedb(dot)com>, tushar <tushar(dot)ahuja(at)enterprisedb(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: refactoring basebackup.c
Date: 2022-01-19 15:56:10
Message-ID: CA+Tgmobm+k0VSeyCa9_TbB9FJfZM6evqfeB7_qC+fMrvUuVvCw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 19, 2022 at 7:16 AM Dipesh Pandit <dipesh(dot)pandit(at)gmail(dot)com> wrote:
> I have added support for decompressing a gzip compressed tar file
> at client. pg_basebackup can enable server side compression for
> plain format backup with this change.
>
> Added a gzip extractor which decompresses the compressed archive
> and forwards it to the next streamer. I have done initial testing and
> working on updating the test coverage.

Cool. It's going to need some documentation changes, too.

I don't like the way you coded this in CreateBackupStreamer(). I would
like the decision about whether to use
bbstreamer_gzip_extractor_new(), and/or throw an error about not being
able to parse an archive, to based on the file type i.e. "did we get a
.tar.gz file?" rather than on whether we asked for server-side
compression. Notice that the existing logic checks whether we actually
got a .tar file from the server rather than assuming that's what must
have happened.

As a matter of style, I don't think it's good for the only thing
inside of an "if" statement to be another "if" statement. The two
could be merged, but we also don't want to have the "if" conditional
be too complex. I am imagining that this should end up saying
something like if (must_parse_archive && !is_tar && !is_tar_gz) {
pg_log_error(...

+ * "windowBits" must be greater than or equal to "windowBits" value
+ * provided to deflateInit2 while compressing.

It would be nice to clarify why we know the value we're using is safe.
Maybe we're using the maximum possible value, in which case you could
just add that to the end of the comment: "...so we use the maximum
possible value for safety."

+ /*
+ * End of the stream, if there is some pending data in output buffers then
+ * we must forward it to next streamer.
+ */
+ if (res == Z_STREAM_END) {
+ bbstreamer_content(mystreamer->base.bbs_next, member,
mystreamer->base.bbs_buffer.data,
+ mystreamer->bytes_written, context);
+ }

Uncuddle the brace.

It probably doesn't make much difference, but I would be inclined to
do the final flush in bbstreamer_gzip_extractor_finalize() rather than
here. That way we rely on our own notion of when there's no more input
data rather than zlib's notion. Probably terrible things are going to
happen if those two ideas don't match up .... but there might be some
other compression algorithm that doesn't return a distinguishing code
at end-of-stream. Such an algorithm would have to take care of any
leftover data in the finalize function, so I think we should do that
here too, so the code can be similar in all cases.

Perhaps we should move all the gzip stuff to a new file bbstreamer_gzip.c.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message John Naylor 2022-01-19 16:08:51 Re: A qsort template
Previous Message Alvaro Herrera 2022-01-19 15:50:44 Re: Refactoring of compression options in pg_basebackup