Re: refactoring basebackup.c

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, tushar <tushar(dot)ahuja(at)enterprisedb(dot)com>
Subject: Re: refactoring basebackup.c
Date: 2021-07-20 18:57:38
Message-ID: CA+TgmobWAD+b+-rA9zdwhaOxS0quK9iJjD4eq2N8qzUsPYD+CA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jul 19, 2021 at 2:51 PM Mark Dilger
<mark(dot)dilger(at)enterprisedb(dot)com> wrote:
> The difficulty in v3-0007 with pg_basebackup only knowing how to parse tar archives seems to be a natural consequence of not sufficiently abstracting out the handling of the tar format. If the bbsink and bbstreamer abstractions fully encapsulated a set of parsing callbacks, then pg_basebackup wouldn't contain things like:
>
> streamer = bbstreamer_tar_parser_new(streamer);
>
> but instead would use the parser callbacks without knowledge of whether they were parsing tar vs. cpio vs. whatever. It just seems really odd that pg_basebackup is using the extensible abstraction layer and then defeating the purpose by knowing too much about the format. It might even be a useful exercise to write cpio support into this patch set rather than waiting until v16, just to make sure the abstraction layer doesn't have tar-specific assumptions left over.

Well, I had a patch in an earlier patch set that tried to get
knowledge of tar out of basebackup.c, but it couldn't use the bbsink
abstraction; it needed a whole separate abstraction layer which I had
called bbarchiver with a different API. So I dropped it, for fear of
being told, not without some justification, that I was just changing
things for the sake of changing them, and also because having exactly
one implementation of some interface is really not great. I do
conceptually like the idea of making the whole thing flexible enough
to generate cpio or zip archives, because like you I think that having
tar-specific stuff all over the place is grotty, but I have a feeling
there's little market demand for having pg_basebackup produce cpio,
pax, zip, iso, etc. archives. On the other hand, server-side
compression and server-side backup seem like functionality with real
utility. Still, if you or others want to vote for resurrecting
bbarchiver on the grounds that general code cleanup is worthwhile for
its own sake, I'm OK with that, too.

I don't really understand what your problem is with how the patch set
leaves pg_basebackup. On the server side, because I dropped the
bbarchiver stuff, basebackup.c still ends up knowing a bunch of stuff
about tar. pg_basebackup.c, however, really doesn't know anything much
about tar any more. It knows that if it's getting a tar file and needs
to parse a tar file then it had better call the tar parsing code, but
that seems difficult to avoid. What we can avoid, and I think the
patch set does, is pg_basebackup.c having any real knowledge of what
the tar parser is doing under the hood.

Thanks also for the detailed comments. I'll try to the right number of
verbs in each sentence in the next version of the patch. I will also
look into the issues mentioned by Dilip and Tushar.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2021-07-20 19:00:01 Re: [bug?] Missed parallel safety checks, and wrong parallel safety
Previous Message Alvaro Herrera 2021-07-20 18:57:15 Re: something is wonky with pgbench pipelining