Re: refactoring basebackup.c

From: Sumanta Mukherjee <sumanta(dot)mukherjee(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: refactoring basebackup.c
Date: 2020-05-12 05:26:40
Message-ID: CAMSJAiradWb8ncm+83FngNA59Ss601cN5Me7AwVZeRFVWOqN9A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Robert,
Please see my comments inline below.

On Tue, May 12, 2020 at 12:33 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> Yeah, that needs to be tested. Right now the chunk size is 32kB but it
> might be a good idea to go larger. Another thing is that right now the
> chunk size is tied to the protocol message size, and I'm not sure
> whether the size that's optimal for disk reads is also optimal for
> protocol messages.

One needs a balance between the number of packets to be sent across the
network and so if the size
of reading from the disk and the network packet size could be unified then
it might provide a better optimization.

>
> I don't think it's a particularly bad thing that we include a small
> amount of progress for sending an empty file, a directory, or a
> symlink. That could make the results more meaningful if you have a
> database with lots of empty relations in it. However, I agree that the
> effect of compression shouldn't be included. To get there, I think we
> need to redesign the wire protocol. Right now, the server has no way
> of letting the client know how many uncompressed bytes it's sent, and
> the client has no way of figuring it out without uncompressing, which
> seems like something we want to avoid.
>
>
I agree here too, except that if we have too many tar files one might
cringe
but sending the xtra amt from these tar files looks okay to me.

> There are some other problems with the current wire protocol, too:
>
> 1. The syntax for the BASE_BACKUP command is large and unwieldy. We
> really ought to adopt an extensible options syntax, like COPY, VACUUM,
> EXPLAIN, etc. do, rather than using a zillion ad-hoc bolt-ons, each
> with bespoke lexer and parser support.
>
> 2. The client is sent a list of tablespaces and is supposed to use
> that to expect an equal number of archives, computing the name for
> each one on the client side from the tablespace info. However, I think
> we should be able to support modes like "put all the tablespaces in a
> single archive" or "send a separate archive for every 256GB" or "ship
> it all to the cloud and don't send me any archives". To get there, I
> think we should have the server send the archive name to the clients,
> and the client should just keep receiving the next archive until it's
> told that there are no more. Then if there's one archive or ten
> archives or no archives, the client doesn't have to care. It just
> receives what the server sends until it hears that there are no more.
> It also doesn't know how the server is naming the archives; the server
> can, for example, adjust the archive names based on which compression
> format is being chosen, without knowledge of the server's naming
> conventions needing to exist on the client side.
>
> One thing to remember here could be that an optimization would need to
be made between the number of options
we provide and people coming back and saying which combinations do not
work
For example, if a user script has "put all the tablespaces in a single
archive" and later on somebody makes some
script changes to break it down at "256 GB" and there is a conflict then
which one takes precedence needs to be chosen.
When the number of options like this become very large this could lead to
some complications.

> I think we should keep support for the current BASE_BACKUP command but
> either add a new variant using an extensible options, or else invent a
> whole new command with a different name (BACKUP, SEND_BACKUP,
> whatever) that takes extensible options. This command should send back
> all the archives and the backup manifest using a single COPY stream
> rather than multiple COPY streams. Within the COPY stream, we'll
> invent a sub-protocol, e.g. based on the first letter of the message,
> e.g.:
>
> t = Tablespace boundary. No further message payload. Indicates, for
> progress reporting purposes, that we are advancing to the next
> tablespace.
> f = Filename. The remainder of the message payload is the name of the
> next file that will be transferred.
> d = Data. The next four bytes contain the number of uncompressed bytes
> covered by this message, for progress reporting purposes. The rest of
> the message is payload, possibly compressed. Could be empty, if the
> data is being shipped elsewhere, and these messages are only being
> sent to update the client's notion of progress.
>

Here I support this.

> I thought about that a bit, too. There might be some way to unify that
> by having some common context object that's defined by basebackup.c
> and all archivers get it, so that they have some commonly-desired
> details without needing bespoke code, but I'm not sure at this point
> whether that will actually produce a nicer result. Even if we don't
> have it initially, it seems like it wouldn't be very hard to add it
> later, so I'm not too stressed about it.
>

--Sumanta Mukherjee
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

>
>
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2020-05-12 05:45:22 Re: A comment fix
Previous Message Amit Kapila 2020-05-12 05:00:47 Re: Parallel copy