Quick Links

Re: WIP/PoC for parallel backup

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Asif Rehman <asifr(dot)rehman(at)gmail(dot)com>
Cc:	Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP/PoC for parallel backup
Date:	2019-12-19 17:47:22
Message-ID:	CA+TgmobLeEAkxph=bgd4f5wrcSTKdK=2Qq9puccWeuvqtWpLgA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Thu, Dec 12, 2019 at 10:20 AM Asif Rehman <asifr(dot)rehman(at)gmail(dot)com> wrote:
> I have updated the patches (v7 attached) and have taken care of all issues pointed by Jeevan, additionally
> ran the pgindent on each patch. Furthermore, Command names have been renamed as suggested and I
> have simplified the SendFiles function. Client can only request the regular files, any other kind such as
> directories or symlinks will be skipped, the client will be responsible for taking care of such.

Hi,

Patch 0001 of this series conflicts with my recent commit
303640199d0436c5e7acdf50b837a027b5726594; that commit was actually
inspired by some previous study of 0001. That being said, I think 0001
has the wrong idea. There's no reason that I can see why it should be
correct to remove the PG_ENSURE_ERROR_CLEANUP calls from
perform_base_backup(). It's true that if we register a long-lived
before_shmem_exit hook, then the backup will get cleaned up even
without the PG_ENSURE_ERROR_CLEANUP block, but there's also the
question of the warning message. I think that our goal should be to
emit the warning message about a backup being stopped too early if the
user uses either pg_start_backup() or the new START_BACKUP command and
does not end the backup with either pg_stop_backup() or the new
STOP_BACKUP command -- but not if a single command that both starts
and ends a backup, like BASE_BACKUP, is interrupted. To accomplish
that goal in the wake of 303640199d0436c5e7acdf50b837a027b5726594, we
need to temporarily register do_pg_abort_backup() as a
before_shmem_exit() handler using PG_ENSURE_ERROR_CLEANUP() during
commands like BASE_BACKUP() -- and for things like pg_start_backup()
or the new START_BACKUP command, we just need to add a single call to
register_persistent_abort_backup_handler().

So I think you can drop 0001, and then in the patch that actually
introduces START_BACKUP, add the call to
register_persistent_abort_backup_handler() before calling
do_pg_start_backup(). Also in that patch, also adjust the warning text
that do_pg_abort_backup() emits to be more generic e.g. "aborting
backup due to backend exiting while a non-exclusive backup is in
progress".

0003 creates three new functions, moving code from
do_pg_start_backup() to a new function collectTablespaces() and from
perform_base_backup() to new functions setup_throttle() and
include_wal_files(). I'm skeptical about all of these changes. One
general nitpick is that the way these function names are capitalized
and punctuated does not seem to have been chosen very consistently;
how about name_like_this() throughout? A bit more substantively:

- collectTablespaces() is factored out of do_pg_start_backup() so that
it can also be used by SendFileList(), but that means that a client is
going to invoke START_BACKUP, indirectly calling collectTablespaces(),
and then immediately afterward the client is probably going to call
SEND_FILE_LIST, which will again call collectTablespaces(). That does
not appear to be super-great. For one thing, it's duplicate work,
although because SendFileList() is going to pass infotbssize as false,
it's not a lot of duplicated work. Also, what happens if the two calls
to collectTablespaces() return different answers due to concurrent
CREATE/DROP TABLESPACE commands? Maybe it would all work out fine, but
it seems like there is at least the possibility of bugs if different
parts of the backup have different notions of what tablespaces exist.

- setup_throttle() is factored out of perform_base_backup() so that it
can be called in StartBackup() and StopBackup() and SendFiles(). This
seems extremely odd. Why does it make any sense to give the user an
option to activate throttling when *ending* a backup? Why does it make
sense to give the user a chance to enable throttling *both* at the
startup of a backup *and also* for each individual file. If we're
going to support throttling here, it seems like it should be either a
backup-level property or a file-level property, not both.

- include_wal_files() is factored out of perform_base_backup() so that
it can be called by StopBackup(). This seems like a poor design
decision. The idea behind the BASE_BACKUP command is that you run that
one command, and the server sends you everything. The idea in this new
way of doing business is that the client requests the individual files
it wants -- except for the WAL files, which are for some reason not
requested individually but sent all together as part of the
STOP_BACKUP response. It seems like it would be more consistent if the
client were to decide which WAL files it needs and request them one by
one, just as we do with other files.

I think there's a common theme to all of these complaints, which is
that you haven't done enough to move things that are the
responsibility of the backend in the BASE_BACKUP model to the frontend
in this model. I started wondering, for example, whether it might not
be better to have the client rather than the server construct the
tablespace_map file. After all, the client needs to get the list of
files anyway (hence SEND_FILE_LIST) and if it's got that then it knows
almost enough to construct the tablespace map. The only additional
thing it needs is the full pathname to which the link points. But, it
seems that we could fairly easily extend SEND_FILE_LIST to send, for
files that are symbolic links, the target of the link, using a new
column. Or alternatively, using a separate command, so that instead of
just sending a single SEND_FILE_LIST command, the client might first
ask for a tablespace list and then might ask for a list of files
within each tablespace (e.g. LIST_TABLESPACES, then LIST_FILES <oid>
for each tablespace, with 0 for the main tablespace, perhaps). I'm not
sure which way is better.

Similarly, for throttling, I have a hard time understanding how what
you've got here is going to work reasonably. It looks like each client
is just going to request whatever MAX_RATE the user specifies, but the
result of that will be that the actual transfer rate is probably a
multiple of the specified rate, approximately equal to the specified
rate times the number of clients. That's probably not what the user
wants. You could take the specified rate and divide it by the number
of workers, but limiting each of 4 workers to a quarter of the rate
will probably lead to a combined rate of less than than the specified
rate, because if one worker doesn't use all of the bandwidth to which
it's entitled, or even exits earlier than the others, the other
workers don't get to go any faster as a result. Another problem is
that, in the current approach, throttling applies overall to the
entire backup, but in this approach, it is applied separately to each
SEND_FILE command. In the current approach, if one file finishes a
little faster or slower than anticipated, the next file in the tarball
will be sent a little slower or faster to compensate. But in this
approach, each SEND_FILES command is throttled separately, so this
property is lost. Furthermore, while BASEBACKUP sends data
continuously, this approach naturally involves pauses between
commands. If files are large, that won't matter much, but if they're
small and numerous, it will tend to cause the actual transfer rate to
be less than the throttling rate.

One potential way to solve this problem is... move it to the client
side. Instead of making it the server's job not to send data too fast,
make it the client's job not to receive data too fast. Let the server
backends write as fast as they want, and on the pg_basebackup side,
have the threads coordinate with each other so that they don't read
data faster than the configured rate. That's not quite the same thing,
though, because the server can get ahead by the size of the client's
receive buffers plus whatever data is on the wire. I don't know
whether that's a big enough problem to be worth caring about. If it
is, then I think we need some server infrastructure to "group
throttle" a group of cooperating backends.

A general comment about 0004 is that it seems like you've proceeded by
taking the code from perform_base_backup() and spreading it across
several different functions without, necessarily, as much thought as
is needed there. For instance, StartBackup() looks like just the
beginning of perform_base_backup(). But, why shouldn't it instead look
like pg_start_backup() -- in fact, a simplified version that only
handles the non-exclusive backup case? Is the extra stuff it's doing
really appropriate? I've already complained about the
tablespace-related stuff here and the throttling, but there's more.
Setting statrelpath here will probably break if somebody tries to use
SEND_FILES without first calling START_BACKUP. Sending the
backup_label file here is oddly asymmetric, because that's done by
pg_stop_backup(), not pg_start_backup(). And similarly, StopBackup()
looks like it's just the end of perform_base_backup(), but that's not
pretty strange-looking too. Again, I've already complained about
include_wal_files() being part of this, but there's also:

+ /* ... and pg_control after everything else. */

...which (1) is an odd thing to say when this is the first thing this
particular function is to send and (2) is another example of a sloppy
division of labor between client and server; apparently, the client is
supposed to know not to request pg_control, because the server is
going to send it unsolicited. There's no particular reason to have
this a special case. The client could just request it last. And then
the server code wouldn't need a special case, and you wouldn't have
this odd logic split between the client and the server.

Overall, I think this needs a lot more work. The overall idea's not
wrong, but there seem to be a very large number of details which, at
least to me, do not seem to be correct.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Re: WIP/PoC for parallel backup at 2019-12-12 15:19:57 from Asif Rehman

Responses

Re: WIP/PoC for parallel backup at 2020-01-04 06:53:52 from Asif Rehman

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Robert Haas	2019-12-19 17:52:39	Re: [PATCH] Increase the maximum value track_activity_query_size
Previous Message	Dave Cramer	2019-12-19 16:59:05	How is this possible "publication does not exist"