Re: WIP/PoC for parallel backup

From: Asif Rehman <asifr(dot)rehman(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, dipesh(dot)pandit(at)gmail(dot)com
Cc: Kashif Zeeshan <kashif(dot)zeeshan(at)enterprisedb(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP/PoC for parallel backup
Date: 2020-04-22 14:18:20
Message-ID: CADM=JegOqxdzTVdCshtqtuCO7mQDk7enQoYbaeuAjJKoeEq=Og@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Dipesh,

The rebased and updated patch is attached. Its rebased to (9f2c4ede).

> +typedef struct
> +{
> ...
> +} BackupFile;
> +
> +typedef struct
> +{
> ...
> +} BackupState;
>
> These structures need comments.
>
Done.

>
> +list_wal_files_opt_list:
> + SCONST SCONST
> {
> - $$ = makeDefElem("manifest_checksums",
> -
> (Node *)makeString($2), -1);
> + $$ = list_make2(
> + makeDefElem("start_wal_location",
> + (Node *)makeString($2),
> -1),
> + makeDefElem("end_wal_location",
> + (Node *)makeString($2),
> -1));
> +
> }
>
> This seems like an unnecessarily complicated parse representation. The
> DefElems seem to be completely unnecessary here.
>

The startptr and endptr are now in a shared state. so this command does not
need to have these two options now. So I have removed this rule entirely.

> @@ -998,7 +1110,37 @@ SendBaseBackup(BaseBackupCmd *cmd)
> set_ps_display(activitymsg);
> }
>
> - perform_base_backup(&opt);
> + switch (cmd->cmdtag)
>
> So the design here is that SendBaseBackup() is now going to do a bunch
> of things that are NOT sending a base backup? With no updates to the
> comments of that function and no change to the process title it sets?
>

Okay. I have renamed the function and have updated the comments.

>
> - return (manifest->buffile != NULL);
> + return (manifest && manifest->buffile != NULL);
>
> Heck no. It appears that you didn't even bother reading the function
> header comment.
>

Okay, I forgot to remove this check. In the backup manifest patch,
manifest_info
object is always available. Anyways I have removed this check for 003 patch
as well.

>
> + * Send a single resultset containing XLogRecPtr record (in text format)
> + * TimelineID and backup label.
> */
> static void
> -SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli)
> +SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli,
> + StringInfo label, char *backupid)
>
> This just casually breaks wire protocol compatibility, which seems
> completely unacceptable.
>

Non-parallal backup returns startptr and tli in the result set. The
START_BACKUP
returns startptr, tli, backup label and backupid. So I had extended this
result set.

I have removed the changes from SendXlogRecPtrResult and have added another
function just for returning the result set for parallel backup.

>
> + if (strlen(opt->tablespace) > 0)
> + sendTablespace(opt->tablespace, NULL, true, NULL, &files);
> + else
> + sendDir(".", 1, true, NIL, true, NULL, NULL, &files);
> +
> + SendFilesHeader(files);
>
> So I guess the idea here is that we buffer the entire list of files in
> memory, regardless of size, and then we send it out afterwards. That
> doesn't seem like a good idea. The list of files might be very large.
> We probably need some code refactoring here rather than just piling
> more and more different responsibilities onto sendTablespace() and
> sendDir().
>

I don't foresee memory to be a challenge here. Assuming a database
containing 10240
relation files (that max reach to 10 TB of size), the list will occupy
approximately 102MB
of space in memory. This obviously can be reduced, but it doesn’t seem too
bad either.
One way of doing it is by fetching a smaller set of files and clients can
result in the next
set if the current one is processed; perhaps fetch initially per table
space and request for
next one once the current one is done with.

Currently, basebackup only does compression on the client-side. So, I
suggest we stick with
the existing behavior. On the other thread, you have mentioned that the
backend should send
the tarballs and that the server should decide which files per tarball. I
believe the current
design can accommodate that easily if it's the client deciding the files
per tarball. The current
design can also accommodate server-side compression and encryption with
minimal changes.
Is there a point I’m overlooking here?

>
> + if (state->parallel_mode)
> + SpinLockAcquire(&state->lock);
> +
> + state->throttling_counter += increment;
> +
> + if (state->parallel_mode)
> + SpinLockRelease(&state->lock);
>
> I don't like this much. It seems to me that we would do better to use
> atomics here all the time, instead of conditional spinlocks.
>

Okay have added throttling_counter as atomic. however a lock is still
required
for throttling_counter%=throttling_sample.

>
> +static void
> +send_file(basebackup_options *opt, char *file, bool missing_ok)
> ...
> + if (file == NULL)
> + return;
>
> That seems totally inappropriate.
>

Removed.

> + sendFile(file, file + basepathlen, &statbuf,
> true, InvalidOid, NULL, NULL);
>
> Maybe I'm misunderstanding, but this looks like it's going to write a
> tar header, even though we're not writing a tarfile.
>

sendFile() always sends files with tar header included, even if the backup
mode

is plain. pg_basebackup also expects the same. That's the current behavior
of

the system.

Otherwise, we will have to duplicate this function which would be doing the
pretty

much same thing, except the tar header.

>
> + else
> + ereport(WARNING,
> + (errmsg("skipping special file
> or directory \"%s\"", file)));
>
> So, if the user asks for a directory or symlink, what's going to
> happen is that they're going to receive an empty file, and get a
> warning. That sounds like terrible behavior.
>

Removed the warning and generated an error if other then a regular file is
requested.

>
>
> + /*
> + * Check for checksum failures. If there are failures across
> multiple
> + * processes it may not report total checksum count, but it will
> error
> + * out,terminating the backup.
> + */
>
> In other words, the patch breaks the feature. Not that the feature in
> question works particularly well as things stand, but this makes it
> worse.
>

Added an atomic uint64 total_checksum_failures to shared state to keep
the total count across workers, So it will have the same behavior as
current.

--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Attachment Content-Type Size
parallel_backup_v15.zip application/zip 62.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-04-22 14:19:51 Re: HEAPDEBUGALL is broken
Previous Message Tom Lane 2020-04-22 14:17:26 Re: HEAPDEBUGALL is broken