Re: WIP/PoC for parallel backup

From: Ahsan Hadi <ahsan(dot)hadi(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Asif Rehman <asifr(dot)rehman(at)gmail(dot)com>, Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>, Kashif Zeeshan <kashif(dot)zeeshan(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>
Subject: Re: WIP/PoC for parallel backup
Date: 2020-04-15 08:49:39
Message-ID: CA+9bhC+TzeDuH9-eOD4-d4n9fKROGxFD94tMALYdniBi5G01BQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 15 Apr 2020 at 1:49 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Tue, Apr 14, 2020 at 10:37 AM Asif Rehman <asifr(dot)rehman(at)gmail(dot)com>
> wrote:
> > I forgot to make a check for no-manifest. Fixed. Attached is the updated
> patch.
>
> +typedef struct
> +{
> ...
> +} BackupFile;
> +
> +typedef struct
> +{
> ...
> +} BackupState;
>
> These structures need comments.
>
> +list_wal_files_opt_list:
> + SCONST SCONST
> {
> - $$ = makeDefElem("manifest_checksums",
> -
> (Node *)makeString($2), -1);
> + $$ = list_make2(
> + makeDefElem("start_wal_location",
> + (Node *)makeString($2),
> -1),
> + makeDefElem("end_wal_location",
> + (Node *)makeString($2),
> -1));
> +
> }
>
> This seems like an unnecessarily complicated parse representation. The
> DefElems seem to be completely unnecessary here.
>
> @@ -998,7 +1110,37 @@ SendBaseBackup(BaseBackupCmd *cmd)
> set_ps_display(activitymsg);
> }
>
> - perform_base_backup(&opt);
> + switch (cmd->cmdtag)
>
> So the design here is that SendBaseBackup() is now going to do a bunch
> of things that are NOT sending a base backup? With no updates to the
> comments of that function and no change to the process title it sets?
>
> - return (manifest->buffile != NULL);
> + return (manifest && manifest->buffile != NULL);
>
> Heck no. It appears that you didn't even bother reading the function
> header comment.
>
> + * Send a single resultset containing XLogRecPtr record (in text format)
> + * TimelineID and backup label.
> */
> static void
> -SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli)
> +SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli,
> + StringInfo label, char *backupid)
>
> This just casually breaks wire protocol compatibility, which seems
> completely unacceptable.
>
> + if (strlen(opt->tablespace) > 0)
> + sendTablespace(opt->tablespace, NULL, true, NULL, &files);
> + else
> + sendDir(".", 1, true, NIL, true, NULL, NULL, &files);
> +
> + SendFilesHeader(files);
>
> So I guess the idea here is that we buffer the entire list of files in
> memory, regardless of size, and then we send it out afterwards. That
> doesn't seem like a good idea. The list of files might be very large.
> We probably need some code refactoring here rather than just piling
> more and more different responsibilities onto sendTablespace() and
> sendDir().
>
> + if (state->parallel_mode)
> + SpinLockAcquire(&state->lock);
> +
> + state->throttling_counter += increment;
> +
> + if (state->parallel_mode)
> + SpinLockRelease(&state->lock);
>
> I don't like this much. It seems to me that we would do better to use
> atomics here all the time, instead of conditional spinlocks.
>
> +static void
> +send_file(basebackup_options *opt, char *file, bool missing_ok)
> ...
> + if (file == NULL)
> + return;
>
> That seems totally inappropriate.
>
> + sendFile(file, file + basepathlen, &statbuf,
> true, InvalidOid, NULL, NULL);
>
> Maybe I'm misunderstanding, but this looks like it's going to write a
> tar header, even though we're not writing a tarfile.
>
> + else
> + ereport(WARNING,
> + (errmsg("skipping special file
> or directory \"%s\"", file)));
>
> So, if the user asks for a directory or symlink, what's going to
> happen is that they're going to receive an empty file, and get a
> warning. That sounds like terrible behavior.
>
> + /*
> + * Check for checksum failures. If there are failures across
> multiple
> + * processes it may not report total checksum count, but it will
> error
> + * out,terminating the backup.
> + */
>
> In other words, the patch breaks the feature. Not that the feature in
> question works particularly well as things stand, but this makes it
> worse.
>
> I think this patch (0003) is in really bad shape. I'm having second
> thoughts about the design, but it's kind of hard to even have a
> discussion about the design when the patch is riddled with minor
> problems like inadequate comments, failure to update existing
> comments, and breaking a bunch of things. I understand that sometimes
> things get missed, but this is version 14 of a patch that's been
> kicking around since last August.

Fair enough. Some of this is also due to backup related features i.e backup
manifest, progress reporting that got committed to master towards the tail
end of PG-13. Rushing to get parallel backup feature compatible with these
features also caused some of the oversights.

>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
> --
Highgo Software (Canada/China/Pakistan)
URL : http://www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
EMAIL: mailto: ahsan(dot)hadi(at)highgo(dot)ca

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ants Aasma 2020-04-15 09:05:47 Re: Parallel copy
Previous Message Ants Aasma 2020-04-15 08:45:31 Re: Parallel copy