Re: WIP/PoC for parallel backup

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Asif Rehman <asifr(dot)rehman(at)gmail(dot)com>
Cc: Kashif Zeeshan <kashif(dot)zeeshan(at)enterprisedb(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP/PoC for parallel backup
Date: 2020-04-14 20:49:04
Message-ID: CA+TgmoZ81uKVCE3vFxP1EVwbLBUL9hSU_byg5F2-L1eVXajoYw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Apr 14, 2020 at 10:37 AM Asif Rehman <asifr(dot)rehman(at)gmail(dot)com> wrote:
> I forgot to make a check for no-manifest. Fixed. Attached is the updated patch.

+typedef struct
+{
...
+} BackupFile;
+
+typedef struct
+{
...
+} BackupState;

These structures need comments.

+list_wal_files_opt_list:
+ SCONST SCONST
{
- $$ = makeDefElem("manifest_checksums",
-
(Node *)makeString($2), -1);
+ $$ = list_make2(
+ makeDefElem("start_wal_location",
+ (Node *)makeString($2), -1),
+ makeDefElem("end_wal_location",
+ (Node *)makeString($2), -1));
+
}

This seems like an unnecessarily complicated parse representation. The
DefElems seem to be completely unnecessary here.

@@ -998,7 +1110,37 @@ SendBaseBackup(BaseBackupCmd *cmd)
set_ps_display(activitymsg);
}

- perform_base_backup(&opt);
+ switch (cmd->cmdtag)

So the design here is that SendBaseBackup() is now going to do a bunch
of things that are NOT sending a base backup? With no updates to the
comments of that function and no change to the process title it sets?

- return (manifest->buffile != NULL);
+ return (manifest && manifest->buffile != NULL);

Heck no. It appears that you didn't even bother reading the function
header comment.

+ * Send a single resultset containing XLogRecPtr record (in text format)
+ * TimelineID and backup label.
*/
static void
-SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli)
+SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli,
+ StringInfo label, char *backupid)

This just casually breaks wire protocol compatibility, which seems
completely unacceptable.

+ if (strlen(opt->tablespace) > 0)
+ sendTablespace(opt->tablespace, NULL, true, NULL, &files);
+ else
+ sendDir(".", 1, true, NIL, true, NULL, NULL, &files);
+
+ SendFilesHeader(files);

So I guess the idea here is that we buffer the entire list of files in
memory, regardless of size, and then we send it out afterwards. That
doesn't seem like a good idea. The list of files might be very large.
We probably need some code refactoring here rather than just piling
more and more different responsibilities onto sendTablespace() and
sendDir().

+ if (state->parallel_mode)
+ SpinLockAcquire(&state->lock);
+
+ state->throttling_counter += increment;
+
+ if (state->parallel_mode)
+ SpinLockRelease(&state->lock);

I don't like this much. It seems to me that we would do better to use
atomics here all the time, instead of conditional spinlocks.

+static void
+send_file(basebackup_options *opt, char *file, bool missing_ok)
...
+ if (file == NULL)
+ return;

That seems totally inappropriate.

+ sendFile(file, file + basepathlen, &statbuf,
true, InvalidOid, NULL, NULL);

Maybe I'm misunderstanding, but this looks like it's going to write a
tar header, even though we're not writing a tarfile.

+ else
+ ereport(WARNING,
+ (errmsg("skipping special file
or directory \"%s\"", file)));

So, if the user asks for a directory or symlink, what's going to
happen is that they're going to receive an empty file, and get a
warning. That sounds like terrible behavior.

+ /*
+ * Check for checksum failures. If there are failures across multiple
+ * processes it may not report total checksum count, but it will error
+ * out,terminating the backup.
+ */

In other words, the patch breaks the feature. Not that the feature in
question works particularly well as things stand, but this makes it
worse.

I think this patch (0003) is in really bad shape. I'm having second
thoughts about the design, but it's kind of hard to even have a
discussion about the design when the patch is riddled with minor
problems like inadequate comments, failure to update existing
comments, and breaking a bunch of things. I understand that sometimes
things get missed, but this is version 14 of a patch that's been
kicking around since last August.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-04-14 21:31:13 Re: ERROR: could not determine which collation to use for string comparison
Previous Message Alvaro Herrera 2020-04-14 20:44:32 Re: cleaning perl code