Re: WIP/PoC for parallel backup

From: Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>
To: asifr(dot)rehman(at)gmail(dot)com
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP/PoC for parallel backup
Date: 2019-11-27 08:38:31
Message-ID: CAM2+6=UYH799S2qZ4V=f7EK1Y1HzFQKx8GnQWw2gEET1J=mZhQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Nov 13, 2019 at 7:04 PM Asif Rehman <asifr(dot)rehman(at)gmail(dot)com> wrote:

>
> Sorry, I sent the wrong patches. Please see the correct version of the
> patches (_v6).
>

Review comments on these patches:

1.
+ XLogRecPtr wal_location;

Looking at the other field names in basebackup_options structure, let's use
wallocation instead. Or better startwallocation to be precise.

2.
+ int32 size;

Should we use size_t here?

3.
I am still not sure why we need SEND_BACKUP_FILELIST as a separate command.
Can't we return the file list with START_BACKUP itself?

4.
+ else if (
+#ifndef WIN32
+ S_ISLNK(statbuf.st_mode)
+#else
+ pgwin32_is_junction(pathbuf)
+#endif
+ )
+ {
+ /*
+ * If symlink, write it as a directory. file symlinks only
allowed
+ * in pg_tblspc
+ */
+ statbuf.st_mode = S_IFDIR | pg_dir_create_mode;
+ _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
false);
+ }

In normal backup mode, we skip the special file which is not a regular file
or
a directory or a symlink inside pg_tblspc. But in your patch, above code,
treats it as a directory. Should parallel backup too skip such special
files?

5.
Please keep header file inclusions in alphabetical order in basebackup.c and
pg_basebackup.c

6.
+ /*
+ * build query in form of: SEND_BACKUP_FILES ('base/1/1245/32683',
+ * 'base/1/1245/32683', ...) [options]
+ */

Please update these comments as we fetch one file at a time.

7.
+backup_file:
+ SCONST { $$ = (Node *)
makeString($1); }
+ ;
+

Instead of having this rule with only one constant terminal, we can use
SCONST directly in backup_files_list. However, I don't see any issue with
this approach either, just trying to reduce the rules.

8.
Please indent code within 80 char limit at all applicable places.

9.
Please fix following typos:

identifing => identifying
optionaly => optionally
structre => structure
progrsss => progress
Retrive => Retrieve
direcotries => directories

=====

The other mail thread related to backup manifest [1], is creating a
backup_manifest file and sends that to the client which has optional
checksum and other details including filename, file size, mtime, etc.
There is a patch on the same thread which is then validating the backup too.

Since this patch too gets a file list from the server and has similar
details (except checksum), can somehow parallel backup use the
backup-manifest
infrastructure from that patch?

When the parallel backup is in use, will there be a backup_manifest file
created too? I am just visualizing what will be the scenario when both these
features are checked-in.

[1]
https://www.postgresql.org/message-id/CA+TgmoZV8dw1H2bzZ9xkKwdrk8+XYa+DC9H=F7heO2zna5T6qg@mail.gmail.com

> --
> Asif Rehman
> Highgo Software (Canada/China/Pakistan)
> URL : www.highgo.ca
>
>
Thanks
--
Jeevan Chalke
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Khandekar 2019-11-27 08:46:16 Re: logical decoding : exceeded maxAllocatedDescs for .spill files
Previous Message Michael Paquier 2019-11-27 08:31:54 Re: [patch]socket_timeout in interfaces/libpq