Re: WIP/PoC for parallel backup

From: Asif Rehman <asifr(dot)rehman(at)gmail(dot)com>
To: Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP/PoC for parallel backup
Date: 2019-12-10 14:33:48
Message-ID: CADM=JegEJSuNguL6EenCOM3mTzyuDnxMGnBotdGcfA7Jsz6fsg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Nov 27, 2019 at 1:38 PM Jeevan Chalke <
jeevan(dot)chalke(at)enterprisedb(dot)com> wrote:

>
>
> On Wed, Nov 13, 2019 at 7:04 PM Asif Rehman <asifr(dot)rehman(at)gmail(dot)com>
> wrote:
>
>>
>> Sorry, I sent the wrong patches. Please see the correct version of the
>> patches (_v6).
>>
>
> Review comments on these patches:
>
> 1.
> + XLogRecPtr wal_location;
>
> Looking at the other field names in basebackup_options structure, let's use
> wallocation instead. Or better startwallocation to be precise.
>
> 2.
> + int32 size;
>
> Should we use size_t here?
>
> 3.
> I am still not sure why we need SEND_BACKUP_FILELIST as a separate command.
> Can't we return the file list with START_BACKUP itself?
>
> 4.
> + else if (
> +#ifndef WIN32
> + S_ISLNK(statbuf.st_mode)
> +#else
> + pgwin32_is_junction(pathbuf)
> +#endif
> + )
> + {
> + /*
> + * If symlink, write it as a directory. file symlinks only
> allowed
> + * in pg_tblspc
> + */
> + statbuf.st_mode = S_IFDIR | pg_dir_create_mode;
> + _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
> false);
> + }
>
> In normal backup mode, we skip the special file which is not a regular
> file or
> a directory or a symlink inside pg_tblspc. But in your patch, above code,
> treats it as a directory. Should parallel backup too skip such special
> files?
>

Yeah going through the code again, I found it a little bit inconsistent. In
fact
SendBackupFiles function is supposed to send the files that were requested
of
it. However, currently is performing these tasks:

1) If the requested file were to be a directory, it will return a TAR
directory entry.
2) If the requested files were to be symlink inside pg_tblspc, it will
return the link path.
3) and as you pointed out above, if the requested files were a symlink
outside pg_tblspc
and inside PGDATA then it will return TAR directory entry.

I think that this function should not take care of any of the above.
Instead, it should
be the client (i.e. pg_basebackup) managing it. The SendBackupFiles should
only send the
regular files and ignore the request of any other kind, be it a directory
or symlink.

Any thoughts?

> 5.
> Please keep header file inclusions in alphabetical order in basebackup.c
> and
> pg_basebackup.c
>
> 6.
> + /*
> + * build query in form of: SEND_BACKUP_FILES ('base/1/1245/32683',
> + * 'base/1/1245/32683', ...) [options]
> + */
>
> Please update these comments as we fetch one file at a time.
>
> 7.
> +backup_file:
> + SCONST { $$ = (Node *)
> makeString($1); }
> + ;
> +
>
> Instead of having this rule with only one constant terminal, we can use
> SCONST directly in backup_files_list. However, I don't see any issue with
> this approach either, just trying to reduce the rules.
>
> 8.
> Please indent code within 80 char limit at all applicable places.
>
> 9.
> Please fix following typos:
>
> identifing => identifying
> optionaly => optionally
> structre => structure
> progrsss => progress
> Retrive => Retrieve
> direcotries => directories
>
>
> =====
>
> The other mail thread related to backup manifest [1], is creating a
> backup_manifest file and sends that to the client which has optional
> checksum and other details including filename, file size, mtime, etc.
> There is a patch on the same thread which is then validating the backup
> too.
>
> Since this patch too gets a file list from the server and has similar
> details (except checksum), can somehow parallel backup use the
> backup-manifest
> infrastructure from that patch?
>

This was discussed earlier in the thread, and as Robert suggested, it would
complicate the
code to no real benefit.

> When the parallel backup is in use, will there be a backup_manifest file
> created too? I am just visualizing what will be the scenario when both
> these
> features are checked-in.
>

Yes, I think it should. Since the full backup will have a manifest file,
there is no
reason for parallel backup to not support it.

I'll share the updated patch in the next couple of days.

--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Asif Rehman 2019-12-10 14:34:14 Re: WIP/PoC for parallel backup
Previous Message Jeremy Finzel 2019-12-10 14:25:18 Re: Index corruption / planner issue with one table in my pg 11.6 instance