Re: WIP/PoC for parallel backup

From: Asif Rehman <asifr(dot)rehman(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP/PoC for parallel backup
Date: 2019-11-01 15:26:02
Message-ID: CADM=Jejj10qjvMs4aviFck+Qq2wwXedoFOm9unD-z1q7Dxzffg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 30, 2019 at 7:16 PM Asif Rehman <asifr(dot)rehman(at)gmail(dot)com> wrote:

>
>
> On Mon, Oct 28, 2019 at 8:29 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
>> On Mon, Oct 28, 2019 at 10:03 AM Asif Rehman <asifr(dot)rehman(at)gmail(dot)com>
>> wrote:
>> > I have updated the patch to include the changes suggested by Jeevan.
>> This patch also implements the thread workers instead of
>> > processes and fetches a single file at a time. The tar format has been
>> disabled for first version of parallel backup.
>>
>> Looking at 0001-0003:
>>
>> It's not clear to me what the purpose of the start WAL location is
>> supposed to be. As far as I can see, SendBackupFiles() stores it in a
>> variable which is then used for exactly nothing, and nothing else uses
>> it. It seems like that would be part of a potential incremental
>> backup feature, but I don't see what it's got to do with parallel full
>> backup.
>>
>
> 'startptr' is used by sendFile() during checksum verification. Since
> SendBackupFiles() is using sendFIle we have to set a valid WAL location.
>
>
>> The tablespace_path option appears entirely unused, and I don't know
>> why that should be necessary here, either.
>>
>
> This is to calculate the basepathlen. We need to exclude the tablespace
> location (or
> base path) from the filename before it is sent to the client with sendFile
> call. I added
> this option primarily to avoid performing string manipulation on filename
> to extract the
> tablespace location and then calculate the basepathlen.
>
> Alternatively we can do it by extracting the base path from the received
> filename. What
> do you suggest?
>
>
>>
>> STORE_BACKUPFILE() seems like maybe it should be a function rather
>> than a macro, and also probably be renamed, because it doesn't store
>> files and the argument's not necessarily a file.
>>
> Sure.
>
>
>>
>> SendBackupManifest() does not send a backup manifest in the sense
>> contemplated by the email thread on that subject. It sends a file
>> list. That seems like the right idea - IMHO, anyway - but you need to
>> do a thorough renaming.
>>
>
> I'm considering the following command names:
> START_BACKUP
> - Starts the backup process
>
> SEND_BACKUP_FILELIST (Instead of SEND_BACKUP_MANIFEST)
> - Sends the list of all files (along with file information such as
> filename, file type (directory/file/link),
> file size and file mtime for each file) to be backed up.
>
> SEND_BACKUP_FILES
> - Sends one or more files to the client.
>
> STOP_BACKUP
> - Stops the backup process.
>
> I'll update the function names accordingly after your confirmation. Of
> course, suggestions for
> better names are welcome.
>
>
>>
>> I think it would be fine to decide that this facility won't support
>> exclusive-mode backup.
>>
>
> Sure. Will drop this patch.
>
>
>>
>> I don't think much of having both sendDir() and sendDir_(). The latter
>> name is inconsistent with any naming convention we have, and there
>> seems to be no reason not to just add an argument to sendDir() and
>> change the callers.
>
>
>> I think we should rename - perhaps as a preparatory patch - the
>> sizeonly flag to dryrun, or something like that.
>>
>
> Sure, will take care of it.
>
>
>> The resource cleanup does not look right. You've included calls to
>> PG_ENSURE_ERROR_CLEANUP(base_backup_cleanup, 0) in both StartBackup()
>> and StopBackup(), but what happens if there is an error or even a
>> clean shutdown of the connection in between? I think that there needs
>
> to be some change here to ensure that a walsender will always call
>> base_backup_cleanup() when it exits; I think that'd probably remove
>> the need for any PG_ENSURE_ERROR_CLEANUP calls at all, including ones
>> we have already. This might also be something that could be done as a
>> separate, prepatory refactoring patch.
>>
>
> You're right. I didn't handle this case properly. I will removed
> PG_ENSURE_ERROR_CLEANUP
> calls and replace it with before_shmem_exit handler. This way
> whenever backend process exits,
> base_backup_cleanup will be called:
> - If it exists before calling the do_pg_stop_backup, base_backup_cleanup
> will take care of cleanup.
> - otherwise in case of a clean shutdown (after calling do_pg_stop_backup)
> then base_backup_cleanup
> will simply return without doing anything.
>
>
>
The updated patches are attached.

Thanks,

--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Attachment Content-Type Size
0002-Rename-sizeonly-to-dryrun-for-few-functions-in-baseb_v4.patch application/octet-stream 7.4 KB
0001-remove-PG_ENSURE_ERROR_CLEANUP-macro-from-basebackup_v4.patch application/octet-stream 8.0 KB
0005-pg_basebackup-changes-for-parallel-backup_v4.patch application/octet-stream 24.5 KB
0004-backend-changes-for-parallel-backup_v4.patch application/octet-stream 25.3 KB
0003-Refactor-some-basebackup-code-to-increase-reusabilit_v4.patch application/octet-stream 23.4 KB
0006-parallel-backup-testcase_v4.patch application/octet-stream 18.0 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2019-11-01 15:26:40 Re: [Proposal] Global temporary tables
Previous Message Robert Haas 2019-11-01 15:23:20 Re: MarkBufferDirtyHint() and LSN update