Re: WIP/PoC for parallel backup

From: Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Asif Rehman <asifr(dot)rehman(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP/PoC for parallel backup
Date: 2019-10-24 10:19:08
Message-ID: CALtqXTd9sdN70nXqBF9WYdjSPts98Fu9NMxY4cgOgKWrwLKD4w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Oct 18, 2019 at 4:12 PM Jeevan Chalke <
jeevan(dot)chalke(at)enterprisedb(dot)com> wrote:

>
>
> On Thu, Oct 17, 2019 at 10:51 AM Asif Rehman <asifr(dot)rehman(at)gmail(dot)com>
> wrote:
>
>>
>> Attached are the updated patches.
>>
>
> I had a quick look over these changes and they look good overall.
> However, here are my few review comments I caught while glancing the
> patches
> 0002 and 0003.
>
>
> --- 0002 patch
>
> 1.
> Can lsn option be renamed to start-wal-location? This will be more clear
> too.
>
> 2.
> +typedef struct
> +{
> + char name[MAXPGPATH];
> + char type;
> + int32 size;
> + time_t mtime;
> +} BackupFile;
>
> I think it will be good if we keep this structure in a common place so that
> the client can also use it.
>
> 3.
> + SEND_FILE_LIST,
> + SEND_FILES_CONTENT,
> Can above two commands renamed to SEND_BACKUP_MANIFEST and SEND_BACKUP_FILE
> respectively?
> The reason behind the first name change is, we are not getting only file
> lists
> here instead we are getting a few more details with that too. And for
> others,
> it will be inline with START_BACKUP/STOP_BACKUP/SEND_BACKUP_MANIFEST.
>
> 4.
> Typos:
> non-exlusive => non-exclusive
> retured => returned
> optionaly => optionally
> nessery => necessary
> totoal => total
>
>
> --- 0003 patch
>
> 1.
> +static int
> +simple_list_length(SimpleStringList *list)
> +{
> + int len = 0;
> + SimpleStringListCell *cell;
> +
> + for (cell = list->head; cell; cell = cell->next, len++)
> + ;
> +
> + return len;
> +}
>
> I think it will be good if it goes to simple_list.c. That will help in
> other
> usages as well.
>
> 2.
> Please revert these unnecessary changes:
>
> @@ -1475,6 +1575,7 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res,
> int rownum)
> */
> snprintf(filename, sizeof(filename), "%s/%s", current_path,
> copybuf);
> +
> if (filename[strlen(filename) - 1] == '/')
> {
> /*
>
> @@ -1528,8 +1622,8 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res,
> int rownum)
> * can map them too.)
> */
> filename[strlen(filename) - 1] = '\0'; /* Remove
> trailing slash */
> -
> mapped_tblspc_path =
> get_tablespace_mapping(&copybuf[157]);
> +
> if (symlink(mapped_tblspc_path, filename) != 0)
> {
> pg_log_error("could not create symbolic link from
> \"%s\" to \"%s\": %m",
>
> 3.
> Typos:
> retrive => retrieve
> takecare => take care
> tablespae => tablespace
>
> 4.
> ParallelBackupEnd() function does not do anything for parallelism. Will it
> be
> better to just rename it as EndBackup()?
>
> 5.
> To pass a tablespace path to the server in SEND_FILES_CONTENT, you are
> reusing
> a LABEL option, that seems odd. How about adding a new option for that?
>
> 6.
> It will be good if we have some comments explaining what the function is
> actually doing in its prologue. For functions like:
> GetBackupFilesList()
> ReceiveFiles()
> create_workers_and_fetch()
>
>
> Thanks
>
>
>>
>> Thanks,
>>
>> --
>> Asif Rehman
>> Highgo Software (Canada/China/Pakistan)
>> URL : www.highgo.ca
>>
>>
>
> --
> Jeevan Chalke
> Associate Database Architect & Team Lead, Product Development
> EnterpriseDB Corporation
> The Enterprise PostgreSQL Company
>
>
I had a detailed discussion with Robert Haas at PostgreConf Europe about
parallel backup.
We discussed the current state of the patch and what needs to be done to
get the patch committed.

- The current patch uses a process to implement parallelism. There are many
reasons we need to use threads instead of processes. To start with, as this
is a client utility it makes
more sense to use threads. The data needs to be shared amongst different
threads and the main process,
handling that is simpler as compared to interprocess communication.

- Fetching a single file or multiple files was also discussed. We concluded
in our discussion that we
need to benchmark to see if disk I/O is a bottleneck or not and if parallel
writing gives us
any benefit. This benchmark needs to be done on different hardware and
different
network to identify which are the real bottlenecks. In general, we agreed
that we could start with fetching
one file at a time but that will be revisited after the benchmarks are done.

- There is also an ongoing debate in this thread that we should have one
single tar file for all files or one
TAR file per thread. I really want to have a single tar file because the
main purpose of the TAR file is to
reduce the management of multiple files, but in case of one file per
thread, we end up with many tar
files. Therefore we need to have one master thread which is responsible for
writing on tar file and all
the other threads will receive the data from the network and stream to the
master thread. This also
supports the idea of using a thread-based model rather than a process-based
approach because it
requires too much data sharing between processes. If we cannot achieve
this, then we can disable the
TAR option for parallel backup in the first version.

- In the case of data sharing, we need to try to avoid unnecessary locking
and more suitable algorithm to
solve the reader-writer problem is required.

--
Ibrar Ahmed

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2019-10-24 10:51:37 Re: [HACKERS] Block level parallel vacuum
Previous Message Amit Langote 2019-10-24 10:01:50 Re: v12.0: ERROR: could not find pathkey item to sort