Re: WIP/PoC for parallel backup

From: Asif Rehman <asifr(dot)rehman(at)gmail(dot)com>
To: Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP/PoC for parallel backup
Date: 2019-10-24 11:24:41
Message-ID: CADM=JejGWE28Ok=eAdV2r06uKnPXu1BUo1QYhgJR1rTJ7Dxjiw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Oct 24, 2019 at 3:21 PM Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com> wrote:

>
>
> On Fri, Oct 18, 2019 at 4:12 PM Jeevan Chalke <
> jeevan(dot)chalke(at)enterprisedb(dot)com> wrote:
>
>>
>>
>> On Thu, Oct 17, 2019 at 10:51 AM Asif Rehman <asifr(dot)rehman(at)gmail(dot)com>
>> wrote:
>>
>>>
>>> Attached are the updated patches.
>>>
>>
>> I had a quick look over these changes and they look good overall.
>> However, here are my few review comments I caught while glancing the
>> patches
>> 0002 and 0003.
>>
>>
>> --- 0002 patch
>>
>> 1.
>> Can lsn option be renamed to start-wal-location? This will be more clear
>> too.
>>
>> 2.
>> +typedef struct
>> +{
>> + char name[MAXPGPATH];
>> + char type;
>> + int32 size;
>> + time_t mtime;
>> +} BackupFile;
>>
>> I think it will be good if we keep this structure in a common place so
>> that
>> the client can also use it.
>>
>> 3.
>> + SEND_FILE_LIST,
>> + SEND_FILES_CONTENT,
>> Can above two commands renamed to SEND_BACKUP_MANIFEST and
>> SEND_BACKUP_FILE
>> respectively?
>> The reason behind the first name change is, we are not getting only file
>> lists
>> here instead we are getting a few more details with that too. And for
>> others,
>> it will be inline with START_BACKUP/STOP_BACKUP/SEND_BACKUP_MANIFEST.
>>
>> 4.
>> Typos:
>> non-exlusive => non-exclusive
>> retured => returned
>> optionaly => optionally
>> nessery => necessary
>> totoal => total
>>
>>
>> --- 0003 patch
>>
>> 1.
>> +static int
>> +simple_list_length(SimpleStringList *list)
>> +{
>> + int len = 0;
>> + SimpleStringListCell *cell;
>> +
>> + for (cell = list->head; cell; cell = cell->next, len++)
>> + ;
>> +
>> + return len;
>> +}
>>
>> I think it will be good if it goes to simple_list.c. That will help in
>> other
>> usages as well.
>>
>> 2.
>> Please revert these unnecessary changes:
>>
>> @@ -1475,6 +1575,7 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult
>> *res, int rownum)
>> */
>> snprintf(filename, sizeof(filename), "%s/%s", current_path,
>> copybuf);
>> +
>> if (filename[strlen(filename) - 1] == '/')
>> {
>> /*
>>
>> @@ -1528,8 +1622,8 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult
>> *res, int rownum)
>> * can map them too.)
>> */
>> filename[strlen(filename) - 1] = '\0'; /* Remove
>> trailing slash */
>> -
>> mapped_tblspc_path =
>> get_tablespace_mapping(&copybuf[157]);
>> +
>> if (symlink(mapped_tblspc_path, filename) != 0)
>> {
>> pg_log_error("could not create symbolic link
>> from \"%s\" to \"%s\": %m",
>>
>> 3.
>> Typos:
>> retrive => retrieve
>> takecare => take care
>> tablespae => tablespace
>>
>> 4.
>> ParallelBackupEnd() function does not do anything for parallelism. Will
>> it be
>> better to just rename it as EndBackup()?
>>
>> 5.
>> To pass a tablespace path to the server in SEND_FILES_CONTENT, you are
>> reusing
>> a LABEL option, that seems odd. How about adding a new option for that?
>>
>> 6.
>> It will be good if we have some comments explaining what the function is
>> actually doing in its prologue. For functions like:
>> GetBackupFilesList()
>> ReceiveFiles()
>> create_workers_and_fetch()
>>
>>
>> Thanks
>>
>>
>>>
>>> Thanks,
>>>
>>> --
>>> Asif Rehman
>>> Highgo Software (Canada/China/Pakistan)
>>> URL : www.highgo.ca
>>>
>>>
>>
>> --
>> Jeevan Chalke
>> Associate Database Architect & Team Lead, Product Development
>> EnterpriseDB Corporation
>> The Enterprise PostgreSQL Company
>>
>>
> I had a detailed discussion with Robert Haas at PostgreConf Europe about
> parallel backup.
> We discussed the current state of the patch and what needs to be done to
> get the patch committed.
>
> - The current patch uses a process to implement parallelism. There are many
> reasons we need to use threads instead of processes. To start with, as
> this is a client utility it makes
> more sense to use threads. The data needs to be shared amongst different
> threads and the main process,
> handling that is simpler as compared to interprocess communication.
>

Yes I agree. I have already converted the code to use threads instead of
processes. This avoids the overhead
of interprocess communication.

With a single file fetching strategy, this requires communication between
competing threads/processes. To handle
that in a multiprocess application, it requires IPC. The current approach
of multiple threads instead of processes
avoids this overhead.

> - Fetching a single file or multiple files was also discussed. We
> concluded in our discussion that we
> need to benchmark to see if disk I/O is a bottleneck or not and if
> parallel writing gives us
> any benefit. This benchmark needs to be done on different hardware and
> different
> network to identify which are the real bottlenecks. In general, we agreed
> that we could start with fetching
> one file at a time but that will be revisited after the benchmarks are
> done.
>

I'll share the updated patch in the next couple of days. After that, I'll
work on benchmarking that in
different environments that I have.

>
> - There is also an ongoing debate in this thread that we should have one
> single tar file for all files or one
> TAR file per thread. I really want to have a single tar file because the
> main purpose of the TAR file is to
> reduce the management of multiple files, but in case of one file per
> thread, we end up with many tar
> files. Therefore we need to have one master thread which is responsible
> for writing on tar file and all
> the other threads will receive the data from the network and stream to the
> master thread. This also
> supports the idea of using a thread-based model rather than a
> process-based approach because it
> requires too much data sharing between processes. If we cannot achieve
> this, then we can disable the
> TAR option for parallel backup in the first version.
>

I am in favour of disabling the tar format for the first version of
parallel backup.

> - In the case of data sharing, we need to try to avoid unnecessary locking
> and more suitable algorithm to
> solve the reader-writer problem is required.
>
> --
> Ibrar Ahmed
>

--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2019-10-24 12:08:43 Re: Fix of fake unlogged LSN initialization
Previous Message Skjalg A. Skagen 2019-10-24 11:06:01 PostgreSQL 12 installation fails because locale name contained non-english characters