Re: WIP/PoC for parallel backup

From: Asif Rehman <asifr(dot)rehman(at)gmail(dot)com>
To: Kashif Zeeshan <kashif(dot)zeeshan(at)enterprisedb(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP/PoC for parallel backup
Date: 2020-04-14 12:33:16
Message-ID: CADM=Jeigv1dByGPhQRqPM-nzPDiFbmd5BBthZn3-xzeitb9qOQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Apr 8, 2020 at 6:53 PM Kashif Zeeshan <
kashif(dot)zeeshan(at)enterprisedb(dot)com> wrote:

>
>
> On Tue, Apr 7, 2020 at 9:44 PM Asif Rehman <asifr(dot)rehman(at)gmail(dot)com> wrote:
>
>> Hi,
>>
>> Thanks, Kashif and Rajkumar. I have fixed the reported issues.
>>
>> I have added the shared state as previously described. The new grammar
>> changes
>> are as follows:
>>
>> START_BACKUP [LABEL '<label>'] [FAST] [MAX_RATE %d]
>> - This will generate a unique backupid using pg_strong_random(16) and
>> hex-encoded
>> it. which is then returned as the result set.
>> - It will also create a shared state and add it to the hashtable. The
>> hash table size is set
>> to BACKUP_HASH_SIZE=10, but since hashtable can expand dynamically,
>> I think it's
>> sufficient initial size. max_wal_senders is not used, because it
>> can be set to quite a
>> large values.
>>
>> JOIN_BACKUP 'backup_id'
>> - finds 'backup_id' in hashtable and attaches it to server process.
>>
>>
>> SEND_FILE '(' 'FILE' ')' [NOVERIFY_CHECKSUMS]
>> - renamed SEND_FILES to SEND_FILE
>> - removed START_WAL_LOCATION from this because 'startptr' is now
>> accessible through
>> shared state.
>>
>> There is no change in other commands:
>> STOP_BACKUP [NOWAIT]
>> LIST_TABLESPACES [PROGRESS]
>> LIST_FILES [TABLESPACE]
>> LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
>>
>> The current patches (v11) have been rebased to the latest master. The
>> backup manifest is enabled
>> by default, so I have disabled it for parallel backup mode and have
>> generated a warning so that
>> user is aware of it and not expect it in the backup.
>>
>> Hi Asif
>
> I have verified the bug fixes, one bug is fixed and working now as
> expected
>
> For the verification of the other bug fixes faced following issues, please
> have a look.
>
>
> 1) Following bug fixes mentioned below are generating segmentation fault.
>
> Please note for reference I have added a description only as steps were
> given in previous emails of each bug I tried to verify the fix. Backtrace
> is also added with each case which points to one bug for both the cases.
>
> a) The backup failed with errors "error: could not connect to server:
> could not look up local user ID 1000: Too many open files" when the
> max_wal_senders was set to 2000.
>
>
> [edb(at)localhost bin]$ ./pg_basebackup -v -j 1990 -D
> /home/edb/Desktop/backup/
> pg_basebackup: warning: backup manifest is disabled in parallel backup mode
> pg_basebackup: initiating base backup, waiting for checkpoint to complete
> pg_basebackup: checkpoint completed
> pg_basebackup: write-ahead log start point: 0/2000028 on timeline 1
> pg_basebackup: starting background WAL receiver
> pg_basebackup: created temporary replication slot "pg_basebackup_9925"
> pg_basebackup: backup worker (0) created
> pg_basebackup: backup worker (1) created
> pg_basebackup: backup worker (2) created
> pg_basebackup: backup worker (3) created
> ….
> ….
> pg_basebackup: backup worker (1014) created
> pg_basebackup: backup worker (1015) created
> pg_basebackup: backup worker (1016) created
> pg_basebackup: backup worker (1017) created
> pg_basebackup: error: could not connect to server: could not look up local
> user ID 1000: Too many open files
> Segmentation fault
> [edb(at)localhost bin]$
>
>
> [edb(at)localhost bin]$
> [edb(at)localhost bin]$ gdb pg_basebackup
> /tmp/cores/core.pg_basebackup.13219.localhost.localdomain.1586349551
> GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
> Copyright (C) 2013 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <
> http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law. Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-redhat-linux-gnu".
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>...
> Reading symbols from
> /home/edb/Communtiy_Parallel_backup/postgresql/inst/bin/pg_basebackup...done.
> [New LWP 13219]
> [New LWP 13222]
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> Core was generated by `./pg_basebackup -v -j 1990 -D
> /home/edb/Desktop/backup/'.
> Program terminated with signal 11, Segmentation fault.
> #0 pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
> 47 if (INVALID_NOT_TERMINATED_TD_P (pd))
> (gdb) bt
> #0 pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
> #1 0x000000000040904a in cleanup_workers () at pg_basebackup.c:2978
> #2 0x0000000000403806 in disconnect_atexit () at pg_basebackup.c:332
> #3 0x00007f2226f76a49 in __run_exit_handlers (status=1,
> listp=0x7f22272f86c8 <__exit_funcs>, run_list_atexit=run_list_atexit(at)entry=true)
> at exit.c:77
> #4 0x00007f2226f76a95 in __GI_exit (status=<optimized out>) at exit.c:99
> #5 0x0000000000408c54 in create_parallel_workers (backupinfo=0x952ca0) at
> pg_basebackup.c:2811
> #6 0x000000000040798f in BaseBackup () at pg_basebackup.c:2211
> #7 0x0000000000408b4d in main (argc=6, argv=0x7ffe3dabc718) at
> pg_basebackup.c:2765
> (gdb)
>
>
>
>
> b) When executing two backups at the same time, getting FATAL error due to
> max_wal_senders and instead of exit Backup got completed.
>
> [edb(at)localhost bin]$
> [edb(at)localhost bin]$
> [edb(at)localhost bin]$ ./pg_basebackup -v -j 8 -D
> /home/edb/Desktop/backup1/
> pg_basebackup: warning: backup manifest is disabled in parallel backup mode
> pg_basebackup: initiating base backup, waiting for checkpoint to complete
> pg_basebackup: checkpoint completed
> pg_basebackup: write-ahead log start point: 1/DA000028 on timeline 1
> pg_basebackup: starting background WAL receiver
> pg_basebackup: created temporary replication slot "pg_basebackup_17066"
> pg_basebackup: backup worker (0) created
> pg_basebackup: backup worker (1) created
> pg_basebackup: backup worker (2) created
> pg_basebackup: backup worker (3) created
> pg_basebackup: backup worker (4) created
> pg_basebackup: backup worker (5) created
> pg_basebackup: backup worker (6) created
> pg_basebackup: error: could not connect to server: FATAL: number of
> requested standby connections exceeds max_wal_senders (currently 10)
> Segmentation fault (core dumped)
> [edb(at)localhost bin]$
> [edb(at)localhost bin]$
> [edb(at)localhost bin]$ gdb pg_basebackup
> /tmp/cores/core.pg_basebackup.17041.localhost.localdomain.1586353696
> GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
> Copyright (C) 2013 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <
> http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law. Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-redhat-linux-gnu".
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>...
> Reading symbols from
> /home/edb/Communtiy_Parallel_backup/postgresql/inst/bin/pg_basebackup...done.
> [New LWP 17041]
> [New LWP 17067]
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> Core was generated by `./pg_basebackup -v -j 8 -D
> /home/edb/Desktop/backup1/'.
> Program terminated with signal 11, Segmentation fault.
> #0 pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
> 47 if (INVALID_NOT_TERMINATED_TD_P (pd))
> (gdb) bt
> #0 pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
> #1 0x000000000040904a in cleanup_workers () at pg_basebackup.c:2978
> #2 0x0000000000403806 in disconnect_atexit () at pg_basebackup.c:332
> #3 0x00007f051edc1a49 in __run_exit_handlers (status=1,
> listp=0x7f051f1436c8 <__exit_funcs>, run_list_atexit=run_list_atexit(at)entry=true)
> at exit.c:77
> #4 0x00007f051edc1a95 in __GI_exit (status=<optimized out>) at exit.c:99
> #5 0x0000000000408c54 in create_parallel_workers (backupinfo=0x1c6dca0)
> at pg_basebackup.c:2811
> #6 0x000000000040798f in BaseBackup () at pg_basebackup.c:2211
> #7 0x0000000000408b4d in main (argc=6, argv=0x7ffdb76a6d68) at
> pg_basebackup.c:2765
> (gdb)
>
>
>
>
> 2) The following bug is not fixed yet
>
> A similar case is when DB Server is shut down while the Parallel Backup is
> in progress then the correct error is displayed but then the backup folder
> is not cleaned and leaves a corrupt backup.
>
> [edb(at)localhost bin]$
> [edb(at)localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/ -j 8
> pg_basebackup: warning: backup manifest is disabled in parallel backup mode
> pg_basebackup: initiating base backup, waiting for checkpoint to complete
> pg_basebackup: checkpoint completed
> pg_basebackup: write-ahead log start point: 0/A0000028 on timeline 1
> pg_basebackup: starting background WAL receiver
> pg_basebackup: created temporary replication slot "pg_basebackup_16235"
> pg_basebackup: backup worker (0) created
> pg_basebackup: backup worker (1) created
> pg_basebackup: backup worker (2) created
> pg_basebackup: backup worker (3) created
> pg_basebackup: backup worker (4) created
> pg_basebackup: backup worker (5) created
> pg_basebackup: backup worker (6) created
> pg_basebackup: backup worker (7) created
> pg_basebackup: error: could not read COPY data: server closed the
> connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> pg_basebackup: error: could not read COPY data: server closed the
> connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> pg_basebackup: removing contents of data directory
> "/home/edb/Desktop/backup/"
> pg_basebackup: error: could not read COPY data: server closed the
> connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> [edb(at)localhost bin]$
> [edb(at)localhost bin]$
> [edb(at)localhost bin]$
>
>
>
> [edb(at)localhost bin]$
> [edb(at)localhost bin]$ ls /home/edb/Desktop/backup
> base pg_hba.conf pg_logical pg_notify pg_serial
> pg_stat pg_subtrans pg_twophase pg_xact postgresql.conf
> pg_dynshmem pg_ident.conf pg_multixact pg_replslot pg_snapshots
> pg_stat_tmp pg_tblspc PG_VERSION postgresql.auto.conf
> [edb(at)localhost bin]$
> [edb(at)localhost bin]$
>
>
>
>
> Thanks
> Kashif Zeeshan
>
>>
>>
>> On Tue, Apr 7, 2020 at 4:03 PM Kashif Zeeshan <
>> kashif(dot)zeeshan(at)enterprisedb(dot)com> wrote:
>>
>>>
>>>
>>> On Fri, Apr 3, 2020 at 3:01 PM Kashif Zeeshan <
>>> kashif(dot)zeeshan(at)enterprisedb(dot)com> wrote:
>>>
>>>> Hi Asif
>>>>
>>>> When a non-existent slot is used with tablespace then correct error is
>>>> displayed but then the backup folder is not cleaned and leaves a corrupt
>>>> backup.
>>>>
>>>> Steps
>>>> =======
>>>>
>>>> edb(at)localhost bin]$
>>>> [edb(at)localhost bin]$ mkdir /home/edb/tbl1
>>>> [edb(at)localhost bin]$ mkdir /home/edb/tbl_res
>>>> [edb(at)localhost bin]$
>>>> postgres=# create tablespace tbl1 location '/home/edb/tbl1';
>>>> CREATE TABLESPACE
>>>> postgres=#
>>>> postgres=# create table t1 (a int) tablespace tbl1;
>>>> CREATE TABLE
>>>> postgres=# insert into t1 values(100);
>>>> INSERT 0 1
>>>> postgres=# insert into t1 values(200);
>>>> INSERT 0 1
>>>> postgres=# insert into t1 values(300);
>>>> INSERT 0 1
>>>> postgres=#
>>>>
>>>>
>>>> [edb(at)localhost bin]$
>>>> [edb(at)localhost bin]$ ./pg_basebackup -v -j 2 -D
>>>> /home/edb/Desktop/backup/ -T /home/edb/tbl1=/home/edb/tbl_res -S test
>>>> pg_basebackup: initiating base backup, waiting for checkpoint to
>>>> complete
>>>> pg_basebackup: checkpoint completed
>>>> pg_basebackup: write-ahead log start point: 0/2E000028 on timeline 1
>>>> pg_basebackup: starting background WAL receiver
>>>> pg_basebackup: error: could not send replication command
>>>> "START_REPLICATION": ERROR: replication slot "test" does not exist
>>>> pg_basebackup: backup worker (0) created
>>>> pg_basebackup: backup worker (1) created
>>>> pg_basebackup: write-ahead log end point: 0/2E000100
>>>> pg_basebackup: waiting for background process to finish streaming ...
>>>> pg_basebackup: error: child thread exited with error 1
>>>> [edb(at)localhost bin]$
>>>>
>>>> backup folder not cleaned
>>>>
>>>> [edb(at)localhost bin]$
>>>> [edb(at)localhost bin]$
>>>> [edb(at)localhost bin]$
>>>> [edb(at)localhost bin]$ ls /home/edb/Desktop/backup
>>>> backup_label global pg_dynshmem pg_ident.conf pg_multixact
>>>> pg_replslot pg_snapshots pg_stat_tmp pg_tblspc PG_VERSION pg_xact
>>>> postgresql.conf
>>>> base pg_commit_ts pg_hba.conf pg_logical pg_notify
>>>> pg_serial pg_stat pg_subtrans pg_twophase pg_wal
>>>> postgresql.auto.conf
>>>> [edb(at)localhost bin]$
>>>>
>>>>
>>>>
>>>>
>>>> If the same case is executed without the parallel backup patch then the
>>>> backup folder is cleaned after the error is displayed.
>>>>
>>>> [edb(at)localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/
>>>> -T /home/edb/tbl1=/home/edb/tbl_res -S test999
>>>> pg_basebackup: initiating base backup, waiting for checkpoint to
>>>> complete
>>>> pg_basebackup: checkpoint completed
>>>> pg_basebackup: write-ahead log start point: 0/2B000028 on timeline 1
>>>> pg_basebackup: starting background WAL receiver
>>>> pg_basebackup: error: could not send replication command
>>>> "START_REPLICATION": ERROR: replication slot "test999" does not exist
>>>> pg_basebackup: write-ahead log end point: 0/2B000100
>>>> pg_basebackup: waiting for background process to finish streaming ...
>>>> pg_basebackup: error: child process exited with exit code 1
>>>> *pg_basebackup: removing data directory " /home/edb/Desktop/backup"*
>>>> pg_basebackup: changes to tablespace directories will not be undone
>>>>
>>>
>>>
>>> Hi Asif
>>>
>>> A similar case is when DB Server is shut down while the Parallel Backup
>>> is in progress then the correct error is displayed but then the backup
>>> folder is not cleaned and leaves a corrupt backup. I think one bug fix will
>>> solve all these cases where clean up is not done when parallel backup is
>>> failed.
>>>
>>> [edb(at)localhost bin]$
>>> [edb(at)localhost bin]$
>>> [edb(at)localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/
>>> -j 8
>>> pg_basebackup: initiating base backup, waiting for checkpoint to complete
>>> pg_basebackup: checkpoint completed
>>> pg_basebackup: write-ahead log start point: 0/C1000028 on timeline 1
>>> pg_basebackup: starting background WAL receiver
>>> pg_basebackup: created temporary replication slot "pg_basebackup_57337"
>>> pg_basebackup: backup worker (0) created
>>> pg_basebackup: backup worker (1) created
>>> pg_basebackup: backup worker (2) created
>>> pg_basebackup: backup worker (3) created
>>> pg_basebackup: backup worker (4) created
>>> pg_basebackup: backup worker (5) created
>>> pg_basebackup: backup worker (6) created
>>> pg_basebackup: backup worker (7) created
>>> pg_basebackup: error: could not read COPY data: server closed the
>>> connection unexpectedly
>>> This probably means the server terminated abnormally
>>> before or while processing the request.
>>> pg_basebackup: error: could not read COPY data: server closed the
>>> connection unexpectedly
>>> This probably means the server terminated abnormally
>>> before or while processing the request.
>>> [edb(at)localhost bin]$
>>> [edb(at)localhost bin]$
>>>
>>> Same case when executed on pg_basebackup without the Parallel backup
>>> patch then proper clean up is done.
>>>
>>> [edb(at)localhost bin]$
>>> [edb(at)localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/
>>> pg_basebackup: initiating base backup, waiting for checkpoint to complete
>>> pg_basebackup: checkpoint completed
>>> pg_basebackup: write-ahead log start point: 0/C5000028 on timeline 1
>>> pg_basebackup: starting background WAL receiver
>>> pg_basebackup: created temporary replication slot "pg_basebackup_5590"
>>> pg_basebackup: error: could not read COPY data: server closed the
>>> connection unexpectedly
>>> This probably means the server terminated abnormally
>>> before or while processing the request.
>>> pg_basebackup: removing contents of data directory
>>> "/home/edb/Desktop/backup/"
>>> [edb(at)localhost bin]$
>>>
>>> Thanks
>>>
>>>
>>>>
>>>> On Fri, Apr 3, 2020 at 1:46 PM Asif Rehman <asifr(dot)rehman(at)gmail(dot)com>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Thu, Apr 2, 2020 at 8:45 PM Robert Haas <robertmhaas(at)gmail(dot)com>
>>>>> wrote:
>>>>>
>>>>>> On Thu, Apr 2, 2020 at 11:17 AM Asif Rehman <asifr(dot)rehman(at)gmail(dot)com>
>>>>>> wrote:
>>>>>> >> Why would you need to do that? As long as the process where
>>>>>> >> STOP_BACKUP can do the check, that seems good enough.
>>>>>> >
>>>>>> > Yes, but the user will get the error only after the STOP_BACKUP,
>>>>>> not while the backup is
>>>>>> > in progress. So if the backup is a large one, early error detection
>>>>>> would be much beneficial.
>>>>>> > This is the current behavior of non-parallel backup as well.
>>>>>>
>>>>>> Because non-parallel backup does not feature early detection of this
>>>>>> error, it is not necessary to make parallel backup do so. Indeed, it
>>>>>> is undesirable. If you want to fix that problem, do it on a separate
>>>>>> thread in a separate patch. A patch proposing to make parallel backup
>>>>>> inconsistent in behavior with non-parallel backup will be rejected, at
>>>>>> least if I have anything to say about it.
>>>>>>
>>>>>> TBH, fixing this doesn't seem like an urgent problem to me. The
>>>>>> current situation is not great, but promotions ought to be relatively
>>>>>> infrequent, so I'm not sure it's a huge problem in practice. It is
>>>>>> also worth considering whether the right fix is to figure out how to
>>>>>> make that case actually work, rather than just making it fail quicker.
>>>>>> I don't currently understand the reason for the prohibition so I can't
>>>>>> express an intelligent opinion on what the right answer is here, but
>>>>>> it seems like it ought to be investigated before somebody goes and
>>>>>> builds a bunch of infrastructure to make the error more timely.
>>>>>>
>>>>>
>>>>> Non-parallel backup already does the early error checking. I only
>>>>> intended
>>>>>
>>>>> to make parallel behave the same as non-parallel here. So, I agree with
>>>>>
>>>>> you that the behavior of parallel backup should be consistent with the
>>>>>
>>>>> non-parallel one. Please see the code snippet below from
>>>>>
>>>>> basebackup.c:sendDir()
>>>>>
>>>>>
>>>>> /*
>>>>>>
>>>>>> * Check if the postmaster has signaled us to exit, and abort with an
>>>>>>
>>>>>> * error in that case. The error handler further up will call
>>>>>>
>>>>>> * do_pg_abort_backup() for us. Also check that if the backup was
>>>>>>
>>>>>> * started while still in recovery, the server wasn't promoted.
>>>>>>
>>>>>> * do_pg_stop_backup() will check that too, but it's better to stop
>>>>>>
>>>>>> * the backup early than continue to the end and fail there.
>>>>>>
>>>>>> */
>>>>>>
>>>>>> CHECK_FOR_INTERRUPTS();
>>>>>>
>>>>>> *if* (RecoveryInProgress() != backup_started_in_recovery)
>>>>>>
>>>>>> ereport(ERROR,
>>>>>>
>>>>>> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>>>>>>
>>>>>> errmsg("the standby was promoted during online backup"),
>>>>>>
>>>>>> errhint("This means that the backup being taken is corrupt "
>>>>>>
>>>>>> "and should not be used. "
>>>>>>
>>>>>> "Try taking another online backup.")));
>>>>>>
>>>>>>
>>>>>> > Okay, then I will add the shared state. And since we are adding the
>>>>>> shared state, we can use
>>>>>> > that for throttling, progress-reporting and standby early error
>>>>>> checking.
>>>>>>
>>>>>> Please propose a grammar here for all the new replication commands you
>>>>>> plan to add before going and implement everything. That will make it
>>>>>> easier to hash out the design without forcing you to keep changing the
>>>>>> code. Your design should include a sketch of how several sets of
>>>>>> coordinating backends taking several concurrent parallel backups will
>>>>>> end up with one shared state per parallel backup.
>>>>>>
>>>>>> > There are two possible options:
>>>>>> >
>>>>>> > (1) Server may generate a unique ID i.e. BackupID=<unique_string> OR
>>>>>> > (2) (Preferred Option) Use the WAL start location as the BackupID.
>>>>>> >
>>>>>> > This BackupID should be given back as a response to start backup
>>>>>> command. All client workers
>>>>>> > must append this ID to all parallel backup replication commands. So
>>>>>> that we can use this identifier
>>>>>> > to search for that particular backup. Does that sound good?
>>>>>>
>>>>>> Using the WAL start location as the backup ID seems like it might be
>>>>>> problematic -- could a single checkpoint not end up as the start
>>>>>> location for multiple backups started at the same time? Whether that's
>>>>>> possible now or not, it seems unwise to hard-wire that assumption into
>>>>>> the wire protocol.
>>>>>>
>>>>>> I was thinking that perhaps the client should generate a unique backup
>>>>>> ID, e.g. leader does:
>>>>>>
>>>>>> START_BACKUP unique_backup_id [options]...
>>>>>>
>>>>>> And then others do:
>>>>>>
>>>>>> JOIN_BACKUP unique_backup_id
>>>>>>
>>>>>> My thought is that you will have a number of shared memory structure
>>>>>> equal to max_wal_senders, each one large enough to hold the shared
>>>>>> state for one backup. The shared state will include
>>>>>> char[NAMEDATALEN-or-something] which will be used to hold the backup
>>>>>> ID. START_BACKUP would allocate one and copy the name into it;
>>>>>> JOIN_BACKUP would search for one by name.
>>>>>>
>>>>>> If you want to generate the name on the server side, then I suppose
>>>>>> START_BACKUP would return a result set that includes the backup ID,
>>>>>> and clients would have to specify that same backup ID when invoking
>>>>>> JOIN_BACKUP. The rest would stay the same. I am not sure which way is
>>>>>> better. Either way, the backup ID should be something long and hard to
>>>>>> guess, not e.g. the leader processes' PID. I think we should generate
>>>>>> it using pg_strong_random, say 8 or 16 bytes, and then hex-encode the
>>>>>> result to get a string. That way there's almost no risk of two backup
>>>>>> IDs colliding accidentally, and even if we somehow had a malicious
>>>>>> user trying to screw up somebody else's parallel backup by choosing a
>>>>>> colliding backup ID, it would be pretty hard to have any success. A
>>>>>> user with enough access to do that sort of thing can probably cause a
>>>>>> lot worse problems anyway, but it seems pretty easy to guard against
>>>>>> intentional collisions robustly here, so I think we should.
>>>>>>
>>>>>>
>>>>> Okay so If we are to add another replication command ‘JOIN_BACKUP
>>>>> unique_backup_id’
>>>>> to make workers find the relevant shared state. There won't be any
>>>>> need for changing
>>>>> the grammar for any other command. The START_BACKUP can return the
>>>>> unique_backup_id
>>>>> in the result set.
>>>>>
>>>>> I am thinking of the following struct for shared state:
>>>>>
>>>>>> *typedef* *struct*
>>>>>>
>>>>>> {
>>>>>>
>>>>>> *char* backupid[NAMEDATALEN];
>>>>>>
>>>>>> XLogRecPtr startptr;
>>>>>>
>>>>>>
>>>>>> slock_t lock;
>>>>>>
>>>>>> int64 throttling_counter;
>>>>>>
>>>>>> *bool* backup_started_in_recovery;
>>>>>>
>>>>>> } BackupSharedState;
>>>>>>
>>>>>>
>>>>> The shared state structure entries would be maintained by a shared
>>>>> hash table.
>>>>> There will be one structure per parallel backup. Since a single
>>>>> parallel backup
>>>>> can engage more than one wal sender, so I think max_wal_senders might
>>>>> be a little
>>>>> too much; perhaps max_wal_senders/2 since there will be at least 2
>>>>> connections
>>>>> per parallel backup? Alternatively, we can set a new GUC that defines
>>>>> the maximum
>>>>> number of for concurrent parallel backups i.e.
>>>>> ‘max_concurent_backups_allowed = 10’
>>>>> perhaps, or we can make it user-configurable.
>>>>>
>>>>> The key would be “backupid=hex_encode(pg_random_strong(16))”
>>>>>
>>>>> Checking for Standby Promotion:
>>>>> At the START_BACKUP command, we initialize
>>>>> BackupSharedState.backup_started_in_recovery
>>>>> and keep checking it whenever send_file () is called to send a new
>>>>> file.
>>>>>
>>>>> Throttling:
>>>>> BackupSharedState.throttling_counter - The throttling logic remains
>>>>> the same
>>>>> as for non-parallel backup with the exception that multiple threads
>>>>> will now be
>>>>> updating it. So in parallel backup, this will represent the overall
>>>>> bytes that
>>>>> have been transferred. So the workers would sleep if they have
>>>>> exceeded the
>>>>> limit. Hence, the shared state carries a lock to safely update the
>>>>> throttling
>>>>> value atomically.
>>>>>
>>>>> Progress Reporting:
>>>>> Although I think we should add progress-reporting for parallel backup
>>>>> as a
>>>>> separate patch. The relevant entries for progress-reporting such as
>>>>> ‘backup_total’ and ‘backup_streamed’ would be then added to this
>>>>> structure
>>>>> as well.
>>>>>
>>>>>
>>>>> Grammar:
>>>>> There is a change in the resultset being returned for START_BACKUP
>>>>> command;
>>>>> unique_backup_id is added. Additionally, JOIN_BACKUP replication
>>>>> command is
>>>>> added. SEND_FILES has been renamed to SEND_FILE. There are no other
>>>>> changes
>>>>> to the grammar.
>>>>>
>>>>> START_BACKUP [LABEL '<label>'] [FAST]
>>>>> - returns startptr, tli, backup_label, unique_backup_id
>>>>> STOP_BACKUP [NOWAIT]
>>>>> - returns startptr, tli, backup_label
>>>>> JOIN_BACKUP ‘unique_backup_id’
>>>>> - attaches a shared state identified by ‘unique_backup_id’ to a
>>>>> backend process.
>>>>>
>>>>> LIST_TABLESPACES [PROGRESS]
>>>>> LIST_FILES [TABLESPACE]
>>>>> LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
>>>>> SEND_FILE '(' FILE ')' [NOVERIFY_CHECKSUMS]
>>>>>
>>>>>
>

Hi,

rebased and updated to the current master (8128b0c1). v13 is attached.

- Fixes the above reported issues.

- Added progress-reporting support for parallel:
For this, 'backup_streamed' is moved to a shared structure (BackupState) as
pg_atomic_uint64 variable. The worker processes will keep incrementing this
variable.

While files are being transferred from server to client. The main process
remains
in an idle state. So after each increment, the worker process will signal
master to
update the stats in pg_stat_progress_basebackup view.

The 'tablespace_streamed' column is not updated and will remain empty. This
is
because multiple workers may be copying files from different tablespaces.

- Added backup manifest:
The backend workers maintain their own manifest file which contains a list
of files
that are being transferred by the work. Once all backup files are
transferred, the
workers will create a temp file as
('pg_tempdir/temp_file_prefix_backupid.workerid')
to write the content of the manifest file from BufFile. The workers won’t
add the
header, nor the WAL information in their manifest. These two will be added
by the
main process while merging all worker manifest files.

The main process will read these individual files and concatenate them into
a single file
which is then sent back to the client.

The manifest file is created when the following command is received:

> BUILD_MANIFEST 'backupid'

This is a new replication command. It is sent when pg_basebackup has copied
all the
$PGDATA files including WAL files.

--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Attachment Content-Type Size
parallel_backup_v13.zip application/zip 57.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2020-04-14 13:01:00 Re: Poll: are people okay with function/operator table redesign?
Previous Message Dilip Kumar 2020-04-14 10:36:56 Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions