Quick Links

Re: WIP/PoC for parallel backup

From:	Asif Rehman <asifr(dot)rehman(at)gmail(dot)com>
To:	Kashif Zeeshan <kashif(dot)zeeshan(at)enterprisedb(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP/PoC for parallel backup
Date:	2020-04-14 14:36:58
Message-ID:	CADM=JegsK0tLiU8x3HVfq6SXqc1=34_Fk89=LtmMRW3-skvxYw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Tue, Apr 14, 2020 at 6:32 PM Kashif Zeeshan <
kashif(dot)zeeshan(at)enterprisedb(dot)com> wrote:

> Hi Asif
>
> Getting the following error on Parallel backup when --no-manifest option
> is used.
>
> [edb(at)localhost bin]$
> [edb(at)localhost bin]$
> [edb(at)localhost bin]$ ./pg_basebackup -v -j 5 -D
> /home/edb/Desktop/backup/ --no-manifest
> pg_basebackup: initiating base backup, waiting for checkpoint to complete
> pg_basebackup: checkpoint completed
> pg_basebackup: write-ahead log start point: 0/2000028 on timeline 1
> pg_basebackup: starting background WAL receiver
> pg_basebackup: created temporary replication slot "pg_basebackup_10223"
> pg_basebackup: backup worker (0) created
> pg_basebackup: backup worker (1) created
> pg_basebackup: backup worker (2) created
> pg_basebackup: backup worker (3) created
> pg_basebackup: backup worker (4) created
> pg_basebackup: write-ahead log end point: 0/2000100
> pg_basebackup: error: could not get data for 'BUILD_MANIFEST': ERROR:
> could not open file
> "base/pgsql_tmp/pgsql_tmp_b4ef5ac0fd150b2a28caf626bbb1bef2.1": No such file
> or directory
> pg_basebackup: removing contents of data directory
> "/home/edb/Desktop/backup/"
> [edb(at)localhost bin]$
>

I forgot to make a check for no-manifest. Fixed. Attached is the updated
patch.

> Thanks
>
> On Tue, Apr 14, 2020 at 5:33 PM Asif Rehman <asifr(dot)rehman(at)gmail(dot)com>
> wrote:
>
>>
>>
>> On Wed, Apr 8, 2020 at 6:53 PM Kashif Zeeshan <
>> kashif(dot)zeeshan(at)enterprisedb(dot)com> wrote:
>>
>>>
>>>
>>> On Tue, Apr 7, 2020 at 9:44 PM Asif Rehman <asifr(dot)rehman(at)gmail(dot)com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> Thanks, Kashif and Rajkumar. I have fixed the reported issues.
>>>>
>>>> I have added the shared state as previously described. The new grammar
>>>> changes
>>>> are as follows:
>>>>
>>>> START_BACKUP [LABEL '<label>'] [FAST] [MAX_RATE %d]
>>>> - This will generate a unique backupid using pg_strong_random(16)
>>>> and hex-encoded
>>>> it. which is then returned as the result set.
>>>> - It will also create a shared state and add it to the hashtable.
>>>> The hash table size is set
>>>> to BACKUP_HASH_SIZE=10, but since hashtable can expand
>>>> dynamically, I think it's
>>>> sufficient initial size. max_wal_senders is not used, because it
>>>> can be set to quite a
>>>> large values.
>>>>
>>>> JOIN_BACKUP 'backup_id'
>>>> - finds 'backup_id' in hashtable and attaches it to server process.
>>>>
>>>>
>>>> SEND_FILE '(' 'FILE' ')' [NOVERIFY_CHECKSUMS]
>>>> - renamed SEND_FILES to SEND_FILE
>>>> - removed START_WAL_LOCATION from this because 'startptr' is now
>>>> accessible through
>>>> shared state.
>>>>
>>>> There is no change in other commands:
>>>> STOP_BACKUP [NOWAIT]
>>>> LIST_TABLESPACES [PROGRESS]
>>>> LIST_FILES [TABLESPACE]
>>>> LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
>>>>
>>>> The current patches (v11) have been rebased to the latest master. The
>>>> backup manifest is enabled
>>>> by default, so I have disabled it for parallel backup mode and have
>>>> generated a warning so that
>>>> user is aware of it and not expect it in the backup.
>>>>
>>>> Hi Asif
>>>
>>> I have verified the bug fixes, one bug is fixed and working now as
>>> expected
>>>
>>> For the verification of the other bug fixes faced following issues,
>>> please have a look.
>>>
>>>
>>> 1) Following bug fixes mentioned below are generating segmentation
>>> fault.
>>>
>>> Please note for reference I have added a description only as steps were
>>> given in previous emails of each bug I tried to verify the fix. Backtrace
>>> is also added with each case which points to one bug for both the cases.
>>>
>>> a) The backup failed with errors "error: could not connect to server:
>>> could not look up local user ID 1000: Too many open files" when the
>>> max_wal_senders was set to 2000.
>>>
>>>
>>> [edb(at)localhost bin]$ ./pg_basebackup -v -j 1990 -D
>>> /home/edb/Desktop/backup/
>>> pg_basebackup: warning: backup manifest is disabled in parallel backup
>>> mode
>>> pg_basebackup: initiating base backup, waiting for checkpoint to complete
>>> pg_basebackup: checkpoint completed
>>> pg_basebackup: write-ahead log start point: 0/2000028 on timeline 1
>>> pg_basebackup: starting background WAL receiver
>>> pg_basebackup: created temporary replication slot "pg_basebackup_9925"
>>> pg_basebackup: backup worker (0) created
>>> pg_basebackup: backup worker (1) created
>>> pg_basebackup: backup worker (2) created
>>> pg_basebackup: backup worker (3) created
>>> ….
>>> ….
>>> pg_basebackup: backup worker (1014) created
>>> pg_basebackup: backup worker (1015) created
>>> pg_basebackup: backup worker (1016) created
>>> pg_basebackup: backup worker (1017) created
>>> pg_basebackup: error: could not connect to server: could not look up
>>> local user ID 1000: Too many open files
>>> Segmentation fault
>>> [edb(at)localhost bin]$
>>>
>>>
>>> [edb(at)localhost bin]$
>>> [edb(at)localhost bin]$ gdb pg_basebackup
>>> /tmp/cores/core.pg_basebackup.13219.localhost.localdomain.1586349551
>>> GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
>>> Copyright (C) 2013 Free Software Foundation, Inc.
>>> License GPLv3+: GNU GPL version 3 or later <
>>> http://gnu.org/licenses/gpl.html>
>>> This is free software: you are free to change and redistribute it.
>>> There is NO WARRANTY, to the extent permitted by law. Type "show
>>> copying"
>>> and "show warranty" for details.
>>> This GDB was configured as "x86_64-redhat-linux-gnu".
>>> For bug reporting instructions, please see:
>>> <http://www.gnu.org/software/gdb/bugs/>...
>>> Reading symbols from
>>> /home/edb/Communtiy_Parallel_backup/postgresql/inst/bin/pg_basebackup...done.
>>> [New LWP 13219]
>>> [New LWP 13222]
>>> [Thread debugging using libthread_db enabled]
>>> Using host libthread_db library "/lib64/libthread_db.so.1".
>>> Core was generated by `./pg_basebackup -v -j 1990 -D
>>> /home/edb/Desktop/backup/'.
>>> Program terminated with signal 11, Segmentation fault.
>>> #0 pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
>>> 47 if (INVALID_NOT_TERMINATED_TD_P (pd))
>>> (gdb) bt
>>> #0 pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
>>> #1 0x000000000040904a in cleanup_workers () at pg_basebackup.c:2978
>>> #2 0x0000000000403806 in disconnect_atexit () at pg_basebackup.c:332
>>> #3 0x00007f2226f76a49 in __run_exit_handlers (status=1,
>>> listp=0x7f22272f86c8 <__exit_funcs>, run_list_atexit=run_list_atexit(at)entry=true)
>>> at exit.c:77
>>> #4 0x00007f2226f76a95 in __GI_exit (status=<optimized out>) at exit.c:99
>>> #5 0x0000000000408c54 in create_parallel_workers (backupinfo=0x952ca0)
>>> at pg_basebackup.c:2811
>>> #6 0x000000000040798f in BaseBackup () at pg_basebackup.c:2211
>>> #7 0x0000000000408b4d in main (argc=6, argv=0x7ffe3dabc718) at
>>> pg_basebackup.c:2765
>>> (gdb)
>>>
>>>
>>>
>>>
>>> b) When executing two backups at the same time, getting FATAL error due
>>> to max_wal_senders and instead of exit Backup got completed.
>>>
>>> [edb(at)localhost bin]$
>>> [edb(at)localhost bin]$
>>> [edb(at)localhost bin]$ ./pg_basebackup -v -j 8 -D
>>> /home/edb/Desktop/backup1/
>>> pg_basebackup: warning: backup manifest is disabled in parallel backup
>>> mode
>>> pg_basebackup: initiating base backup, waiting for checkpoint to complete
>>> pg_basebackup: checkpoint completed
>>> pg_basebackup: write-ahead log start point: 1/DA000028 on timeline 1
>>> pg_basebackup: starting background WAL receiver
>>> pg_basebackup: created temporary replication slot "pg_basebackup_17066"
>>> pg_basebackup: backup worker (0) created
>>> pg_basebackup: backup worker (1) created
>>> pg_basebackup: backup worker (2) created
>>> pg_basebackup: backup worker (3) created
>>> pg_basebackup: backup worker (4) created
>>> pg_basebackup: backup worker (5) created
>>> pg_basebackup: backup worker (6) created
>>> pg_basebackup: error: could not connect to server: FATAL: number of
>>> requested standby connections exceeds max_wal_senders (currently 10)
>>> Segmentation fault (core dumped)
>>> [edb(at)localhost bin]$
>>> [edb(at)localhost bin]$
>>> [edb(at)localhost bin]$ gdb pg_basebackup
>>> /tmp/cores/core.pg_basebackup.17041.localhost.localdomain.1586353696
>>> GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
>>> Copyright (C) 2013 Free Software Foundation, Inc.
>>> License GPLv3+: GNU GPL version 3 or later <
>>> http://gnu.org/licenses/gpl.html>
>>> This is free software: you are free to change and redistribute it.
>>> There is NO WARRANTY, to the extent permitted by law. Type "show
>>> copying"
>>> and "show warranty" for details.
>>> This GDB was configured as "x86_64-redhat-linux-gnu".
>>> For bug reporting instructions, please see:
>>> <http://www.gnu.org/software/gdb/bugs/>...
>>> Reading symbols from
>>> /home/edb/Communtiy_Parallel_backup/postgresql/inst/bin/pg_basebackup...done.
>>> [New LWP 17041]
>>> [New LWP 17067]
>>> [Thread debugging using libthread_db enabled]
>>> Using host libthread_db library "/lib64/libthread_db.so.1".
>>> Core was generated by `./pg_basebackup -v -j 8 -D
>>> /home/edb/Desktop/backup1/'.
>>> Program terminated with signal 11, Segmentation fault.
>>> #0 pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
>>> 47 if (INVALID_NOT_TERMINATED_TD_P (pd))
>>> (gdb) bt
>>> #0 pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
>>> #1 0x000000000040904a in cleanup_workers () at pg_basebackup.c:2978
>>> #2 0x0000000000403806 in disconnect_atexit () at pg_basebackup.c:332
>>> #3 0x00007f051edc1a49 in __run_exit_handlers (status=1,
>>> listp=0x7f051f1436c8 <__exit_funcs>, run_list_atexit=run_list_atexit(at)entry=true)
>>> at exit.c:77
>>> #4 0x00007f051edc1a95 in __GI_exit (status=<optimized out>) at exit.c:99
>>> #5 0x0000000000408c54 in create_parallel_workers (backupinfo=0x1c6dca0)
>>> at pg_basebackup.c:2811
>>> #6 0x000000000040798f in BaseBackup () at pg_basebackup.c:2211
>>> #7 0x0000000000408b4d in main (argc=6, argv=0x7ffdb76a6d68) at
>>> pg_basebackup.c:2765
>>> (gdb)
>>>
>>>
>>>
>>>
>>> 2) The following bug is not fixed yet
>>>
>>> A similar case is when DB Server is shut down while the Parallel Backup
>>> is in progress then the correct error is displayed but then the backup
>>> folder is not cleaned and leaves a corrupt backup.
>>>
>>> [edb(at)localhost bin]$
>>> [edb(at)localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/
>>> -j 8
>>> pg_basebackup: warning: backup manifest is disabled in parallel backup
>>> mode
>>> pg_basebackup: initiating base backup, waiting for checkpoint to complete
>>> pg_basebackup: checkpoint completed
>>> pg_basebackup: write-ahead log start point: 0/A0000028 on timeline 1
>>> pg_basebackup: starting background WAL receiver
>>> pg_basebackup: created temporary replication slot "pg_basebackup_16235"
>>> pg_basebackup: backup worker (0) created
>>> pg_basebackup: backup worker (1) created
>>> pg_basebackup: backup worker (2) created
>>> pg_basebackup: backup worker (3) created
>>> pg_basebackup: backup worker (4) created
>>> pg_basebackup: backup worker (5) created
>>> pg_basebackup: backup worker (6) created
>>> pg_basebackup: backup worker (7) created
>>> pg_basebackup: error: could not read COPY data: server closed the
>>> connection unexpectedly
>>> This probably means the server terminated abnormally
>>> before or while processing the request.
>>> pg_basebackup: error: could not read COPY data: server closed the
>>> connection unexpectedly
>>> This probably means the server terminated abnormally
>>> before or while processing the request.
>>> pg_basebackup: removing contents of data directory
>>> "/home/edb/Desktop/backup/"
>>> pg_basebackup: error: could not read COPY data: server closed the
>>> connection unexpectedly
>>> This probably means the server terminated abnormally
>>> before or while processing the request.
>>> [edb(at)localhost bin]$
>>> [edb(at)localhost bin]$
>>> [edb(at)localhost bin]$
>>>
>>>
>>>
>>> [edb(at)localhost bin]$
>>> [edb(at)localhost bin]$ ls /home/edb/Desktop/backup
>>> base pg_hba.conf pg_logical pg_notify pg_serial
>>> pg_stat pg_subtrans pg_twophase pg_xact postgresql.conf
>>> pg_dynshmem pg_ident.conf pg_multixact pg_replslot pg_snapshots
>>> pg_stat_tmp pg_tblspc PG_VERSION postgresql.auto.conf
>>> [edb(at)localhost bin]$
>>> [edb(at)localhost bin]$
>>>
>>>
>>>
>>>
>>> Thanks
>>> Kashif Zeeshan
>>>
>>>>
>>>>
>>>> On Tue, Apr 7, 2020 at 4:03 PM Kashif Zeeshan <
>>>> kashif(dot)zeeshan(at)enterprisedb(dot)com> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Fri, Apr 3, 2020 at 3:01 PM Kashif Zeeshan <
>>>>> kashif(dot)zeeshan(at)enterprisedb(dot)com> wrote:
>>>>>
>>>>>> Hi Asif
>>>>>>
>>>>>> When a non-existent slot is used with tablespace then correct error
>>>>>> is displayed but then the backup folder is not cleaned and leaves a corrupt
>>>>>> backup.
>>>>>>
>>>>>> Steps
>>>>>> =======
>>>>>>
>>>>>> edb(at)localhost bin]$
>>>>>> [edb(at)localhost bin]$ mkdir /home/edb/tbl1
>>>>>> [edb(at)localhost bin]$ mkdir /home/edb/tbl_res
>>>>>> [edb(at)localhost bin]$
>>>>>> postgres=# create tablespace tbl1 location '/home/edb/tbl1';
>>>>>> CREATE TABLESPACE
>>>>>> postgres=#
>>>>>> postgres=# create table t1 (a int) tablespace tbl1;
>>>>>> CREATE TABLE
>>>>>> postgres=# insert into t1 values(100);
>>>>>> INSERT 0 1
>>>>>> postgres=# insert into t1 values(200);
>>>>>> INSERT 0 1
>>>>>> postgres=# insert into t1 values(300);
>>>>>> INSERT 0 1
>>>>>> postgres=#
>>>>>>
>>>>>>
>>>>>> [edb(at)localhost bin]$
>>>>>> [edb(at)localhost bin]$ ./pg_basebackup -v -j 2 -D
>>>>>> /home/edb/Desktop/backup/ -T /home/edb/tbl1=/home/edb/tbl_res -S test
>>>>>> pg_basebackup: initiating base backup, waiting for checkpoint to
>>>>>> complete
>>>>>> pg_basebackup: checkpoint completed
>>>>>> pg_basebackup: write-ahead log start point: 0/2E000028 on timeline 1
>>>>>> pg_basebackup: starting background WAL receiver
>>>>>> pg_basebackup: error: could not send replication command
>>>>>> "START_REPLICATION": ERROR: replication slot "test" does not exist
>>>>>> pg_basebackup: backup worker (0) created
>>>>>> pg_basebackup: backup worker (1) created
>>>>>> pg_basebackup: write-ahead log end point: 0/2E000100
>>>>>> pg_basebackup: waiting for background process to finish streaming ...
>>>>>> pg_basebackup: error: child thread exited with error 1
>>>>>> [edb(at)localhost bin]$
>>>>>>
>>>>>> backup folder not cleaned
>>>>>>
>>>>>> [edb(at)localhost bin]$
>>>>>> [edb(at)localhost bin]$
>>>>>> [edb(at)localhost bin]$
>>>>>> [edb(at)localhost bin]$ ls /home/edb/Desktop/backup
>>>>>> backup_label global pg_dynshmem pg_ident.conf pg_multixact
>>>>>> pg_replslot pg_snapshots pg_stat_tmp pg_tblspc PG_VERSION pg_xact
>>>>>> postgresql.conf
>>>>>> base pg_commit_ts pg_hba.conf pg_logical pg_notify
>>>>>> pg_serial pg_stat pg_subtrans pg_twophase pg_wal
>>>>>> postgresql.auto.conf
>>>>>> [edb(at)localhost bin]$
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> If the same case is executed without the parallel backup patch then
>>>>>> the backup folder is cleaned after the error is displayed.
>>>>>>
>>>>>> [edb(at)localhost bin]$ ./pg_basebackup -v -D
>>>>>> /home/edb/Desktop/backup/ -T /home/edb/tbl1=/home/edb/tbl_res -S test999
>>>>>> pg_basebackup: initiating base backup, waiting for checkpoint to
>>>>>> complete
>>>>>> pg_basebackup: checkpoint completed
>>>>>> pg_basebackup: write-ahead log start point: 0/2B000028 on timeline 1
>>>>>> pg_basebackup: starting background WAL receiver
>>>>>> pg_basebackup: error: could not send replication command
>>>>>> "START_REPLICATION": ERROR: replication slot "test999" does not exist
>>>>>> pg_basebackup: write-ahead log end point: 0/2B000100
>>>>>> pg_basebackup: waiting for background process to finish streaming ...
>>>>>> pg_basebackup: error: child process exited with exit code 1
>>>>>> *pg_basebackup: removing data directory " /home/edb/Desktop/backup"*
>>>>>> pg_basebackup: changes to tablespace directories will not be undone
>>>>>>
>>>>>
>>>>>
>>>>> Hi Asif
>>>>>
>>>>> A similar case is when DB Server is shut down while the Parallel
>>>>> Backup is in progress then the correct error is displayed but then the
>>>>> backup folder is not cleaned and leaves a corrupt backup. I think one bug
>>>>> fix will solve all these cases where clean up is not done when parallel
>>>>> backup is failed.
>>>>>
>>>>> [edb(at)localhost bin]$
>>>>> [edb(at)localhost bin]$
>>>>> [edb(at)localhost bin]$ ./pg_basebackup -v -D
>>>>> /home/edb/Desktop/backup/ -j 8
>>>>> pg_basebackup: initiating base backup, waiting for checkpoint to
>>>>> complete
>>>>> pg_basebackup: checkpoint completed
>>>>> pg_basebackup: write-ahead log start point: 0/C1000028 on timeline 1
>>>>> pg_basebackup: starting background WAL receiver
>>>>> pg_basebackup: created temporary replication slot "pg_basebackup_57337"
>>>>> pg_basebackup: backup worker (0) created
>>>>> pg_basebackup: backup worker (1) created
>>>>> pg_basebackup: backup worker (2) created
>>>>> pg_basebackup: backup worker (3) created
>>>>> pg_basebackup: backup worker (4) created
>>>>> pg_basebackup: backup worker (5) created
>>>>> pg_basebackup: backup worker (6) created
>>>>> pg_basebackup: backup worker (7) created
>>>>> pg_basebackup: error: could not read COPY data: server closed the
>>>>> connection unexpectedly
>>>>> This probably means the server terminated abnormally
>>>>> before or while processing the request.
>>>>> pg_basebackup: error: could not read COPY data: server closed the
>>>>> connection unexpectedly
>>>>> This probably means the server terminated abnormally
>>>>> before or while processing the request.
>>>>> [edb(at)localhost bin]$
>>>>> [edb(at)localhost bin]$
>>>>>
>>>>> Same case when executed on pg_basebackup without the Parallel backup
>>>>> patch then proper clean up is done.
>>>>>
>>>>> [edb(at)localhost bin]$
>>>>> [edb(at)localhost bin]$ ./pg_basebackup -v -D
>>>>> /home/edb/Desktop/backup/
>>>>> pg_basebackup: initiating base backup, waiting for checkpoint to
>>>>> complete
>>>>> pg_basebackup: checkpoint completed
>>>>> pg_basebackup: write-ahead log start point: 0/C5000028 on timeline 1
>>>>> pg_basebackup: starting background WAL receiver
>>>>> pg_basebackup: created temporary replication slot "pg_basebackup_5590"
>>>>> pg_basebackup: error: could not read COPY data: server closed the
>>>>> connection unexpectedly
>>>>> This probably means the server terminated abnormally
>>>>> before or while processing the request.
>>>>> pg_basebackup: removing contents of data directory
>>>>> "/home/edb/Desktop/backup/"
>>>>> [edb(at)localhost bin]$
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>>>
>>>>>> On Fri, Apr 3, 2020 at 1:46 PM Asif Rehman <asifr(dot)rehman(at)gmail(dot)com>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Apr 2, 2020 at 8:45 PM Robert Haas <robertmhaas(at)gmail(dot)com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> On Thu, Apr 2, 2020 at 11:17 AM Asif Rehman <asifr(dot)rehman(at)gmail(dot)com>
>>>>>>>> wrote:
>>>>>>>> >> Why would you need to do that? As long as the process where
>>>>>>>> >> STOP_BACKUP can do the check, that seems good enough.
>>>>>>>> >
>>>>>>>> > Yes, but the user will get the error only after the STOP_BACKUP,
>>>>>>>> not while the backup is
>>>>>>>> > in progress. So if the backup is a large one, early error
>>>>>>>> detection would be much beneficial.
>>>>>>>> > This is the current behavior of non-parallel backup as well.
>>>>>>>>
>>>>>>>> Because non-parallel backup does not feature early detection of this
>>>>>>>> error, it is not necessary to make parallel backup do so. Indeed, it
>>>>>>>> is undesirable. If you want to fix that problem, do it on a separate
>>>>>>>> thread in a separate patch. A patch proposing to make parallel
>>>>>>>> backup
>>>>>>>> inconsistent in behavior with non-parallel backup will be rejected,
>>>>>>>> at
>>>>>>>> least if I have anything to say about it.
>>>>>>>>
>>>>>>>> TBH, fixing this doesn't seem like an urgent problem to me. The
>>>>>>>> current situation is not great, but promotions ought to be
>>>>>>>> relatively
>>>>>>>> infrequent, so I'm not sure it's a huge problem in practice. It is
>>>>>>>> also worth considering whether the right fix is to figure out how to
>>>>>>>> make that case actually work, rather than just making it fail
>>>>>>>> quicker.
>>>>>>>> I don't currently understand the reason for the prohibition so I
>>>>>>>> can't
>>>>>>>> express an intelligent opinion on what the right answer is here, but
>>>>>>>> it seems like it ought to be investigated before somebody goes and
>>>>>>>> builds a bunch of infrastructure to make the error more timely.
>>>>>>>>
>>>>>>>
>>>>>>> Non-parallel backup already does the early error checking. I only
>>>>>>> intended
>>>>>>>
>>>>>>> to make parallel behave the same as non-parallel here. So, I agree
>>>>>>> with
>>>>>>>
>>>>>>> you that the behavior of parallel backup should be consistent with
>>>>>>> the
>>>>>>>
>>>>>>> non-parallel one. Please see the code snippet below from
>>>>>>>
>>>>>>> basebackup.c:sendDir()
>>>>>>>
>>>>>>>
>>>>>>> /*
>>>>>>>>
>>>>>>>> * Check if the postmaster has signaled us to exit, and abort with
>>>>>>>> an
>>>>>>>>
>>>>>>>> * error in that case. The error handler further up will call
>>>>>>>>
>>>>>>>> * do_pg_abort_backup() for us. Also check that if the backup was
>>>>>>>>
>>>>>>>> * started while still in recovery, the server wasn't promoted.
>>>>>>>>
>>>>>>>> * do_pg_stop_backup() will check that too, but it's better to stop
>>>>>>>>
>>>>>>>> * the backup early than continue to the end and fail there.
>>>>>>>>
>>>>>>>> */
>>>>>>>>
>>>>>>>> CHECK_FOR_INTERRUPTS();
>>>>>>>>
>>>>>>>> *if* (RecoveryInProgress() != backup_started_in_recovery)
>>>>>>>>
>>>>>>>> ereport(ERROR,
>>>>>>>>
>>>>>>>> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>>>>>>>>
>>>>>>>> errmsg("the standby was promoted during online backup"),
>>>>>>>>
>>>>>>>> errhint("This means that the backup being taken is corrupt "
>>>>>>>>
>>>>>>>> "and should not be used. "
>>>>>>>>
>>>>>>>> "Try taking another online backup.")));
>>>>>>>>
>>>>>>>>
>>>>>>>> > Okay, then I will add the shared state. And since we are adding
>>>>>>>> the shared state, we can use
>>>>>>>> > that for throttling, progress-reporting and standby early error
>>>>>>>> checking.
>>>>>>>>
>>>>>>>> Please propose a grammar here for all the new replication commands
>>>>>>>> you
>>>>>>>> plan to add before going and implement everything. That will make it
>>>>>>>> easier to hash out the design without forcing you to keep changing
>>>>>>>> the
>>>>>>>> code. Your design should include a sketch of how several sets of
>>>>>>>> coordinating backends taking several concurrent parallel backups
>>>>>>>> will
>>>>>>>> end up with one shared state per parallel backup.
>>>>>>>>
>>>>>>>> > There are two possible options:
>>>>>>>> >
>>>>>>>> > (1) Server may generate a unique ID i.e. BackupID=<unique_string>
>>>>>>>> OR
>>>>>>>> > (2) (Preferred Option) Use the WAL start location as the BackupID.
>>>>>>>> >
>>>>>>>> > This BackupID should be given back as a response to start backup
>>>>>>>> command. All client workers
>>>>>>>> > must append this ID to all parallel backup replication commands.
>>>>>>>> So that we can use this identifier
>>>>>>>> > to search for that particular backup. Does that sound good?
>>>>>>>>
>>>>>>>> Using the WAL start location as the backup ID seems like it might be
>>>>>>>> problematic -- could a single checkpoint not end up as the start
>>>>>>>> location for multiple backups started at the same time? Whether
>>>>>>>> that's
>>>>>>>> possible now or not, it seems unwise to hard-wire that assumption
>>>>>>>> into
>>>>>>>> the wire protocol.
>>>>>>>>
>>>>>>>> I was thinking that perhaps the client should generate a unique
>>>>>>>> backup
>>>>>>>> ID, e.g. leader does:
>>>>>>>>
>>>>>>>> START_BACKUP unique_backup_id [options]...
>>>>>>>>
>>>>>>>> And then others do:
>>>>>>>>
>>>>>>>> JOIN_BACKUP unique_backup_id
>>>>>>>>
>>>>>>>> My thought is that you will have a number of shared memory structure
>>>>>>>> equal to max_wal_senders, each one large enough to hold the shared
>>>>>>>> state for one backup. The shared state will include
>>>>>>>> char[NAMEDATALEN-or-something] which will be used to hold the backup
>>>>>>>> ID. START_BACKUP would allocate one and copy the name into it;
>>>>>>>> JOIN_BACKUP would search for one by name.
>>>>>>>>
>>>>>>>> If you want to generate the name on the server side, then I suppose
>>>>>>>> START_BACKUP would return a result set that includes the backup ID,
>>>>>>>> and clients would have to specify that same backup ID when invoking
>>>>>>>> JOIN_BACKUP. The rest would stay the same. I am not sure which way
>>>>>>>> is
>>>>>>>> better. Either way, the backup ID should be something long and hard
>>>>>>>> to
>>>>>>>> guess, not e.g. the leader processes' PID. I think we should
>>>>>>>> generate
>>>>>>>> it using pg_strong_random, say 8 or 16 bytes, and then hex-encode
>>>>>>>> the
>>>>>>>> result to get a string. That way there's almost no risk of two
>>>>>>>> backup
>>>>>>>> IDs colliding accidentally, and even if we somehow had a malicious
>>>>>>>> user trying to screw up somebody else's parallel backup by choosing
>>>>>>>> a
>>>>>>>> colliding backup ID, it would be pretty hard to have any success. A
>>>>>>>> user with enough access to do that sort of thing can probably cause
>>>>>>>> a
>>>>>>>> lot worse problems anyway, but it seems pretty easy to guard against
>>>>>>>> intentional collisions robustly here, so I think we should.
>>>>>>>>
>>>>>>>>
>>>>>>> Okay so If we are to add another replication command ‘JOIN_BACKUP
>>>>>>> unique_backup_id’
>>>>>>> to make workers find the relevant shared state. There won't be any
>>>>>>> need for changing
>>>>>>> the grammar for any other command. The START_BACKUP can return the
>>>>>>> unique_backup_id
>>>>>>> in the result set.
>>>>>>>
>>>>>>> I am thinking of the following struct for shared state:
>>>>>>>
>>>>>>>> *typedef* *struct*
>>>>>>>>
>>>>>>>> {
>>>>>>>>
>>>>>>>> *char* backupid[NAMEDATALEN];
>>>>>>>>
>>>>>>>> XLogRecPtr startptr;
>>>>>>>>
>>>>>>>>
>>>>>>>> slock_t lock;
>>>>>>>>
>>>>>>>> int64 throttling_counter;
>>>>>>>>
>>>>>>>> *bool* backup_started_in_recovery;
>>>>>>>>
>>>>>>>> } BackupSharedState;
>>>>>>>>
>>>>>>>>
>>>>>>> The shared state structure entries would be maintained by a shared
>>>>>>> hash table.
>>>>>>> There will be one structure per parallel backup. Since a single
>>>>>>> parallel backup
>>>>>>> can engage more than one wal sender, so I think max_wal_senders
>>>>>>> might be a little
>>>>>>> too much; perhaps max_wal_senders/2 since there will be at least 2
>>>>>>> connections
>>>>>>> per parallel backup? Alternatively, we can set a new GUC that
>>>>>>> defines the maximum
>>>>>>> number of for concurrent parallel backups i.e.
>>>>>>> ‘max_concurent_backups_allowed = 10’
>>>>>>> perhaps, or we can make it user-configurable.
>>>>>>>
>>>>>>> The key would be “backupid=hex_encode(pg_random_strong(16))”
>>>>>>>
>>>>>>> Checking for Standby Promotion:
>>>>>>> At the START_BACKUP command, we initialize
>>>>>>> BackupSharedState.backup_started_in_recovery
>>>>>>> and keep checking it whenever send_file () is called to send a new
>>>>>>> file.
>>>>>>>
>>>>>>> Throttling:
>>>>>>> BackupSharedState.throttling_counter - The throttling logic remains
>>>>>>> the same
>>>>>>> as for non-parallel backup with the exception that multiple threads
>>>>>>> will now be
>>>>>>> updating it. So in parallel backup, this will represent the overall
>>>>>>> bytes that
>>>>>>> have been transferred. So the workers would sleep if they have
>>>>>>> exceeded the
>>>>>>> limit. Hence, the shared state carries a lock to safely update the
>>>>>>> throttling
>>>>>>> value atomically.
>>>>>>>
>>>>>>> Progress Reporting:
>>>>>>> Although I think we should add progress-reporting for parallel
>>>>>>> backup as a
>>>>>>> separate patch. The relevant entries for progress-reporting such as
>>>>>>> ‘backup_total’ and ‘backup_streamed’ would be then added to this
>>>>>>> structure
>>>>>>> as well.
>>>>>>>
>>>>>>>
>>>>>>> Grammar:
>>>>>>> There is a change in the resultset being returned for START_BACKUP
>>>>>>> command;
>>>>>>> unique_backup_id is added. Additionally, JOIN_BACKUP replication
>>>>>>> command is
>>>>>>> added. SEND_FILES has been renamed to SEND_FILE. There are no other
>>>>>>> changes
>>>>>>> to the grammar.
>>>>>>>
>>>>>>> START_BACKUP [LABEL '<label>'] [FAST]
>>>>>>> - returns startptr, tli, backup_label, unique_backup_id
>>>>>>> STOP_BACKUP [NOWAIT]
>>>>>>> - returns startptr, tli, backup_label
>>>>>>> JOIN_BACKUP ‘unique_backup_id’
>>>>>>> - attaches a shared state identified by ‘unique_backup_id’ to a
>>>>>>> backend process.
>>>>>>>
>>>>>>> LIST_TABLESPACES [PROGRESS]
>>>>>>> LIST_FILES [TABLESPACE]
>>>>>>> LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
>>>>>>> SEND_FILE '(' FILE ')' [NOVERIFY_CHECKSUMS]
>>>>>>>
>>>>>>>
>>>
>>
>> Hi,
>>
>> rebased and updated to the current master (8128b0c1). v13 is attached.
>>
>> - Fixes the above reported issues.
>>
>> - Added progress-reporting support for parallel:
>> For this, 'backup_streamed' is moved to a shared structure (BackupState)
>> as
>> pg_atomic_uint64 variable. The worker processes will keep incrementing
>> this
>> variable.
>>
>> While files are being transferred from server to client. The main process
>> remains
>> in an idle state. So after each increment, the worker process will signal
>> master to
>> update the stats in pg_stat_progress_basebackup view.
>>
>> The 'tablespace_streamed' column is not updated and will remain empty.
>> This is
>> because multiple workers may be copying files from different tablespaces.
>>
>>
>> - Added backup manifest:
>> The backend workers maintain their own manifest file which contains a
>> list of files
>> that are being transferred by the work. Once all backup files are
>> transferred, the
>> workers will create a temp file as
>> ('pg_tempdir/temp_file_prefix_backupid.workerid')
>> to write the content of the manifest file from BufFile. The workers won’t
>> add the
>> header, nor the WAL information in their manifest. These two will be
>> added by the
>> main process while merging all worker manifest files.
>>
>> The main process will read these individual files and concatenate them
>> into a single file
>> which is then sent back to the client.
>>
>> The manifest file is created when the following command is received:
>>
>>> BUILD_MANIFEST 'backupid'
>>
>>
>> This is a new replication command. It is sent when pg_basebackup has
>> copied all the
>> $PGDATA files including WAL files.
>>
>>
>>
>> --
>> Asif Rehman
>> Highgo Software (Canada/China/Pakistan)
>> URL : www.highgo.ca
>>
>>
>
> --
> Regards
> ====================================
> Kashif Zeeshan
> Lead Quality Assurance Engineer / Manager
>
> EnterpriseDB Corporation
> The Enterprise Postgres Company
>
>

--
--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Attachment	Content-Type	Size
parallel_backup_v14.zip	application/zip	56.9 KB

In response to

Re: WIP/PoC for parallel backup at 2020-04-14 13:32:40 from Kashif Zeeshan

Responses

Re: WIP/PoC for parallel backup at 2020-04-14 20:49:04 from Robert Haas
Re: WIP/PoC for parallel backup at 2020-04-17 06:00:55 from Kashif Zeeshan
Re: WIP/PoC for parallel backup at 2020-04-21 04:27:31 from Amit Kapila

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andreas Karlsson	2020-04-14 14:39:44	Re: Poll: are people okay with function/operator table redesign?
Previous Message	Tom Lane	2020-04-14 14:34:10	Re: index paths and enable_indexscan