Re: WIP/PoC for parallel backup

From: Asif Rehman <asifr(dot)rehman(at)gmail(dot)com>
To: Kashif Zeeshan <kashif(dot)zeeshan(at)enterprisedb(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP/PoC for parallel backup
Date: 2020-04-07 16:43:51
Message-ID: CADM=JehQA_ifXi59mpxyv_m__L7PVk0DF_T+FFsimj2asmZZuQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Thanks, Kashif and Rajkumar. I have fixed the reported issues.

I have added the shared state as previously described. The new grammar
changes
are as follows:

START_BACKUP [LABEL '<label>'] [FAST] [MAX_RATE %d]
- This will generate a unique backupid using pg_strong_random(16) and
hex-encoded
it. which is then returned as the result set.
- It will also create a shared state and add it to the hashtable. The
hash table size is set
to BACKUP_HASH_SIZE=10, but since hashtable can expand dynamically, I
think it's
sufficient initial size. max_wal_senders is not used, because it can
be set to quite a
large values.

JOIN_BACKUP 'backup_id'
- finds 'backup_id' in hashtable and attaches it to server process.

SEND_FILE '(' 'FILE' ')' [NOVERIFY_CHECKSUMS]
- renamed SEND_FILES to SEND_FILE
- removed START_WAL_LOCATION from this because 'startptr' is now
accessible through
shared state.

There is no change in other commands:
STOP_BACKUP [NOWAIT]
LIST_TABLESPACES [PROGRESS]
LIST_FILES [TABLESPACE]
LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']

The current patches (v11) have been rebased to the latest master. The
backup manifest is enabled
by default, so I have disabled it for parallel backup mode and have
generated a warning so that
user is aware of it and not expect it in the backup.

On Tue, Apr 7, 2020 at 4:03 PM Kashif Zeeshan <
kashif(dot)zeeshan(at)enterprisedb(dot)com> wrote:

>
>
> On Fri, Apr 3, 2020 at 3:01 PM Kashif Zeeshan <
> kashif(dot)zeeshan(at)enterprisedb(dot)com> wrote:
>
>> Hi Asif
>>
>> When a non-existent slot is used with tablespace then correct error is
>> displayed but then the backup folder is not cleaned and leaves a corrupt
>> backup.
>>
>> Steps
>> =======
>>
>> edb(at)localhost bin]$
>> [edb(at)localhost bin]$ mkdir /home/edb/tbl1
>> [edb(at)localhost bin]$ mkdir /home/edb/tbl_res
>> [edb(at)localhost bin]$
>> postgres=# create tablespace tbl1 location '/home/edb/tbl1';
>> CREATE TABLESPACE
>> postgres=#
>> postgres=# create table t1 (a int) tablespace tbl1;
>> CREATE TABLE
>> postgres=# insert into t1 values(100);
>> INSERT 0 1
>> postgres=# insert into t1 values(200);
>> INSERT 0 1
>> postgres=# insert into t1 values(300);
>> INSERT 0 1
>> postgres=#
>>
>>
>> [edb(at)localhost bin]$
>> [edb(at)localhost bin]$ ./pg_basebackup -v -j 2 -D
>> /home/edb/Desktop/backup/ -T /home/edb/tbl1=/home/edb/tbl_res -S test
>> pg_basebackup: initiating base backup, waiting for checkpoint to complete
>> pg_basebackup: checkpoint completed
>> pg_basebackup: write-ahead log start point: 0/2E000028 on timeline 1
>> pg_basebackup: starting background WAL receiver
>> pg_basebackup: error: could not send replication command
>> "START_REPLICATION": ERROR: replication slot "test" does not exist
>> pg_basebackup: backup worker (0) created
>> pg_basebackup: backup worker (1) created
>> pg_basebackup: write-ahead log end point: 0/2E000100
>> pg_basebackup: waiting for background process to finish streaming ...
>> pg_basebackup: error: child thread exited with error 1
>> [edb(at)localhost bin]$
>>
>> backup folder not cleaned
>>
>> [edb(at)localhost bin]$
>> [edb(at)localhost bin]$
>> [edb(at)localhost bin]$
>> [edb(at)localhost bin]$ ls /home/edb/Desktop/backup
>> backup_label global pg_dynshmem pg_ident.conf pg_multixact
>> pg_replslot pg_snapshots pg_stat_tmp pg_tblspc PG_VERSION pg_xact
>> postgresql.conf
>> base pg_commit_ts pg_hba.conf pg_logical pg_notify
>> pg_serial pg_stat pg_subtrans pg_twophase pg_wal
>> postgresql.auto.conf
>> [edb(at)localhost bin]$
>>
>>
>>
>>
>> If the same case is executed without the parallel backup patch then the
>> backup folder is cleaned after the error is displayed.
>>
>> [edb(at)localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/ -T
>> /home/edb/tbl1=/home/edb/tbl_res -S test999
>> pg_basebackup: initiating base backup, waiting for checkpoint to complete
>> pg_basebackup: checkpoint completed
>> pg_basebackup: write-ahead log start point: 0/2B000028 on timeline 1
>> pg_basebackup: starting background WAL receiver
>> pg_basebackup: error: could not send replication command
>> "START_REPLICATION": ERROR: replication slot "test999" does not exist
>> pg_basebackup: write-ahead log end point: 0/2B000100
>> pg_basebackup: waiting for background process to finish streaming ...
>> pg_basebackup: error: child process exited with exit code 1
>> *pg_basebackup: removing data directory " /home/edb/Desktop/backup"*
>> pg_basebackup: changes to tablespace directories will not be undone
>>
>
>
> Hi Asif
>
> A similar case is when DB Server is shut down while the Parallel Backup is
> in progress then the correct error is displayed but then the backup folder
> is not cleaned and leaves a corrupt backup. I think one bug fix will solve
> all these cases where clean up is not done when parallel backup is failed.
>
> [edb(at)localhost bin]$
> [edb(at)localhost bin]$
> [edb(at)localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/ -j
> 8
> pg_basebackup: initiating base backup, waiting for checkpoint to complete
> pg_basebackup: checkpoint completed
> pg_basebackup: write-ahead log start point: 0/C1000028 on timeline 1
> pg_basebackup: starting background WAL receiver
> pg_basebackup: created temporary replication slot "pg_basebackup_57337"
> pg_basebackup: backup worker (0) created
> pg_basebackup: backup worker (1) created
> pg_basebackup: backup worker (2) created
> pg_basebackup: backup worker (3) created
> pg_basebackup: backup worker (4) created
> pg_basebackup: backup worker (5) created
> pg_basebackup: backup worker (6) created
> pg_basebackup: backup worker (7) created
> pg_basebackup: error: could not read COPY data: server closed the
> connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> pg_basebackup: error: could not read COPY data: server closed the
> connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> [edb(at)localhost bin]$
> [edb(at)localhost bin]$
>
> Same case when executed on pg_basebackup without the Parallel backup patch
> then proper clean up is done.
>
> [edb(at)localhost bin]$
> [edb(at)localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/
> pg_basebackup: initiating base backup, waiting for checkpoint to complete
> pg_basebackup: checkpoint completed
> pg_basebackup: write-ahead log start point: 0/C5000028 on timeline 1
> pg_basebackup: starting background WAL receiver
> pg_basebackup: created temporary replication slot "pg_basebackup_5590"
> pg_basebackup: error: could not read COPY data: server closed the
> connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> pg_basebackup: removing contents of data directory
> "/home/edb/Desktop/backup/"
> [edb(at)localhost bin]$
>
> Thanks
>
>
>>
>> On Fri, Apr 3, 2020 at 1:46 PM Asif Rehman <asifr(dot)rehman(at)gmail(dot)com>
>> wrote:
>>
>>>
>>>
>>> On Thu, Apr 2, 2020 at 8:45 PM Robert Haas <robertmhaas(at)gmail(dot)com>
>>> wrote:
>>>
>>>> On Thu, Apr 2, 2020 at 11:17 AM Asif Rehman <asifr(dot)rehman(at)gmail(dot)com>
>>>> wrote:
>>>> >> Why would you need to do that? As long as the process where
>>>> >> STOP_BACKUP can do the check, that seems good enough.
>>>> >
>>>> > Yes, but the user will get the error only after the STOP_BACKUP, not
>>>> while the backup is
>>>> > in progress. So if the backup is a large one, early error detection
>>>> would be much beneficial.
>>>> > This is the current behavior of non-parallel backup as well.
>>>>
>>>> Because non-parallel backup does not feature early detection of this
>>>> error, it is not necessary to make parallel backup do so. Indeed, it
>>>> is undesirable. If you want to fix that problem, do it on a separate
>>>> thread in a separate patch. A patch proposing to make parallel backup
>>>> inconsistent in behavior with non-parallel backup will be rejected, at
>>>> least if I have anything to say about it.
>>>>
>>>> TBH, fixing this doesn't seem like an urgent problem to me. The
>>>> current situation is not great, but promotions ought to be relatively
>>>> infrequent, so I'm not sure it's a huge problem in practice. It is
>>>> also worth considering whether the right fix is to figure out how to
>>>> make that case actually work, rather than just making it fail quicker.
>>>> I don't currently understand the reason for the prohibition so I can't
>>>> express an intelligent opinion on what the right answer is here, but
>>>> it seems like it ought to be investigated before somebody goes and
>>>> builds a bunch of infrastructure to make the error more timely.
>>>>
>>>
>>> Non-parallel backup already does the early error checking. I only
>>> intended
>>>
>>> to make parallel behave the same as non-parallel here. So, I agree with
>>>
>>> you that the behavior of parallel backup should be consistent with the
>>>
>>> non-parallel one. Please see the code snippet below from
>>>
>>> basebackup.c:sendDir()
>>>
>>>
>>> /*
>>>>
>>>> * Check if the postmaster has signaled us to exit, and abort with an
>>>>
>>>> * error in that case. The error handler further up will call
>>>>
>>>> * do_pg_abort_backup() for us. Also check that if the backup was
>>>>
>>>> * started while still in recovery, the server wasn't promoted.
>>>>
>>>> * do_pg_stop_backup() will check that too, but it's better to stop
>>>>
>>>> * the backup early than continue to the end and fail there.
>>>>
>>>> */
>>>>
>>>> CHECK_FOR_INTERRUPTS();
>>>>
>>>> *if* (RecoveryInProgress() != backup_started_in_recovery)
>>>>
>>>> ereport(ERROR,
>>>>
>>>> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>>>>
>>>> errmsg("the standby was promoted during online backup"),
>>>>
>>>> errhint("This means that the backup being taken is corrupt "
>>>>
>>>> "and should not be used. "
>>>>
>>>> "Try taking another online backup.")));
>>>>
>>>>
>>>> > Okay, then I will add the shared state. And since we are adding the
>>>> shared state, we can use
>>>> > that for throttling, progress-reporting and standby early error
>>>> checking.
>>>>
>>>> Please propose a grammar here for all the new replication commands you
>>>> plan to add before going and implement everything. That will make it
>>>> easier to hash out the design without forcing you to keep changing the
>>>> code. Your design should include a sketch of how several sets of
>>>> coordinating backends taking several concurrent parallel backups will
>>>> end up with one shared state per parallel backup.
>>>>
>>>> > There are two possible options:
>>>> >
>>>> > (1) Server may generate a unique ID i.e. BackupID=<unique_string> OR
>>>> > (2) (Preferred Option) Use the WAL start location as the BackupID.
>>>> >
>>>> > This BackupID should be given back as a response to start backup
>>>> command. All client workers
>>>> > must append this ID to all parallel backup replication commands. So
>>>> that we can use this identifier
>>>> > to search for that particular backup. Does that sound good?
>>>>
>>>> Using the WAL start location as the backup ID seems like it might be
>>>> problematic -- could a single checkpoint not end up as the start
>>>> location for multiple backups started at the same time? Whether that's
>>>> possible now or not, it seems unwise to hard-wire that assumption into
>>>> the wire protocol.
>>>>
>>>> I was thinking that perhaps the client should generate a unique backup
>>>> ID, e.g. leader does:
>>>>
>>>> START_BACKUP unique_backup_id [options]...
>>>>
>>>> And then others do:
>>>>
>>>> JOIN_BACKUP unique_backup_id
>>>>
>>>> My thought is that you will have a number of shared memory structure
>>>> equal to max_wal_senders, each one large enough to hold the shared
>>>> state for one backup. The shared state will include
>>>> char[NAMEDATALEN-or-something] which will be used to hold the backup
>>>> ID. START_BACKUP would allocate one and copy the name into it;
>>>> JOIN_BACKUP would search for one by name.
>>>>
>>>> If you want to generate the name on the server side, then I suppose
>>>> START_BACKUP would return a result set that includes the backup ID,
>>>> and clients would have to specify that same backup ID when invoking
>>>> JOIN_BACKUP. The rest would stay the same. I am not sure which way is
>>>> better. Either way, the backup ID should be something long and hard to
>>>> guess, not e.g. the leader processes' PID. I think we should generate
>>>> it using pg_strong_random, say 8 or 16 bytes, and then hex-encode the
>>>> result to get a string. That way there's almost no risk of two backup
>>>> IDs colliding accidentally, and even if we somehow had a malicious
>>>> user trying to screw up somebody else's parallel backup by choosing a
>>>> colliding backup ID, it would be pretty hard to have any success. A
>>>> user with enough access to do that sort of thing can probably cause a
>>>> lot worse problems anyway, but it seems pretty easy to guard against
>>>> intentional collisions robustly here, so I think we should.
>>>>
>>>>
>>> Okay so If we are to add another replication command ‘JOIN_BACKUP
>>> unique_backup_id’
>>> to make workers find the relevant shared state. There won't be any need
>>> for changing
>>> the grammar for any other command. The START_BACKUP can return the
>>> unique_backup_id
>>> in the result set.
>>>
>>> I am thinking of the following struct for shared state:
>>>
>>>> *typedef* *struct*
>>>>
>>>> {
>>>>
>>>> *char* backupid[NAMEDATALEN];
>>>>
>>>> XLogRecPtr startptr;
>>>>
>>>>
>>>> slock_t lock;
>>>>
>>>> int64 throttling_counter;
>>>>
>>>> *bool* backup_started_in_recovery;
>>>>
>>>> } BackupSharedState;
>>>>
>>>>
>>> The shared state structure entries would be maintained by a shared hash
>>> table.
>>> There will be one structure per parallel backup. Since a single parallel
>>> backup
>>> can engage more than one wal sender, so I think max_wal_senders might be
>>> a little
>>> too much; perhaps max_wal_senders/2 since there will be at least 2
>>> connections
>>> per parallel backup? Alternatively, we can set a new GUC that defines
>>> the maximum
>>> number of for concurrent parallel backups i.e.
>>> ‘max_concurent_backups_allowed = 10’
>>> perhaps, or we can make it user-configurable.
>>>
>>> The key would be “backupid=hex_encode(pg_random_strong(16))”
>>>
>>> Checking for Standby Promotion:
>>> At the START_BACKUP command, we initialize
>>> BackupSharedState.backup_started_in_recovery
>>> and keep checking it whenever send_file () is called to send a new file.
>>>
>>> Throttling:
>>> BackupSharedState.throttling_counter - The throttling logic remains the
>>> same
>>> as for non-parallel backup with the exception that multiple threads will
>>> now be
>>> updating it. So in parallel backup, this will represent the overall
>>> bytes that
>>> have been transferred. So the workers would sleep if they have exceeded
>>> the
>>> limit. Hence, the shared state carries a lock to safely update the
>>> throttling
>>> value atomically.
>>>
>>> Progress Reporting:
>>> Although I think we should add progress-reporting for parallel backup as
>>> a
>>> separate patch. The relevant entries for progress-reporting such as
>>> ‘backup_total’ and ‘backup_streamed’ would be then added to this
>>> structure
>>> as well.
>>>
>>>
>>> Grammar:
>>> There is a change in the resultset being returned for START_BACKUP
>>> command;
>>> unique_backup_id is added. Additionally, JOIN_BACKUP replication command
>>> is
>>> added. SEND_FILES has been renamed to SEND_FILE. There are no other
>>> changes
>>> to the grammar.
>>>
>>> START_BACKUP [LABEL '<label>'] [FAST]
>>> - returns startptr, tli, backup_label, unique_backup_id
>>> STOP_BACKUP [NOWAIT]
>>> - returns startptr, tli, backup_label
>>> JOIN_BACKUP ‘unique_backup_id’
>>> - attaches a shared state identified by ‘unique_backup_id’ to a
>>> backend process.
>>>
>>> LIST_TABLESPACES [PROGRESS]
>>> LIST_FILES [TABLESPACE]
>>> LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
>>> SEND_FILE '(' FILE ')' [NOVERIFY_CHECKSUMS]
>>>
>>>
>>> --
>>> Asif Rehman
>>> Highgo Software (Canada/China/Pakistan)
>>> URL : www.highgo.ca
>>>
>>>
>>
>> --
>> Regards
>> ====================================
>> Kashif Zeeshan
>> Lead Quality Assurance Engineer / Manager
>>
>> EnterpriseDB Corporation
>> The Enterprise Postgres Company
>>
>>
>
> --
> Regards
> ====================================
> Kashif Zeeshan
> Lead Quality Assurance Engineer / Manager
>
> EnterpriseDB Corporation
> The Enterprise Postgres Company
>
>

--
--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Attachment Content-Type Size
parallel_backup_v11.zip application/zip 44.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2020-04-07 16:44:26 Re: pgsql: Generate backup manifests for base backups, and validate them.
Previous Message Robert Haas 2020-04-07 16:41:07 Re: Improving connection scalability: GetSnapshotData()