Quick Links

Re: WIP/PoC for parallel backup

From:	Kashif Zeeshan <kashif(dot)zeeshan(at)enterprisedb(dot)com>
To:	Asif Rehman <asifr(dot)rehman(at)gmail(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP/PoC for parallel backup
Date:	2020-04-03 10:01:17
Message-ID:	CAKfXphpurkUohwB=cHoBkKC-axVWx+38vj-tDM8MbutOJH-8uQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi Asif

When a non-existent slot is used with tablespace then correct error is
displayed but then the backup folder is not cleaned and leaves a corrupt
backup.

Steps
=======

edb(at)localhost bin]$
[edb(at)localhost bin]$ mkdir /home/edb/tbl1
[edb(at)localhost bin]$ mkdir /home/edb/tbl_res
[edb(at)localhost bin]$
postgres=# create tablespace tbl1 location '/home/edb/tbl1';
CREATE TABLESPACE
postgres=#
postgres=# create table t1 (a int) tablespace tbl1;
CREATE TABLE
postgres=# insert into t1 values(100);
INSERT 0 1
postgres=# insert into t1 values(200);
INSERT 0 1
postgres=# insert into t1 values(300);
INSERT 0 1
postgres=#

[edb(at)localhost bin]$
[edb(at)localhost bin]$ ./pg_basebackup -v -j 2 -D /home/edb/Desktop/backup/
-T /home/edb/tbl1=/home/edb/tbl_res -S test
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2E000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: error: could not send replication command
"START_REPLICATION": ERROR: replication slot "test" does not exist
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: write-ahead log end point: 0/2E000100
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: error: child thread exited with error 1
[edb(at)localhost bin]$

backup folder not cleaned

[edb(at)localhost bin]$
[edb(at)localhost bin]$
[edb(at)localhost bin]$
[edb(at)localhost bin]$ ls /home/edb/Desktop/backup
backup_label global pg_dynshmem pg_ident.conf pg_multixact
pg_replslot pg_snapshots pg_stat_tmp pg_tblspc PG_VERSION pg_xact
postgresql.conf
base pg_commit_ts pg_hba.conf pg_logical pg_notify
pg_serial pg_stat pg_subtrans pg_twophase pg_wal
postgresql.auto.conf
[edb(at)localhost bin]$

If the same case is executed without the parallel backup patch then the
backup folder is cleaned after the error is displayed.

[edb(at)localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/ -T
/home/edb/tbl1=/home/edb/tbl_res -S test999
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2B000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: error: could not send replication command
"START_REPLICATION": ERROR: replication slot "test999" does not exist
pg_basebackup: write-ahead log end point: 0/2B000100
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: error: child process exited with exit code 1
*pg_basebackup: removing data directory " /home/edb/Desktop/backup"*
pg_basebackup: changes to tablespace directories will not be undone

On Fri, Apr 3, 2020 at 1:46 PM Asif Rehman <asifr(dot)rehman(at)gmail(dot)com> wrote:

>
>
> On Thu, Apr 2, 2020 at 8:45 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
>> On Thu, Apr 2, 2020 at 11:17 AM Asif Rehman <asifr(dot)rehman(at)gmail(dot)com>
>> wrote:
>> >> Why would you need to do that? As long as the process where
>> >> STOP_BACKUP can do the check, that seems good enough.
>> >
>> > Yes, but the user will get the error only after the STOP_BACKUP, not
>> while the backup is
>> > in progress. So if the backup is a large one, early error detection
>> would be much beneficial.
>> > This is the current behavior of non-parallel backup as well.
>>
>> Because non-parallel backup does not feature early detection of this
>> error, it is not necessary to make parallel backup do so. Indeed, it
>> is undesirable. If you want to fix that problem, do it on a separate
>> thread in a separate patch. A patch proposing to make parallel backup
>> inconsistent in behavior with non-parallel backup will be rejected, at
>> least if I have anything to say about it.
>>
>> TBH, fixing this doesn't seem like an urgent problem to me. The
>> current situation is not great, but promotions ought to be relatively
>> infrequent, so I'm not sure it's a huge problem in practice. It is
>> also worth considering whether the right fix is to figure out how to
>> make that case actually work, rather than just making it fail quicker.
>> I don't currently understand the reason for the prohibition so I can't
>> express an intelligent opinion on what the right answer is here, but
>> it seems like it ought to be investigated before somebody goes and
>> builds a bunch of infrastructure to make the error more timely.
>>
>
> Non-parallel backup already does the early error checking. I only intended
>
> to make parallel behave the same as non-parallel here. So, I agree with
>
> you that the behavior of parallel backup should be consistent with the
>
> non-parallel one. Please see the code snippet below from
>
> basebackup.c:sendDir()
>
>
> /*
>>
>> * Check if the postmaster has signaled us to exit, and abort with an
>>
>> * error in that case. The error handler further up will call
>>
>> * do_pg_abort_backup() for us. Also check that if the backup was
>>
>> * started while still in recovery, the server wasn't promoted.
>>
>> * do_pg_stop_backup() will check that too, but it's better to stop
>>
>> * the backup early than continue to the end and fail there.
>>
>> */
>>
>> CHECK_FOR_INTERRUPTS();
>>
>> *if* (RecoveryInProgress() != backup_started_in_recovery)
>>
>> ereport(ERROR,
>>
>> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>>
>> errmsg("the standby was promoted during online backup"),
>>
>> errhint("This means that the backup being taken is corrupt "
>>
>> "and should not be used. "
>>
>> "Try taking another online backup.")));
>>
>>
>> > Okay, then I will add the shared state. And since we are adding the
>> shared state, we can use
>> > that for throttling, progress-reporting and standby early error
>> checking.
>>
>> Please propose a grammar here for all the new replication commands you
>> plan to add before going and implement everything. That will make it
>> easier to hash out the design without forcing you to keep changing the
>> code. Your design should include a sketch of how several sets of
>> coordinating backends taking several concurrent parallel backups will
>> end up with one shared state per parallel backup.
>>
>> > There are two possible options:
>> >
>> > (1) Server may generate a unique ID i.e. BackupID=<unique_string> OR
>> > (2) (Preferred Option) Use the WAL start location as the BackupID.
>> >
>> > This BackupID should be given back as a response to start backup
>> command. All client workers
>> > must append this ID to all parallel backup replication commands. So
>> that we can use this identifier
>> > to search for that particular backup. Does that sound good?
>>
>> Using the WAL start location as the backup ID seems like it might be
>> problematic -- could a single checkpoint not end up as the start
>> location for multiple backups started at the same time? Whether that's
>> possible now or not, it seems unwise to hard-wire that assumption into
>> the wire protocol.
>>
>> I was thinking that perhaps the client should generate a unique backup
>> ID, e.g. leader does:
>>
>> START_BACKUP unique_backup_id [options]...
>>
>> And then others do:
>>
>> JOIN_BACKUP unique_backup_id
>>
>> My thought is that you will have a number of shared memory structure
>> equal to max_wal_senders, each one large enough to hold the shared
>> state for one backup. The shared state will include
>> char[NAMEDATALEN-or-something] which will be used to hold the backup
>> ID. START_BACKUP would allocate one and copy the name into it;
>> JOIN_BACKUP would search for one by name.
>>
>> If you want to generate the name on the server side, then I suppose
>> START_BACKUP would return a result set that includes the backup ID,
>> and clients would have to specify that same backup ID when invoking
>> JOIN_BACKUP. The rest would stay the same. I am not sure which way is
>> better. Either way, the backup ID should be something long and hard to
>> guess, not e.g. the leader processes' PID. I think we should generate
>> it using pg_strong_random, say 8 or 16 bytes, and then hex-encode the
>> result to get a string. That way there's almost no risk of two backup
>> IDs colliding accidentally, and even if we somehow had a malicious
>> user trying to screw up somebody else's parallel backup by choosing a
>> colliding backup ID, it would be pretty hard to have any success. A
>> user with enough access to do that sort of thing can probably cause a
>> lot worse problems anyway, but it seems pretty easy to guard against
>> intentional collisions robustly here, so I think we should.
>>
>>
> Okay so If we are to add another replication command ‘JOIN_BACKUP
> unique_backup_id’
> to make workers find the relevant shared state. There won't be any need
> for changing
> the grammar for any other command. The START_BACKUP can return the
> unique_backup_id
> in the result set.
>
> I am thinking of the following struct for shared state:
>
>> *typedef* *struct*
>>
>> {
>>
>> *char* backupid[NAMEDATALEN];
>>
>> XLogRecPtr startptr;
>>
>>
>> slock_t lock;
>>
>> int64 throttling_counter;
>>
>> *bool* backup_started_in_recovery;
>>
>> } BackupSharedState;
>>
>>
> The shared state structure entries would be maintained by a shared hash
> table.
> There will be one structure per parallel backup. Since a single parallel
> backup
> can engage more than one wal sender, so I think max_wal_senders might be a
> little
> too much; perhaps max_wal_senders/2 since there will be at least 2
> connections
> per parallel backup? Alternatively, we can set a new GUC that defines the
> maximum
> number of for concurrent parallel backups i.e.
> ‘max_concurent_backups_allowed = 10’
> perhaps, or we can make it user-configurable.
>
> The key would be “backupid=hex_encode(pg_random_strong(16))”
>
> Checking for Standby Promotion:
> At the START_BACKUP command, we initialize
> BackupSharedState.backup_started_in_recovery
> and keep checking it whenever send_file () is called to send a new file.
>
> Throttling:
> BackupSharedState.throttling_counter - The throttling logic remains the
> same
> as for non-parallel backup with the exception that multiple threads will
> now be
> updating it. So in parallel backup, this will represent the overall bytes
> that
> have been transferred. So the workers would sleep if they have exceeded the
> limit. Hence, the shared state carries a lock to safely update the
> throttling
> value atomically.
>
> Progress Reporting:
> Although I think we should add progress-reporting for parallel backup as a
> separate patch. The relevant entries for progress-reporting such as
> ‘backup_total’ and ‘backup_streamed’ would be then added to this structure
> as well.
>
>
> Grammar:
> There is a change in the resultset being returned for START_BACKUP
> command;
> unique_backup_id is added. Additionally, JOIN_BACKUP replication command is
> added. SEND_FILES has been renamed to SEND_FILE. There are no other changes
> to the grammar.
>
> START_BACKUP [LABEL '<label>'] [FAST]
> - returns startptr, tli, backup_label, unique_backup_id
> STOP_BACKUP [NOWAIT]
> - returns startptr, tli, backup_label
> JOIN_BACKUP ‘unique_backup_id’
> - attaches a shared state identified by ‘unique_backup_id’ to a backend
> process.
>
> LIST_TABLESPACES [PROGRESS]
> LIST_FILES [TABLESPACE]
> LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
> SEND_FILE '(' FILE ')' [NOVERIFY_CHECKSUMS]
>
>
> --
> Asif Rehman
> Highgo Software (Canada/China/Pakistan)
> URL : www.highgo.ca
>
>

--
Regards
====================================
Kashif Zeeshan
Lead Quality Assurance Engineer / Manager

EnterpriseDB Corporation
The Enterprise Postgres Company

In response to

Re: WIP/PoC for parallel backup at 2020-04-03 08:45:23 from Asif Rehman

Responses

Re: WIP/PoC for parallel backup at 2020-04-07 04:15:00 from Jeevan Chalke
Re: WIP/PoC for parallel backup at 2020-04-07 11:03:46 from Kashif Zeeshan

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Michał Wadas	2020-04-03 11:45:31	Proposal: is_castable
Previous Message	Dilip Kumar	2020-04-03 09:58:11	Re: User Interface for WAL usage data