PITR on DROP DATABASE, deleting of the database directory despite the recovery_target_time set before.

From: Nicolas Lutic <n(dot)lutic(at)loxodata(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: PITR on DROP DATABASE, deleting of the database directory despite the recovery_target_time set before.
Date: 2019-11-18 10:48:37
Message-ID: aac5c3e6-2805-ba06-143b-070478aa9434@loxodata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Hackers,

A customer reported a strange behaviour on a PITR restoration. 
After a drop database, he tried to recover the data on the last inserted
transaction by using the recovery_target_time.
The issue is the database is present in the system catalog but the
directory was still deleted.
Here the technical information of the database
version 11
default  postgresql.conf except for this options
    wal_level = replica
    archive_mode = on
    archive_command = 'cp %p /tmp/wal_archive/%f '
    log_statement = 'all'
    log_min_messages = debug5

  
The following method was used 

* create cluster

* create database

* create 1 table 

* create 1 index on 1 column

* insert 1 rows

* backup with pg_base_backup

* insert 2 rows

* drop database

* stop instance

* found the last inserted transaction timestamp(|'2019-11-13
11:49:08.413744+01'|) before drop database

* replace $datadir by a pg_base_backup archive

* edit recovery.conf

* |restore_command = 'cp /tmp/wal_archive/%f "%p"'|

* |recovery_target_time = '2019-11-13 11:49:08.413744+01'|

* |recovery_target_inclusive = true|

* |restart cluster|

|
|
|
|
|I tried to understand what's happening, when we analyse the
postgresql.log (log_min_message = debug5), we can see that  |
|
|
|the recovery stopped before transaction 574 (repository database) so at
transaction 573 being the last insert, but the database directory was
still deleted.|
|
|
|
|
|
|
|2019-11-13 11:55:12.732 CET [30666] DEBUG:  remove KnownAssignedXid 572|
|
|
|2019-11-13 11:55:12.732 CET [30666] CONTEXT:  WAL redo at 0/3000178 for
Transaction/COMMIT: 2019-11-13 11:49:08.248928+01|
|
|
|2019-11-13 11:55:12.732 CET [30666] DEBUG:  record known xact 573
latestObservedXid 572|
|
|
|2019-11-13 11:55:12.732 CET [30666] CONTEXT:  WAL redo at 0/30001A0 for
Heap/INSERT: off 3|
|
|
|2019-11-13 11:55:12.732 CET [30666] DEBUG:  record known xact 573
latestObservedXid 573|
|
|
|2019-11-13 11:55:12.732 CET [30666] CONTEXT:  WAL redo at 0/30001F8 for
Btree/INSERT_LEAF: off 3|
|
|
|2019-11-13 11:55:12.732 CET [30666] DEBUG:  record known xact 573
latestObservedXid 573|
|
|
|2019-11-13 11:55:12.732 CET [30666] CONTEXT:  WAL redo at 0/3000238 for
Transaction/COMMIT: 2019-11-13 11:49:08.413744+01|
|
|
|2019-11-13 11:55:12.732 CET [30666] DEBUG:  record known xact 573
latestObservedXid 573|
|
|
|2019-11-13 11:55:12.732 CET [30666] CONTEXT:  WAL redo at 0/3000238 for
Transaction/COMMIT: 2019-11-13 11:49:08.413744+01|
|
|
|2019-11-13 11:55:12.732 CET [30666] DEBUG:  remove KnownAssignedXid 573|
|
|
|2019-11-13 11:55:12.732 CET [30666] CONTEXT:  WAL redo at 0/3000238 for
Transaction/COMMIT: 2019-11-13 11:49:08.413744+01|
|
|
|2019-11-13 11:55:12.732 CET [30666] DEBUG:  record known xact 574
latestObservedXid 573|
|
|
|2019-11-13 11:55:12.732 CET [30666] CONTEXT:  WAL redo at 0/3000260 for
Heap/DELETE: off 4 KEYS_UPDATED|
|
|
|2019-11-13 11:55:12.732 CET [30666] DEBUG:  prune KnownAssignedXids to 574|
|
|
|2019-11-13 11:55:12.732 CET [30666] CONTEXT:  WAL redo at 0/3000730 for
Standby/RUNNING_XACTS: nextXid 575 latestCompletedXid 573
oldestRunningXid 574; 1 xacts: 574|
|
|
|2019-11-13 11:55:12.732 CET [30666] DEBUG:  record known xact 574
latestObservedXid 574|
|
|
|2019-11-13 11:55:12.732 CET [30666] CONTEXT:  WAL redo at 0/30007D8 for
Database/DROP: dir 16384/1663|
|
|
|2019-11-13 11:55:12.738 CET [30666] LOG:  recovery stopping before
commit of transaction 574, time 2019-11-13 11:49:10.683426+01|
|
|
|2019-11-13 11:55:12.738 CET [30666] LOG:  recovery has paused|
|
|
|
|
|
|

By analysing the wal file with pg_waldump 

|rmgr: Heap        len (rec/tot):     54/   198, tx:        572, lsn:
0/03000028, prev 0/020000F8, desc: INSERT off 2, blkref #0: rel
1663/16384/16385 blk 0 FPW|
|
|
|rmgr: Btree       len (rec/tot):     53/   133, tx:        572, lsn:
0/030000F0, prev 0/03000028, desc: INSERT_LEAF off 2, blkref #0: rel
1663/16384/16388 blk 1 FPW|
|
|
|rmgr: Transaction len (rec/tot):     34/    34, tx:        572, lsn:
0/03000178, prev 0/030000F0, desc: COMMIT 2019-11-13 11:49:08.248928 CET|
|
|
|rmgr: Heap        len (rec/tot):     87/    87, tx:        573, lsn:
0/030001A0, prev 0/03000178, desc: INSERT off 3, blkref #0: rel
1663/16384/16385 blk 0|
|
|
|rmgr: Btree       len (rec/tot):     64/    64, tx:        573, lsn:
0/030001F8, prev 0/030001A0, desc: INSERT_LEAF off 3, blkref #0: rel
1663/16384/16388 blk 1|
|
|
|rmgr: Transaction len (rec/tot):     34/    34, tx:        573, lsn:
0/03000238, prev 0/030001F8, desc: COMMIT 2019-11-13 11:49:08.413744 CET|
|
|
|rmgr: Heap        len (rec/tot):     59/  1227, tx:        574, lsn:
0/03000260, prev 0/03000238, desc: DELETE off 4 KEYS_UPDATED , blkref
#0: rel 1664/0/1262 blk 0 FPW|
|
|
|rmgr: Standby     len (rec/tot):     54/    54, tx:          0, lsn:
0/03000730, prev 0/03000260, desc: RUNNING_XACTS nextXid 575
latestCompletedXid 573 oldestRunningXid 574; 1 xacts: 574|
|
|
|rmgr: XLOG        len (rec/tot):    106/   106, tx:          0, lsn:
0/03000768, prev 0/03000730, desc: CHECKPOINT_ONLINE redo 0/3000730; tli
1; prev tli 1; fpw true; xid 0:575; oid 24576; multi 1; offset 0; oldest
xid 561 in DB 1; oldest multi 1 in DB 1; oldest/newest commit timestamp
xid: 0/0; oldest running xid 574; online|
|
|
|rmgr: Database    len (rec/tot):     34/    34, tx:        574, lsn:
0/030007D8, prev 0/03000768, desc: DROP dir 16384/1663|
|
|
|rmgr: Transaction len (rec/tot):     66/    66, tx:        574, lsn:
0/03000800, prev 0/030007D8, desc: COMMIT 2019-11-13 11:49:10.683426
CET; inval msgs: catcache 21; sync|
|
|

We notice that the following log

|rmgr: Database    len (rec/tot):     34/    34, tx:        574, lsn:
0/030007D8, prev 0/03000768, desc: DROP dir 16384/1663|

is executed between the last commit that we are interested inand the
next record with a timestamp

|rmgr: Transaction len (rec/tot):     66/    66, tx:        574, lsn:
0/03000800, prev 0/030007D8, desc: COMMIT 2019-11-13 11:49:10.683426
CET; inval msgs: catcache 21; sync|
|
|

We understand that the drop database command is not transactional but
the drop dir is attached to the xact whose xid has a commit with
timestime out of recovery_target_time bound.
On the other hand, DBA role is to determine which at which xact recovery
should stop and define recovery_target_xid rather than
recovery_target_time.
Humans are prone to use natural things such as time to define "when" to
stop or start things.
We know that this rarely happens in production, because you can't drop a
database if users are still connected. But with the new force drop
database option, it might be a reasonable choice to improve the
situation with that recovery_target_time directive.

It turns out there are two different choices we can make : 
    

*     Change recovery behaviour in that case to prevent all xact
operation to perform until COMMIT timestamp is checked against
recovery_time bound (but it seems to be difficult as
state https://www.postgresql.org/message-id/flat/20141125160629.GC21475%40msg.df7cb.dewhich
also identifies the problem and tries to give some solutions.  Maybe
another way, as a trivial guess (all apologises) is to buffer
immediate xacts until we have the commit for each and apply the
whole buffer xact once the timestamp known (and checked agains
recovery_target_time value);

*     The other way to improve this is to update PostgreSQL
documentation  by specifying that recovery_target_time cannot be
used in this case.There should be multiple places where it can be
stated. The best one (if only one) seems to be in 
https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=doc/src/sgml/config.sgml;h=f83770350eda5625179526300c652f23ff29c9fe;hb=HEAD#l3400 

We are willing to help on this case either with code patching or
documentation improvement.

Best regards,

--
LOXODATA
Nicolas Lutic

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2019-11-18 11:32:32 Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
Previous Message Amit Langote 2019-11-18 08:53:11 Re: adding partitioned tables to publications