Re: backup manifests

From: tushar <tushar(dot)ahuja(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Suraj Kharage <suraj(dot)kharage(at)enterprisedb(dot)com>
Cc: Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>, Tels <nospam-pg-abuse(at)bloodgate(dot)com>, David Steele <david(at)pgmasters(dot)net>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>
Subject: Re: backup manifests
Date: 2020-03-04 13:51:03
Message-ID: 08fb5011-091a-0590-9ca6-01449a4c8779@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

There is a scenario in which i add something inside the pg_tablespace
directory , i am getting an error like-

pg_validatebackup: * manifest_checksum =
77ddacb4e7e02e2b880792a19a3adf09266dd88553dd15cfd0c22caee7d9cc04
pg_validatebackup: error: "pg_tblspc/16385/*PG_13_202002271*/test" is
present on disk but not in the manifest

but if i remove 'PG_13_202002271 ' directory then there is no error

[centos(at)tushar-ldap-docker bin]$ ./pg_validatebackup data
pg_validatebackup: * manifest_checksum =
77ddacb4e7e02e2b880792a19a3adf09266dd88553dd15cfd0c22caee7d9cc04
pg_validatebackup: backup successfully verified

Steps to reproduce -
--connect to psql terminal   , create a tablespace
postgres=# \! mkdir /tmp/my_tblspc
postgres=# create tablespace tbs location '/tmp/my_tblspc';
CREATE TABLESPACE
postgres=# \q

--run pg_basebackup
[centos(at)tushar-ldap-docker bin]$ ./pg_basebackup -D data_dir   -T
/tmp/my_tblspc/=/tmp/new_my_tblspc
[centos(at)tushar-ldap-docker bin]$
[centos(at)tushar-ldap-docker bin]$ ls /tmp/new_my_tblspc/
PG_13_202002271

--create a new file under PG_13_* folder
[centos(at)tushar-ldap-docker bin]$ touch
/tmp/new_my_tblspc/PG_13_202002271/test
[centos(at)tushar-ldap-docker bin]$

--run pg_validatebackup ,Getting an error which looks expected
[centos(at)tushar-ldap-docker bin]$ ./pg_validatebackup data_dir/
pg_validatebackup: * manifest_checksum =
3951308eab576906ebdb002ff00ca313b2c1862592168c1f5f7ecf051ac07907
pg_validatebackup: error: "pg_tblspc/16386/PG_13_202002271/test" is
present on disk but not in the manifest
[centos(at)tushar-ldap-docker bin]$

--remove the added file
[centos(at)tushar-ldap-docker bin]$ rm -rf  
/tmp/new_my_tblspc/PG_13_202002271/test

--run pg_validatebackup , working fine
[centos(at)tushar-ldap-docker bin]$ ./pg_validatebackup data_dir/
pg_validatebackup: * manifest_checksum =
3951308eab576906ebdb002ff00ca313b2c1862592168c1f5f7ecf051ac07907
pg_validatebackup: backup successfully verified
[centos(at)tushar-ldap-docker bin]$

--remove the folder PG_13*
[centos(at)tushar-ldap-docker bin]$ rm -rf  
/tmp/new_my_tblspc/PG_13_202002271/
[centos(at)tushar-ldap-docker bin]$
[centos(at)tushar-ldap-docker bin]$ ls /tmp/new_my_tblspc/

--run pg_validatebackup ,   No error reported  ?
[centos(at)tushar-ldap-docker bin]$ ./pg_validatebackup data_dir/
pg_validatebackup: * manifest_checksum =
3951308eab576906ebdb002ff00ca313b2c1862592168c1f5f7ecf051ac07907
pg_validatebackup: backup successfully verified
[centos(at)tushar-ldap-docker bin]$

Start the server -

[centos(at)tushar-ldap-docker bin]$ ./pg_ctl -D data_dir/ start -o '-p 9033'
waiting for server to start....2020-03-04 19:18:54.839 IST [13097] LOG: 
starting PostgreSQL 13devel on x86_64-pc-linux-gnu, compiled by gcc
(GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit
2020-03-04 19:18:54.840 IST [13097] LOG:  listening on IPv6 address
"::1", port 9033
2020-03-04 19:18:54.840 IST [13097] LOG:  listening on IPv4 address
"127.0.0.1", port 9033
2020-03-04 19:18:54.842 IST [13097] LOG:  listening on Unix socket
"/tmp/.s.PGSQL.9033"
2020-03-04 19:18:54.843 IST [13097] LOG:  could not open directory
"pg_tblspc/16386/PG_13_202002271": No such file or directory
2020-03-04 19:18:54.845 IST [13098] LOG:  database system was
interrupted; last known up at 2020-03-04 19:14:50 IST
2020-03-04 19:18:54.937 IST [13098] LOG:  could not open directory
"pg_tblspc/16386/PG_13_202002271": No such file or directory
2020-03-04 19:18:54.939 IST [13098] LOG:  could not open directory
"pg_tblspc/16386/PG_13_202002271": No such file or directory
2020-03-04 19:18:54.939 IST [13098] LOG:  redo starts at 0/18000028
2020-03-04 19:18:54.939 IST [13098] LOG:  consistent recovery state
reached at 0/18000100
2020-03-04 19:18:54.939 IST [13098] LOG:  redo done at 0/18000100
2020-03-04 19:18:54.941 IST [13098] LOG:  could not open directory
"pg_tblspc/16386/PG_13_202002271": No such file or directory
2020-03-04 19:18:54.984 IST [13097] LOG:  database system is ready to
accept connections
 done
server started
[centos(at)tushar-ldap-docker bin]$

regards,

On 3/4/20 3:51 PM, tushar wrote:
> Another scenario, in which if we modify Manifest-Checksum" value from
> backup_manifest file , we are not getting an error
>
> [centos(at)tushar-ldap-docker bin]$ ./pg_validatebackup data/
> pg_validatebackup: * manifest_checksum =
> 28d082921650d0ae881de8ceb122c8d2af5f449f51ecfb446827f7f49f91f65d
> pg_validatebackup: backup successfully verified
>
> open backup_manifest file and replace
>
> "Manifest-Checksum":
> "8d082921650d0ae881de8ceb122c8d2af5f449f51ecfb446827f7f49f91f65d"}
> with
> "Manifest-Checksum": "Hello World"}
>
> rerun the pg_validatebackup
>
> [centos(at)tushar-ldap-docker bin]$ ./pg_validatebackup data/
> pg_validatebackup: * manifest_checksum = Hello World
> pg_validatebackup: backup successfully verified
>
> regards,
>
> On 3/4/20 3:26 PM, tushar wrote:
>> Hi,
>> Another observation , if i change the ownership of a file which is
>> under global/ directory
>> i.e
>>
>> [root(at)tushar-ldap-docker global]# chown enterprisedb 2396
>>
>> and run the pg_validatebackup command, i am getting this message -
>>
>> [centos(at)tushar-ldap-docker bin]$ ./pg_validatebackup gggg
>> pg_validatebackup: * manifest_checksum =
>> e8cb007bcc9c0deab6eff51cd8d9d9af6af35b86e02f3055e60e70e56737e877
>> pg_validatebackup: error: could not open file "global/2396":
>> Permission denied
>> *** Error in `./pg_validatebackup': double free or corruption
>> (!prev): 0x0000000001850ba0 ***
>> ======= Backtrace: =========
>> /lib64/libc.so.6(+0x81679)[0x7fa2248e3679]
>> ./pg_validatebackup[0x401f4c]
>> /lib64/libc.so.6(__libc_start_main+0xf5)[0x7fa224884505]
>> ./pg_validatebackup[0x402049]
>> ======= Memory map: ========
>> 00400000-00415000 r-xp 00000000 fd:03 4044545
>> /home/centos/pg13_bk_mani/edb/edbpsql/bin/pg_validatebackup
>> 00614000-00615000 r--p 00014000 fd:03 4044545
>> /home/centos/pg13_bk_mani/edb/edbpsql/bin/pg_validatebackup
>> 00615000-00616000 rw-p 00015000 fd:03 4044545
>> /home/centos/pg13_bk_mani/edb/edbpsql/bin/pg_validatebackup
>> 017f3000-01878000 rw-p 00000000 00:00
>> 0                                  [heap]
>> 7fa218000000-7fa218021000 rw-p 00000000 00:00 0
>> 7fa218021000-7fa21c000000 ---p 00000000 00:00 0
>> 7fa21e122000-7fa21e137000 r-xp 00000000 fd:03 141697
>> /usr/lib64/libgcc_s-4.8.5-20150702.so.1
>> 7fa21e137000-7fa21e336000 ---p 00015000 fd:03 141697
>> /usr/lib64/libgcc_s-4.8.5-20150702.so.1
>> 7fa21e336000-7fa21e337000 r--p 00014000 fd:03 141697
>> /usr/lib64/libgcc_s-4.8.5-20150702.so.1
>> 7fa21e337000-7fa21e338000 rw-p 00015000 fd:03 141697
>> /usr/lib64/libgcc_s-4.8.5-20150702.so.1
>> 7fa21e338000-7fa224862000 r--p 00000000 fd:03
>> 266442                     /usr/lib/locale/locale-archive
>> 7fa224862000-7fa224a25000 r-xp 00000000 fd:03
>> 134456                     /usr/lib64/libc-2.17.so
>> 7fa224a25000-7fa224c25000 ---p 001c3000 fd:03
>> 134456                     /usr/lib64/libc-2.17.so
>> 7fa224c25000-7fa224c29000 r--p 001c3000 fd:03
>> 134456                     /usr/lib64/libc-2.17.so
>> 7fa224c29000-7fa224c2b000 rw-p 001c7000 fd:03
>> 134456                     /usr/lib64/libc-2.17.so
>> 7fa224c2b000-7fa224c30000 rw-p 00000000 00:00 0
>> 7fa224c30000-7fa224c47000 r-xp 00000000 fd:03
>> 134485                     /usr/lib64/libpthread-2.17.so
>> 7fa224c47000-7fa224e46000 ---p 00017000 fd:03
>> 134485                     /usr/lib64/libpthread-2.17.so
>> 7fa224e46000-7fa224e47000 r--p 00016000 fd:03
>> 134485                     /usr/lib64/libpthread-2.17.so
>> 7fa224e47000-7fa224e48000 rw-p 00017000 fd:03
>> 134485                     /usr/lib64/libpthread-2.17.so
>> 7fa224e48000-7fa224e4c000 rw-p 00000000 00:00 0
>> 7fa224e4c000-7fa224e90000 r-xp 00000000 fd:03 4044478
>> /home/centos/pg13_bk_mani/edb/edbpsql/lib/libpq.so.5.13
>> 7fa224e90000-7fa225090000 ---p 00044000 fd:03 4044478
>> /home/centos/pg13_bk_mani/edb/edbpsql/lib/libpq.so.5.13
>> 7fa225090000-7fa225093000 r--p 00044000 fd:03 4044478
>> /home/centos/pg13_bk_mani/edb/edbpsql/lib/libpq.so.5.13
>> 7fa225093000-7fa225094000 rw-p 00047000 fd:03 4044478
>> /home/centos/pg13_bk_mani/edb/edbpsql/lib/libpq.so.5.13
>> 7fa225094000-7fa2250b6000 r-xp 00000000 fd:03
>> 130333                     /usr/lib64/ld-2.17.so
>> 7fa22527d000-7fa2252a2000 rw-p 00000000 00:00 0
>> 7fa2252b3000-7fa2252b5000 rw-p 00000000 00:00 0
>> 7fa2252b5000-7fa2252b6000 r--p 00021000 fd:03
>> 130333                     /usr/lib64/ld-2.17.so
>> 7fa2252b6000-7fa2252b7000 rw-p 00022000 fd:03
>> 130333                     /usr/lib64/ld-2.17.so
>> 7fa2252b7000-7fa2252b8000 rw-p 00000000 00:00 0
>> 7ffdf354f000-7ffdf3570000 rw-p 00000000 00:00
>> 0                          [stack]
>> 7ffdf3572000-7ffdf3574000 r-xp 00000000 00:00
>> 0                          [vdso]
>> ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00
>> 0                  [vsyscall]
>> Aborted
>> [centos(at)tushar-ldap-docker bin]$
>>
>>
>> I am getting the error message but along with "*** Error in
>> `./pg_validatebackup': double free or corruption (!prev):
>> 0x0000000001850ba0 ***"  messages
>>
>> Is this expected ?
>>
>> regards,
>>
>> On 3/3/20 8:19 PM, tushar wrote:
>>> On 3/3/20 4:04 PM, tushar wrote:
>>>> Thanks Robert.  After applying all the 5 patches (v8-00*) against
>>>> PG v13 (commit id -afb5465e0cfce7637066eaaaeecab30b0f23fbe3) ,
>>>
>>> There is a scenario where pg_validatebackup is not throwing an error
>>> if some file deleted from pg_wal/ folder and  but later at the time
>>> of restoring - we are getting an error
>>>
>>> [centos(at)tushar-ldap-docker bin]$ ./pg_basebackup  -D test1
>>>
>>> [centos(at)tushar-ldap-docker bin]$ ls test1/pg_wal/
>>> 000000010000000000000010  archive_status
>>>
>>> [centos(at)tushar-ldap-docker bin]$ rm -rf test1/pg_wal/*
>>>
>>> [centos(at)tushar-ldap-docker bin]$ ./pg_validatebackup test1
>>> pg_validatebackup: * manifest_checksum =
>>> 88f1ed995c83e86252466a2c88b3e660a69cfc76c169991134b101c4f16c9df7
>>> pg_validatebackup: backup successfully verified
>>>
>>> [centos(at)tushar-ldap-docker bin]$ ./pg_ctl -D test1 start -o '-p 3333'
>>> waiting for server to start....2020-03-02 20:05:22.732 IST [21441]
>>> LOG:  starting PostgreSQL 13devel on x86_64-pc-linux-gnu, compiled
>>> by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit
>>> 2020-03-02 20:05:22.733 IST [21441] LOG:  listening on IPv6 address
>>> "::1", port 3333
>>> 2020-03-02 20:05:22.733 IST [21441] LOG:  listening on IPv4 address
>>> "127.0.0.1", port 3333
>>> 2020-03-02 20:05:22.736 IST [21441] LOG:  listening on Unix socket
>>> "/tmp/.s.PGSQL.3333"
>>> 2020-03-02 20:05:22.739 IST [21442] LOG:  database system was
>>> interrupted; last known up at 2020-03-02 20:04:35 IST
>>> 2020-03-02 20:05:22.739 IST [21442] LOG:  creating missing WAL
>>> directory "pg_wal/archive_status"
>>> 2020-03-02 20:05:22.886 IST [21442] LOG:  invalid checkpoint record
>>> 2020-03-02 20:05:22.886 IST [21442] FATAL:  could not locate
>>> required checkpoint record
>>> 2020-03-02 20:05:22.886 IST [21442] HINT:  If you are restoring from
>>> a backup, touch
>>> "/home/centos/pg13_bk_mani/edb/edbpsql/bin/test1/recovery.signal"
>>> and add required recovery options.
>>>     If you are not restoring from a backup, try removing the file
>>> "/home/centos/pg13_bk_mani/edb/edbpsql/bin/test1/backup_label".
>>>     Be careful: removing
>>> "/home/centos/pg13_bk_mani/edb/edbpsql/bin/test1/backup_label" will
>>> result in a corrupt cluster if restoring from a backup.
>>> 2020-03-02 20:05:22.886 IST [21441] LOG:  startup process (PID
>>> 21442) exited with exit code 1
>>> 2020-03-02 20:05:22.886 IST [21441] LOG:  aborting startup due to
>>> startup process failure
>>> 2020-03-02 20:05:22.889 IST [21441] LOG:  database system is shut down
>>>  stopped waiting
>>> pg_ctl: could not start server
>>> Examine the log output.
>>> [centos(at)tushar-ldap-docker bin]$
>>>
>>
>

--
regards,tushar
EnterpriseDB https://www.enterprisedb.com/
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hamid Akhtar 2020-03-04 14:01:16 Re: Minor issues in .pgpass
Previous Message Arseny Sher 2020-03-04 13:29:44 Re: ERROR: subtransaction logged without previous top-level txn record