Re: WIP/PoC for parallel backup

From: Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>
To: Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>
Cc: asifr(dot)rehman(at)gmail(dot)com, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP/PoC for parallel backup
Date: 2020-03-25 07:22:11
Message-ID: CAKcux6=CaMrV04i9_aA2pATpHU8DK6_rzR-eyRC90Y8HjcyqRA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Asif,

While testing further I observed parallel backup is not able to take backup
of standby server.

mkdir /tmp/archive_dir
echo "archive_mode='on'">> data/postgresql.conf
echo "archive_command='cp %p /tmp/archive_dir/%f'">> data/postgresql.conf

./pg_ctl -D data -l logs start
./pg_basebackup -p 5432 -Fp -R -D /tmp/slave

echo "primary_conninfo='host=127.0.0.1 port=5432 user=edb'">>
/tmp/slave/postgresql.conf
echo "restore_command='cp /tmp/archive_dir/%f %p'">>
/tmp/slave/postgresql.conf
echo "promote_trigger_file='/tmp/failover.log'">> /tmp/slave/postgresql.conf

./pg_ctl -D /tmp/slave -l /tmp/slave_logs -o "-p 5433" start -c

[edb(at)localhost bin]$ ./psql postgres -p 5432 -c "select
pg_is_in_recovery();"
pg_is_in_recovery
-------------------
f
(1 row)

[edb(at)localhost bin]$ ./psql postgres -p 5433 -c "select
pg_is_in_recovery();"
pg_is_in_recovery
-------------------
t
(1 row)

*[edb(at)localhost bin]$ ./pg_basebackup -p 5433 -D /tmp/bkp_s --jobs
6pg_basebackup: error: could not list backup files: ERROR: the standby was
promoted during online backupHINT: This means that the backup being taken
is corrupt and should not be used. Try taking another online
backup.pg_basebackup: removing data directory "/tmp/bkp_s"*

#same is working fine without parallel backup
[edb(at)localhost bin]$ ./pg_basebackup -p 5433 -D /tmp/bkp_s --jobs 1
[edb(at)localhost bin]$ ls /tmp/bkp_s/PG_VERSION
/tmp/bkp_s/PG_VERSION

Thanks & Regards,
Rajkumar Raghuwanshi

On Thu, Mar 19, 2020 at 4:11 PM Rajkumar Raghuwanshi <
rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com> wrote:

> Hi Asif,
>
> In another scenarios, bkp data is corrupted for tablespace. again this is
> not reproducible everytime,
> but If I am running the same set of commands I am getting the same error.
>
> [edb(at)localhost bin]$ ./pg_ctl -D data -l logfile start
> waiting for server to start.... done
> server started
> [edb(at)localhost bin]$
> [edb(at)localhost bin]$ mkdir /tmp/tblsp
> [edb(at)localhost bin]$ ./psql postgres -p 5432 -c "create tablespace tblsp
> location '/tmp/tblsp';"
> CREATE TABLESPACE
> [edb(at)localhost bin]$ ./psql postgres -p 5432 -c "create database testdb
> tablespace tblsp;"
> CREATE DATABASE
> [edb(at)localhost bin]$ ./psql testdb -p 5432 -c "create table testtbl (a
> text);"
> CREATE TABLE
> [edb(at)localhost bin]$ ./psql testdb -p 5432 -c "insert into testtbl values
> ('parallel_backup with tablespace');"
> INSERT 0 1
> [edb(at)localhost bin]$ ./pg_basebackup -p 5432 -D /tmp/bkp -T
> /tmp/tblsp=/tmp/tblsp_bkp --jobs 2
> [edb(at)localhost bin]$ ./pg_ctl -D /tmp/bkp -l /tmp/bkp_logs -o "-p 5555"
> start
> waiting for server to start.... done
> server started
> [edb(at)localhost bin]$ ./psql postgres -p 5555 -c "select * from
> pg_tablespace where spcname like 'tblsp%' or spcname = 'pg_default'";
> oid | spcname | spcowner | spcacl | spcoptions
> -------+------------+----------+--------+------------
> 1663 | pg_default | 10 | |
> 16384 | tblsp | 10 | |
> (2 rows)
>
> [edb(at)localhost bin]$ ./psql testdb -p 5555 -c "select * from testtbl";
> psql: error: could not connect to server: FATAL:
> "pg_tblspc/16384/PG_13_202003051/16385" is not a valid data directory
> DETAIL: File "pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION" is
> missing.
> [edb(at)localhost bin]$
> [edb(at)localhost bin]$ ls
> data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
> data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
> [edb(at)localhost bin]$ ls
> /tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
> ls: cannot access
> /tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION: No such file or
> directory
>
>
> Thanks & Regards,
> Rajkumar Raghuwanshi
>
>
> On Mon, Mar 16, 2020 at 6:19 PM Rajkumar Raghuwanshi <
> rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com> wrote:
>
>> Hi Asif,
>>
>> On testing further, I found when taking backup with -R, pg_basebackup
>> crashed
>> this crash is not consistently reproducible.
>>
>> [edb(at)localhost bin]$ ./psql postgres -p 5432 -c "create table test (a
>> text);"
>> CREATE TABLE
>> [edb(at)localhost bin]$ ./psql postgres -p 5432 -c "insert into test values
>> ('parallel_backup with -R recovery-conf');"
>> INSERT 0 1
>> [edb(at)localhost bin]$ ./pg_basebackup -p 5432 -j 2 -D /tmp/test_bkp/bkp -R
>> Segmentation fault (core dumped)
>>
>> stack trace looks the same as it was on earlier reported crash with
>> tablespace.
>> --stack trace
>> [edb(at)localhost bin]$ gdb -q -c core.37915 pg_basebackup
>> Loaded symbols for /lib64/libnss_files.so.2
>> Core was generated by `./pg_basebackup -p 5432 -j 2 -D /tmp/test_bkp/bkp
>> -R'.
>> Program terminated with signal 11, Segmentation fault.
>> #0 0x00000000004099ee in worker_get_files (wstate=0xc1e458) at
>> pg_basebackup.c:3175
>> 3175 backupinfo->curr = fetchfile->next;
>> Missing separate debuginfos, use: debuginfo-install
>> keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64
>> libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64
>> openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
>> (gdb) bt
>> #0 0x00000000004099ee in worker_get_files (wstate=0xc1e458) at
>> pg_basebackup.c:3175
>> #1 0x0000000000408a9e in worker_run (arg=0xc1e458) at
>> pg_basebackup.c:2715
>> #2 0x0000003921a07aa1 in start_thread (arg=0x7f72207c0700) at
>> pthread_create.c:301
>> #3 0x00000039212e8c4d in clone () at
>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
>> (gdb)
>>
>> Thanks & Regards,
>> Rajkumar Raghuwanshi
>>
>>
>> On Mon, Mar 16, 2020 at 2:14 PM Jeevan Chalke <
>> jeevan(dot)chalke(at)enterprisedb(dot)com> wrote:
>>
>>> Hi Asif,
>>>
>>>
>>>> Thanks Rajkumar. I have fixed the above issues and have rebased the
>>>> patch to the latest master (b7f64c64).
>>>> (V9 of the patches are attached).
>>>>
>>>
>>> I had a further review of the patches and here are my few observations:
>>>
>>> 1.
>>> +/*
>>> + * stop_backup() - ends an online backup
>>> + *
>>> + * The function is called at the end of an online backup. It sends out
>>> pg_control
>>> + * file, optionally WAL segments and ending WAL location.
>>> + */
>>>
>>> Comments seem out-dated.
>>>
>>> 2. With parallel jobs, maxrate is now not supported. Since we are now
>>> asking
>>> data in multiple threads throttling seems important here. Can you please
>>> explain why have you disabled that?
>>>
>>> 3. As we are always fetching a single file and as Robert suggested, let
>>> rename
>>> SEND_FILES to SEND_FILE instead.
>>>
>>> 4. Does this work on Windows? I mean does pthread_create() work on
>>> Windows?
>>> I asked this as I see that pgbench has its own implementation for
>>> pthread_create() for WIN32 but this patch doesn't.
>>>
>>> 5. Typos:
>>> tablspace => tablespace
>>> safly => safely
>>>
>>> 6. parallel_backup_run() needs some comments explaining the states it
>>> goes
>>> through PB_* states.
>>>
>>> 7.
>>> + case PB_FETCH_REL_FILES: /* fetch files from server */
>>> + if (backupinfo->activeworkers == 0)
>>> + {
>>> + backupinfo->backupstate = PB_STOP_BACKUP;
>>> + free_filelist(backupinfo);
>>> + }
>>> + break;
>>> + case PB_FETCH_WAL_FILES: /* fetch WAL files from server
>>> */
>>> + if (backupinfo->activeworkers == 0)
>>> + {
>>> + backupinfo->backupstate = PB_BACKUP_COMPLETE;
>>> + }
>>> + break;
>>>
>>> Why free_filelist() is not called in PB_FETCH_WAL_FILES case?
>>>
>>> Thanks
>>> --
>>> Jeevan Chalke
>>> Associate Database Architect & Team Lead, Product Development
>>> EnterpriseDB Corporation
>>> The Enterprise PostgreSQL Company
>>>
>>> Phone: +91 20 66449694
>>>
>>> Website: www.enterprisedb.com
>>> EnterpriseDB Blog: http://blogs.enterprisedb.com/
>>> Follow us on Twitter: http://www.twitter.com/enterprisedb
>>>
>>> This e-mail message (and any attachment) is intended for the use of the
>>> individual or entity to whom it is addressed. This message contains
>>> information from EnterpriseDB Corporation that may be privileged,
>>> confidential, or exempt from disclosure under applicable law. If you are
>>> not the intended recipient or authorized to receive this for the intended
>>> recipient, any use, dissemination, distribution, retention, archiving, or
>>> copying of this communication is strictly prohibited. If you have received
>>> this e-mail in error, please notify the sender immediately by reply e-mail
>>> and delete this message.
>>>
>>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2020-03-25 08:14:51 Re: pg_upgrade fails with non-standard ACL
Previous Message Amit Kapila 2020-03-25 06:53:23 Re: backup manifests