Re: WIP/PoC for parallel backup

From: Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>
To: Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>
Cc: asifr(dot)rehman(at)gmail(dot)com, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP/PoC for parallel backup
Date: 2020-03-16 12:49:44
Message-ID: CAKcux6navgSDCM6Yre9Zzh+Yqc16QTByct72wVnoTTJhBpD37w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Asif,

On testing further, I found when taking backup with -R, pg_basebackup
crashed
this crash is not consistently reproducible.

[edb(at)localhost bin]$ ./psql postgres -p 5432 -c "create table test (a
text);"
CREATE TABLE
[edb(at)localhost bin]$ ./psql postgres -p 5432 -c "insert into test values
('parallel_backup with -R recovery-conf');"
INSERT 0 1
[edb(at)localhost bin]$ ./pg_basebackup -p 5432 -j 2 -D /tmp/test_bkp/bkp -R
Segmentation fault (core dumped)

stack trace looks the same as it was on earlier reported crash with
tablespace.
--stack trace
[edb(at)localhost bin]$ gdb -q -c core.37915 pg_basebackup
Loaded symbols for /lib64/libnss_files.so.2
Core was generated by `./pg_basebackup -p 5432 -j 2 -D /tmp/test_bkp/bkp
-R'.
Program terminated with signal 11, Segmentation fault.
#0 0x00000000004099ee in worker_get_files (wstate=0xc1e458) at
pg_basebackup.c:3175
3175 backupinfo->curr = fetchfile->next;
Missing separate debuginfos, use: debuginfo-install
keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64
libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64
openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0 0x00000000004099ee in worker_get_files (wstate=0xc1e458) at
pg_basebackup.c:3175
#1 0x0000000000408a9e in worker_run (arg=0xc1e458) at pg_basebackup.c:2715
#2 0x0000003921a07aa1 in start_thread (arg=0x7f72207c0700) at
pthread_create.c:301
#3 0x00000039212e8c4d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:115
(gdb)

Thanks & Regards,
Rajkumar Raghuwanshi

On Mon, Mar 16, 2020 at 2:14 PM Jeevan Chalke <
jeevan(dot)chalke(at)enterprisedb(dot)com> wrote:

> Hi Asif,
>
>
>> Thanks Rajkumar. I have fixed the above issues and have rebased the patch
>> to the latest master (b7f64c64).
>> (V9 of the patches are attached).
>>
>
> I had a further review of the patches and here are my few observations:
>
> 1.
> +/*
> + * stop_backup() - ends an online backup
> + *
> + * The function is called at the end of an online backup. It sends out
> pg_control
> + * file, optionally WAL segments and ending WAL location.
> + */
>
> Comments seem out-dated.
>
> 2. With parallel jobs, maxrate is now not supported. Since we are now
> asking
> data in multiple threads throttling seems important here. Can you please
> explain why have you disabled that?
>
> 3. As we are always fetching a single file and as Robert suggested, let
> rename
> SEND_FILES to SEND_FILE instead.
>
> 4. Does this work on Windows? I mean does pthread_create() work on Windows?
> I asked this as I see that pgbench has its own implementation for
> pthread_create() for WIN32 but this patch doesn't.
>
> 5. Typos:
> tablspace => tablespace
> safly => safely
>
> 6. parallel_backup_run() needs some comments explaining the states it goes
> through PB_* states.
>
> 7.
> + case PB_FETCH_REL_FILES: /* fetch files from server */
> + if (backupinfo->activeworkers == 0)
> + {
> + backupinfo->backupstate = PB_STOP_BACKUP;
> + free_filelist(backupinfo);
> + }
> + break;
> + case PB_FETCH_WAL_FILES: /* fetch WAL files from server */
> + if (backupinfo->activeworkers == 0)
> + {
> + backupinfo->backupstate = PB_BACKUP_COMPLETE;
> + }
> + break;
>
> Why free_filelist() is not called in PB_FETCH_WAL_FILES case?
>
> Thanks
> --
> Jeevan Chalke
> Associate Database Architect & Team Lead, Product Development
> EnterpriseDB Corporation
> The Enterprise PostgreSQL Company
>
> Phone: +91 20 66449694
>
> Website: www.enterprisedb.com
> EnterpriseDB Blog: http://blogs.enterprisedb.com/
> Follow us on Twitter: http://www.twitter.com/enterprisedb
>
> This e-mail message (and any attachment) is intended for the use of the
> individual or entity to whom it is addressed. This message contains
> information from EnterpriseDB Corporation that may be privileged,
> confidential, or exempt from disclosure under applicable law. If you are
> not the intended recipient or authorized to receive this for the intended
> recipient, any use, dissemination, distribution, retention, archiving, or
> copying of this communication is strictly prohibited. If you have received
> this e-mail in error, please notify the sender immediately by reply e-mail
> and delete this message.
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message James Coleman 2020-03-16 13:08:36 Re: improve transparency of bitmap-only heap scans
Previous Message Peter Eisentraut 2020-03-16 12:49:26 Re: adding partitioned tables to publications