pg_restore crash when there is a failure before all child process is created

From: vignesh C <vignesh21(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: pg_restore crash when there is a failure before all child process is created
Date: 2020-01-01 03:50:39
Message-ID: CALDaNm1Luv-E3sarR+-unz-BjchquHHyfP+YC+2FS2pt_J+wxg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I found one crash in pg_restore, this occurs when there is a failure before
all the child workers are created. Back trace for the same is given below:
#0 0x00007f9c6d31e337 in raise () from /lib64/libc.so.6
#1 0x00007f9c6d31fa28 in abort () from /lib64/libc.so.6
#2 0x00007f9c6d317156 in __assert_fail_base () from /lib64/libc.so.6
#3 0x00007f9c6d317202 in __assert_fail () from /lib64/libc.so.6
#4 0x0000000000407c9e in WaitForTerminatingWorkers (pstate=0x14af7f0) at
parallel.c:515
#5 0x0000000000407bf9 in ShutdownWorkersHard (pstate=0x14af7f0) at
parallel.c:451
#6 0x0000000000407ae9 in archive_close_connection (code=1, arg=0x6315a0
<shutdown_info>) at parallel.c:368
#7 0x000000000041a7c7 in exit_nicely (code=1) at pg_backup_utils.c:99
#8 0x0000000000408180 in ParallelBackupStart (AH=0x14972e0) at
parallel.c:967
#9 0x000000000040a3dd in RestoreArchive (AHX=0x14972e0) at
pg_backup_archiver.c:661
#10 0x0000000000404125 in main (argc=6, argv=0x7ffd5146f308) at
pg_restore.c:443

The problem is like:

- The variable pstate->numWorkers is being set with the number of
workers initially in ParallelBackupStart.
- Then the workers are created one by one.
- Before creating all the process there is a failure.
- Then the parent terminates the child process and waits for all the
child process to get terminated.
- This function WaitForTerminatingWorkers checks if all process is
terminated by calling HasEveryWorkerTerminated.
- HasEveryWorkerTerminated will always return false because it will
check for the numWorkers rather than the actual forked process count and
hits the next assert "Assert(j < pstate->numWorkers);".

Attached patch has the fix for the same. Fixed it by setting
pstate->numWorkers with the actual worker count when the child process is
being created.

Thoughts?

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
0001-pg_restore-crash-when-there-is-a-failure-before-all-worker-creation.patch application/x-patch 2.7 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kohei KaiGai 2020-01-01 06:07:57 Re: TRUNCATE on foreign tables
Previous Message Kohei KaiGai 2020-01-01 02:46:11 TRUNCATE on foreign tables