PATCH: pg_restore parallel-execution-deadlock issue

From: Armin Schöffmann <armin(dot)schoeffmann(at)aegaeon(dot)de>
To: pgsql-hackers(at)postgresql(dot)org <pgsql-hackers(at)postgresql(dot)org>
Subject: PATCH: pg_restore parallel-execution-deadlock issue
Date: 2016-04-05 00:28:45
Message-ID: zarafa.570306bd.3418.074bf1420d8f2ba2@root.aegaeon.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

worthy hackers,
I propose the below patches to parallels.c and pg_backup_utils.c fixing deadlocks in pg_restore (windows only) if running more than 2 parallel jobs.
This problem was reported by me earlier this year.
http://www.postgresql.org/message-id/20160307161619.25731.78653@wrigleys.postgresql.org

- Winsock's "recv(...)" called in piperead() is a blocking read by default, therefor, signalizing termEvent as used in ShutdownWorkersHard() is not enough to make worker-threads go away.
We need a preceding shutdown(pipeWrite, SD_BOTH), first, to abort blocking IO in this case.
Otherwise, the main-thread will wait forever, if more than one additional worker is active (e.g. option -j3) and a premature EOF occurs in the input-file.

Findings in pg_backup_utils.c/ parallels.c, which could impact other tools, too:

- threads created with _beginthreadex need to be exited by either a "return exitcode" or "_endthreadex(exitcode)". It might be obsolete in fire-and-forget-scenarios, but it matters in other cases.
As of current, pg_backup_utils uses EndThread to retire additional worker-threads., which are spawned by _beginthreadex in parallel.c. The corresponding call for ExitThread would be CreateThread,
nevertheless, _beginthreadex is the correct choice here, as we do call-out into CRT and need to retain the thread-handle for after-death synchronization with the main-thread.
The thread-handle needs to be closed explicitly.

If this is not the correct place to discuss patches, I'd be glad if somebody can notify the tool's maintainer, to take a look into it.

best regards,
Armin Schöffmann.

--
Aegaeon technologies GmbH
phone: +49.941.8107344
fax: +49.941.8107356

Legal disclaimer:
http://aegaeon.de/disclaimer/email_all_int.txt

parallel.c

@@ -350,7 +350,8 @@ static void
ShutdownWorkersHard(ParallelState *pstate)
{
-#ifndef WIN32
+
int i;

+#ifndef WIN32
signal(SIGPIPE, SIG_IGN);

@@ -367,4 +368,7 @@ ShutdownWorkersHard(ParallelState *pstate)
/* The workers monitor this event via checkAborting(). */
SetEvent(termEvent);
+
+ for (i = 0; i < pstate->numWorkers; i++)
+ shutdown(pstate->parallelSlot[i].pipeWrite, SD_BOTH);
#endif

@@ -408,5 +412,8 @@ WaitForTerminatingWorkers(ParallelState *pstate)
for (j = 0; j < pstate->numWorkers; j++)
if (pstate->parallelSlot[j].hThread == hThread)
+ {
slot = &(pstate->parallelSlot[j]);
+ CloseHandle(hThread);
+ }

free(lpHandles);

pg_backup_utils.c

@@ -120,5 +120,5 @@ exit_nicely(int code)
#ifdef WIN32
if (parallel_init_done && GetCurrentThreadId() != mainThreadId)
- ExitThread(code);
+ _endthreadex(code);
#endif

Attachment Content-Type Size
parallel.c.diff application/octet-stream 1.1 KB
pg_backup_utils.c.diff application/octet-stream 377 bytes

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2016-04-05 01:00:41 Re: Yet another small patch - reorderbuffer.c:1099
Previous Message Michael Paquier 2016-04-05 00:18:28 Re: pgsql: Avoid archiving XLOG_RUNNING_XACTS on idle server