ParallelContexts can get confused about which worker is which

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: ParallelContexts can get confused about which worker is which
Date: 2015-10-30 10:24:25
Message-ID: CA+TgmoaO-KYSMxhwisDcDs3mwmMuQwx8v19wBj7XcEC-xiSdcg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

While testing last night, I discovered a serious case of brain fade in
parallel.c; the same conceptual mistake has also spread to
nodeGather.c. parallel.c creates an array of ParallelWorkerInfo
structures, which are defined like this:

typedef struct ParallelWorkerInfo
{
BackgroundWorkerHandle *bgwhandle;
shm_mq_handle *error_mqh;
int32 pid;
} ParallelWorkerInfo;

These structures are typically accessed as pcxt->worker[i]. The
trouble is that pcxt->worker[i].bgwhandle is the bgwhandler for the
i'th worker to be registered, while pcxt->worker[i].error_mqh is the
error_mqh for the i'th worker to attach to the dynamic shared memory
segment. But those might be totally different processes.

This happens to mostly work, because the postmaster will probably
start the processes in the same order that they are registered, and
the operating system will probably schedule them in the same order the
postmaster starts them. And if you only have one worker, then you're
fine! It also seems to work out that if all workers exit cleanly, any
shuffling of the background worker handles relative to the error queue
handles is harmless. But it's still pretty broken.

There seem to be two ways to fix this. One is to keep the bgwhandle
objects in a separate array from the error_mqh objects and reconstruct
after the fact what the mapping between those two sets of indexes is.
This solution looks complicated to code and generally pretty annoying
to get right. The other way to fix this is to pass down the index
that the leader assigns to any given worker, and have the worker use
that index instead of allocating its own separate index after
connecting to the DSM segment. Unfortunately, there's not really a
good way to pass that additional information down to the worker right
now, but we could fix that pretty easily by adding an additional field
to the BackgroundWorker structure, which the worker would then be able
to access via MyBgworkerEntry. I suggest something like this:

--- a/src/include/postmaster/bgworker.h
+++ b/src/include/postmaster/bgworker.h
@@ -74,6 +74,7 @@ typedef enum
#define BGW_DEFAULT_RESTART_INTERVAL 60
#define BGW_NEVER_RESTART -1
#define BGW_MAXLEN 64
+#define BGW_EXTRALEN 128

typedef struct BackgroundWorker
{
@@ -85,6 +86,7 @@ typedef struct BackgroundWorker
char bgw_library_name[BGW_MAXLEN]; /* only if
bgw_main is NULL */
char bgw_function_name[BGW_MAXLEN]; /* only if
bgw_main is NULL */
Datum bgw_main_arg;
+ char bgw_extra[BGW_EXTRALEN];
pid_t bgw_notify_pid; /* SIGUSR1 this backend on start/stop */
} BackgroundWorker;

The ability to pass down a little more information from the
registering process to the background worker seems like it would be
useful for more than just parallelism, so I imagine this change might
be generally welcomed. Parallel workers would use the first four
bytes of bgw_extra to store the worker index, and other users of the
background worker facility could use it for whatever they like.

Comments?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2015-10-30 10:30:17 Re: [HACKERS] max_worker_processes on the standby
Previous Message Ashutosh Bapat 2015-10-30 10:19:08 Re: Getting sorted data from foreign server