Re: intermittent failures in Cygwin from select_parallel tests

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: intermittent failures in Cygwin from select_parallel tests
Date: 2017-06-15 21:09:48
Message-ID: CA+TgmoYx6mynFL5aDs7+xjZ01QrY8smp+Zr=5BxAseODZdZPWA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jun 15, 2017 at 5:06 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I wrote:
>> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>>> I think you're right. So here's a theory:
>
>>> 1. The ERROR mapping the DSM segment is just a case of the worker the
>>> losing a race, and isn't a bug.
>
>> I concur that this is a possibility,
>
> Actually, no, it isn't. I tried to reproduce the problem by inserting
> a sleep into ParallelWorkerMain, and could not. After digging around
> in the code, I realize that the leader process *can not* exit the
> parallel query before the workers start, at least not without hitting
> an error first, which is not happening in these examples. The reason
> is that nodeGather cannot deem the query done until it's seen EOF on
> each tuple queue, which it cannot see until each worker has attached
> to and then detached from the associated shm_mq.

Oh. That's sad. It definitely has to wait for any tuple queues that
have been attached to be detached, but it would be better if it didn't
have to wait for processes that haven't even attached yet.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2017-06-15 21:12:01 Re: WIP: Data at rest encryption
Previous Message Robert Haas 2017-06-15 21:08:23 pg_waldump command line arguments