Re: Broken order-of-operations in parallel query latch manipulation

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Broken order-of-operations in parallel query latch manipulation
Date: 2016-08-01 04:07:25
Message-ID: CAA4eK1J8cP0ceOBZodcbBfkPz7zhM28-OwJH+pyeYHjsaznefQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Aug 1, 2016 at 1:58 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Both shm_mq.c and nodeGather.c contain instances of this coding pattern:
>
> WaitLatch(MyLatch, WL_LATCH_SET, 0);
> CHECK_FOR_INTERRUPTS();
> ResetLatch(MyLatch);
>
> I believe this is wrong and the CHECK_FOR_INTERRUPTS needs to be before
> or after the two latch operations. As-is, if the reason somebody set
> our latch was to get us to notice that a CHECK_FOR_INTERRUPTS condition
> happened, there's a race condition where we'd fail to realize that.
>

I could see that in nodeGather.c, it might fail to notice the SetLatch
done by worker process or spuriously woken up due to SetLatch for some
unrelated reason. However, I don't see what problem it can cause
apart from one extra loop cycle where it will try to process the tuple
when actually there is no tuple in the queue.

> Other places such as ProcWaitForSignal() do it that way; only recently
> introduced (and unproven in the field) code has it like this.
>
> Anyone want to argue it's okay as-is?
>

I don't want to argue to the change to make it same as what we do at
other places, but want to understand the problem you are seeing with
current coding pattern in nodeGather.c

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Khandekar 2016-08-01 05:14:56 Re: asynchronous and vectorized execution
Previous Message Thomas Munro 2016-08-01 00:52:06 Re: Combining hash values