Re: Instability in select_parallel regression test

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Instability in select_parallel regression test
Date: 2017-02-20 02:22:04
Message-ID: CAA4eK1KAoMtX_SYsNEsyQ3Ld8on79Ci4c_OVc6CwSbGSXy=+Nw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Feb 19, 2017 at 8:32 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Sun, Feb 19, 2017 at 6:50 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>> To close the remaining gap, don't you think we can check slot->in_use
>> flag when generation number for handle and slot are same.
>
> That doesn't completely fix it either, because
> ForgetBackgroundWorker() also does
> BackgroundWorkerData->parallel_terminate_count++, which we might also
> fail to see, which would cause RegisterDynamicBackgroundWorker() to
> bail out early. There are CPU ordering effects to think about here,
> not just the order in which the operations are actually performed.
>

Sure, I think we can attempt to fix that as well by adding write
memory barrier in ForgetBackgroundWorker(). The main point is if we
keep any loose end in this area, then there is a chance that the
regression test select_parallel can still fail, if not now, then in
future. Another way could be that we can try to minimize the race
condition here and then adjust the select_parallel as suggested above
so that we don't see this failure.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2017-02-20 02:22:08 Re: Documentation improvements for partitioning
Previous Message Amit Langote 2017-02-20 02:16:52 dropping partitioned tables without CASCADE