Quick Links

Re: intermittent failures in Cygwin from select_parallel tests

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: intermittent failures in Cygwin from select_parallel tests
Date:	2017-06-27 02:06:40
Message-ID:	CAA4eK1+eSgrkhuT-JYiDtBP0zaLdgNJ8WyyGrpKvusyrMrbF2w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Mon, Jun 26, 2017 at 8:09 PM, Andrew Dunstan
<andrew(dot)dunstan(at)2ndquadrant(dot)com> wrote:
>
>
> On 06/26/2017 10:36 AM, Amit Kapila wrote:
>> On Fri, Jun 23, 2017 at 9:12 AM, Andrew Dunstan
>> <andrew(dot)dunstan(at)2ndquadrant(dot)com> wrote:
>>>
>>> On 06/22/2017 10:24 AM, Tom Lane wrote:
>>>> Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com> writes:
>>>>> Please let me know if there are tests I can run. I missed your earlier
>>>>> request in this thread, sorry about that.
>>>> That earlier request is still valid. Also, if you can reproduce the
>>>> symptom that lorikeet just showed and get a stack trace from the
>>>> (hypothetical) postmaster core dump, that would be hugely valuable.
>>>>
>>>>
>>>
>>> See attached log and stacktrace
>>>
>> Is this all the log contents or is there something else? The attached
>> log looks strange to me in the sense that the first worker that gets
>> invoked is Parallel worker number 2 and it finds that somebody else
>> has already set the sender handle.
>>
>
>
>
> No, it's the end of the log file. I can rerun and get the whole log file
> if you like.
>

Okay, if possible, please share the same. Another way to get better
information is if we change the code of shm_mq_set_sender such that it
will hang if we hit Assertion condition. Once it hangs we can attach
a debugger and try to get some information. Basically, just change
the code of shm_mq_set_sender as below or something like that:

void
shm_mq_set_sender(shm_mq *mq, PGPROC *proc)
{
volatile shm_mq *vmq = mq;
PGPROC *receiver;

SpinLockAcquire(&mq->mq_mutex);
if (vmq->mq_sender != NULL)
{
while(1)
{
}
}

If we are able to hit the above code, then we can print the values of
mq_sender especially pid and see if the pid is of the current backend.
In theory, it should be of the different backend as this is the first
time we are setting the mq_sender.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Re: intermittent failures in Cygwin from select_parallel tests at 2017-06-26 14:39:49 from Andrew Dunstan

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Michael Paquier	2017-06-27 03:13:58	Re: pg_basebackup fails on Windows when using tablespace mapping
Previous Message	Michael Paquier	2017-06-27 01:22:01	Re: Setting pd_lower in GIN metapage