| From: | Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> | 
|---|---|
| To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com> | 
| Cc: | Andres Freund <andres(at)2ndquadrant(dot)com>, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Fabrízio Mello <fabriziomello(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> | 
| Subject: | Re: Parallel Seq Scan | 
| Date: | 2015-03-16 04:10:15 | 
| Message-ID: | 550657A7.3000902@lab.ntt.co.jp | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
On 13-03-2015 PM 11:03, Amit Kapila wrote:
> On Fri, Mar 13, 2015 at 7:15 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>
>> I don't think this is the right fix; the point of that code is to
>> remove a tuple queue from the funnel when it gets detached, which is a
>> correct thing to want to do.  funnel->nextqueue should always be less
>> than funnel->nqueues; how is that failing to be the case here?
>>
> 
> I could not reproduce the issue, neither the exact scenario is
> mentioned in mail.  However what I think can lead to funnel->nextqueue
> greater than funnel->nqueues is something like below:
> 
> Assume 5 queues, so value of funnel->nqueues will be 5 and
> assume value of funnel->nextqueue is 2, so now let us say 4 workers
> got detached one-by-one, so for such a case it will always go in else loop
> and will never change funnel->nextqueue whereas value of funnel->nqueues
> will become 1.
> 
Or if the just-detached queue happens to be the last one, we'll make
shm_mq_receive() to read from a potentially already-detached queue in the
immediately next iteration. That seems to be caused by not having updated the
funnel->nextqueue. With the returned value being SHM_MQ_DETACHED, we'll again
try to remove it from the queue. In this case, it causes the third argument to
memcpy be negative and hence the segfault.
I can't seem to really figure out the other problem of waiting forever in
WaitLatch() but I had managed to make it go away with:
-        if (funnel->nextqueue == waitpos)
+        if (result != SHM_MQ_DETACHED && funnel->nextqueue == waitpos)
By the way, you can try reproducing this with the example I posted on Friday.
Thanks,
Amit
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Kouhei Kaigai | 2015-03-16 06:50:25 | Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API) | 
| Previous Message | Amit Kapila | 2015-03-16 03:15:39 | Re: pg_rewind in contrib |