Re: WIP: [[Parallel] Shared] Hash

From: Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Peter Geoghegan <pg(at)heroku(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: [[Parallel] Shared] Hash
Date: 2017-02-02 03:57:30
Message-ID: CAOGQiiNk5Uri44t+jS5Z3rMTEKshhcTdDEB33JRM=kYSXNwpYw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Feb 2, 2017 at 1:19 AM, Thomas Munro
<thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> On Thu, Feb 2, 2017 at 3:34 AM, Rafia Sabih
> <rafia(dot)sabih(at)enterprisedb(dot)com> wrote:
>> 9 | 62928.88 | 59077.909
>
> Thanks Rafia. At first glance this plan is using the Parallel Shared
> Hash in one place where it should pay off, that is loading the orders
> table, but the numbers are terrible. I noticed that it uses batch
> files and then has to increase the number of batch files, generating a
> bunch of extra work, even though it apparently overestimated the
> number of rows, though that's only ~9 seconds of ~60. I am
> investigating.

Hi Thomas,
Apart from the previously reported regression, there appear one more
issue in this set of patches. At times, running a query using parallel
hash it hangs up and all the workers including the master shows the
following backtrace,

#0 0x00003fff880c7de8 in __epoll_wait_nocancel () from /lib64/power8/libc.so.6
#1 0x00000000104e2718 in WaitEventSetWaitBlock (set=0x100157bde90,
cur_timeout=-1, occurred_events=0x3fffdbe69698, nevents=1) at
latch.c:998
#2 0x00000000104e255c in WaitEventSetWait (set=0x100157bde90,
timeout=-1, occurred_events=0x3fffdbe69698, nevents=1,
wait_event_info=134217745) at latch.c:950
#3 0x0000000010512970 in ConditionVariableSleep (cv=0x3ffd736e05a4,
wait_event_info=134217745) at condition_variable.c:132
#4 0x00000000104dbb1c in BarrierWaitSet (barrier=0x3ffd736e0594,
new_phase=1, wait_event_info=134217745) at barrier.c:97
#5 0x00000000104dbb9c in BarrierWait (barrier=0x3ffd736e0594,
wait_event_info=134217745) at barrier.c:127
#6 0x00000000103296a8 in ExecHashShrink (hashtable=0x3ffd73747dc0) at
nodeHash.c:1075
#7 0x000000001032c46c in dense_alloc_shared
(hashtable=0x3ffd73747dc0, size=40, shared=0x3fffdbe69eb8,
respect_work_mem=1 '\001') at nodeHash.c:2618
#8 0x000000001032a2f0 in ExecHashTableInsert
(hashtable=0x3ffd73747dc0, slot=0x100158f9e90, hashvalue=2389907270)
at nodeHash.c:1476
#9 0x0000000010327fd0 in MultiExecHash (node=0x100158f9800) at nodeHash.c:296
#10 0x0000000010306730 in MultiExecProcNode (node=0x100158f9800) at
execProcnode.c:577

The issue is not deterministic and straightforwardly reproducible,
sometimes after make clean, etc. queries run sometimes they hang up
again. I wanted to bring this to your notice hoping you might be
faster than me in picking up the exact reason behind this anomaly.

--
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2017-02-02 04:05:08 Re: WIP: [[Parallel] Shared] Hash
Previous Message Amit Kapila 2017-02-02 03:53:57 Re: parallelize queries containing subplans