Re: Effect of changing the value for PARALLEL_TUPLE_QUEUE_SIZE

From: Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Effect of changing the value for PARALLEL_TUPLE_QUEUE_SIZE
Date: 2017-06-01 13:11:20
Message-ID: CAOGQiiPhWCiprCurxsawGjEF7Rj2Z2JFHj3MrpjCGPb8qqEL3Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, May 30, 2017 at 4:57 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> I did a little bit of brief experimentation on this same topic a long
> time ago and didn't see an improvement from boosting the queue size
> beyond 64k but Rafia is testing Gather rather than Gather Merge and,
> as I say, my test was very brief. I think it would be a good idea to
> try to get a complete picture here. Does this help on any query that
> returns many tuples through the Gather? Only the ones that use Gather
> Merge? Some queries but not others with no obvious pattern? Only
> this query?
>
I did further exploration trying other values of
PARALLEL_TUPLE_QUEUE_SIZE and trying different queries and here are my
findings,
- on even setting PARALLEL_TUPLE_QUEUE_SIZE to 655360, there isn't
much improvement in q12 itself.
- there is no other TPC-H query which is showing significant
improvement on 6553600 itself. There is a small improvement in q3
which is also using gather-merge.
- as per perf analysis of q12 on head and patch, the %age of
ExecGatherMerge is 18% with patch and 98% on head, and similar with
gather_merge_readnext and gather_merge_writenext.

As per my understanding it looks like this increase in tuple queue
size is helping only gather-merge. Particularly, in the case where it
is enough stalling by master in gather-merge because it is maintaining
the sort-order. Like in q12 the index is unclustered and gather-merge
is just above parallel index scan, thus, it is likely that to maintain
the order the workers have to wait long for the in-sequence tuple is
attained by the master. Something like this might be happening, master
takes one tuple from worker 1, then next say 10 tuples from worker 2
and so on, and then finally returning to worker1, so, one worker 1 has
done enough that filled it's queue it sits idle. Hence, on increasing
the tuple queue size helps in workers to keep on working for longer
and this is improving the performance.

In other cases like q3, q18, etc. gather-merge is above sort, partial
group aggregate, etc. here the chances of stalls is comparatively
lesser stalls since the scan of relation is using the primary key,
hence the tuples in the blocks are likely to be in the order. Similar
was the case for many other cases of TPC-H queries. Other thing is
that in TPC-H benchmark queries most of the time the number of tuples
at gather-merge is fairly low so I'll try to test this on some custom
queries which exhibit aforementioned case.

> Blindly adding a GUC because we found one query that would be faster
> with a different value is not the right solution. If we don't even
> know why a larger value is needed here and (maybe) not elsewhere, then
> how will any user possibly know how to tune the GUC? And do we really
> want the user to have to keep adjusting a GUC before each query to get
> maximum performance? I think we need to understand the whole picture
> here, and then decide what to do. Ideally this would auto-tune, but
> we can't write code for that without a more complete picture of the
> behavior.
>
Yeah may be for the scenario discussed above GUC is not the best idea
but may be using something which can tell the relation between the
ordering on index and the physical ordering of the tuples along with
the number of tuples, etc. by the planner to decide the value of
PARALLEL_TUPLE_QUEUE_SIZE might help. E.g. if the index is primary key
then the physical order is same as index order and if this the sort
key then while at gather-merge stalls would be less, but if this is
unclustered index then the physical order is way different than index
order then it is likely that workers would be stalling more so keep a
higher value of PARALLEL_TUPLE_QUEUE _SIZE based on the number of
tuples.

Again I am not yet concluding anything as this is very less
experimentation to ascertain something, I'll continue the experiments
and would be grateful to have more suggestions on that.

--
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-06-01 13:25:24 Re: retry shm attach for windows (WAS: Re: OK, so culicidae is *still* broken)
Previous Message Alexander Korotkov 2017-06-01 13:10:16 Re: ALTER INDEX .. SET STATISTICS ... behaviour