Quick Links

Re: Gather Merge

From:	Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>
To:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Gather Merge
Date:	2016-11-24 12:12:22
Message-ID:	CAGPqQf37KFFW=Q3PGhTLmWD1sTovCs+KykNAP4PjS21z0Szkgg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, Nov 16, 2016 at 3:10 PM, Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>
wrote:

>
>
> On Mon, Nov 14, 2016 at 3:51 PM, Thomas Munro <
> thomas(dot)munro(at)enterprisedb(dot)com> wrote:
>
>> On Sat, Nov 12, 2016 at 1:56 AM, Rushabh Lathia
>> <rushabh(dot)lathia(at)gmail(dot)com> wrote:
>> > On Fri, Nov 4, 2016 at 8:30 AM, Thomas Munro <
>> thomas(dot)munro(at)enterprisedb(dot)com>
>> > wrote:
>> >> + * Portions Copyright (c) 1996-2015, PostgreSQL Global Development
>> Group
>> >> + * Portions Copyright (c) 1994, Regents of the University of
>> California
>> >>
>> >> Shouldn't this say just "(c) 2016, PostgreSQL Global Development
>> >> Group"?
>> >
>> > Fixed.
>>
>> The year also needs updating to 2016 in nodeGatherMerge.h.
>>
>
> Oops sorry, fixed now.
>
>
>>
>> >> + /* Per-tuple heap maintenance cost */
>> >> + run_cost += path->path.rows * comparison_cost * 2.0 * logN;
>> >>
>> >> Why multiply by two? The comment above this code says "about log2(N)
>> >> comparisons to delete the top heap entry and another log2(N)
>> >> comparisons to insert its successor". In fact gather_merge_getnext
>> >> calls binaryheap_replace_first, which replaces the top element without
>> >> any comparisons at all and then performs a sift-down in log2(N)
>> >> comparisons to find its new position. There is no per-tuple "delete"
>> >> involved. We "replace" the top element with the value it already had,
>> >> just to trigger the sift-down, because we know that our comparator
>> >> function might have a new opinion of the sort order of this element.
>> >> Very clever! The comment and the 2.0 factor in cost_gather_merge seem
>> >> to be wrong though -- or am I misreading the code?
>> >>
>> > See cost_merge_append.
>>
>> That just got tweaked in commit 34ca0905.
>>
>
> Fixed.
>
>
>>
>> > Looking at the plan I realize that this is happening because wrong
>> costing
>> > for Gather Merge. Here in the plan we can see the row estimated by
>> > Gather Merge is wrong. This is because earlier patch GM was considering
>> > rows = subpath->rows, which is not true as subpath is partial path. So
>> > we need to multiple it with number of worker. Attached patch also fixed
>> > this issues. I also run the TPC-H benchmark with the patch and results
>> > are same as earlier.
>>
>> In create_grouping_paths:
>> + double total_groups = gmpath->rows *
>> gmpath->parallel_workers;
>>
>> This hides a variable of the same name in the enclosing scope. Maybe
>> confusing?
>>
>> In some other places like create_ordered_paths:
>> + double total_groups = path->rows * path->parallel_workers;
>>
>> Though it probably made sense to use this variable name in
>> create_grouping_paths, wouldn't total_rows be better here?
>>
>>
> Initially I just copied from the other places. I agree with you that
> create_ordered_paths - total_rows make more sense.
>
>
>> It feels weird to be working back to a total row count estimate from
>> the partial one by simply multiplying by path->parallel_workers.
>> Gather Merge will underestimate the total rows when parallel_workers <
>> 4, if using partial row estimates ultimately from cost_seqscan which
>> assume some leader contribution. I don't have a better idea though.
>> Reversing cost_seqscan's logic certainly doesn't seem right. I don't
>> know how to make them agree on the leader's contribution AND give
>> principled answers, since there seems to be some kind of cyclic
>> dependency in the costing logic (cost_seqscan really needs to be given
>> a leader contribution estimate from its superpath which knows whether
>> it will allow the leader to pull tuples greedily/fairly or not, but
>> that superpath hasn't been created yet; cost_gather_merge needs the
>> row count from its subpath). Or maybe I'm just confused.
>>
>>
> Yes, I agree with you. But we can't really do changes into cost_seqscan.
> Another option I can think of is just calculate the rows for gather merge,
> by using the reverse formula which been used into cost_seqscan. So
> we can completely remote the rows argument from the
> create_gather_merge_path()
> and then inside create_gather_merge_path() - calculate the total_rows
> using same formula which been used into cost_seqscan. This is working
> fine - but not quite sure about the approach. So I attached that part of
> changes
> as separate patch. Any suggestions?
>

With offline discussion with Thomas, I realized that this won't work. It
will
work only if the subplan is seqscan - so this logic won't be enough to
estimate the rows. I guess as Thomas told earlier, this is not problem
with GatherMerge implementation as such - so we will keep this as separate.

Apart from this my colleague Neha Sharma, reported the server crash with
the patch.
It was hitting below Assert into create_gather_merge_path().

Assert(pathkeys);

Basically when query is something like "select * from foo where a = 1 order
by a;"
here query has sortclause but planner won't generate sort key because
where equality clause is on same column. Fix is about making sure of
pathkeys
before calling create_gather_merge_path().

PFA latest patch with fix as well as few cosmetic changes.

>
>
> --
> Rushabh Lathia
> www.EnterpriseDB.com
>
>

--
Rushabh Lathia

Attachment	Content-Type	Size
gather_merge_v5.patch	binary/octet-stream	51.5 KB

In response to

Re: Gather Merge at 2016-11-16 09:40:57 from Rushabh Lathia

Responses

Re: Gather Merge at 2016-12-05 00:36:10 from Haribabu Kommi

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Mithun Cy	2016-11-24 12:16:48	Re: Patch: Implement failover on libpq connect level.
Previous Message	Amit Langote	2016-11-24 11:13:22	Re: Declarative partitioning - another take