Re: Performance improvement for joins where outer side is unique

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Performance improvement for joins where outer side is unique
Date: 2017-03-26 22:28:39
Message-ID: CAKJS1f_6iSqJ15b8PFfh1yJ0tyaDeyF8pMysw-4gP1ES-iHKMg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 14 March 2017 at 16:37, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> wrote:
> On 14 March 2017 at 11:35, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
> wrote:
>>
>> On 14 March 2017 at 07:50, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>>
>>> [ getting back to this patch finally... ]
>>>
>>> David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> writes:
>>> > I've attached a patch which implements this, though only for
>>> > MergeJoin, else I'd imagine we'd also need to ensure all proofs used
>>> > for testing the uniqueness were also hash-able too. I added some XXX
>>> > comments in analyzejoin.c around the mergeopfamilies == NIL tests to
>>> > mention that Merge Join depends on all the unique proof quals having
>>> > mergeopfamilies. This also assumes we'll never use some subset of
>>> > mergejoin-able quals for a merge join, which could be an interesting
>>> > area in the future, as we might have some btree index on a subset of
>>> > those columns to provide pre-sorted input. In short, it's a great
>>> > optimisation, but I'm a little scared we might break it one day.
>>>
>>> Umm ... it's broken already isn't it? See the comment for
>>> generate_mergejoin_paths:
>>>
>> Thanks for looking at this again.
>>
>> Yeah confirmed. It's broken. I guess I just need to remember in the Path
>> if we got all the join quals, although I've not looked in detail what the
>> best fix is. I imagine it'll require storing something else in the JoinPath.
>
>
> OK, so I've spent some more time on this and I've come up with a solution.
>
> Basically the solution is to not skip mark and restore when joinquals
> contains any items. This is a requirement for SEMI joins, but overly
> cautious for unique joins. However, I think it'll apply in most cases when
> Merge Join will be a win, and that's when there's a btree index on both
> sides of the join which covers all columns in the join condition. I
> carefully commented this part of the code to explain what can be done to
> have it apply in more cases.
>
> This caused me to go and change the following code too:
>
> @@ -2676,6 +2688,9 @@ final_cost_mergejoin(PlannerInfo *root, MergePath
> *path,
> * it off does not entitle us to deliver an invalid plan.
> */
> else if (innersortkeys == NIL &&
> + !((extra->inner_unique || path->jpath.jointype == JOIN_SEMI) &&
> + list_length(path->jpath.joinrestrictinfo) ==
> + list_length(path->path_mergeclauses)) &&
> !ExecSupportsMarkRestore(inner_path))
> path->materialize_inner = true;
>
> I've been staring at this for a while and I think it's correct. If it's
> wrong then we'd get "ERROR: unrecognized node type: <n>" from ExecMarkPos().
>
> Here we must make sure and never skip materializing the inner side, if we're
> not going to skip mark and restore in ExecInitMergeJoin().
>
> I've attached a patch which fixes the issue.
>
> Also added a regression test to try to make sure it stays fixed. There's a
> few other mostly cosmetic fixes in there too.

Patch is attached which fixes up the conflict between the expression
evaluation performance patch.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
unique_joins_2017-03-27.patch application/octet-stream 95.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2017-03-26 22:41:30 Re: WIP: [[Parallel] Shared] Hash
Previous Message Tomas Vondra 2017-03-26 22:13:09 Re: crashes due to setting max_parallel_workers=0