|From:||David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>|
|To:||Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>|
|Subject:||Re: Performance improvement for joins where outer side is unique|
|Views:||Raw Message | Whole Thread | Download mbox|
On 14 March 2017 at 16:37, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> wrote:
> On 14 March 2017 at 11:35, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
>> On 14 March 2017 at 07:50, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> [ getting back to this patch finally... ]
>>> David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> writes:
>>> > I've attached a patch which implements this, though only for
>>> > MergeJoin, else I'd imagine we'd also need to ensure all proofs used
>>> > for testing the uniqueness were also hash-able too. I added some XXX
>>> > comments in analyzejoin.c around the mergeopfamilies == NIL tests to
>>> > mention that Merge Join depends on all the unique proof quals having
>>> > mergeopfamilies. This also assumes we'll never use some subset of
>>> > mergejoin-able quals for a merge join, which could be an interesting
>>> > area in the future, as we might have some btree index on a subset of
>>> > those columns to provide pre-sorted input. In short, it's a great
>>> > optimisation, but I'm a little scared we might break it one day.
>>> Umm ... it's broken already isn't it? See the comment for
>> Thanks for looking at this again.
>> Yeah confirmed. It's broken. I guess I just need to remember in the Path
>> if we got all the join quals, although I've not looked in detail what the
>> best fix is. I imagine it'll require storing something else in the JoinPath.
> OK, so I've spent some more time on this and I've come up with a solution.
> Basically the solution is to not skip mark and restore when joinquals
> contains any items. This is a requirement for SEMI joins, but overly
> cautious for unique joins. However, I think it'll apply in most cases when
> Merge Join will be a win, and that's when there's a btree index on both
> sides of the join which covers all columns in the join condition. I
> carefully commented this part of the code to explain what can be done to
> have it apply in more cases.
> This caused me to go and change the following code too:
> @@ -2676,6 +2688,9 @@ final_cost_mergejoin(PlannerInfo *root, MergePath
> * it off does not entitle us to deliver an invalid plan.
> else if (innersortkeys == NIL &&
> + !((extra->inner_unique || path->jpath.jointype == JOIN_SEMI) &&
> + list_length(path->jpath.joinrestrictinfo) ==
> + list_length(path->path_mergeclauses)) &&
> path->materialize_inner = true;
> I've been staring at this for a while and I think it's correct. If it's
> wrong then we'd get "ERROR: unrecognized node type: <n>" from ExecMarkPos().
> Here we must make sure and never skip materializing the inner side, if we're
> not going to skip mark and restore in ExecInitMergeJoin().
> I've attached a patch which fixes the issue.
> Also added a regression test to try to make sure it stays fixed. There's a
> few other mostly cosmetic fixes in there too.
Patch is attached which fixes up the conflict between the expression
evaluation performance patch.
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
|Next Message||Thomas Munro||2017-03-26 22:41:30||Re: WIP: [[Parallel] Shared] Hash|
|Previous Message||Tomas Vondra||2017-03-26 22:13:09||Re: crashes due to setting max_parallel_workers=0|