Re: Performance improvement for joins where outer side is unique

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Performance improvement for joins where outer side is unique
Date: 2017-04-06 23:29:13
Message-ID: CAKJS1f-mOBYUmHLt85XeMoV0Z+P9_c75J2BDigxqUSOVwCJ7fA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

On 7 April 2017 at 07:26, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I'm looking through this, and I'm failing to see where it deals with
> the problem we discussed last time, namely that you can't apply the
> optimization unless all clauses that were used in the uniqueness
> proof are included in the join's merge/hash conditions + joinquals.

Many thanks for looking at this again.

The test in question is in nodeMergeJoin.c. I believe the join is
still unique no matter where the clauses are evaluated. It should be
up to the executor code to make use of the knowledge how it sees fit.
The planner should not make assumptions on how the executor will make
use of this knowledge. I've carefully crafted a comment in
nodeMergejoin.c which explains all of this, and also about the
limitations and where things might be improved later.

The code in question is:

mergestate->mj_SkipMarkRestore = !mergestate->js.joinqual &&
mergestate->js.first_inner_tuple_only;

> I don't especially like the centralized unique_rels cache structure.
> It's not terribly clear what it's for, and you're making uncomfortably
> large assumptions about never indexing off the end of the array, despite
> not having any range checks for the subscripts. Wouldn't it be better to
> add simple List fields into RelOptInfo, representing the outer rels this
> rel has been proven unique or not-unique for? That would dodge the array
> size question and would be more readily extensible to someday applying
> this to join rels .

hmm, perhaps bounds checking could be done, but it's no worse than
planner_rt_fetch().
I don't really think the List idea would be nearly as efficient. The
array provides a direct lookup for the List of proof relations. A List
of List list would require a linear lookup just to find the correct
List, then the existing linear lookup to find the proofs. The cache is
designed to be fast. Slowing it down seems like a bad idea. Perhaps
throwing it away would be better, since it's not required and was only
added as an optimisation.

The non_unique_rels will most often have a NULL bitmap set due to the
incremental join search by the standard planner. So access to this as
an array should be very fast, as we'll quickly realise there are no
proofs to be found.

> I also think some more thought is needed about handling JOIN_UNIQUE_OUTER
> and JOIN_UNIQUE_INNER cases. In the first place, the patch cavalierly
> violates the statement in joinpath.c that those jointype values never
> propagate outside that module. In the second place, shouldn't
> JOIN_UNIQUE_INNER automatically result in the path getting marked
> inner_unique? I suspect the logic in add_paths_to_joinrel ought to
> look something like
>
> if (jointype == JOIN_UNIQUE_INNER)
> extra.inner_unique = true;
> else
> extra.inner_unique = innerrel_is_unique(root, outerrel, innerrel,
> (jointype == JOIN_UNIQUE_OUTER ? JOIN_INNER : jointype),
> restrictlist);

hmm, innerrel_is_unique() is without prejudice as to the jointypes it
supports, so much so that the argument is completely ignored. It was
probably left over from some previous incarnation of the patch.
If we treat JOIN_UNIQUE_INNER specially, then we'd better be sure that
it's made unique on the RHS join quals. It looks like
create_unique_path() uses sjinfo->semi_rhs_exprs to uniquify the
relation, and compute_semijoin_info() seems to take all of the join
conditions there or nothing at all, so I think it's safe to
automatically mark JOIN_UNIQUE_INNERs this way.

Updated patch is attached.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
unique_joins_2017-04-07.patch application/octet-stream 96.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Beena Emerson 2017-04-06 23:33:41 Re: increasing the default WAL segment size
Previous Message Corey Huinker 2017-04-06 23:21:21 Re: Undefined psql variables