Re: Use unique index for longer pathkeys.

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: amit(dot)kapila16(at)gmail(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Use unique index for longer pathkeys.
Date: 2014-08-04 07:52:30
Message-ID: 20140804.165230.231416844.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

> > > Now drop the i_t1_pkey_1 and check the query plan again.
> > >
> > > drop index i_t1_pkey_1;
> > > explain (costs off, analyze off) select * from t,t1 where t.a=t1.a
> order by
> > > t1.a,t1.b,t1.c,t1.d;
> > > QUERY PLAN
> > > ------------------------------------------------
> > > Sort
> > > Sort Key: t.a, t1.b, t1.c, t1.d
> > > -> Merge Join
> > > Merge Cond: (t.a = t1.a)
> > > -> Index Scan using i_t_pkey on t
> > > -> Index Scan using i_t1_pkey_2 on t1
> > > (6 rows)
> > >
> > > Can't above plan eliminate Sort Key even after dropping index
> > > (i_t1_pkey_1)?
> >
> > My patch doesn't so since there no longer a 'common primary
> > pathkeys' in this query. Perhaps the query doesn't allow the sort
> > eliminated. Since a is no more a pkey, t1 can have dulicate rows
> > for the same a, so the joined relation also may have duplicte
> > values in the column a.

Sorry, I may misunderstood you. The dropped index is t1(a) not
t1(a,b) so the discussion abobe is on the wrong assumption.

> I think irrespective of that we can trim t1.c & t1.d as we have
> primary key (unique and non column) for t1.a, t1.b. Basically
> even if we don't use the primary key index, we can still trim
> the keys in such a case. IIUC, then Tom has mentioned the
> same in his message related to this issue.

Although, yes, you're right, irrespective of the "common
something", and even if the dropped index was i_t1_pkey_2, which
is on t1(a, b), the result tuples are sorted as expected only by
the pathkey (t.a = t1.a, t1.b). It is because both t and t1 are
still unique so the joined tuples are also unique, and the unique
key of the result tuples is the merged pkey (t.a, t1.a, t1.b),
which can be transformed to (t.a, t1.b) using the equiality
between t.a and t1.a. And considering the inner relation (t1) is
already sorted by (a, b), the sort itself could be elimited from
the plan.

The above could be described as below,

- Join between relations with pkey yiels relation with the pkey
containing the columns of both pkeys, this is created by
simply concatenate them.

- The concatenated pkey can be transformed using equality
operators, in other words, consulting equivelance classes.

- If the projected joined relation(?) contains one of the
transformed pkeys above, it is a pkey for the result.

And in regards to MergeJoin and NestLoop, the result pathkeys can
be composed by concatenating child pathkeys in outer-inner
order. If the all ORDER BY pathkeys columns appear in the
concatenated pathkeys in the same order , the sort itself can be
eliminated.

> I am referring to below text:
>
> "If we have "ORDER BY a, b, c" and (a,b) is the
> primary key, then including c in the ORDER BY list is semantically
> redundant, *whether or not we use an indexscan on the pkey index at all*."
>
> http://www.postgresql.org/message-id/5212.1397599817@sss.pgh.pa.us

Yes, my point at that time was mainly union (all) and partitioned
tables so the optimization for joins was the next step for
me. Although, my previous implement of
collect_common_primary_pathkeys seems to have been excessively
heavyweighted and it could be integrated with join optimization.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Gabriele Bartolini 2014-08-04 08:15:29 Re: Proposal: Incremental Backup
Previous Message Dilip kumar 2014-08-04 06:11:57 Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ]