Re: Join optimization for inheritance tables

From: Greg Stark <gsstark(at)mit(dot)edu>
To: nborisov(at)asterdata(dot)com
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Herodotos Herodotou <Herodotos(dot)Herodotou(at)asterdata(dot)com>, Eric Friedman <Eric(dot)Friedman(at)asterdata(dot)com>, John Cieslewicz <John(dot)Cieslewicz(at)asterdata(dot)com>, Dheeraj Pandey <Dheeraj(dot)Pandey(at)asterdata(dot)com>
Subject: Re: Join optimization for inheritance tables
Date: 2009-07-06 22:14:13
Message-ID: 407d949e0907061514i5f1a044r691d1d74eaefb067@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jul 6, 2009 at 10:23 PM, Nedyalko Borisov<nedyalko(at)asterdata(dot)com> wrote:
> For example, typical observed scenario is: optimizer chooses Merge Join for
> two partitioned tables like the plan below:
> Merge Cond: (table1.id = table2.id)
>   -> Sort
>       Sort Key: id
>       -> Append
>           -> Seq Scan on table1_part1
>           -> Seq Scan on table1_part2
>       -> ....
>           -> Seq Scan on table1_partN
>      ->  Sort
>           Sort Key: id
>           -> Append
>              -> Seq Scan on table2_part1
>              -> Seq Scan on table2_part2
>              -> ....
>              -> Seq Scan on table2_partM
>
> This plan ignores the fact there are indexes on the table2 partitions and
> that the pairwise partitions joins (index nested loop or hash join) will be
> faster than scanning all the partitions and sorting them.

To some degree my merge-append patch would mitigate this case. It
would allow the use of indexes on some or all the partitions to avoid
the sorts.

However it would still force all the partitions to be appended on each
side and then merged. If we could match up all the partitions then I
think this plan would be faster with the Append on top and separate
merge joins for each pair of partitions.

Aside from skipping the cpu cost of the merge-append I think it would
win for a few other reasons as well. Each join would be able to come
up with much better statistics which would enable it to pick a better
join when one is available. Even if the planner still picks a merge
join it would be much more likely to finish early and skip the
remainder of a partition on one side or the other.

> OK, implementing a special "abstract"/"parent" table would make more sense.

I had in mind to tackle this in a bit of a roundabout way. If we mark
the parent table "read-only" then notice that all tuples (all 0 of
them) in the table are frozen then we can discard that table from the
plans. Since changing the "read-only" attribute would have to be
treated as a DDL operation which would invalidate any cached plans we
can trust that it won't change as long as the plan lives so no new
tuples can be inserted.

The reason I wanted to take such a roundabout route instead of having
an "abstract" or "empty" property is that a wanted to generalize this.
Once we know a table is read-only then there are lots of properties we
could find useful in planning aside from emptiness. We could have
statistics like the minimum and maximum value for a column which the
planner would be able to trust and exclude partitions without having
to explicitly declare constraints on every column.

This is all just my musings, not any kind of consensus. Does it make
sense to others or is it too baroque when a simple "abstract" flag
would do?

--
greg
http://mit.edu/~gsstark/resume.pdf

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Itagaki Takahiro 2009-07-07 02:06:24 Re: ALTER SET DISTINCT vs. Oracle-like DBMS_STATS
Previous Message Nedyalko Borisov 2009-07-06 21:23:36 Re: Join optimization for inheritance tables