Re: MIN/MAX optimization for partitioned table

From: Greg Stark <gsstark(at)mit(dot)edu>
To: Alan Li <ali(at)truviso(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: MIN/MAX optimization for partitioned table
Date: 2009-07-18 19:04:42
Message-ID: 407d949e0907181204r549e98c5m558873098b50275@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Jul 18, 2009 at 6:35 PM, Alan Li<ali(at)truviso(dot)com> wrote:
> Yeah, are you running into the same issue as well?  I tried to figure out a
> way around the O(n^2) behavior, but there were no existing direct
> relationship between the child subpath and its associated AppendRelInfo.  I
> think an AppenRelInfo dynahash would work and just have it hash by the
> child_relid.

I don't see any place in my patch where I had to do anything like
this. I do have to loop through all the appendrel elements and skip
over the ones which don't belong to this appendrel which seems weird
to me but it isn't n^2.

Generating a normal Append node you're generating a brand new path for
each child and attaching them to the append path. It looks like you're
allowing the append rel to generate paths and then digging around
looking at those paths. I wonder if that's the right approach.

>> The other thing is it would be nice if we could avoid making separate
>> subplans for each child and instead make one for the whole structure
>> including the aggregate. It would at the very least make the explain
>> output prettier, but I think it would avoid repeated execution of the
>> aggregate in some cases as well.
>
> How would this plan look?  I think the repeated execution of the aggregate
> comes from having to process the output of each child.  So it's O(n)
> executions where n is the number of children, assuming each child has the
> appropriate index for this optimization.

No I just mean instead of generating

InitPlan 1
Limit
Index Scan
InitPlan 2
Limit
Index Scan
Aggregate
Append
Result
Result

I think it would be better to generate this:

InitPlan 1
Aggregate
Append
Limit 1
IndexScan
Limit 1
IndexScan
Result

The reason I think this is better is because if the subquery is
executed multiple times it will just return the one precalculated
result. In your plan it will have all the child minimums precalculated
but will still have to re-execute the aggregate and append node.
That's not terribly expensive but we might as well generate the
simpler more efficient plan.

> The other optimization that I thought was that the optimizer might use the
> check constraints (in the range partitioning) on the column that I want to
> do a MIN/MAX on.  Then it'll only have to execute the subplan in one child
> partition and the parent partition.  But it was more of a wild idea on my
> part, not sure if that's possible.

Yes, I think this is the long-term goal. That's the whole "better
representation of partitions" plotline. To do this efficiently the
planner should know what the partition key is and what the
partitioning structure is.

In an ideal world we would be able to collapse the whole structure and
eliminate the append relation entirely. To do that we need some way to
indicate that the parent relation is provably empty. I had in mind to
mark it as read-only and the statistics as authoritative since that
seems more useful than just being able to mark it empty. I think there
are a lot of other interesting things we could do with statistics if
we knew they were authoritative for a read-only table. (The read-only
property we would need here is very strong. There would have to be a
vacuum freeze and moreover we would have to know that all tuples were
successfully frozen.)

--
greg
http://mit.edu/~gsstark/resume.pdf

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2009-07-18 19:25:03 Re: multi-threaded pgbench
Previous Message Robert Haas 2009-07-18 19:02:57 Re: Comments on automatic DML routing and explicit partitioning subcommands