Quick Links

Re: MIN/MAX optimization for partitioned table

From:	Alan Li <ali(at)truviso(dot)com>
To:	Greg Stark <gsstark(at)mit(dot)edu>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: MIN/MAX optimization for partitioned table
Date:	2009-07-20 18:40:53
Message-ID:	782056770907201140h6c449398n6fb2bf19c3305fa8@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Sat, Jul 18, 2009 at 12:04 PM, Greg Stark <gsstark(at)mit(dot)edu> wrote:

> On Sat, Jul 18, 2009 at 6:35 PM, Alan Li<ali(at)truviso(dot)com> wrote:
> > Yeah, are you running into the same issue as well? I tried to figure out
> a
> > way around the O(n^2) behavior, but there were no existing direct
> > relationship between the child subpath and its associated AppendRelInfo.
> I
> > think an AppenRelInfo dynahash would work and just have it hash by the
> > child_relid.
>
> I don't see any place in my patch where I had to do anything like
> this. I do have to loop through all the appendrel elements and skip
> over the ones which don't belong to this appendrel which seems weird
> to me but it isn't n^2.

> Generating a normal Append node you're generating a brand new path for
> each child and attaching them to the append path. It looks like you're
> allowing the append rel to generate paths and then digging around
> looking at those paths. I wonder if that's the right approach.
>
>
> >> The other thing is it would be nice if we could avoid making separate
> >> subplans for each child and instead make one for the whole structure
> >> including the aggregate. It would at the very least make the explain
> >> output prettier, but I think it would avoid repeated execution of the
> >> aggregate in some cases as well.
> >
> > How would this plan look? I think the repeated execution of the
> aggregate
> > comes from having to process the output of each child. So it's O(n)
> > executions where n is the number of children, assuming each child has the
> > appropriate index for this optimization.
>
> No I just mean instead of generating
>
> InitPlan 1
> Limit
> Index Scan
> InitPlan 2
> Limit
> Index Scan
> Aggregate
> Append
> Result
> Result
>
> I think it would be better to generate this:
>
> InitPlan 1
> Aggregate
> Append
> Limit 1
> IndexScan
> Limit 1
> IndexScan
> Result
>
> The reason I think this is better is because if the subquery is
> executed multiple times it will just return the one precalculated
> result. In your plan it will have all the child minimums precalculated
> but will still have to re-execute the aggregate and append node.
> That's not terribly expensive but we might as well generate the
> simpler more efficient plan.
>
>
> > The other optimization that I thought was that the optimizer might use
> the
> > check constraints (in the range partitioning) on the column that I want
> to
> > do a MIN/MAX on. Then it'll only have to execute the subplan in one
> child
> > partition and the parent partition. But it was more of a wild idea on my
> > part, not sure if that's possible.
>
> Yes, I think this is the long-term goal. That's the whole "better
> representation of partitions" plotline. To do this efficiently the
> planner should know what the partition key is and what the
> partitioning structure is.
>
> In an ideal world we would be able to collapse the whole structure and
> eliminate the append relation entirely. To do that we need some way to
> indicate that the parent relation is provably empty. I had in mind to
> mark it as read-only and the statistics as authoritative since that
> seems more useful than just being able to mark it empty. I think there
> are a lot of other interesting things we could do with statistics if
> we knew they were authoritative for a read-only table. (The read-only
> property we would need here is very strong. There would have to be a
> vacuum freeze and moreover we would have to know that all tuples were
> successfully frozen.)
>
> --
> greg
> http://mit.edu/~gsstark/resume.pdf <http://mit.edu/%7Egsstark/resume.pdf>
>
>
>
Attached is an updated patch that removes the O(n^2) behavior and the
awkwardness of optimizing the seqscan path as the plan was about to be
created. Now, the optimization is considered when appendrel is generating
the paths.

I also changed the plan as you had suggested. It now looks like this:

postgres=# explain select max(b) from foo_archive;
QUERY
PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1.22..1.23 rows=1 width=8)
-> Append (cost=0.00..1.13 rows=32 width=8)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_idx on foo_archive
(cost=0.00..73.35 rows=1940 width=8)
-> Limit (cost=0.00..0.03 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_01_idx on
foo_archive_2007_01_01 foo_archive (cost=0.00..2723.24 rows=86399 width=8)
-> Limit (cost=0.00..0.03 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_02_idx on
foo_archive_2007_01_02 foo_archive (cost=0.00..2719.26 rows=86400 width=8)
-> Limit (cost=0.00..0.03 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_03_idx on
foo_archive_2007_01_03 foo_archive (cost=0.00..2719.26 rows=86400 width=8)
-> Limit (cost=0.00..0.03 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_04_idx on
foo_archive_2007_01_04 foo_archive (cost=0.00..2719.26 rows=86400 width=8)
-> Limit (cost=0.00..0.03 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_05_idx on
foo_archive_2007_01_05 foo_archive (cost=0.00..2719.26 rows=86400 width=8)
-> Limit (cost=0.00..0.03 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_06_idx on
foo_archive_2007_01_06 foo_archive (cost=0.00..2719.26 rows=86400 width=8)
-> Limit (cost=0.00..0.03 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_07_idx on
foo_archive_2007_01_07 foo_archive (cost=0.00..2719.26 rows=86400 width=8)
-> Limit (cost=0.00..0.03 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_08_idx on
foo_archive_2007_01_08 foo_archive (cost=0.00..2719.26 rows=86400 width=8)
-> Limit (cost=0.00..0.03 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_09_idx on
foo_archive_2007_01_09 foo_archive (cost=0.00..2719.26 rows=86400 width=8)
-> Limit (cost=0.00..0.03 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_10_idx on
foo_archive_2007_01_10 foo_archive (cost=0.00..2719.26 rows=86400 width=8)
-> Limit (cost=0.00..0.03 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_11_idx on
foo_archive_2007_01_11 foo_archive (cost=0.00..2719.26 rows=86400 width=8)
-> Limit (cost=0.00..0.03 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_12_idx on
foo_archive_2007_01_12 foo_archive (cost=0.00..1568.27 rows=49601 width=8)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_13_idx on
foo_archive_2007_01_13 foo_archive (cost=0.00..73.35 rows=1940 width=8)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_14_idx on
foo_archive_2007_01_14 foo_archive (cost=0.00..73.35 rows=1940 width=8)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_15_idx on
foo_archive_2007_01_15 foo_archive (cost=0.00..73.35 rows=1940 width=8)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_16_idx on
foo_archive_2007_01_16 foo_archive (cost=0.00..73.35 rows=1940 width=8)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_17_idx on
foo_archive_2007_01_17 foo_archive (cost=0.00..73.35 rows=1940 width=8)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_18_idx on
foo_archive_2007_01_18 foo_archive (cost=0.00..73.35 rows=1940 width=8)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_19_idx on
foo_archive_2007_01_19 foo_archive (cost=0.00..73.35 rows=1940 width=8)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_20_idx on
foo_archive_2007_01_20 foo_archive (cost=0.00..73.35 rows=1940 width=8)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_21_idx on
foo_archive_2007_01_21 foo_archive (cost=0.00..73.35 rows=1940 width=8)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_22_idx on
foo_archive_2007_01_22 foo_archive (cost=0.00..73.35 rows=1940 width=8)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_23_idx on
foo_archive_2007_01_23 foo_archive (cost=0.00..73.35 rows=1940 width=8)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_24_idx on
foo_archive_2007_01_24 foo_archive (cost=0.00..73.35 rows=1940 width=8)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_25_idx on
foo_archive_2007_01_25 foo_archive (cost=0.00..73.35 rows=1940 width=8)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_26_idx on
foo_archive_2007_01_26 foo_archive (cost=0.00..73.35 rows=1940 width=8)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_27_idx on
foo_archive_2007_01_27 foo_archive (cost=0.00..73.35 rows=1940 width=8)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_28_idx on
foo_archive_2007_01_28 foo_archive (cost=0.00..73.35 rows=1940 width=8)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_29_idx on
foo_archive_2007_01_29 foo_archive (cost=0.00..73.35 rows=1940 width=8)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_30_idx on
foo_archive_2007_01_30 foo_archive (cost=0.00..73.35 rows=1940 width=8)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_31_idx on
foo_archive_2007_01_31 foo_archive (cost=0.00..73.35 rows=1940 width=8)
(66 rows)

Attachment	Content-Type	Size
optimize_append_minmax_1.patch	text/x-patch	7.7 KB

In response to

Re: MIN/MAX optimization for partitioned table at 2009-07-18 19:04:42 from Greg Stark

Responses

Re: MIN/MAX optimization for partitioned table at 2009-07-20 19:29:09 from Greg Stark

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Peter Eisentraut	2009-07-20 18:52:51	Re: [PATCH] SE-PgSQL/tiny rev.2193
Previous Message	Alvaro Herrera	2009-07-20 18:36:06	Re: fmgroids.h not installed by "make install" in VPATH