Improving planner's handling of min/max aggregate optimization

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Improving planner's handling of min/max aggregate optimization
Date: 2010-11-02 03:16:31
Message-ID: 12570.1288667791@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Now that we have MergeAppend in place, it strikes me that the current
support for implementing MIN() and MAX() via indexscan+LIMIT subplans
could be generalized to work on inheritance trees: a MergeAppend
plan plus LIMIT would just about do it.

However, extending the existing implementation in planagg.c to create
this type of plan looks quite unappetizing. It's basically duplicating
the key parts of indexpath generation, and we'd have to duplicate a
whole lot more logic in order to do MergeAppends in the same fashion.
I think it's time to rethink the idea of having that code be mostly
independent of the rest of the planner logic.

With the recent hacking on equivalence classes fresh in mind, it seems
to me that the right way to refactor planagg.c is like this:

1. Do the initial processing (checking to see if all aggregates are
MIN/MAX) early in grouping_planner(). Perhaps it'd be sensible to
merge it into count_agg_clauses(), although for the moment I'd be
satisfied with doing a second pass over the tree for this purpose.

2. If we find that the aggregates are potentially optimizable,
then push EquivalenceClauses for their arguments into the EC machinery.

3. As a result of #2, the path generation code will automatically try to
build indexscan paths with sort orders matching the aggregate arguments.
If there are any inherited tables, it'll try to build MergeAppend paths
for those, too. (Well, actually, we'll likely have to mess with the
"useful pathkeys" logic just a tad for this to work. But it won't be
hard.)

4. The final step is still the same as now, and done at the same place:
compare costs and see if we should build an alternative plan. But
instead of constructing paths the hard way, we'll just grab the cheapest
suitably-sorted path from the existing path collection.

This will make the min/max optimization code more visible to the rest of
the planner in a couple of ways: aside from being called at two places
not one, it will have some intermediate state that'll have to be kept in
PlannerInfo, and the "useful pathkeys" logic will have to be complicit
in letting paths that match the aggregates' requirements survive. But
it's not real bad, and it seems a lot better than continuing with the
fully-at-arms-length approach.

Comments?

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Vaibhav Kaushal 2010-11-02 04:40:29 Starting off with the development
Previous Message Hitoshi Harada 2010-11-02 02:49:13 Re: Sort is actually PlanState?