Re: Add proper planner support for ORDER BY / DISTINCT aggregates

From: Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>
To: David Rowley <dgrowleyml(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>, Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>
Subject: Re: Add proper planner support for ORDER BY / DISTINCT aggregates
Date: 2021-11-04 07:59:00
Message-ID: 3163474.aeNJFYEL58@aivenronan
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Le jeudi 22 juillet 2021, 10:42:49 CET Ronan Dunklau a écrit :
> Le jeudi 22 juillet 2021, 09:38:50 CEST David Rowley a écrit :
> > On Thu, 22 Jul 2021 at 02:01, Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>
> > > I tested the 0001 patch against both HEAD and my proposed bugfix for
> > > postgres_fdw.
> > >
> > > There is a problem that the ordered aggregate is not pushed down
> > > anymore.
> > > The underlying Sort node is correctly pushed down though.
> > >
> > > This comes from the fact that postgres_fdw grouping path never contains
> > > any
> > > pathkey. Since the cost is fuzzily the same between the pushed-down
> > > aggregate and the locally performed one, the tie is broken against the
> > > pathkeys.
> >
> > I think this might be more down to a lack of any penalty cost for
> > fetching foreign tuples. Looking at create_foreignscan_path(), I don't
> > see anything that adds anything to account for fetching the tuples
> > from the foreign server. If there was something like that then there'd
> > be more of a preference to perform the remote aggregation so that
> > fewer rows must arrive from the remote server.
> >
> > I tested by adding: total_cost += cpu_tuple_cost * rows * 100; in
> > create_foreignscan_path() and I got the plan with the remote
> > aggregation. That's a fairly large penalty of 1.0 per row. Much bigger
> > than parallel_tuple_cost's default value.
> >
> > I'm a bit undecided on how much this patch needs to get involved in
> > adjusting foreign scan costs. The problem is that we've given the
> > executor a new path to consider and nobody has done any proper
> > costings for the foreign scan so that it properly prefers paths that
> > have to pull fewer foreign tuples. This is a pretty similar problem
> > to what parallel_tuple_cost aims to fix. Also similar to how we had to
> > add APPEND_CPU_COST_MULTIPLIER to have partition-wise aggregates
> > prefer grouping at the partition level rather than at the partitioned
> > table level.
> >
> > > Ideally we would add the group pathkeys to the grouping path, but this
> > > would add an additional ORDER BY expression matching the GROUP BY.
> > > Moreover, some triaging of the pathkeys would be necessary since we now
> > > mix the sort-in- aggref pathkeys with the group_pathkeys.
> >
> > I think you're talking about passing pathkeys into
> > create_foreign_upper_path in add_foreign_grouping_paths. If so, I
> > don't really see how it would be safe to add pathkeys to the foreign
> > grouping path. What if the foreign server did a Hash Aggregate? The
> > rows might come back in any random order.
> Yes, I was suggesting to add a new path with the pathkeys factored in, which
> if chosen over the non-ordered path would result in an additional ORDER BY
> clause to prevent a HashAggregate. But that doesn't seem a good idea after
> all.
> > I kinda think that to fix this properly would need a new foreign
> > server option such as foreign_tuple_cost. I'd feel better about
> > something like that with some of the people with a vested interest in
> > the FDW code were watching more closely. So far we've not managed to
> > entice any of them with the bug report yet, but it's maybe early days
> > yet.
> We already have that in the form of fdw_tuple_cost as a server option if I'm
> not mistaken ? It works as expected when the number of tuples is notably
> reduced by the foreign group by.
> The problem arise when the cardinality of the groups is equal to the input's
> cardinality. I think even in that case we should try to use a remote
> aggregate since it's a computation that will not happen on the local
> server. I also think we're more likely to have up to date statistics
> remotely than the ones collected locally on the foreign tables, and the
> estimated number of groups would be more accurate on the remote side than
> the local one.

I took some time to toy with this again.

At first I thought that charging a discount in foreign grouping paths for
Aggref targets (since they are computed remotely) would be a good idea,
similar to what is done for the grouping keys.

But in the end, it's probably not something we would like to do. Yes, the
group planning will be more accurate on the remote side generally (better
statistics than locally for estimating the number of groups) but executing the
grouping locally when the number of groups is close to the input's cardinality
(ex: group by unique_key) gives us a form of parallelism which can actually
perform better.

For the other cases where there is fewer output than input tuples, that is,
when an actual grouping takes place, adjusting fdw_tuple_cost might be enough
to tune the behavior to what is desirable.

Ronan Dunklau

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2021-11-04 08:10:48 Re: Data is copied twice when specifying both child and parent table in publication
Previous Message Michael Paquier 2021-11-04 07:41:04 Re: should we enable log_checkpoints out of the box?