Re: Aggregate Push Down - Performing aggregation on foreign server

From: Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Aggregate Push Down - Performing aggregation on foreign server
Date: 2016-10-24 10:46:02
Message-ID: CAM2+6=Xky1e6T9Wsj10Xv0m4Mmosxcetnn8JWU2HGZrEhJEsjw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Oct 22, 2016 at 9:09 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> brolga is still not terribly happy with this patch: it's choosing not to
> push down the aggregates in one of the queries. While I failed to
> duplicate that result locally, investigation suggests that brolga's result
> is perfectly sane; in fact it's not very clear why we're not getting that
> from multiple critters, because the plan brolga is choosing is not
> inferior to the expected one.
>
> The core of the problem is this subquery:
>
> contrib_regression=# explain verbose select min(13), avg(ft1.c1),
> sum(ft2.c1) from ft1 right join ft2 on (ft1.c1 = ft2.c1) where ft1.c1 = 12;
>
> QUERY PLAN
> ------------------------------------------------------------
> ------------------------------------------------------------
> ---------------------------------
> Foreign Scan (cost=108.61..108.64 rows=1 width=44)
> Output: (min(13)), (avg(ft1.c1)), (sum(ft2.c1))
> Relations: Aggregate on ((public.ft1) INNER JOIN (public.ft2))
> Remote SQL: SELECT min(13), avg(r1."C 1"), sum(r2."C 1") FROM ("S 1"."T
> 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r2."C 1" = 12)) AND ((r1."C 1" =
> 12))))
> (4 rows)
>
> If you look at the estimate to just fetch the data, it's:
>
> contrib_regression=# explain verbose select ft1.c1, ft2.c1 from ft1 right
> join ft2 on (ft1.c1 = ft2.c1) where ft1.c1 = 12;
> QUERY PLAN
> ------------------------------------------------------------
> --------------------------------------------------------------------------
> Foreign Scan (cost=100.55..108.62 rows=1 width=8)
> Output: ft1.c1, ft2.c1
> Relations: (public.ft1) INNER JOIN (public.ft2)
> Remote SQL: SELECT r1."C 1", r2."C 1" FROM ("S 1"."T 1" r1 INNER JOIN
> "S 1"."T 1" r2 ON (((r2."C 1" = 12)) AND ((r1."C 1" = 12))))
> (4 rows)
>
> Note we're expecting only one row out of the join. Now the cost of doing
> three aggregates on a single row of input is not a lot. Comparing these
> local queries:
>
> regression=# explain select min(13),avg(q1),sum(q2) from int8_tbl where
> q2=456;
> QUERY PLAN
> ---------------------------------------------------------------
> Aggregate (cost=1.07..1.08 rows=1 width=68)
> -> Seq Scan on int8_tbl (cost=0.00..1.06 rows=1 width=16)
> Filter: (q2 = 456)
> (3 rows)
>
> regression=# explain select (q1),(q2) from int8_tbl where q2=456;
> QUERY PLAN
> ---------------------------------------------------------
> Seq Scan on int8_tbl (cost=0.00..1.06 rows=1 width=16)
> Filter: (q2 = 456)
> (2 rows)
>
> we seem to have startup = input cost + .01 and then another .01
> for total. So the estimate to do the above remote scan and then
> aggregate locally should have been 108.63 startup and 108.64 total,
> give or take. The estimate for aggregating remotely is a hair better,
> but it's not nearly better enough to guarantee that the planner won't
> see it as fuzzily the same cost.
>
> In short: the problem with this test case is that it's considering
> aggregation over only a single row, which is a situation in which
> pushing down the aggregate actually doesn't save us anything, because
> we're retrieving one row from the remote either way. So it's not at all
> surprising that we don't get a stable plan choice. The test query needs
> to be adjusted so that the aggregation is done over multiple rows,
> allowing fdw_tuple_cost to kick in and provide some daylight between
> the cost estimates.
>

Attached patch which performs aggrgation over 1000 rows as suggested by Tom.

I believe it will have a stable output/plan now.

Thanks

>
> regards, tom lane
>

--
Jeevan B Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Attachment Content-Type Size
agg_push_down_fix_testcase.patch binary/octet-stream 5.4 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Victor Wagner 2016-10-24 11:03:22 Re: Patch: Implement failover on libpq connect level.
Previous Message Thom Brown 2016-10-24 10:36:31 Re: Patch: Implement failover on libpq connect level.