Re: Costing foreign joins in postgres_fdw

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>
Cc: "Ashutosh Bapat *EXTERN*" <ashutosh(dot)bapat(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Costing foreign joins in postgres_fdw
Date: 2015-12-18 16:39:13
Message-ID: CA+TgmoZbbnCX_9c=kqUis9cMUb61GO+5EJP7rMCigVmYupOXzQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Dec 18, 2015 at 8:09 AM, Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at> wrote:
> My gut feeling is that for a join where all join predicates can be pushed down, it
> will usually be a win to push the join to the foreign server.
>
> So in your first scenario, I'd opt for always pushing down the join
> if possible if use_remote_estimate is OFF.
>
> Your second scenario is essentially to estimate that a pushed down join will
> always be executed as a nested loop join, which will in most cases produce
> an unfairly negative estimate.

+1 to all that. Whatever we do here for costing in detail, it should
be set up so that the pushed-down join wins unless there's some pretty
tangible reason to think, in a particular case, that it will lose.

> What about using local statistics to come up with an estimated row count for
> the join and use that as the basis for an estimate? My idea here is that it
> is always be a win to push down a join unless the result set is so large that
> transferring it becomes the bottleneck.

This also sounds about right.

> Maybe, to come up with something remotely realistic, a formula like
>
> sum of locally estimated costs of sequential scan for the base table
> plus count of estimated result rows (times a factor)

Was this meant to say "the base tables", plural?

I think whatever we do here should try to extend the logic in
postgres_fdw's estimate_path_cost_size() to foreign tables in some
reasonably natural way, but I'm not sure exactly what that should look
like. Maybe do what that function currently does for single-table
scans, and then add all the values up, or something like that. I'm a
little worried, though, that the planner might then view a query that
will be executed remotely as a nested loop with inner index-scan as
not worth pushing down, because in that case the join actually will
not touch every row from both tables, as a hash or merge join would.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2015-12-18 16:40:35 Re: [COMMITTERS] pgsql: Handle policies during DROP OWNED BY
Previous Message Robert Haas 2015-12-18 16:28:41 Re: Speed up Clog Access by increasing CLOG buffers