Re: Enforcing Parameterised Nested Loop Join Order for Foreign Table Joins

From: Adam Zegelin <adam(at)relational(dot)io>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Enforcing Parameterised Nested Loop Join Order for Foreign Table Joins
Date: 2013-03-18 23:38:17
Message-ID: F3D61D18-DFC7-44ED-A370-92A8730FBAD3@relational.io
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Tom,

Thank you for your prompt reply. Your advice has pointed me in the right direction.

I now have the wrapper identifying columns that are inputs to the web service, and thus parameterisable. The ec_classes, left_join_clauses and right_join_clauses trees are scanned for Var exprs that match these attributes. If they are present, the relid is added to the required list of outer rels for the path -- this is done as an extension to the logic I posted previously.

In all cases this seems to work, except one. A join between 3 tables. The foreign table has 2 parameterised columns, each given a restriction based on one of the other two tables:

adam=# explain select * from l1, l2, foreign1 where foreign1.a = l1.a and foreign1.b = l2.a;
QUERY PLAN
----------------------------------------------------------------------------------------------------
Merge Join (cost=5000704.96..5001278.44 rows=37822 width=168)
Merge Cond: (l2.a = foreign1.b)
-> Sort (cost=85.43..88.50 rows=1230 width=36)
Sort Key: l2.a
-> Seq Scan on l2 (cost=0.00..22.30 rows=1230 width=36)
-> Sort (cost=5000619.54..5000634.91 rows=6150 width=132)
Sort Key: foreign1.b
-> Merge Join (cost=5000135.26..5000232.51 rows=6150 width=132)
Merge Cond: (foreign1.a = l1.a)
-> Sort (cost=5000049.83..5000052.33 rows=1000 width=96)
Sort Key: foreign1.a
-> Foreign Scan on foreign1 (cost=5000000.00..5000000.00 rows=1000 width=96)
-> Sort (cost=85.43..88.50 rows=1230 width=36)
Sort Key: l1.a
-> Seq Scan on l1 (cost=0.00..22.30 rows=1230 width=36)

My path generation logic seems to work:

baserel->cheapest_parameterized_paths = (
{FOREIGNPATH
:pathtype 120
:parent_relids (b 3)
:required_outer (b 1 2)
:rows 500
:startup_cost 0.00
:total_cost 0.00
:pathkeys <>
:fdw_private <>
}
{FOREIGNPATH
:pathtype 120
:parent_relids (b 3)
:required_outer (b)
:rows 1000
:startup_cost 5000000.00
:total_cost 5000000.00
:pathkeys <>
:fdw_private <>
}
)

Yet the planner picks the non-parameterised path:

ForeignPath* best_path = {FOREIGNPATH
:pathtype 120
:parent_relids (b 3)
:required_outer (b)
:rows 1000
:startup_cost 5000000.00
:total_cost 5000000.00
:pathkeys <>
:fdw_private <>
}

I’ve tried adjusting planner tuneables to disable all join types except nested loop, and setting `join_collapse_limit` to 1 with no desirable outcome.

Yet, adding a restriction clause between the other two tables forces them to be scanned first:

adam=# explain select * from l1, l2, foreign1 where foreign1.a = l1.a and foreign1.b = l2.a and l1.b > l2.b;
QUERY PLAN
-------------------------------------------------------------------------
Nested Loop (cost=0.00..2544241.17 rows=12608 width=168)
-> Nested Loop (cost=0.00..22741.17 rows=504300 width=72)
Join Filter: (l1.b > l2.b)
-> Seq Scan on l1 (cost=0.00..22.30 rows=1230 width=36)
-> Materialize (cost=0.00..28.45 rows=1230 width=36)
-> Seq Scan on l2 (cost=0.00..22.30 rows=1230 width=36)
-> Foreign Scan on foreign1 (cost=0.00..0.00 rows=500 width=96)
Filter: ((a = l1.a) AND (b = l2.a))

ForeignPath* best_path = {FOREIGNPATH
:pathtype 120
:parent_relids (b 3)
:required_outer (b 1 2)
:rows 500
:startup_cost 0.00
:total_cost 0.00
:pathkeys <>
:fdw_private <>
}

On 18/03/2013, at 4:09 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Adam Zegelin <adam(at)relational(dot)io> writes:
>> Some service endpoints have no concept of unqualified queries. In the example above, a ‘sequence scan’ of Bing is a not possible.
>
> In that case, you shouldn't be generating such a path. But keep in mind
> that this may lead to failure to produce any plan at all for some
> queries. If the foreign data source is really so broken that it can't
> do that, then you have little choice ... but you shouldn't be thinking
> of that as anything but a broken design decision on their part.

I tried adding a condition that would prevent the non-parameterised path from being generated if the service only supported parameterised scans. Postgres refuses to generate a plan: "ERROR: could not devise a query plan for the given query". I did a bit of digging and this error is generated by pathnode.c:set_cheapest . As there is no non-parameterised `cheapest_total_path` the error is raised (line 253).

For now, I just add an expensive non-pramerterised path and let the FDW throw an error if no qual is found involving the required columns.

Regards,
Adam

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2013-03-19 01:00:00 Re: Roadmap for Postgres on AIX
Previous Message Christophe Pettus 2013-03-18 23:36:55 .backup file documentation