From: | Adam Zegelin <adam(at)relational(dot)io> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Enforcing Parameterised Nested Loop Join Order for Foreign Table Joins |
Date: | 2013-03-18 23:38:17 |
Message-ID: | F3D61D18-DFC7-44ED-A370-92A8730FBAD3@relational.io |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Tom,
Thank you for your prompt reply. Your advice has pointed me in the right direction.
I now have the wrapper identifying columns that are inputs to the web service, and thus parameterisable. The ec_classes, left_join_clauses and right_join_clauses trees are scanned for Var exprs that match these attributes. If they are present, the relid is added to the required list of outer rels for the path -- this is done as an extension to the logic I posted previously.
In all cases this seems to work, except one. A join between 3 tables. The foreign table has 2 parameterised columns, each given a restriction based on one of the other two tables:
adam=# explain select * from l1, l2, foreign1 where foreign1.a = l1.a and foreign1.b = l2.a;
QUERY PLAN
----------------------------------------------------------------------------------------------------
Merge Join (cost=5000704.96..5001278.44 rows=37822 width=168)
Merge Cond: (l2.a = foreign1.b)
-> Sort (cost=85.43..88.50 rows=1230 width=36)
Sort Key: l2.a
-> Seq Scan on l2 (cost=0.00..22.30 rows=1230 width=36)
-> Sort (cost=5000619.54..5000634.91 rows=6150 width=132)
Sort Key: foreign1.b
-> Merge Join (cost=5000135.26..5000232.51 rows=6150 width=132)
Merge Cond: (foreign1.a = l1.a)
-> Sort (cost=5000049.83..5000052.33 rows=1000 width=96)
Sort Key: foreign1.a
-> Foreign Scan on foreign1 (cost=5000000.00..5000000.00 rows=1000 width=96)
-> Sort (cost=85.43..88.50 rows=1230 width=36)
Sort Key: l1.a
-> Seq Scan on l1 (cost=0.00..22.30 rows=1230 width=36)
My path generation logic seems to work:
baserel->cheapest_parameterized_paths = (
{FOREIGNPATH
:pathtype 120
:parent_relids (b 3)
:required_outer (b 1 2)
:rows 500
:startup_cost 0.00
:total_cost 0.00
:pathkeys <>
:fdw_private <>
}
{FOREIGNPATH
:pathtype 120
:parent_relids (b 3)
:required_outer (b)
:rows 1000
:startup_cost 5000000.00
:total_cost 5000000.00
:pathkeys <>
:fdw_private <>
}
)
Yet the planner picks the non-parameterised path:
ForeignPath* best_path = {FOREIGNPATH
:pathtype 120
:parent_relids (b 3)
:required_outer (b)
:rows 1000
:startup_cost 5000000.00
:total_cost 5000000.00
:pathkeys <>
:fdw_private <>
}
I’ve tried adjusting planner tuneables to disable all join types except nested loop, and setting `join_collapse_limit` to 1 with no desirable outcome.
Yet, adding a restriction clause between the other two tables forces them to be scanned first:
adam=# explain select * from l1, l2, foreign1 where foreign1.a = l1.a and foreign1.b = l2.a and l1.b > l2.b;
QUERY PLAN
-------------------------------------------------------------------------
Nested Loop (cost=0.00..2544241.17 rows=12608 width=168)
-> Nested Loop (cost=0.00..22741.17 rows=504300 width=72)
Join Filter: (l1.b > l2.b)
-> Seq Scan on l1 (cost=0.00..22.30 rows=1230 width=36)
-> Materialize (cost=0.00..28.45 rows=1230 width=36)
-> Seq Scan on l2 (cost=0.00..22.30 rows=1230 width=36)
-> Foreign Scan on foreign1 (cost=0.00..0.00 rows=500 width=96)
Filter: ((a = l1.a) AND (b = l2.a))
ForeignPath* best_path = {FOREIGNPATH
:pathtype 120
:parent_relids (b 3)
:required_outer (b 1 2)
:rows 500
:startup_cost 0.00
:total_cost 0.00
:pathkeys <>
:fdw_private <>
}
On 18/03/2013, at 4:09 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Adam Zegelin <adam(at)relational(dot)io> writes:
>> Some service endpoints have no concept of unqualified queries. In the example above, a ‘sequence scan’ of Bing is a not possible.
>
> In that case, you shouldn't be generating such a path. But keep in mind
> that this may lead to failure to produce any plan at all for some
> queries. If the foreign data source is really so broken that it can't
> do that, then you have little choice ... but you shouldn't be thinking
> of that as anything but a broken design decision on their part.
I tried adding a condition that would prevent the non-parameterised path from being generated if the service only supported parameterised scans. Postgres refuses to generate a plan: "ERROR: could not devise a query plan for the given query". I did a bit of digging and this error is generated by pathnode.c:set_cheapest . As there is no non-parameterised `cheapest_total_path` the error is raised (line 253).
For now, I just add an expensive non-pramerterised path and let the FDW throw an error if no qual is found involving the required columns.
Regards,
Adam
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2013-03-19 01:00:00 | Re: Roadmap for Postgres on AIX |
Previous Message | Christophe Pettus | 2013-03-18 23:36:55 | .backup file documentation |