Re: PG9.2 and FDW query planning.

From: Ronan Dunklau <rdunklau(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PG9.2 and FDW query planning.
Date: 2012-07-12 13:14:18
Message-ID: 4FFECDAA.80107@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 05/07/2012 18:30, Tom Lane wrote:
> Ronan Dunklau <rdunklau(at)gmail(dot)com> writes:
>> Let's say I have an IMAP foreign data wrapper, and I write a
>> query joining the table on itself using the In-Reply-To and
>> Message-ID headers, is there anything I can do to avoid fetching
>> all the mails from the remote server ?
>
>> If I could somehow inform the planner that it can look up rows
>> by message-id, thus avoiding the need to fetch everything from
>> the remote server. Perhaps "persuading" the planner to use a
>> nested-loop ?
>
> OK, so what you're saying is that the imap server can effectively
> provide an index on message_id. What you'd do is create a
> parameterized path that uses the tbl.message_id =
> other_tbl.in_reply_to join clause. If that's enough cheaper than a
> full scan, the planner would select it.

Thank you, I was able to build such paths from your indication.

The python FDW implementor can optionally give a list of tuples
consisting of (path key, expected_number_of_row). So, in the imap
example that could be [('Message-ID', 1), ('From', 1000)] for example.

- From this information, if there is an equivalence class which
restrictinfo uses one of those keys, we build a parameterized path,
with an associated cost of base_width * expected_number_of_row, in
addition to the generic, unparameterized path.

The planner can then select this path, and build plans looking like this:

postgres=# explain select m1."From",
m1."To",
m2."From",
m2."To"
from mails m1 inner join mails m2 on m2."Message-ID" = m1."In-Reply-To"
where m1."From" = '%test(at)example(dot)com%';

QUERY PLAN
- --------------------------------------------------------------------
Nested Loop (cost=10.00..60001000.00 rows=500000000 width=128)
-> Foreign Scan on mails m1 (cost=0.00..30000000.00 rows=100000
width=300)
Filter: (("From")::text = '%test(at)example(dot)com%'::text)
-> Foreign Scan on mails m2 (cost=10.00..300.00 rows=1 width=300)
Filter: (("Message-ID")::text = (m1."In-Reply-To")::text)

If I understand it correctly, after returning a ForeignScan (from
GetForeignPlan), the planner decides to use a nestloop, and in the
process of creating the nestloop plan, replaces Var nodes coming from
the outerel (here, m1."In-Reply-To") by params nodes.

My current implementation already looks for (var = param) expressions
that it may handle during the plan phase and stores the association
between the var and the param_id.
At execution time, the needed parameters values are fetched (from the
ParamExecData array found in es_param_exec_vals) and passed to the
python foreign data wrapper.

The problem I have: how can I retrieve the generated params and keep
the association between the var and the param ?

Should I replace the (var = outervar) clauses by (var = param) myself
and store them in the fdw_exprs field of the foreign scan ?

> FWIW, I'm not sure that it's sane to try to expose this stuff to
> python yet. It's complicated and still something of a moving
> target. Once we've got a few more C-coded FDWs that can do this
> type of optimization, things might be stable enough that it'd be
> useful to try to provide a high-level API.

The current API (as mentioned above) would be more declarative than
anything, only offering a way to (maybe) build parameterized paths
without guaranteeing anything. Even if the internals change, I fail to
see how it can hurt to offer such a feature.

Regards,

- --
Ronan Dunklau

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)

iQEcBAEBAgAGBQJP/s2dAAoJECTYLCgFy3239KkIAIiKJo/F1r4Yp49wLpmThjQI
ICo910ZajqlUKVsl9ye8m2l6p+lyGEmZMWUAWP6ae2pqFR+aC0zThypjF1faZ9tN
HfqMbEKx/trkDf05U28tJlvOeu21tiEOEs4n02fmfdHu9SvemuLdyhU3dOLxoBVK
ZZ8ra9q/+zHCPpc3zt0Mow80Q1X1M3DtirsHPoeIdOK69wD4nD2ZfhQule5HaoV1
dG3FlrKGAGzRpohLBCuWzyGPcWCS584lXGWfhsz/waLaSDIjcjvaaMke54eaa8Ci
7KxXkMM12CKFQyheSR5VVwFJrobnME2HDiJCoAOkRc0dW+Y2aASJnKG/FwL8C7s=
=4RjB
-----END PGP SIGNATURE-----

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Shaun Thomas 2012-07-12 13:21:08 Re: Synchronous Standalone Master Redoux
Previous Message Dimitri Fontaine 2012-07-12 13:05:51 Re: Synchronous Standalone Master Redoux