Skip site navigation (1) Skip section navigation (2)

Re: pgsql_fdw, FDW for PostgreSQL server

From: Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>
To: Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>, Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pgsql_fdw, FDW for PostgreSQL server
Date: 2012-02-05 20:08:10
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-hackers
2012年2月1日12:15 Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>:
> (2012/01/30 4:39), Kohei KaiGai wrote:
>> I checked the "fdw_helper_funcs_v3.patch", "pgsql_fdw_v5.patch" and
>> "pgsql_fdw_pushdown_v1.patch". My comments are below.
> Thanks for the review!
>> [BUG]
>> Even though pgsql_fdw tries to push-down qualifiers being executable
>> on the remove side at the deparseSql(), it does not remove qualifiers
>> being pushed down from the baserel->baserestrictinfo, thus, these
>> qualifiers are eventually executed twice.
>> See the result of this EXPLAIN.
>>    postgres=# EXPLAIN SELECT * FROM ft1 WHERE a>  2 AND f_leak(b);
>>                                                  QUERY PLAN
>>    ------------------------------------------------------------------------------------------------------
>>     Foreign Scan on ft1  (cost=107.43..122.55 rows=410 width=36)
>>       Filter: (f_leak(b) AND (a>  2))
>>       Remote SQL: DECLARE pgsql_fdw_cursor_0 SCROLL CURSOR FOR SELECT
>> a, b FROM public.t1 WHERE (a>  2)
>>    (3 rows)
>> My expectation is (a>  2) being executed on the remote-side and f_leak(b)
>> being executed on the local-side. But, filter of foreign-scan on ft1 has both
>> of qualifiers. It has to be removed, if a RestrictInfo get pushed-down.
> It's intentional that pgsql_fdw keeps pushed-down qualifier in
> baserestrictinfo, because I saw some undesirable behavior when I
> implemented so that with such optimization when plan is reused, but it's
> not clear to me now.  I'll try to recall what I saw...
> BTW, I think evaluating pushed-down qualifiers again on local side is
> safe and has no semantic problem, though we must pay little for such
> overhead.  Do you have concern about performance?
Yes. In my opinion, one significant benefit of pgsql_fdw is to execute
qualifiers on the distributed nodes; that enables to utilize multiple
CPU resources efficiently.
Duplicate checks are reliable way to keep invisible tuples being filtered
out, indeed. But it shall degrade one competitive characteristics of the
In my module, qualifiers being executable on device side are detached
from the baserel->baserestrictinfo, and remaining qualifiers are chained
to the list.
The is_device_executable_qual() is equivalent to is_foreign_expr() in
the pgsql_fdw module.

Of course, it is your decision, and I might miss something.

BTW, what is the undesirable behavior on your previous implementation?

>> [Design comment]
>> I'm not sure the reason why store_result() uses MessageContext to save
>> the Tuplestorestate within PgsqlFdwExecutionState.
>> The source code comment says it is used to avoid memory leaks in error
>> cases. I also have a similar experience on implementation of my fdw module,
>> so, I could understand per-scan context is already cleared at the timing of
>> resource-release-callback, thus, handlers to external resource have to be
>> saved on separated memory context.
>> In my personal opinion, the right design is to declare a memory context for
>> pgsql_fdw itself, instead of the abuse of existing memory context.
>> (More wise design is to define sub-memory-context for each foreign-scan,
>> then, remove the sub-memory-context after release handlers.)
> I simply chose built-in context which has enough lifespan, but now I
> think that using MessageContext directly is not recommended way.  As you
> say, creating new context as child of MessageContext for each scan in
> BeginForeignScan (or first IterateForeignScan) would be better.  Please
> see attached patch.
> One other option is getting rid of tuplestore by holding result rows as
> PGresult, and track it for error cases which might happen.
> ResourceOwner callback can be used to release PGresult on error, similar
> to PGconn.
If we could have set of results on per-query memory context (thus,
no need to explicit release on error timing), it is more ideal design.
It it possible to implement based on the libpq APIs?

Please note that per-query memory context is already released on
ResourceOwner callback is launched, so, it is unavailable to implement
if libpq requires to release some resource.

>> [Design comment]
>> When "BEGIN" should be issued on the remote-side?
>> The connect_pg_server() is an only chance to issue "BEGIN" command
>> at the remote-side on connection being opened. However, the connection
>> shall be kept unless an error is not raised. Thus, the remote-side will
>> continue to work within a single transaction block, even if local-side iterates
>> a pair of "BEGIN" and "COMMIT".
>> I'd like to suggest to close the transaction block at the timing of either
>> end of the scan, transaction or sub-transaction.
> Indeed, remote transactions should be terminated at some timing.
> Terminating at the end of a scan seems troublesome because a connection
> might be shared by multiple scans in a query.  I'd prefer aborting
> remote transaction at the end of local query.  Please see
> abort_remote_tx in attached patch.
It seems to me abort_remote_tx in ReleaseConnection() is reasonable.
However, isn't it needed to have ABORT in GetConnection() at first time?

>> [Comment to Improve]
>> Also, which transaction isolation level should be specified in this case?
>> An idea is its isolation level is specified according to the current isolation
>> level on the local-side.
>> (Of course, it is your choice if it is not supported right now.)
> Choosing same as local seems better.  Please see start_remote_tx
> function in attached patch.
It seems to me reasonable.

>> [Comment to improve]
>> It seems to me the design of exprFunction is not future-proof, if we add
>> a new node type that contains two or more function calls, because it can
>> return an oid of functions.
>> I think, the right design is to handle individual node-types within the
>> large switch statement at foreign_expr_walker().
>> Of course, it is just my sense.
> You mean that exprFunction should have capability to handle multiple
> Oids for one node, maybe return List<oid> or something?  IMO it's
> overkill at this time.
> Though I'm not sure that it's reasonable, but exprInputCollation too
> seems to not assume that multiple input collation might be stored in one
> node.
I'm still skeptical on the current logic to determine whether qualifier
is available to push down, or not.

For example, it checks oid of operator and oid of function being
invoked on this operator at OpExpr.
However, the purpose of this check is to ensure same result on
execution of qualifier, thus it restricts qualifiers to be pushed down
built-in database objects.
So, isn't it enough to check whether oid of OpExpr is built-in, or not?
(If oid of operator is built-in, its function is also built-in. Right?)

>> [Comment to improve]
>> The pgsql_fdw_handler() allocates FdwRoutine using makeNode(),
>> then it set function-pointers on each fields.
>> Why don't you return a pointer to statically declared FdwRoutine
>> variable being initialized at compile time, like:
>>    static FdwRoutine pgsql_fdw_handler = {
>>        .type               = T_FdwRoutine,
>>        .PlanForeignScan    = pgsqlPlanForeignScan,
>>        .ExplainForeignScan = pgsqlExplainForeignScan,
>>        .BeginForeignScan   = pgsqlBeginForeignScan,
>>        .IterateForeignScan = pgsqlIterateForeignScan,
>>        .ReScanForeignScan  = pgsqlReScanForeignScan,
>>        .EndForeignScan     = pgsqlEndForeignScan,
>>    };
>>    Datum
>>    pgsql_fdw_handler(PG_FUNCTION_ARGS)
>>    {
>>          PG_RETURN_POINTER(&pgsql_fdw_handler);
>>    }
> Fixed to static variable without designated initializers, because it's
> one of C99 feature.  Is there any written policy about using C99
> features in PG development?  I've read Developer's FAQ (wiki) and code
> formatting section of document, but I couldn't find any mention.
OK. It was my wrong suggestion.

>> [Question to implementation]
>> At pgsqlIterateForeignScan(), it applies null-check on festate->tuples
>> and bool-checks on festete->cursor_opened.
>> Do we have a possible scenario that festate->tuples is not null, but
>> festate->cursor_opened, or an opposite combination?
>> If null-check on festate->tuples is enough to detect the first call of
>> the iterate callback, it is not my preference to have redundant flag.
> [ checking... ] No such scenario.  It's undesired remain of obsolete
> support of simple SELECT mode; pgsql_fdw once had two fetching mode;
> simple SELECT mode for small result and CURSOR mode for huge result.
> I've removed cursor_opened from PgsqlFdwExecutionState structure.
OK, I understood the background of this design.

> [Extra change]
> In addition to your comments, I found that some regression tests fail
> because current planner produces different plan tree from expected
> results, and fixed such tests.

KaiGai Kohei <kaigai(at)kaigai(dot)gr(dot)jp>

In response to


pgsql-hackers by date

Next:From: Simon RiggsDate: 2012-02-05 20:40:09
Subject: Re: 16-bit page checksums for 9.2
Previous:From: Andrew DunstanDate: 2012-02-05 19:56:20
Subject: Re: JSON output functions.

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group