Re: pgsql_fdw, FDW for PostgreSQL server

From: Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>
To: Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>, Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pgsql_fdw, FDW for PostgreSQL server
Date: 2012-02-06 08:37:16
Message-ID: 4F2F913C.1080209@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks for the comments!

(2012/02/06 5:08), Kohei KaiGai wrote:
> Yes. In my opinion, one significant benefit of pgsql_fdw is to execute
> qualifiers on the distributed nodes; that enables to utilize multiple
> CPU resources efficiently.
> Duplicate checks are reliable way to keep invisible tuples being filtered
> out, indeed. But it shall degrade one competitive characteristics of the
> pgsql_fdw.
>
> https://github.com/kaigai/pg_strom/blob/master/plan.c#L693
> In my module, qualifiers being executable on device side are detached
> from the baserel->baserestrictinfo, and remaining qualifiers are chained
> to the list.
> The is_device_executable_qual() is equivalent to is_foreign_expr() in
> the pgsql_fdw module.

Agreed, I too think that pushed-down qualifiers should not be evaluated
on local side again from the viewpoint of performance.

> Of course, it is your decision, and I might miss something.
>
> BTW, what is the undesirable behavior on your previous implementation?

In early development, maybe during testing PREPARE/EXECUTE or DECALRE, I
saw that iterated execution of foreign scan produce wrong result which
includes rows which are NOT match pushed-down qualifiers. And, at last,
I could recall what happened at that time. It was just trivial bug I
made. Perhaps I've removed pushed-down qualifiers in Path generation
phase, so generated plan node has lost qualifiers permanently.

In short, I'll remove remote qualifiers from baserestrictinfo, like
pg_storm.

>>> [Design comment]
>>> I'm not sure the reason why store_result() uses MessageContext to save
>>> the Tuplestorestate within PgsqlFdwExecutionState.
>>> The source code comment says it is used to avoid memory leaks in error
>>> cases. I also have a similar experience on implementation of my fdw module,
>>> so, I could understand per-scan context is already cleared at the timing of
>>> resource-release-callback, thus, handlers to external resource have to be
>>> saved on separated memory context.
>>> In my personal opinion, the right design is to declare a memory context for
>>> pgsql_fdw itself, instead of the abuse of existing memory context.
>>> (More wise design is to define sub-memory-context for each foreign-scan,
>>> then, remove the sub-memory-context after release handlers.)
>>
>> I simply chose built-in context which has enough lifespan, but now I
>> think that using MessageContext directly is not recommended way. As you
>> say, creating new context as child of MessageContext for each scan in
>> BeginForeignScan (or first IterateForeignScan) would be better. Please
>> see attached patch.
>>
>> One other option is getting rid of tuplestore by holding result rows as
>> PGresult, and track it for error cases which might happen.
>> ResourceOwner callback can be used to release PGresult on error, similar
>> to PGconn.
>>
> If we could have set of results on per-query memory context (thus,
> no need to explicit release on error timing), it is more ideal design.
> It it possible to implement based on the libpq APIs?

Currently no, so I used tuplestore even though it needs coping results.
However, Kyotaro Horiguchi's patch might make it possible. I'm reading
his patch to determine whether it suits pgsql_fdw.

http://archives.postgresql.org/message-id/20120202143057.GA12434@gmail.com

> Please note that per-query memory context is already released on
> ResourceOwner callback is launched, so, it is unavailable to implement
> if libpq requires to release some resource.

I see. We need to use context which has longer lifetime if we want to
track malloc'ed PQresult. I already use CacheContext for connection
pooling, so linking PGreslts to its source connection would be a solutions.

>>> [Design comment]
>>> When "BEGIN" should be issued on the remote-side?
>>> The connect_pg_server() is an only chance to issue "BEGIN" command
>>> at the remote-side on connection being opened. However, the connection
>>> shall be kept unless an error is not raised. Thus, the remote-side will
>>> continue to work within a single transaction block, even if local-side iterates
>>> a pair of "BEGIN" and "COMMIT".
>>> I'd like to suggest to close the transaction block at the timing of either
>>> end of the scan, transaction or sub-transaction.
>>
>> Indeed, remote transactions should be terminated at some timing.
>> Terminating at the end of a scan seems troublesome because a connection
>> might be shared by multiple scans in a query. I'd prefer aborting
>> remote transaction at the end of local query. Please see
>> abort_remote_tx in attached patch.
>>
> It seems to me abort_remote_tx in ReleaseConnection() is reasonable.
> However, isn't it needed to have ABORT in GetConnection() at first time?

Hm, forcing overhead of aborting transaction to all local queries is
unreasonable. Redundant BEGIN doesn't cause error but just generate
WARNING, so I'll remove abort_remote_tx preceding begin_remote_tx.

>>> [Comment to improve]
>>> It seems to me the design of exprFunction is not future-proof, if we add
>>> a new node type that contains two or more function calls, because it can
>>> return an oid of functions.
>>> I think, the right design is to handle individual node-types within the
>>> large switch statement at foreign_expr_walker().
>>> Of course, it is just my sense.
>>
>> You mean that exprFunction should have capability to handle multiple
>> Oids for one node, maybe return List<oid> or something? IMO it's
>> overkill at this time.
>> Though I'm not sure that it's reasonable, but exprInputCollation too
>> seems to not assume that multiple input collation might be stored in one
>> node.
>>
> I'm still skeptical on the current logic to determine whether qualifier
> is available to push down, or not.
>
> For example, it checks oid of operator and oid of function being
> invoked on this operator at OpExpr.
> However, the purpose of this check is to ensure same result on
> execution of qualifier, thus it restricts qualifiers to be pushed down
> built-in database objects.
> So, isn't it enough to check whether oid of OpExpr is built-in, or not?
> (If oid of operator is built-in, its function is also built-in. Right?)

Basically yes. If a built-in operator has different semantics on remote
side (probably pg_operator catalog has been changed manually by
superuser), it can be said that nothing is reliable in the context of
pgsql_fdw. I added checking function's oid of operators just in case,
and now it seems to me paranoid concern...

I'll post revised patches soon.

Regards,
--
Shigeru Hanada

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2012-02-06 08:46:34 Re: 16-bit page checksums for 9.2
Previous Message Heikki Linnakangas 2012-02-06 07:51:34 Re: 16-bit page checksums for 9.2