Re: FDW for PostgreSQL

From: Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>
To: Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>
Cc: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: FDW for PostgreSQL
Date: 2012-11-21 10:31:58
Message-ID: CADyhKSWxpk4rn+un98NzfoXc=foLPoH9LV5sKjM0NpjvB5qihQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2012/11/21 Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>:
> Thank for the comment!
>
> On Tue, Nov 20, 2012 at 10:23 PM, Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp> wrote:
>>
>> I also think the new "use_remote_explain" option is good. It works fine
>> when we try to use this fdw over the network with latency more or less.
>> It seems to me its default is "false", thus, GetForeignRelSize will return
>> the estimated cost according to ANALYZE, instead of remote EXPLAIN.
>> Even though you mention the default behavior was not changed, is it
>> an expected one? My preference is the current one, as is.
>
>
> Oops, I must have focused on only cost factors.
> I too think that using local statistics is better as default behavior,
> because foreign tables would be used for relatively stable tables.
> If target tables are updated often, it would cause problems about
> consistency, unless we provide full-fledged transaction mapping.
>
>>
>> The deparseFuncExpr() still has case handling whether it is explicit case,
>> implicit cast or regular functions. If my previous proposition has no
>> flaw,
>> could you fix up it using regular function invocation manner? In case when
>> remote node has incompatible implicit-cast definition, this logic can make
>> a problem.
>
>
> Sorry, I overlooked this issue. Fixed to use function call notation
> for all of explicit function calls, explicit casts, and implicit casts.
>
>> At InitPostgresFdwOptions(), the source comment says we don't use
>> malloc() here for simplification of code. Hmm. I'm not sure why it is more
>> simple. It seems to me we have no reason why to avoid malloc here, even
>> though libpq options are acquired using malloc().
>
>
> I used "simple" because using palloc avoids null-check and error handling.
> However, many backend code use malloc to allocate memory which lives
> as long as backend process itself, so I fixed.
>
>>
>> Regarding to the regression test.
>
> [snip]
>>
>> I guess this test tries to check a case when remote column has
>> incompatible
>> data type with local side. Please check timestamp_out(). Its output
>> format follows
>> "datestyle" setting of GUC, and it affects OS configuration on initdb
>> time.
>> (Please grep "datestyle" at initdb.c !) I'd like to recommend to use
>> another data type
>> for this regression test to avoid false-positive detection.
>
>
> Good catch. :)
> I fixed the test to use another data type, user defined enum.
>
One other thing I noticed.

At execute_query(), it stores the retrieved rows onto tuplestore of
festate->tuples at once. Doesn't it make problems when remote-
table has very big number of rows?

IIRC, the previous code used cursor feature to fetch a set of rows
to avoid over-consumption of local memory. Do we have some
restriction if we fetch a certain number of rows with FETCH?
It seems to me, we can fetch 1000 rows for example, and tentatively
store them onto the tuplestore within one PG_TRY() block (so, no
need to worry about PQclear() timing), then we can fetch remote
rows again when IterateForeignScan reached end of tuplestore.

Please point out anything if I missed something.

Anyway, I'll check this v4 patch simultaneously.

Thanks,
--
KaiGai Kohei <kaigai(at)kaigai(dot)gr(dot)jp>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Boszormenyi Zoltan 2012-11-21 10:35:00 Re: [PATCH] Make pg_basebackup configure and start standby [Review]
Previous Message Michael Paquier 2012-11-21 09:35:34 Re: logical changeset generation v3