Re: Optimization for updating foreign tables in Postgres FDW

From: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
To: Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Optimization for updating foreign tables in Postgres FDW
Date: 2014-07-25 07:30:19
Message-ID: 53D2078B.5020603@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

(2014/07/24 18:30), Shigeru Hanada wrote:
> My coworker Takaaki EITOKU reviewed the patch, and here are some
> comments from him.

Thank you for the review, Eitoku-san!

> 2014-07-08 16:07 GMT+09:00 Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>:
>> In the patch postgresPlanForeignModify has been modified so that if, in
>> addition to the above condition, the followings are satisfied, then the
>> ForeignScan and ModifyTable node will work that way.
>>
>> - There are no local BEFORE/AFTER triggers.
>
> The reason the check ignores INSTEAD OF triggers here is that INSTEAD
> OF trigger would prevent executing UPDATE/DELETE statement against a
> foreign tables at all, right?

I'm not sure that I understand your question correctly, but the reason
for that is because foreign tables cannot have INSTEAD OF triggers.

>> - In UPDATE it's safe to evaluate expressions to assign to the target
>> columns on the remote server.
>
> IIUC, in addition to target expressions, whole WHERE clause should be
> safe to be pushed-down. Though that is obviously required, but
> mentioning that in documents and source comments would help users and
> developers.

OK, I'll add the comments and documentation notes.

(I intentionaly didn't mention that because I think the comment for
postgresPlanForeignModify has already said the equivalent condition, ie,
"there are no local conditions", which means all conditions are executed
remotely.)

> We found that this patch speeds up DELETE case remarkably, as you
> describe above, but we saw only less than 2x speed on UPDATE cases.
> Do you have any numbers of UPDATE cases?

Here is the result for an UPDATE case.

>> On remote side:
>> postgres=# create table t (id serial primary key, inserted timestamp
>> default clock_timestamp(), data text);
>> CREATE TABLE
>> postgres=# insert into t(data) select random() from generate_series(0,
>> 99999);
>> INSERT 0 100000
>> postgres=# vacuum t;
>> VACUUM
>>
>> On local side:
>> postgres=# create foreign table ft (id integer, inserted timestamp, data
>> text) server myserver options (table_name 't');
>> CREATE FOREIGN TABLE

Unpatched:
postgres=# explain analyze verbose update ft set data = 'notsolongtext'
where id < 10000;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Update on public.ft (cost=100.00..147.38 rows=650 width=18) (actual
time=1463.944..1463.944 rows=0 loops=1)
Remote SQL: UPDATE public.t SET data = $2 WHERE ctid = $1
-> Foreign Scan on public.ft (cost=100.00..147.38 rows=650
width=18) (actual time=2.069..83.220 rows=9999 loops=1)
Output: id, inserted, 'notsolongtext'::text, ctid
Remote SQL: SELECT id, inserted, ctid FROM public.t WHERE ((id
< 10000)) FOR UPDATE
Planning time: 0.355 ms
Execution time: 1470.032 ms
(7 rows)

Patched:
postgres=# explain analyze verbose update ft set data = 'notsolongtext'
where id < 10000;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Update on public.ft (cost=100.00..147.38 rows=650 width=18) (actual
time=0.005..0.005 rows=0 loops=1)
-> Foreign Scan on public.ft (cost=100.00..147.38 rows=650
width=18) (actual time=0.001..0.001 rows=0 loops=1)
Output: id, inserted, data, ctid
Remote SQL: UPDATE public.t SET data = 'notsolongtext'::text
WHERE ((id < 10000))
Planning time: 0.197 ms
Execution time: 61.519 ms
(6 rows)

I think that the precise effect of this optimization for DELETE/UPDATE
would depend on eg, data, queries (inc. w/ or w/o RETRUNING clauses) and
server/network performance. Could you tell me these information about
the UPDATE evaluation?

> Some more random thoughts:
>
> * Naming of new behavior
> You named this optimization "Direct Update", but I'm not sure that
> this is intuitive enough to express this behavior. I would like to
> hear opinions of native speakers.

+1

> * Macros for # of fdw_private elements
> In postgres_fdw.c you defined MaxFdwScanFdwPrivateLength and
> MinFdwScanFdwPrivateLength for determining the mode, but number of
> fdw_private elements is not a ranged value but an enum value (I mean
> that fdw_private holds 3 or 7, not 4~6, so Max and Min seems
> inappropriate for prefix. IMO those macros should be named so that
> the names represent the behavior, or define a macro to determine the
> mode, say IS_SHIPPABLE(foo).

OK, Will fix.

> * Comparison of Macros
> Comparison against MaxFdwScanFdwPrivateLength and
> MinFdwScanFdwPrivateLength should be == instead of >= or <= to detect
> unexpected value. Adding assert macro seems good too.

Will fix this too.

Thanks,

Best regards,
Etsuro Fujita

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Albe Laurenz 2014-07-25 07:39:40 Re: Optimization for updating foreign tables in Postgres FDW
Previous Message Kyotaro HORIGUCHI 2014-07-25 07:23:40 Re: WAL replay bugs