Re: Foreign join pushdown vs EvalPlanQual

From: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
To: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>
Subject: Re: Foreign join pushdown vs EvalPlanQual
Date: 2015-11-05 11:00:20
Message-ID: 563B36C4.7040407@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2015/11/04 18:50, Etsuro Fujita wrote:
> On 2015/11/04 17:10, Kouhei Kaigai wrote:
>>> On 2015/10/28 6:04, Robert Haas wrote:
>>>> On Tue, Oct 20, 2015 at 12:39 PM, Etsuro Fujita
>>>> <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>>>>> Sorry, my explanation was not correct. (Needed to take in
>>>>> caffeine.) What
>>>>> I'm concerned about is the following:
>>>>>
>>>>> SELECT * FROM localtab JOIN (ft1 LEFT JOIN ft2 ON ft1.x = ft2.x) ON
>>>>> localtab.id = ft1.id FOR UPDATE OF ft1

>>>>> If an EPQ recheck was invoked
>>>>> due to a concurrent transaction on the remote server that changed
>>>>> only the
>>>>> value x of the ft1 tuple previously retrieved, then we would have to
>>>>> generate a fake ft1/ft2-join tuple with nulls for ft2. (Assume that
>>>>> the ft2
>>>>> tuple previously retrieved was not a null tuple.) However, I'm not
>>>>> sure how
>>>>> we can do that in ForeignRecheck; we can't know for example, which
>>>>> one is
>>>>> outer and which one is inner, without an alternative local join
>>>>> execution
>>>>> plan. Maybe I'm missing something, though.

>>>> I would expect it to issue a new query like: SELECT * FROM ft1 LEFT
>>>> JOIN ft2 WHERE ft1.x = ft2.x AND ft1.tid = $0 AND ft2.tid = $1.

>>> We assume here that ft1 uses late row locking, so I thought the above
>>> SQL should include "FOR UPDATE of ft1". But I still don't think that
>>> that is right; the SQL with "FOR UPDATE of ft1" wouldn't generate the
>>> fake ft1/ft2-join tuple with nulls for ft2, as expected. The reason for
>>> that is that the updated version of the ft1 tuple wouldn't satisfy the
>>> ft1.tid = $0 condition in an EPQ recheck, because the ctid for the
>>> updated version of the ft1 tuple has changed. (IIUC, I think that if we
>>> use a TID scan for ft1, the SQL would generate the expected result,
>>> because I think that the TID condition would be ignored in the EPQ
>>> recheck, but I don't think it's guaranteed to use a TID scan for ft1.)
>>> Maybe I'm missing something, though.

>> It looks to me, we should not use ctid system column to identify remote
>> row when postgres_fdw tries to support late row locking.

>> The "rowid" should not be changed once it is fetched from the remote side
>> until it is actually updated, deleted or locked, for correct
>> identification.
>> If ctid is used for this purpose, it is safe only when remote row is
>> locked
>> when it is fetched - it is exactly early row locking behavior, isn't it?

> In case of SELECT FOR UPDATE, I think we are allowed to use ctid to
> identify target rows for late row locking, but I think the above SQL
> should be changed to something like this:
>
> SELECT * FROM (SELECT * FROM ft1 WHERE ft1.tid = $0 FOR UPDATE) ss1 LEFT
> JOIN (SELECT * FROM ft2 WHERE ft2.tid = $1) ss2 ON ss1.x = ss2.x

I noticed that the modofied SQL was still wrong; ss1 would produce no
tuple, if using eg, a sequential scan for ss1, as discussed above.
Sheesh, where is my brain?

I still think we are allowed to do that, but what is the right SQL for
that? In the current implementation of postgres_fdw, we need not take
into consideration that what was fetched was an updated version of the
tuple rather than the same version previously obtained, since that
always uses at least REPEATABLE READ in the remote session. But
otherwise it would be possible that what was fetched was an updated
version of the tuple, having a different ctid value, which wouldn't
satisfy the condition like "ft1.tid = $0" in ss1 any more.

Best regards,
Etsuro Fujita

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Haribabu Kommi 2015-11-05 11:20:19 Re: [PATCH] Skip ALTER x SET SCHEMA if the schema didn't change
Previous Message Fabien COELHO 2015-11-05 10:34:50 Re: extend pgbench expressions with functions