Re: Foreign join pushdown vs EvalPlanQual

From: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
To: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>
Subject: Re: Foreign join pushdown vs EvalPlanQual
Date: 2015-11-04 09:50:23
Message-ID: 5639D4DF.5020709@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2015/11/04 17:10, Kouhei Kaigai wrote:
>> On 2015/10/28 6:04, Robert Haas wrote:
>>> On Tue, Oct 20, 2015 at 12:39 PM, Etsuro Fujita
>>> <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>>>> Sorry, my explanation was not correct. (Needed to take in caffeine.) What
>>>> I'm concerned about is the following:
>>>>
>>>> SELECT * FROM localtab JOIN (ft1 LEFT JOIN ft2 ON ft1.x = ft2.x) ON
>>>> localtab.id = ft1.id FOR UPDATE OF ft1
>>>>
>>>> LockRows
>>>> -> Nested Loop
>>>> Join Filter: (localtab.id = ft1.id)
>>>> -> Seq Scan on localtab
>>>> -> Foreign Scan on <ft1, ft2>
>>>> Remote SQL: SELECT * FROM ft1 LEFT JOIN ft2 WHERE ft1.x = ft2.x
>>>> FOR UPDATE OF ft1
>>>>
>>>> Assume that ft1 performs late row locking.

>>> If the SQL includes "FOR UPDATE of ft1", then it clearly performs
>>> early row locking. I assume you meant to omit that.

>>>> If an EPQ recheck was invoked
>>>> due to a concurrent transaction on the remote server that changed only the
>>>> value x of the ft1 tuple previously retrieved, then we would have to
>>>> generate a fake ft1/ft2-join tuple with nulls for ft2. (Assume that the ft2
>>>> tuple previously retrieved was not a null tuple.) However, I'm not sure how
>>>> we can do that in ForeignRecheck; we can't know for example, which one is
>>>> outer and which one is inner, without an alternative local join execution
>>>> plan. Maybe I'm missing something, though.

>>> I would expect it to issue a new query like: SELECT * FROM ft1 LEFT
>>> JOIN ft2 WHERE ft1.x = ft2.x AND ft1.tid = $0 AND ft2.tid = $1.

>> We assume here that ft1 uses late row locking, so I thought the above
>> SQL should include "FOR UPDATE of ft1". But I still don't think that
>> that is right; the SQL with "FOR UPDATE of ft1" wouldn't generate the
>> fake ft1/ft2-join tuple with nulls for ft2, as expected. The reason for
>> that is that the updated version of the ft1 tuple wouldn't satisfy the
>> ft1.tid = $0 condition in an EPQ recheck, because the ctid for the
>> updated version of the ft1 tuple has changed. (IIUC, I think that if we
>> use a TID scan for ft1, the SQL would generate the expected result,
>> because I think that the TID condition would be ignored in the EPQ
>> recheck, but I don't think it's guaranteed to use a TID scan for ft1.)
>> Maybe I'm missing something, though.

> It looks to me, we should not use ctid system column to identify remote
> row when postgres_fdw tries to support late row locking.
>
> The documentation says:
> http://www.postgresql.org/docs/devel/static/fdw-callbacks.html#FDW-CALLBACKS-UPDATE
>
> UPDATE and DELETE operations are performed against rows previously
> fetched by the table-scanning functions. The FDW may need extra information,
> such as a row ID or the values of primary-key columns, to ensure that it can
> identify the exact row to update or delete
>
> The "rowid" should not be changed once it is fetched from the remote side
> until it is actually updated, deleted or locked, for correct identification.
> If ctid is used for this purpose, it is safe only when remote row is locked
> when it is fetched - it is exactly early row locking behavior, isn't it?

Yeah, we should use early row locking for a target foreign table in
UPDATE/DELETE.

In case of SELECT FOR UPDATE, I think we are allowed to use ctid to
identify target rows for late row locking, but I think the above SQL
should be changed to something like this:

SELECT * FROM (SELECT * FROM ft1 WHERE ft1.tid = $0 FOR UPDATE) ss1 LEFT
JOIN (SELECT * FROM ft2 WHERE ft2.tid = $1) ss2 ON ss1.x = ss2.x

>>> This should be significantly more efficient than fetching the base
>>> rows from each of two tables with two separate queries.

>> Maybe I think we could fix the SQL, so I have to admit that, but I'm
>> just wondering (1) what would happen for the case when ft1 uses late row
>> rocking and ft2 uses early row rocking and (2) that would be still more
>> efficient than re-fetching only the base row from ft1.

> It should be decision by FDW driver. It is not easy to estimate a certain
> FDW driver mixes up early and late locking policy within a same remote join
> query. Do you really want to support such a mysterious implementation?

Yeah, the reason for that is because GetForeignRowMarkType allows that.

> Or, do you expect all the FDW driver is enforced to return a joined tuple
> if remote join case?

No. That wouldn't make sense if at least one component table involved
in a foreign join uses the rowmark type other than ROW_MARK_COPY.

> It is different from my idea; it shall be an extra
> optimization option if FDW can fetch a joined tuple at once, but not always.
> So, if FDW driver does not support this optimal behavior, your driver can
> fetch two base tables then run local alternative join (or something other).

OK, so if we all agree that the joined-tuple optimization is just an
option for the case where all the component tables use ROW_MARK_COPY,
I'd propose to leave that for 9.6.

Best regards,
Etsuro Fujita

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2015-11-04 09:59:02 Re: Foreign join pushdown vs EvalPlanQual
Previous Message Antonin Houska 2015-11-04 09:12:21 Re: Bitmap index scans use of filters on available columns