Re: Foreign join pushdown vs EvalPlanQual

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>
Cc: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>
Subject: Re: Foreign join pushdown vs EvalPlanQual
Date: 2015-10-17 00:58:04
Message-ID: CA+Tgmoba0+5G2USTmaW8v33cooDLFm2kZw0V3L6b9w9Q3ZTNDg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Oct 16, 2015 at 6:12 PM, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com> wrote:
> I think, it is right approach to pretend EPQ doesn't exist if scanrelid==0,
> because what replaced by these ForeignScan/CustomScan node are local join
> node like NestLoop. They don't have its own EPQ slot, but constructs joined-
> tuple based on the underlying scan-tuple originally stored within EPQ slots.

I think you've got that backwards. The fact that they don't have
their own EPQ slot is the problem we need to solve. When an EPQ
recheck happens, we rescan every relation in the query. Each relation
needs to return 0 or 1 tuples. If it returns a tuple, the tuple it
returns must be either the same tuple it previously returned, or an
updated version of that tuple. But "the tuple it previously returned"
does not necessarily mean the tuple it returned most recently. It
means the tuple that it returned which, when passed through the rest
of the plan, contributed to generate the result tuple that is being
rechecked.

Now, if you don't have an EPQ slot, how are you going to do this?
When the EPQ machinery engages, you need to somehow get the tuple you
previously returned stored someplace. And the first time thereafter
that you get called by ExecProcNode, you need to return that tuple,
provided that it still passes the quals. The second time you get
called, and any subsequent times, you need to return an empty slot.
The EPQ slot is well-suited to this task. It's got a TupleTableSlot
to store the tuple you need to return, and it's got a flag indicating
whether you've already returned that tuple. So you're good.

But with Etsuro Fujita's patch, and I think what you have proposed has
been similar, how are you going to do it? The proposal is to call the
recheck method and hope for the best, but what is the recheck method
going to do? Where is it going to get the previously-returned tuple?
How will it know if it has already returned it during the lifetime of
this EPQ check? Offhand, it looks to me like, at least in some
circumstances, you're probably going to return whatever tuple you
returned most recently (which has a good chance of being the right
one, but not necessarily) over and over again. That's not going to
fly.

The bottom line is that a foreign scan that is a pushed-down join is
still a *scan*, and every already-existing scan type has an EPQ slot
*for a reason*. They *need* it in order to deliver the correct
behavior. And foreign scans and custom scans need it to, and for the
same reason.

>> Again, the root of the problem here is that the EPQ machinery provides
>> 1 slot per RTI, and it uses scanrelid to figure out which EPQ slot is
>> applicable for a given scan node. Here, we have scanrelid == 0, so it
>> gets confused. But it's not the case that a pushed-down join has NO
>> scanrelid. It actually has, in effect, *multiple* scanrelids. So we
>> should pick any one of those, say the lowest-numbered one, and use
>> that to decide which EPQ slot to use. The attached patch shows
>> roughly what I have in mind, although this is just crap code to
>> demonstrate the basic idea and doesn't pretend to adjust everything
>> that needs fixing.
>>
> One tricky point of this idea is ExecStoreTuple() in ExecScanFetch(),
> because the EPQ slot picked up by get_proxy_scanrelid() contains
> a tuple of base relation then it tries to put this tuple on the
> TupleTableSlot initialized to save the joined-tuple.
> Of course, recheckMtd is called soon, so callback will be able to
> handle the request correctly. However, it is a bit unnatural to store
> a tuple on incompatible TupleTableSlot.

I think that the TupleTableSlot is incompatible because the dummy
patch I posted only addresses half of the problem. I didn't do
anything about the code that stores stuff into an EPQ slot. If that
were also fixed, then the tuple which the patch retrieves from the
slot would not be incompatible.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2015-10-17 01:17:13 Re: Foreign join pushdown vs EvalPlanQual
Previous Message Noah Misch 2015-10-17 00:45:44 Re: Parallel Seq Scan