Re: [HACKERS] postgres_fdw bug in 9.6

From: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] postgres_fdw bug in 9.6
Date: 2018-01-15 08:09:19
Message-ID: 5A5C61AF.40308@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

(2018/01/13 6:07), Robert Haas wrote:
> On Sun, Dec 3, 2017 at 2:56 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> I thought some more about this. While it seems clear that we don't
>> actually need to recheck anything within a postgres_fdw foreign join,
>> there's still a problem: we have to be able to regurgitate the join
>> row during postgresRecheckForeignScan. Since, in the current system
>> design, the only available data is scan-level rowmarks (that is,
>> whole-row values from each base foreign relation), there isn't any
>> very good way to produce the join row except to insert the rowmark
>> values into dummy scan nodes and then re-execute the join locally.
>> So really, that is what the outerPlan infrastructure is doing for us.
>
> I started to look at this patch again today and got cold feet. It
> seems to me that this entire design on the posted patch is based on
> your remarks in http://postgr.es/m/13242.1481582736@sss.pgh.pa.us --
>
> # One way that we could make things better is to rely on the knowledge
> # that EPQ isn't asked to evaluate joins for more than one row per input
> # relation, and therefore using merge or hash join technology is really
> # overkill. We could make a tree of simple nestloop joins, which aren't
> # going to care about sort order, if we could identify the correct join
> # clauses to apply.

That's right.

> However, those remarks are almost entirely wrong. It's true that the
> relations for which we have EPQ tuples will only ever output a single
> tuple, but because of what you said in the quoted text above, this
> does NOT imply that the recheck plan never needs to worry about
> dealing with a large number of rows, as the following example shows.
> (The part about making a tree of simple nestloop joins is wrong not
> only for this reason but because it won't work in the executor, as
> previously discussed.)
>
> rhaas=# explain select * from foo join (bar join baz on bar.a = baz.a)
> on foo.a = bar.a where foo.b = 'hello' for update;
> QUERY PLAN
> -----------------------------------------------------------------------------------------------
> LockRows (cost=109.02..121.46 rows=1 width=162)
> -> Merge Join (cost=109.02..121.45 rows=1 width=162)
> Merge Cond: (bar.a = foo.a)
> -> Foreign Scan (cost=100.58..4914.58 rows=40000 width=119)
> Relations: (public.bar) INNER JOIN (public.baz)
> -> Hash Join (cost=2556.00..4962.00 rows=40000 width=119)
> Hash Cond: (bar.a = baz.a)
> -> Foreign Scan on bar (cost=100.00..1956.00
> rows=40000 width=28)
> -> Hash (cost=1956.00..1956.00 rows=40000 width=27)
> -> Foreign Scan on baz
> (cost=100.00..1956.00 rows=40000 width=27)
> -> Sort (cost=8.44..8.45 rows=1 width=28)
> Sort Key: foo.a
> -> Index Scan using foo_b_idx on foo (cost=0.41..8.43
> rows=1 width=28)
> Index Cond: (b = 'hello'::text)
>
> There's really nothing to prevent the bar/baz join from producing
> arbitrarily many rows, which means that it's not good to turn the
> recheck plan into a Nested Loop unconditionally -- and come to think
> of it, I'm pretty sure this kind of case is why
> GetExistingLocalJoinPath() tries to using an existing path: it wants a
> path that isn't going suck if it gets executed.

Yeah, but I don't think the above example is good enough to explain
that, because I think the bar/baz join would produce at most one tuple
in an EPQ recheck since we would have only one EPQ tuple for both bar
and baz in that recheck, and the join is inner. I think such an example
would probably be given e.g., by a modified version of the SQL where we
have a full join of bar and baz, not an inner join.

> After some thought, it seems that there's a much simpler way that we
> could fix the problem you identified in that original email: if the
> EPQ path isn't properly sorted, have postgres_fdw's
> add_paths_with_pathkeys_for_rel stick a Sort node on top of it. I
> tried this and it does indeed fix Jeff Janes' initial test case.
> Patch attached.

The patch looks good to me.

> One could do better here if one wanted to revise
> GetExistingLocalJoinPath() to take Last *pathkeys as an additional
> argument; we could then try to find the optimal EPQ path for each
> possible set of sortkeys. But I don't feel the need to work on that
> right now, and it would be nicer not to back-patch a change to the
> signature of GetExistingLocalJoinPath(), as has already been noted.

That makes sense to me.

> The only problem that
> *has actually been demonstrated* is that we can end up using as an EPQ
> path something that doesn't generate the right sort order, and that is
> easily fixed by making it do so, which the attached patch does. So I
> propose we commit and back-patch this and move on.

Agreed.

Best regards,
Etsuro Fujita

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2018-01-15 08:10:09 Re: [HACKERS] Useless code in ExecInitModifyTable
Previous Message Michael Paquier 2018-01-15 07:25:37 Re: BUG #14999: pg_rewind corrupts control file global/pg_control