Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: Shigeru HANADA <shigeru(dot)hanada(at)gmail(dot)com>
Cc: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Thom Brown <thom(at)linux(dot)com>, "pgsql-hackers(at)postgreSQL(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)
Date: 2015-04-29 06:10:10
Message-ID: CAFjFpRc8GQwLNQago-zmJKdqHDXh+C0XXZCaJ4ihyZJ+dB0_1g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Apr 24, 2015 at 3:08 PM, Shigeru HANADA <shigeru(dot)hanada(at)gmail(dot)com>
wrote:

> Hi Ashutosh,
>
> Thanks for the review.
>
> 2015/04/22 19:28、Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com> のメール:
> > Tests
> > -------
> > 1.The postgres_fdw test is re/setting enable_mergejoin at various
> places. The goal of these tests seems to be to test the sanity of foreign
> plans generated. So, it might be better to reset enable_mergejoin (and may
> be all of enable_hashjoin, enable_nestloop_join etc.) to false at the
> beginning of the testcase and set them again at the end. That way, we will
> also make sure that foreign plans are chosen irrespective of future planner
> changes.
>
> I have different, rather opposite opinion about it. I disabled other join
> types as least as the tests pass, because I worry oversights come from
> planner changes. I hope to eliminate enable_foo from the test script, by
> improving costing model smarter.
>

Ok, if you can do that, that will be excellent.

>
> > 2. In the patch, I see that the inheritance testcases have been deleted
> from postgres_fdw.sql, is that intentional? I do not see those being
> replaced anywhere else.
>
> It’s accidental removal, I restored the tests about inheritance feature.
>

Thanks.

>
> > 3. We need one test for each join type (or at least for INNER and LEFT
> OUTER) where there are unsafe to push conditions in ON clause along-with
> safe-to-push conditions. For INNER join, the join should get pushed down
> with the safe conditions and for OUTER join it shouldn't be. Same goes for
> WHERE clause, in which case the join will be pushed down but the
> unsafe-to-push conditions will be applied locally.
>
> Currently INNER JOINs with unsafe join conditions are not pushed down, so
> such test is not in the suit. As you say, in theory, INNER JOINs can be
> pushed down even they have push-down-unsafe join conditions, because such
> conditions can be evaluated no local side against rows retrieved without
> those conditions.
>
> > 4. All the tests have ORDER BY, LIMIT in them, so the setref code is
> being exercised. But, something like aggregates would test the setref code
> better. So, we should add at-least one test like select avg(ft1.c1 +
> ft2.c2) from ft1 join ft2 on (ft1.c1 = ft2.c1).
>
> Added an aggregate case, and also added an UNION case for Append.
>

Thanks.

>
> > 5. It will be good to add some test which contain join between few
> foreign and few local tables to see whether we are able to push down the
> largest possible foreign join tree to the foreign server.
> >
>
>
Are you planning to do anything on this point?

>
>
> > Code
> > -------
> > In classifyConditions(), the code is now appending RestrictInfo::clause
> rather than RestrictInfo itself. But the callers of classifyConditions()
> have not changed. Is this change intentional?
>
> Yes, the purpose of the change is to make appendConditions (former name is
> appendWhereClause) can handle JOIN ON clause, list of Expr.
>
> > The functions which consume the lists produced by this function handle
> expressions as well RestrictInfo, so you may not have noticed it. Because
> of this change, we might be missing some optimizations e.g. in function
> postgresGetForeignPlan()
> > 793 if (list_member_ptr(fpinfo->remote_conds, rinfo))
> > 794 remote_conds = lappend(remote_conds, rinfo->clause);
> > 795 else if (list_member_ptr(fpinfo->local_conds, rinfo))
> > 796 local_exprs = lappend(local_exprs, rinfo->clause);
> > 797 else if (is_foreign_expr(root, baserel, rinfo->clause))
> > 798 remote_conds = lappend(remote_conds, rinfo->clause);
> > 799 else
> > 800 local_exprs = lappend(local_exprs, rinfo->clause);
> > Finding a RestrictInfo in remote_conds avoids another call to
> is_foreign_expr(). So with this change, I think we are doing an extra call
> to is_foreign_expr().
> >
>
> Hm, it seems better to revert my change and make appendConditions downcast
> given information into RestrictInfo or Expr according to the node tag.
>

Thanks.

>
> > The function get_jointype_name() returns an empty string for unsupported
> join types. Instead of that it should throw an error, if some code path
> accidentally calls the function with unsupported join type e.g. SEMI_JOIN.
>
> Agreed, fixed.
>
Thanks.

>
> > While deparsing the SQL with rowmarks, the placement of FOR UPDATE/SHARE
> clause in the original query is not being honored, which means that we will
> end up locking the rows which are not part of the join result even when the
> join is pushed to the foreign server. E.g take the following query (it uses
> the tables created in postgres_fdw.sql tests)
> > contrib_regression=# explain verbose select * from ft1 join ft2 on
> (ft1.c1 = ft2.c1) for update of ft1;
> >
> >
> > QUERY PLAN
> >
> >
> >
> -------------------------------------------------------------------------------------------------------------------------------------------------------
> >
> -------------------------------------------------------------------------------------------------------------------------------------------------------
> >
> -------------------------------------------------------------------------------------------------------------------------------------------------------
> >
> -------------------------------------------------------------------------------------------------------------------------------------------------------
> >
> ----------------------------------------------------------------------------------------------------
> > LockRows (cost=100.00..124.66 rows=822 width=426)
> > Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7,
> ft1.c8, ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8,
> ft1.*, ft2.*
> > -> Foreign Scan (cost=100.00..116.44 rows=822 width=426)
> > Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7,
> ft1.c8, ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8,
> ft1.*,
> > ft2.*
> > Relations: (public.ft1) INNER JOIN (public.ft2)
> > Remote SQL: SELECT l.a1, l.a2, l.a3, l.a4, l.a5, l.a6, l.a7,
> l.a8, l.a9, r.a1, r.a2, r.a3, r.a4, r.a5, r.a6, r.a7, r.a8, r.a9 FROM
> (SELECT l.a
> > 10, l.a11, l.a12, l.a13, l.a14, l.a15, l.a16, l.a17, ROW(l.a10, l.a11,
> l.a12, l.a13, l.a14, l.a15, l.a16, l.a17) FROM (SELECT "C 1" a10, c2 a11,
> c3 a12
> > , c4 a13, c5 a14, c6 a15, c7 a16, c8 a17 FROM "S 1"."T 1" FOR UPDATE) l)
> l (a1, a2, a3, a4, a5, a6, a7, a8, a9) INNER JOIN (SELECT r.a9, r.a10,
> r.a12,
> > r.a13, r.a14, r.a15, r.a16, r.a17, ROW(r.a9, r.a10, r.a12, r.a13, r.a14,
> r.a15, r.a16, r.a17) FROM (SELECT "C 1" a9, c2 a10, c3 a12, c4 a13, c5 a14,
> c6
> > a15, c7 a16, c8 a17 FROM "S 1"."T 1") r) r (a1, a2, a3, a4, a5, a6, a7,
> a8, a9) ON ((l.a1 = r.a1))
> > (6 rows)
> > It's expected that only the rows which are part of join result will be
> locked by FOR UPDATE clause. The query sent to the foreign server has
> attached the FOR UPDATE clause to the sub-query for table ft1 ("S 1"."T 1"
> on foreign server). As per the postgresql documentation, "When a locking
> clause appears in a sub-SELECT, the rows locked are those returned to the
> outer query by the sub-query.". So it's going to lock all rows from "S
> 1"."T 1", rather than only the rows which are part of join. This is going
> to increase probability of deadlocks, if the join is between a big table
> and small table where big table is being used in many queries and the join
> is going to have only a single row in the result.
> >
>
>
Are you planning to do anything about this point?

>
>
> > Since there is no is_first argument to appendConditions(), we should
> remove corresponding line from the function prologue.
> >
>
> Oops, replaced with the description of prefix.
>
>
> > The name TO_RELATIVE() doesn't convey the full meaning of the macro. May
> be GET_RELATIVE_ATTNO() or something like that.
>
> Fixed.
>

Thanks.

>
> >
> > In postgresGetForeignJoinPaths(), while separating the conditions into
> join quals and other quals,
> > 3014 if (IS_OUTER_JOIN(jointype))
> > 3015 {
> > 3016 extract_actual_join_clauses(joinclauses, &joinclauses,
> &otherclauses);
> > 3017 }
> > 3018 else
> > 3019 {
> > 3020 joinclauses = extract_actual_clauses(joinclauses, false);
> > 3021 otherclauses = NIL;
> > 3022 }
> > we shouldn't differentiate between outer and inner join. For inner join
> the join quals can be treated as other clauses and they will be returned as
> other clauses, which is fine. Also, the following condition
> > 3050 /*
> > 3051 * Other condition for the join must be safe to push down.
> > 3052 */
> > 3053 foreach(lc, otherclauses)
> > 3054 {
> > 3055 Expr *expr = (Expr *) lfirst(lc);
> > 3056
> > 3057 if (!is_foreign_expr(root, joinrel, expr))
> > 3058 {
> > 3059 ereport(DEBUG3, (errmsg("filter contains unsafe
> conditions")));
> > 3060 return;
> > 3061 }
> > 3062 }
> > is unnecessary. I there are filter conditions which are unsafe to push
> down, they can be applied locally after obtaining the join result from the
> foreign server. The join quals are all needed to be safe to push down,
> since they decide which rows will contain NULL inner side in an OUTER join.
>
> I’m not sure that we *shouldn’t* differentiate, but I agree that we *don’t
> need* to differentiate if we are talking about only the result of filtering.
>
> IMO we *should* differentiate inner and outer (or differentiate join
> conditions and filter conditions) because all conditions of typical INNER
> JOINs go into otherclauses because their is_pushed_down flag is on, so such
> joins look like CROSS JOIN + WHERE filter. In the latest patch EXPLAIN
> shows the join combinations of a foreign join scan node with join type, but
> your suggestion makes it looks like this:
>
> fdw=# explain (verbose) select * from pgbench_branches b join
> pgbench_tellers t on t.bid = b.bid;
>
> QUERY PLAN
>
>
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> ----------------
> Foreign Scan (cost=100.00..101.00 rows=50 width=716)
> Output: b.bid, b.bbalance, b.filler, t.tid, t.bid, t.tbalance, t.filler
> Relations: (public.pgbench_branches b) CROSS JOIN
> (public.pgbench_tellers t)
> Remote SQL: SELECT l.a1, l.a2, l.a3, r.a1, r.a2, r.a3, r.a4 FROM
> (SELECT l.a9, l.a10, l.a11 FROM (SELECT bid a9, bbalance a10, filler a11
> FROM public.pgbench_branches) l)
> l (a1, a2, a3) CROSS JOIN (SELECT r.a9, r.a10, r.a11, r.a12 FROM (SELECT
> tid a9, bid a10, tbalance a11, filler a12 FROM public.pgbench_tellers) r) r
> (a1, a2, a3, a4) WHERE
> ((l.a1 = r.a2))
> (4 rows)
>
> Thoughts?
>

It does hamper readability a bit. But it explicitly shows, how do we want
to treat the join. We can leave this to the committers though.

>
> Regards,
> --
> Shigeru HANADA
> shigeru(dot)hanada(at)gmail(dot)com
>
>
>
>
>

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2015-04-29 07:26:52 Faster setup_param_list() in plpgsql
Previous Message Michael Paquier 2015-04-29 04:47:21 Re: [COMMITTERS] pgsql: Add transforms feature