Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

From: Shigeru HANADA <shigeru(dot)hanada(at)gmail(dot)com>
To: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
Cc: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Thom Brown <thom(at)linux(dot)com>, "pgsql-hackers(at)postgreSQL(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)
Date: 2015-04-27 09:05:44
Message-ID: 0E949CB1-4038-48FE-A815-5BB9AE5C1828@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Ashutosh,

Thanks for the review.

2015/04/22 19:28、Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com> のメール:
> Tests
> -------
> 1.The postgres_fdw test is re/setting enable_mergejoin at various places. The goal of these tests seems to be to test the sanity of foreign plans generated. So, it might be better to reset enable_mergejoin (and may be all of enable_hashjoin, enable_nestloop_join etc.) to false at the beginning of the testcase and set them again at the end. That way, we will also make sure that foreign plans are chosen irrespective of future planner changes.

I have different, rather opposite opinion about it. I disabled other join types as least as the tests pass, because I worry oversights come from planner changes. I hope to eliminate enable_foo from the test script, by improving costing model smarter.

> 2. In the patch, I see that the inheritance testcases have been deleted from postgres_fdw.sql, is that intentional? I do not see those being replaced anywhere else.

It’s accidental removal, I restored the tests about inheritance feature.

> 3. We need one test for each join type (or at least for INNER and LEFT OUTER) where there are unsafe to push conditions in ON clause along-with safe-to-push conditions. For INNER join, the join should get pushed down with the safe conditions and for OUTER join it shouldn't be. Same goes for WHERE clause, in which case the join will be pushed down but the unsafe-to-push conditions will be applied locally.

Currently INNER JOINs with unsafe join conditions are not pushed down, so such test is not in the suit. As you say, in theory, INNER JOINs can be pushed down even they have push-down-unsafe join conditions, because such conditions can be evaluated no local side against rows retrieved without those conditions.

> 4. All the tests have ORDER BY, LIMIT in them, so the setref code is being exercised. But, something like aggregates would test the setref code better. So, we should add at-least one test like select avg(ft1.c1 + ft2.c2) from ft1 join ft2 on (ft1.c1 = ft2.c1).

Added an aggregate case, and also added an UNION case for Append.

> 5. It will be good to add some test which contain join between few foreign and few local tables to see whether we are able to push down the largest possible foreign join tree to the foreign server.
>

> Code
> -------
> In classifyConditions(), the code is now appending RestrictInfo::clause rather than RestrictInfo itself. But the callers of classifyConditions() have not changed. Is this change intentional?

Yes, the purpose of the change is to make appendConditions (former name is appendWhereClause) can handle JOIN ON clause, list of Expr.

> The functions which consume the lists produced by this function handle expressions as well RestrictInfo, so you may not have noticed it. Because of this change, we might be missing some optimizations e.g. in function postgresGetForeignPlan()
> 793 if (list_member_ptr(fpinfo->remote_conds, rinfo))
> 794 remote_conds = lappend(remote_conds, rinfo->clause);
> 795 else if (list_member_ptr(fpinfo->local_conds, rinfo))
> 796 local_exprs = lappend(local_exprs, rinfo->clause);
> 797 else if (is_foreign_expr(root, baserel, rinfo->clause))
> 798 remote_conds = lappend(remote_conds, rinfo->clause);
> 799 else
> 800 local_exprs = lappend(local_exprs, rinfo->clause);
> Finding a RestrictInfo in remote_conds avoids another call to is_foreign_expr(). So with this change, I think we are doing an extra call to is_foreign_expr().
>

Hm, it seems better to revert my change and make appendConditions downcast given information into RestrictInfo or Expr according to the node tag.

> The function get_jointype_name() returns an empty string for unsupported join types. Instead of that it should throw an error, if some code path accidentally calls the function with unsupported join type e.g. SEMI_JOIN.

Agreed, fixed.

> While deparsing the SQL with rowmarks, the placement of FOR UPDATE/SHARE clause in the original query is not being honored, which means that we will end up locking the rows which are not part of the join result even when the join is pushed to the foreign server. E.g take the following query (it uses the tables created in postgres_fdw.sql tests)
> contrib_regression=# explain verbose select * from ft1 join ft2 on (ft1.c1 = ft2.c1) for update of ft1;
>
>
> QUERY PLAN
>
>
> -------------------------------------------------------------------------------------------------------------------------------------------------------
> -------------------------------------------------------------------------------------------------------------------------------------------------------
> -------------------------------------------------------------------------------------------------------------------------------------------------------
> -------------------------------------------------------------------------------------------------------------------------------------------------------
> ----------------------------------------------------------------------------------------------------
> LockRows (cost=100.00..124.66 rows=822 width=426)
> Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, ft1.c8, ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft1.*, ft2.*
> -> Foreign Scan (cost=100.00..116.44 rows=822 width=426)
> Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, ft1.c8, ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft1.*,
> ft2.*
> Relations: (public.ft1) INNER JOIN (public.ft2)
> Remote SQL: SELECT l.a1, l.a2, l.a3, l.a4, l.a5, l.a6, l.a7, l.a8, l.a9, r.a1, r.a2, r.a3, r.a4, r.a5, r.a6, r.a7, r.a8, r.a9 FROM (SELECT l.a
> 10, l.a11, l.a12, l.a13, l.a14, l.a15, l.a16, l.a17, ROW(l.a10, l.a11, l.a12, l.a13, l.a14, l.a15, l.a16, l.a17) FROM (SELECT "C 1" a10, c2 a11, c3 a12
> , c4 a13, c5 a14, c6 a15, c7 a16, c8 a17 FROM "S 1"."T 1" FOR UPDATE) l) l (a1, a2, a3, a4, a5, a6, a7, a8, a9) INNER JOIN (SELECT r.a9, r.a10, r.a12,
> r.a13, r.a14, r.a15, r.a16, r.a17, ROW(r.a9, r.a10, r.a12, r.a13, r.a14, r.a15, r.a16, r.a17) FROM (SELECT "C 1" a9, c2 a10, c3 a12, c4 a13, c5 a14, c6
> a15, c7 a16, c8 a17 FROM "S 1"."T 1") r) r (a1, a2, a3, a4, a5, a6, a7, a8, a9) ON ((l.a1 = r.a1))
> (6 rows)
> It's expected that only the rows which are part of join result will be locked by FOR UPDATE clause. The query sent to the foreign server has attached the FOR UPDATE clause to the sub-query for table ft1 ("S 1"."T 1" on foreign server). As per the postgresql documentation, "When a locking clause appears in a sub-SELECT, the rows locked are those returned to the outer query by the sub-query.". So it's going to lock all rows from "S 1"."T 1", rather than only the rows which are part of join. This is going to increase probability of deadlocks, if the join is between a big table and small table where big table is being used in many queries and the join is going to have only a single row in the result.
>

> Since there is no is_first argument to appendConditions(), we should remove corresponding line from the function prologue.
>

Oops, replaced with the description of prefix.

> The name TO_RELATIVE() doesn't convey the full meaning of the macro. May be GET_RELATIVE_ATTNO() or something like that.

Fixed.

>
> In postgresGetForeignJoinPaths(), while separating the conditions into join quals and other quals,
> 3014 if (IS_OUTER_JOIN(jointype))
> 3015 {
> 3016 extract_actual_join_clauses(joinclauses, &joinclauses, &otherclauses);
> 3017 }
> 3018 else
> 3019 {
> 3020 joinclauses = extract_actual_clauses(joinclauses, false);
> 3021 otherclauses = NIL;
> 3022 }
> we shouldn't differentiate between outer and inner join. For inner join the join quals can be treated as other clauses and they will be returned as other clauses, which is fine. Also, the following condition
> 3050 /*
> 3051 * Other condition for the join must be safe to push down.
> 3052 */
> 3053 foreach(lc, otherclauses)
> 3054 {
> 3055 Expr *expr = (Expr *) lfirst(lc);
> 3056
> 3057 if (!is_foreign_expr(root, joinrel, expr))
> 3058 {
> 3059 ereport(DEBUG3, (errmsg("filter contains unsafe conditions")));
> 3060 return;
> 3061 }
> 3062 }
> is unnecessary. I there are filter conditions which are unsafe to push down, they can be applied locally after obtaining the join result from the foreign server. The join quals are all needed to be safe to push down, since they decide which rows will contain NULL inner side in an OUTER join.

I’m not sure that we *shouldn’t* differentiate, but I agree that we *don’t need* to differentiate if we are talking about only the result of filtering.

IMO we *should* differentiate inner and outer (or differentiate join conditions and filter conditions) because all conditions of typical INNER JOINs go into otherclauses because their is_pushed_down flag is on, so such joins look like CROSS JOIN + WHERE filter. In the latest patch EXPLAIN shows the join combinations of a foreign join scan node with join type, but your suggestion makes it looks like this:

fdw=# explain (verbose) select * from pgbench_branches b join pgbench_tellers t on t.bid = b.bid;
WARNING: restrictlist: ({RESTRICTINFO :clause {OPEXPR :opno 96 :opfuncid 65 :opresulttype 16 :opretset false :opcollid 0 :inputcollid 0 :args ({VAR :varno 1 :varattno 1 :vartype 23 :vartypmod -1 :varcollid 0 :varlevelsup 0 :varnoold 1 :varoattno 1 :location 85} {VAR :varno 2 :varattno 2 :vartype 23 :vartypmod -1 :varcollid 0 :varlevelsup 0 :varnoold 2 :varoattno 2 :location 77}) :location -1} :is_pushed_down true :outerjoin_delayed false :can_join true :pseudoconstant false :clause_relids (b 1 2) :required_relids (b 1 2) :outer_relids (b) :nullable_relids (b) :left_relids (b 1) :right_relids (b 2) :orclause <> :norm_selec 0.2000 :outer_selec -1.0000 :mergeopfamilies (o 1976) :left_em {EQUIVALENCEMEMBER :em_expr {VAR :varno 1 :varattno 1 :vartype 23 :vartypmod -1 :varcollid 0 :varlevelsup 0 :varnoold 1 :varoattno 1 :location 85} :em_relids (b 1) :em_nullable_relids (b) :em_is_const false :em_is_child false :em_datatype 23} :right_em {EQUIVALENCEMEMBER :em_expr {VAR :varno 2 :varattno 2 :vartype 23 :vartypmod -1 :varcollid 0 :varlevelsup 0 :varnoold 2 :varoattno 2 :location 77} :em_relids (b 2) :em_nullable_relids (b) :em_is_const false :em_is_child false :em_datatype 23} :outer_is_left false :hashjoinoperator 96})
WARNING: joinclauses: <>
WARNING: otherclauses: ({OPEXPR :opno 96 :opfuncid 65 :opresulttype 16 :opretset false :opcollid 0 :inputcollid 0 :args ({VAR :varno 1 :varattno 1 :vartype 23 :vartypmod -1 :varcollid 0 :varlevelsup 0 :varnoold 1 :varoattno 1 :location 85} {VAR :varno 2 :varattno 2 :vartype 23 :vartypmod -1 :varcollid 0 :varlevelsup 0 :varnoold 2 :varoattno 2 :location 77}) :location -1})

QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------
Foreign Scan (cost=100.00..101.00 rows=50 width=716)
Output: b.bid, b.bbalance, b.filler, t.tid, t.bid, t.tbalance, t.filler
Relations: (public.pgbench_branches b) CROSS JOIN (public.pgbench_tellers t)
Remote SQL: SELECT l.a1, l.a2, l.a3, r.a1, r.a2, r.a3, r.a4 FROM (SELECT l.a9, l.a10, l.a11 FROM (SELECT bid a9, bbalance a10, filler a11 FROM public.pgbench_branches) l)
l (a1, a2, a3) CROSS JOIN (SELECT r.a9, r.a10, r.a11, r.a12 FROM (SELECT tid a9, bid a10, tbalance a11, filler a12 FROM public.pgbench_tellers) r) r (a1, a2, a3, a4) WHERE
((l.a1 = r.a2))
(4 rows)

Thoughts?

Regards,
--
Shigeru HANADA
shigeru(dot)hanada(at)gmail(dot)com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Shigeru Hanada 2015-04-27 09:07:09 Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)
Previous Message Etsuro Fujita 2015-04-27 07:20:33 Re: Missing importing option of postgres_fdw