Re: Add semi-join pushdown to postgres_fdw

From: Alexander Pyhalov <a(dot)pyhalov(at)postgrespro(dot)ru>
To: "Fujii(dot)Yuki(at)df(dot)MitsubishiElectric(dot)co(dot)jp" <Fujii(dot)Yuki(at)df(dot)mitsubishielectric(dot)co(dot)jp>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>, Ian Lawrence Barwick <barwick(at)gmail(dot)com>
Subject: Re: Add semi-join pushdown to postgres_fdw
Date: 2022-12-06 09:28:43
Message-ID: 816fa8b1bc2da09a87484d1ef239a332@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi, Yuki.

Thanks for looking at this patch.

Fujii(dot)Yuki(at)df(dot)MitsubishiElectric(dot)co(dot)jp писал 2022-12-03 06:02:

> question1)
> > + if (jointype == JOIN_SEMI && bms_is_member(var->varno,
> innerrel->relids) && !bms_is_member(var->varno, outerrel->relids))
> It takes time for me to find in what case this condition is true.
> There is cases in which this condition is true for semi-join of two
> baserels
> when running query which joins more than two relations such as
> query2 and query3.
> Running queries such as query2, you maybe want to pushdown of only
> semi-join path of
> joinrel(outerrel) defined by (f_t1 a1 join f_t3 a2 on a1.c1 = a2.c1)
> and baserel(innerrel) f_t3
> because of safety deparse. So you add this condition.
> Becouase of this limitation, your patch can't push down subquery
> expression
> "exists (select null from f_t2 where c1 = a1.c1)" in query3.
> I think, it is one of difficulty points for semi-join pushdown.
> This is my understanding of the intent of this condition and the
> restrictions imposed by this condition.
> Is my understanding right?

IIRC, planner can create semi-join, which targetlist references Vars
from inner join relation. However, it's deparsed as exists and so we
can't reference it from SQL. So, there's this check - if Var is
referenced in semi-join target list, it can't be pushed down.
You can see this if comment out this check.

EXPLAIN (verbose, costs off)
SELECT ft2.*, ft4.* FROM ft2 INNER JOIN
(SELECT * FROM ft4 WHERE EXISTS (
SELECT 1 FROM ft2 WHERE ft2.c2=ft4.c2)) ft4
ON ft2.c2 = ft4.c1
INNER JOIN
(SELECT * FROM ft2 WHERE EXISTS (
SELECT 1 FROM ft4 WHERE ft2.c2=ft4.c2)) ft21
ON ft2.c2 = ft21.c2
WHERE ft2.c1 > 900
ORDER BY ft2.c1 LIMIT 10;

will fail with
EXPLAIN SELECT r8.c2, r9.c2 FROM "S 1"."T 1" r8 WHERE (EXISTS (SELECT
NULL FROM "S 1"."T 3" r9 WHERE ((r8.c2 = r9.c2))))

Here you can see that
SELECT * FROM ft2 WHERE EXISTS (
SELECT 1 FROM ft4 WHERE ft2.c2=ft4.c2)

was transformed to
SELECT r8.c2, r9.c2 FROM "S 1"."T 1" r8 WHERE (EXISTS (SELECT NULL FROM
"S 1"."T 3" r9 WHERE ((r8.c2 = r9.c2))))

where our exists subquery is referenced from tlist. It's fine for plan
(relations, participating in semi-join, can be referenced in tlist),
but is not going to work with EXISTS subquery.
BTW, there's a comment in joinrel_target_ok(). It tells exactly that -

5535 if (jointype == JOIN_SEMI && bms_is_member(var->varno,
innerrel->relids) && !bms_is_member(var->varno, outerrel->relids))
5536 {
5537 /* We deparse semi-join as exists() subquery, and
so can't deparse references to inner rel in join target list. */
5538 ok = false;
5539 break;
5540 }

Expanded comment.

> question2) In foreign_join_ok
> > * Constructing queries representing ANTI joins is hard, hence
> Is this true? Is it hard to expand your approach to ANTI join
> pushdown?

I haven't tried, so don't know.

> question3) You use variables whose name is "addl_condXXX" in the
> following code.
> > appendStringInfo(addl_conds, "EXISTS (SELECT NULL FROM %s",
> join_sql_i.data);
> Does this naming mean additional literal?
> Is there more complehensive naming, such as "subquery_exprXXX"?

The naming means additional conditions (for WHERE clause, by analogy
with ignore_conds and remote_conds). Not sure if subquery_expr sounds
better, but if you come with better idea, I'm fine with renaming them.

> question4) Although really detail, there is expression making space
> such as
> "ft4.c2 = ft2.c2" and one making no space such as "c1=ftupper.c1".
> Is there reason for this difference? If not, need we use same policy
> for making space?
>

Fixed.

--
Best regards,
Alexander Pyhalov,
Postgres Professional

Attachment Content-Type Size
v3-0001-postgres_fdw-add-support-for-deparsing-semi-joins.patch text/x-diff 49.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2022-12-06 09:48:54 Re: Allow batched insert during cross-partition updates
Previous Message Richard Guo 2022-12-06 09:00:59 A problem about ParamPathInfo for an AppendPath