RE: Add semi-join pushdown to postgres_fdw

From: "Fujii(dot)Yuki(at)df(dot)MitsubishiElectric(dot)co(dot)jp" <Fujii(dot)Yuki(at)df(dot)MitsubishiElectric(dot)co(dot)jp>
To: Alexander Pyhalov <a(dot)pyhalov(at)postgrespro(dot)ru>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>, Ian Lawrence Barwick <barwick(at)gmail(dot)com>, "Fujii(dot)Yuki(at)df(dot)MitsubishiElectric(dot)co(dot)jp" <Fujii(dot)Yuki(at)df(dot)MitsubishiElectric(dot)co(dot)jp>
Subject: RE: Add semi-join pushdown to postgres_fdw
Date: 2022-12-06 10:25:24
Message-ID: OS3PR01MB6660528FE5C338F76C8AF3F6951B9@OS3PR01MB6660.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Mr.Pyhalov.

Thank you for fixing it and giving more explanation.

> IIRC, planner can create semi-join, which targetlist references Vars
> from inner join relation. However, it's deparsed as exists and so we
> can't reference it from SQL. So, there's this check - if Var is
> referenced in semi-join target list, it can't be pushed down.
> You can see this if comment out this check.
>
> EXPLAIN (verbose, costs off)
> SELECT ft2.*, ft4.* FROM ft2 INNER JOIN
> (SELECT * FROM ft4 WHERE EXISTS (
> SELECT 1 FROM ft2 WHERE ft2.c2=ft4.c2)) ft4
> ON ft2.c2 = ft4.c1
> INNER JOIN
> (SELECT * FROM ft2 WHERE EXISTS (
> SELECT 1 FROM ft4 WHERE ft2.c2=ft4.c2)) ft21
> ON ft2.c2 = ft21.c2
> WHERE ft2.c1 > 900
> ORDER BY ft2.c1 LIMIT 10;
>
> will fail with
> EXPLAIN SELECT r8.c2, r9.c2 FROM "S 1"."T 1" r8 WHERE (EXISTS (SELECT
> NULL FROM "S 1"."T 3" r9 WHERE ((r8.c2 = r9.c2))))
>
> Here you can see that
> SELECT * FROM ft2 WHERE EXISTS (
> SELECT 1 FROM ft4 WHERE ft2.c2=ft4.c2)
>
> was transformed to
> SELECT r8.c2, r9.c2 FROM "S 1"."T 1" r8 WHERE (EXISTS (SELECT NULL
> FROM "S 1"."T 3" r9 WHERE ((r8.c2 = r9.c2))))
>
> where our exists subquery is referenced from tlist. It's fine for plan
> (relations, participating in semi-join, can be referenced in tlist),
> but is not going to work with EXISTS subquery.
> BTW, there's a comment in joinrel_target_ok(). It tells exactly that -
>
> 5535 if (jointype == JOIN_SEMI &&
> bms_is_member(var->varno,
> innerrel->relids) && !bms_is_member(var->varno, outerrel->relids))
> 5536 {
> 5537 /* We deparse semi-join as exists() subquery, and
> so can't deparse references to inner rel in join target list. */
> 5538 ok = false;
> 5539 break;
> 5540 }
>
> Expanded comment.
Thank you for expanding your comment and giving examples.
Thanks to the above examples, I understood in what case planner wolud create semi-join,
which targetlist references Vars from inner join relation.

> > question2) In foreign_join_ok
> > > * Constructing queries representing ANTI joins is hard, hence
> > Is this true? Is it hard to expand your approach to ANTI join
> > pushdown?
>
> I haven't tried, so don't know.
I understand the situation.

> The naming means additional conditions (for WHERE clause, by analogy
> with ignore_conds and remote_conds). Not sure if subquery_expr sounds
> better, but if you come with better idea, I'm fine with renaming them.
Sure.

> > question4) Although really detail, there is expression making space
> > such as
> > "ft4.c2 = ft2.c2" and one making no space such as "c1=ftupper.c1".
> > Is there reason for this difference? If not, need we use same
> > policy for making space?
Thank you.

Later, I'm going to look at other part of your patch.

Sincerely yours,
Yuuki Fujii

--
Yuuki Fujii
Information Technology R&D Center Mitsubishi Electric Corporation

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2022-12-06 10:28:55 Re: Force streaming every change in logical decoding
Previous Message Ronan Dunklau 2022-12-06 10:22:20 Re: Fix gin index cost estimation