Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

From: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>
To: Shigeru HANADA <shigeru(dot)hanada(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Thom Brown <thom(at)linux(dot)com>, "pgsql-hackers(at)postgreSQL(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)
Date: 2015-03-25 03:59:28
Message-ID: 9A28C8860F777E439AA12E8AEA7694F8010C6819@BPXM15GP.gisp.nec.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> > At this moment, I'm not 100% certain about its logic. Especially, I didn't
> > test SEMI- and ANTI- join cases yet.
> > However, time is money - I want people to check overall design first, rather
> > than detailed debugging. Please tell me if I misunderstood the logic to break
> > down join relations.
>
> With applying your patch, regression tests of “updatable view” failed.
> regression.diff contains some errors like this:
> ! ERROR: could not find RelOptInfo for given relids
>
> Could you check that?
>
It is a bug around the logic to find out two RelOptInfo that can construct
another RelOptInfo of joinrel.
Even though I'm now working to correct the logic, it is not obvious to
identify two relids that satisfy joinrel->relids.
(Yep, law of entropy enhancement...)

On the other hands, we may have a solution that does not need a complicated
reconstruction process. The original concern was, FDW driver may add paths
that will replace entire join subtree by foreign-scan on remote join multiple
times, repeatedly, but these paths shall be identical.

If we put a hook for FDW/CSP on bottom of build_join_rel(), we may be able
to solve the problem more straight-forward and simply way.
Because build_join_rel() finds a cache on root->join_rel_hash then returns
immediately if required joinrelids already has its RelOptInfo, bottom of
this function never called twice on a particular set of joinrelids.
Once FDW/CSP constructs a path that replaces entire join subtree towards
the joinrel just after construction, it shall be kept until cheaper built-in
paths are added (if exists).

This idea has one other positive side-effect. We expect remote-join is cheaper
than local join with two remote scan in most cases. Once a much cheaper path
is added prior to local join consideration, add_path_precheck() breaks path
consideration earlier.

Please comment on.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>

> -----Original Message-----
> From: Shigeru HANADA [mailto:shigeru(dot)hanada(at)gmail(dot)com]
> Sent: Tuesday, March 24, 2015 7:36 PM
> To: Kaigai Kouhei(海外 浩平)
> Cc: Robert Haas; Tom Lane; Ashutosh Bapat; Thom Brown;
> pgsql-hackers(at)postgreSQL(dot)org
> Subject: Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)
>
> 2015/03/23 9:12、Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com> のメール:
>
> > Sorry for my response late. It was not easy to code during business trip.
> >
> > The attached patch adds a hook for FDW/CSP to replace entire join-subtree
> > by a foreign/custom-scan, according to the discussion upthread.
> >
> > GetForeignJoinPaths handler of FDW is simplified as follows:
> > typedef void (*GetForeignJoinPaths_function) (PlannerInfo *root,
> > RelOptInfo *joinrel);
>
> It’s not a critical issue but I’d like to propose to rename
> add_joinrel_extra_paths() to add_extra_paths_to_joinrel(), because the latter
> would make it more clear that it does extra work in addition to
> add_paths_to_joinrel().
>
> > It takes PlannerInfo and RelOptInfo of the join-relation to be replaced
> > if available. RelOptInfo contains 'relids' bitmap, so FDW driver will be
> > able to know the relations to be involved and construct a remote join query.
> > However, it is not obvious with RelOptInfo to know how relations are joined.
> >
> > The function below will help implement FDW driver that support remote join.
> >
> > List *
> > get_joinrel_broken_down(PlannerInfo *root, RelOptInfo *joinrel,
> > SpecialJoinInfo **p_sjinfo)
> >
> > It returns a list of RelOptInfo to be involved in the relations join that
> > is represented with 'joinrel', and also set a SpecialJoinInfo on the third
> > argument if not inner join.
> > In case of inner join, it returns multiple (more than or equal to 2)
> > relations to be inner-joined. Elsewhere, it returns two relations and
> > a valid SpecialJoinInfo.
>
> As far as I tested, it works fine for SEMI and ANTI.
> # I want dump function of BitmapSet for debugging, as Node has nodeToString()...
>
> > At this moment, I'm not 100% certain about its logic. Especially, I didn't
> > test SEMI- and ANTI- join cases yet.
> > However, time is money - I want people to check overall design first, rather
> > than detailed debugging. Please tell me if I misunderstood the logic to break
> > down join relations.
>
> With applying your patch, regression tests of “updatable view” failed.
> regression.diff contains some errors like this:
> ! ERROR: could not find RelOptInfo for given relids
>
> Could you check that?
>
> —
> Shigeru HANADA

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2015-03-25 05:18:58 Re: printing table in asciidoc with psql
Previous Message Alvaro Herrera 2015-03-25 03:56:07 Re: parallel mode and parallel contexts