Re: Getting sorted data from foreign server for merge join

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Getting sorted data from foreign server for merge join
Date: 2015-11-17 13:33:53
Message-ID: CAFjFpRfiU43-w+pKf2SPsE6Z+iSi570ruUM9WSKtJy6Fb9HyHg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks Robert for your comments. Please see my replies inlined. Updated
patch is attached.

On Fri, Nov 6, 2015 at 10:02 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

>
> I think this approach is generally reasonable, but I suggested parts
> of it, so may be biased. I would be interested in hearing the
> opinions of others.
>
> Random notes:
>
> "possibily" is a typo.
>

Fixed.

>
> usable_pklist is confusing because it seems like it might be talking
> about primary keys rather than pathkeys. Also, I realize now, looking
> at this again, that you're saying "usable" when what I really think
> you mean is "useful". Lots of pathkeys are usable, but only a few of
> those are useful. I suggest renaming usable_pathkeys to
> query_pathkeys

Done.

> and usable_pklist to useful_pathkeys.

pathkeys is being used to mean a list of pathkey nodes. What we want here
is list of pathkeys so I have renamed it as useful_pathkeys_list, somewhat
long but clear.

> Similarly, let's
> rename generate_pathkeys_for_relation() to
> get_useful_pathkeys_for_relation().
>

Done.

>
> Although I'm usually on the side of marking things as extern whenever
> we find it convenient, I'm nervous about doing that to
> make_canonical_pathkey(), because it has side effects. Searching the
> list of canonical pathkeys for the one we need is reasonable, but is
> it really right to ever think that we might create a new one at this
> stage? Maybe it is, but if so I'd like to hear a good explanation as
> to why.
>

For a foreign table no pathkeys are created before creating paths for
individual foreign table since there are no indexes. For a regular table,
the pathkeys get created for useful indexes. Exception to this is if the
expression appears in the ORDER BY clause, the pathkey for the same is
created while handling ORDER BY clause. So, we will always need to create
pathkey for a foreign table unless the corresponding expression does not
appear in the ORDER BY clause. This can be verified by breaking in
make_canonical_pathkey() while executing "explain verbose select * from ft1
join lt using (val);" ft1(val int, val2 int) is foreign table and lt (val
int, val2 int) is local table. You will hit the first breakpoint with stack
trace
Breakpoint 1, make_canonical_pathkey (root=0x231d858, eclass=0x231f030,
opfamily=1976, strategy=1, nulls_first=0 '\000') at pathkeys.c:60
(gdb) where
#0 make_canonical_pathkey (root=0x231d858, eclass=0x231f030,
opfamily=1976, strategy=1, nulls_first=0 '\000') at pathkeys.c:60
#1 0x00007f6740077e39 in get_useful_pathkeys_for_relation (root=0x231d858,
rel=0x231e390) at postgres_fdw.c:663
#2 0x00007f6740077f34 in postgresGetForeignPaths (root=0x231d858,
baserel=0x231e390, foreigntableid=16393) at postgres_fdw.c:705
#3 0x00000000007079cf in set_foreign_pathlist (root=0x231d858,
rel=0x231e390, rte=0x231c488) at allpaths.c:600
#4 0x0000000000707653 in set_rel_pathlist (root=0x231d858, rel=0x231e390,
rti=1, rte=0x231c488) at allpaths.c:395
#5 0x000000000070735f in set_base_rel_pathlists (root=0x231d858) at
allpaths.c:277

at this point

(gdb) p root->canon_pathkeys
$1 = (List *) 0x0

so get_useful_pathkeys_for_relation->make_canonical_pathkey() will create
the first pathkey.

I have left the corresponding TODO untouched, if anybody else wants to
review that part of the code. I will remove it in the final version of the
patch.

>
> Is the comment "Equivalence classes covering relations other than the
> current one are of interest here" missing a "not"?
>

The sentence is correct. We need equivalence class covering relations other
than the current one, because only such equivalence classes indicate joins
between two relations. If an equivalence class covers only a single baserel
(has only a single member in ec_relids), it indicates equivalence between
columns/expressions of the same table, which can not result in a join. I
have changed to comment to be more elaborate.

>
> I don't find this comment illuminating:
>
> + * In case of child relation, we need to check that the
> + * equivalence class indicates a join to a relation other than
> + * parents, other children and itself (something similar to
> above).
> + * Otherwise we will end up creating useless paths. The code
> below is
> + * similar to generate_implied_equalities_for_column(), which
> might
> + * give a hint.
>
> That basically just says that we have to do it this way because the
> other way would be wrong. But it doesn't say WHY the other way would
> be wrong.

There's no "other way" which is wrong. What's the other way you are talking
about?

Anway, I have updated the comment to be more readable.

> Then a few lines later, you have another comment which says
> the same thing again:
>
> + /*
> + * Ignore equivalence members which correspond to children
> + * or same relation or to parent relations
> + */
>
>
Updated this too.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachment Content-Type Size
pg_sort_all_pd_v2.patch text/x-diff 19.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2015-11-17 13:41:28 Re: Speed up Clog Access by increasing CLOG buffers
Previous Message Simon Riggs 2015-11-17 13:00:54 Re: Speed up Clog Access by increasing CLOG buffers