Re: Partition-wise join for join between (declaratively) partitioned tables

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
Cc: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Partition-wise join for join between (declaratively) partitioned tables
Date: 2017-09-15 21:23:59
Message-ID: CA+Tgmoa0c4QGS2Hf7izoqoUzb2CTE=kA2Tjn7k9++-ANKnpV=Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 15, 2017 at 6:11 AM, Ashutosh Bapat
<ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:
> Thanks a lot Robert.
>
> Here are rebased patches.

I didn't get quite as much time to look at these today as I would have
liked, but here's what I've got so far.

Comments on 0001:

- In the RelOptInfo, part_oids is defined in a completely different
part of the structure than nparts, but you can't use it without nparts
because you don't know how long it is. I suggest moving the
definition to just after nparts.

- On the other hand, maybe we should just remove it completely. I
don't see it in any of the subsequent patches. If it's used by the
advanced matching code, let's leave it out of 0001 for now and add it
back after the basic feature is committed.

- Similarly, partsupfunc isn't used by the later patches either. It
seems it could also be removed, at least for now.

- The comment for partexprs isn't very clear about how the lists
inside the array work. My understanding is that the lists have as
many members as the partition key has columns/expressions.

- I'm not entirely sure whether maintaining partexprs and
nullable_partexprs is the right design. If I understand correctly,
whether or not a partexpr is nullable is really a per-RTI property,
not a per-expression property. You could consider something like
"Relids nullable_rels".

Comments on 0002:

- The relationship between deciding to set partition scheme and
related details and the configured value of enable_partition_wise_join
needs some consideration. If the only use of the partition scheme is
partition-wise join, there's no point in setting it even for a baserel
unless enable_partition_wise_join is set -- but if there are other
important uses for that data, such as Amit's partition pruning work,
then we might want to always set it. And similarly for a join: if the
details are only needed in the partition-wise join case, then we only
need to set them in that case, but if there are other uses, then it's
different. If it turns out that setting these details for a baserel
is useful in other cases but that it's only for a joinrel in the
partition-wise join case, then the patch gets it exactly right. But
is that correct? I'm not sure.

- The naming of enable_partition_wise_join might also need some
thought. What happens when we also have partition-wise aggregate?
What about the proposal to strength-reduce MergeAppend to Append --
would that use this infrastructure? I wonder if we out to call this
enable_partition_wise or enable_partition_wise_planning to make it a
bit more general. Then, too, I've never really liked having
partition_wise in the GUC name because it might make someone think
that it makes you partitions have a lot of wisdom. Removing the
underscore might help: partitionwise. Or maybe there is some whole
different name that would be better. If anyone wants to bikeshed,
now's the time.

- It seems to me that build_joinrel_partition_info() could be
simplified a bit. One thing is that list_copy() is perfectly capable
of handling a NIL input, so there's no need to test for that before
calling it.

Comments on 0003:

- Instead of reorganizing add_paths_to_append_rel as you did, could
you just add an RTE_JOIN case to the switch? Not sure if there's some
problem with that idea, but it seems like it might come out nicer.

On the overall patch set:

- I am curious to know how this has been tested. How much of the new
code is covered by the tests in 0007-Partition-wise-join-tests.patch?
How much does coverage improve with
0008-Extra-testcases-for-partition-wise-join-NOT-FOR-COMM.patch? What
code, if any, is not covered by either of those test suites? Could we
do meaningful testing of this with something like Andreas
Seltenreich's sqlsmith?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-09-15 21:26:44 Re: More efficient truncation of pg_stat_activity query strings
Previous Message Thom Brown 2017-09-15 21:05:35 Re: SendRowDescriptionMessage() is slow for queries with a lot of columns