Re: [HACKERS] path toward faster partition pruning

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] path toward faster partition pruning
Date: 2018-02-26 04:10:46
Message-ID: 4db98104-4a48-9768-4f34-499565ae420e@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

On 2018/02/23 23:46, Robert Haas wrote:
> On Wed, Feb 21, 2018 at 5:44 AM, Amit Langote
> <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> Please find attached updated patches.
>
> Committed 0001 and 0002.

Thank you for committing and for the review.

> I'm having some difficulty wrapping my head around 0003 because it has
> minimal comments and no useful commit message. I think, though, that
> it's actually broken. Pre-patch, the logic in find_partition_scheme
> compares partopfamily, partopcintype, and parttypcoll and then asserts
> equality for parttyplen and parttypbyval; not coincidentally,
> PartitionSchemeData describes the latter two fields only as "cached
> data", so that the segregation of fields in PartitionSchemeData into
> two groups exactly matches what find_partition_scheme is actually
> doing. However, with the patch, it turns into a sort of hodgepodge.
> parttypid is added into the "cached" section of PartitionSchemeData
> and partcollation to the primary section, but both values are
> compared, not asserted; parttypcoll moves from the "compared" section
> to the "asserted" section but the declaration in PartitionSchemeData
> stays where it was.
>
> Moreover, there's no explanation of why this is getting changed.
> There's an existing comment that explains the motivation for what the
> code does today, which the patch does not modify:
>
> * We store the opclass-declared input data types instead of the partition key
> * datatypes since the former rather than the latter are used to compare
> * partition bounds. Since partition key data types and the opclass declared
> * input data types are expected to be binary compatible (per ResolveOpClass),
> * both of those should have same byval and length properties.
>
> Obviously, this raises the issue of whether changing this is really
> the right thing to do in the first place, but at any rate it's
> certainly necessary for the comments to match what the code actually
> does.
>
> Also, I find this not very helpful:
>
> + * The collation of the partition key can differ from the collation of the
> + * underlying column, so we must store this separately.
>
> If the comments about parttypcol and partcollation were clear enough
> (and I think they could use some work to distinguish them better),
> then this would be pretty much unnecessary -- clearly the only reason
> to store two things is if they might be different from each other.
>
> It might be more useful to somehow explain how parttypid and
> partsupfunc are intended to be work/be used, but actually I don't
> think any satisfactory explanation is possible. Either we have one
> partition scheme per partopcintype -- in which case parttypid is
> ill-defined because it could vary among relations with the same
> PartitionScheme -- or we have on per parttypid -- in which case,
> without some other change, partition-wise join will stop working
> between relations with different parttypids but the same
> partopcintype.

I think I'm convinced that partopcintype OIDs can be used where I thought
parttypid ones were necessary. The pruning patch uses the respective OID
from this array when extracting the datum from an OpExpr to be compared
with the partition bound datums. It's sensible, I now think, to require
the extracted datum to be of partition opclass declared input type, rather
than the type of the partition key involved. So, I removed the parttypid
that I'd added to PartitionSchemeData.

Updated the comments to make clear the distinction between and purpose of
having both parttypcoll and partcollation. Also expanded the comment
about partsupfunc a bit.

Attached updated patches.

Thanks,
Amit

Attachment Content-Type Size
v33-0001-Add-partcollation-and-partsupfunc-to-PartitionSc.patch text/plain 4.4 KB
v33-0002-Faster-partition-pruning.patch text/plain 111.4 KB
v33-0003-Add-only-unpruned-partitioned-child-rels-to-part.patch text/plain 23.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2018-02-26 04:13:14 Re: [HACKERS] Runtime Partition Pruning
Previous Message Tsunakawa, Takayuki 2018-02-26 04:06:48 RE: [bug fix] Produce a crash dump before main() on Windows