Re: [HACKERS] path toward faster partition pruning

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
Cc: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] path toward faster partition pruning
Date: 2018-02-23 14:46:11
Message-ID: CA+TgmoZe_OXzE1JK07ChhGB+V+2ui_YWWD0NJTw=hesQZYzMEw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 21, 2018 at 5:44 AM, Amit Langote
<Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> Please find attached updated patches.

Committed 0001 and 0002.

I'm having some difficulty wrapping my head around 0003 because it has
minimal comments and no useful commit message. I think, though, that
it's actually broken. Pre-patch, the logic in find_partition_scheme
compares partopfamily, partopcintype, and parttypcoll and then asserts
equality for parttyplen and parttypbyval; not coincidentally,
PartitionSchemeData describes the latter two fields only as "cached
data", so that the segregation of fields in PartitionSchemeData into
two groups exactly matches what find_partition_scheme is actually
doing. However, with the patch, it turns into a sort of hodgepodge.
parttypid is added into the "cached" section of PartitionSchemeData
and partcollation to the primary section, but both values are
compared, not asserted; parttypcoll moves from the "compared" section
to the "asserted" section but the declaration in PartitionSchemeData
stays where it was.

Moreover, there's no explanation of why this is getting changed.
There's an existing comment that explains the motivation for what the
code does today, which the patch does not modify:

* We store the opclass-declared input data types instead of the partition key
* datatypes since the former rather than the latter are used to compare
* partition bounds. Since partition key data types and the opclass declared
* input data types are expected to be binary compatible (per ResolveOpClass),
* both of those should have same byval and length properties.

Obviously, this raises the issue of whether changing this is really
the right thing to do in the first place, but at any rate it's
certainly necessary for the comments to match what the code actually
does.

Also, I find this not very helpful:

+ * The collation of the partition key can differ from the collation of the
+ * underlying column, so we must store this separately.

If the comments about parttypcol and partcollation were clear enough
(and I think they could use some work to distinguish them better),
then this would be pretty much unnecessary -- clearly the only reason
to store two things is if they might be different from each other.

It might be more useful to somehow explain how parttypid and
partsupfunc are intended to be work/be used, but actually I don't
think any satisfactory explanation is possible. Either we have one
partition scheme per partopcintype -- in which case parttypid is
ill-defined because it could vary among relations with the same
PartitionScheme -- or we have on per parttypid -- in which case,
without some other change, partition-wise join will stop working
between relations with different parttypids but the same
partopcintype.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2018-02-23 14:49:24 Re: Online enabling of checksums
Previous Message Robert Haas 2018-02-23 14:05:12 Re: [HACKERS] advanced partition matching algorithm for partition-wise join