Re: [POC] hash partitioning

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: amul sul <sulamul(at)gmail(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp>, David Steele <david(at)pgmasters(dot)net>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [POC] hash partitioning
Date: 2017-05-19 17:05:50
Message-ID: CA+TgmoabcyyYPe_TRiHyXb+Aa2rQ+zVC9ZHHLASPHmmYbp4sig@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, May 19, 2017 at 5:32 AM, amul sul <sulamul(at)gmail(dot)com> wrote:
> Updated patch attached. 0001-patch rebased against latest head.
> 0002-patch also incorporates code comments and error message changes
> as per Robert's & your suggestions. Thanks !

- if (spec->strategy != PARTITION_STRATEGY_LIST)
- elog(ERROR, "invalid strategy in partition bound spec");
+ Assert(spec->strategy == PARTITION_STRATEGY_LIST);

Let's just drop these hunks. I realize this is a response to a review
comment I made, but I take it back. If the existing code is already
doing it this way, there's no real need to revise it. The patch
doesn't even make it consistent anyway, since elsewhere you elog() for
a similar case. Perhaps elog() is best anyway.

- partitioning methods include range and list, where each partition is
- assigned a range of keys and a list of keys, respectively.
+ partitioning methods include hash, range and list, where each partition is
+ assigned a modulus and remainder of keys, a range of keys and a list of
+ keys, respectively.

I think this sentence has become too long and unwieldy, and is more
unclear than helpful. I'd just write "The currently supported
partitioning methods are list, range, and hash." The use of the word
include is actually wrong here, because it implies that there are more
not mentioned here, which is false.

- expression. If no btree operator class is specified when creating a
- partitioned table, the default btree operator class for the datatype will
- be used. If there is none, an error will be reported.
+ expression. List and range partitioning uses only btree operator class.
+ Hash partitioning uses only hash operator class. If no operator class is
+ specified when creating a partitioned table, the default operator class
+ for the datatype will be used. If there is none, an error will be
+ reported.
+ </para>

I suggest: If no operator class is specified when creating a
partitioned table, the default operator class of the appropriate type
(btree for list and range partitioning, hash for hash partitioning)
will be used. If there is none, an error will be reported.

+ <para>
+ Since hash partitiong operator class, provide only equality,
not ordering,
+ collation is not relevant in hash partition key column. An error will be
+ reported if collation is specified.

partitiong -> partitioning. Also, remove the comma after "operator
class" and change "not relevant in hash partition key column" to "not
relevant for hash partitioning". Also change "if collation is
specified" to "if a collation is specified".

+ Create a hash partitioned table:
+<programlisting>
+CREATE TABLE orders (
+ order_id bigint not null,
+ cust_id bigint not null,
+ status text
+) PARTITION BY HASH (order_id);
+</programlisting></para>

Move this down so it's just above the example of creating partitions.

+ * For range and list partitioned tables, datums is an array of datum-tuples
+ * with key->partnatts datums each.
+ * For hash partitioned tables, it is an array of datum-tuples with 2 datums,
+ * modulus and remainder, corresponding to a given partition.

Second line is very short; reflow as one paragraph.

* In case of range partitioning, it stores one entry per distinct range
* datum, which is the index of the partition for which a given datum
* is an upper bound.
+ * In the case of hash partitioning, the number of the entries in the indexes
+ * array is same as the greatest modulus amongst all partitions. For a given
+ * partition key datum-tuple, the index of the partition which would
accept that
+ * datum-tuple would be given by the entry pointed by remainder produced when
+ * hash value of the datum-tuple is divided by the greatest modulus.

Insert line break before the new text as a paragraph break.

+ char strategy; /* hash, list or range bounds? */

Might be clearer to just write /* hash, list, or range? */ or /*
bounds for hash, list, or range? */

+static uint32 compute_hash_value(PartitionKey key, Datum *values,
bool *isnull);
+

I think there should be a blank line after this but not before it.

I don't really see why hash partitioning needs to touch
partition_bounds_equal() at all. Why can't the existing logic work
for hash partitioning without change?

+ valid_bound = true;

valid_modulus, maybe?

- errmsg("data type %s has no default btree operator class",
- format_type_be(atttype)),
- errhint("You must specify a btree operator
class or define a default btree operator class for the data type.")));
+ errmsg("data type %s has no default %s operator class",
+ format_type_be(atttype), am_method),
+ errhint("You must specify a %s operator
class or define a default %s operator class for the data type.",
+ am_method, am_method)));

Let's use this existing wording from typecmds.c:

errmsg("data type %s has no default operator
class for access method \"%s\"",

and for the hint, maybe: You must specify an operator class or define
a default operator class for the data type. Leave out the %s, in
other words.

+ /*
+ * Hash operator classes provide only equality, not ordering.
+ * Collation, which is relevant for ordering and not for equality, is
+ * irrelevant for hash partitioning.
+ */
+ if (*strategy == PARTITION_STRATEGY_HASH && pelem->collation != NIL)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot use collation for hash partitioning"),
+ parser_errposition(pstate, pelem->location)));

This error message is not very informative, and it requires
propagating information about the partitioning type into parts of the
code that otherwise don't require it. I was waffling before on
whether to ERROR here; I think now I'm in favor of ignoring the
problem. The collation won't do any harm; it just won't affect the
behavior.

+ * Identify opclass to use. For list and range partitioning we use
+ * only btree operator class, which seems enough for those. For hash
+ * partitioning, we use hash operator class.

Strange wording. Suggest: Identify the appropriate operator class.
For list and range partitioning, we use a btree operator class; hash
partitioning uses a hash operator class.

+ FOR VALUES WITH '(' hash_partbound ')' /*TODO: syntax is
not finalised*/

Remove the comment.

+ foreach (lc, $5)
+ {
+ DefElem *opt = (DefElem *) lfirst(lc);
+
+ if (strcmp(opt->defname, "modulus") == 0)
+ n->modulus = defGetInt32(opt);
+ else if (strcmp(opt->defname, "remainder") == 0)
+ n->remainder = defGetInt32(opt);
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("unrecognized hash
partition bound specification \"%s\"",
+ opt->defname),
+ parser_errposition(opt->location)));
+ }

This logic doesn't complain if the same option is specified more than
once. I suggest adding a check for that, and also pushing this logic
out into a helper function that gets called here instead of including
it inline.

+ errmsg("hash partition modulus must be a positive
integer")));

modulus for hash partition

+ errmsg("hash partition remainder must be a
non-negative integer")));

remainder for hash partition

+ errmsg("hash partition modulus must be greater than remainder")));

modulus for hash partition must be greater than remainder

+-- values are hashed, row may map to different partitions, which result in

the row

+-- regression failure. To avoid this, let's create non-default hash function

create a non-default

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2017-05-19 17:31:11 Re: Preliminary results for proposed new pgindent implementation
Previous Message Cyril Auburtin 2017-05-19 16:50:55 Allowing dash character in LTREE