Re: [POC] hash partitioning

From: amul sul <sulamul(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp>, David Steele <david(at)pgmasters(dot)net>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [POC] hash partitioning
Date: 2017-05-22 08:53:31
Message-ID: CAAJ_b95eMdLuiM2_gAZ=Bz3yoXg+OG1mHs6H4US5eRhVptPwRg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, May 19, 2017 at 10:35 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Fri, May 19, 2017 at 5:32 AM, amul sul <sulamul(at)gmail(dot)com> wrote:
>> Updated patch attached. 0001-patch rebased against latest head.
>> 0002-patch also incorporates code comments and error message changes
>> as per Robert's & your suggestions. Thanks !
>
> - if (spec->strategy != PARTITION_STRATEGY_LIST)
> - elog(ERROR, "invalid strategy in partition bound spec");
> + Assert(spec->strategy == PARTITION_STRATEGY_LIST);
>
> Let's just drop these hunks. I realize this is a response to a review
> comment I made, but I take it back. If the existing code is already
> doing it this way, there's no real need to revise it. The patch
> doesn't even make it consistent anyway, since elsewhere you elog() for
> a similar case. Perhaps elog() is best anyway.
>
Done.

> - partitioning methods include range and list, where each partition is
> - assigned a range of keys and a list of keys, respectively.
> + partitioning methods include hash, range and list, where each partition is
> + assigned a modulus and remainder of keys, a range of keys and a list of
> + keys, respectively.
>
> I think this sentence has become too long and unwieldy, and is more
> unclear than helpful. I'd just write "The currently supported
> partitioning methods are list, range, and hash." The use of the word
> include is actually wrong here, because it implies that there are more
> not mentioned here, which is false.
>
Done.

> - expression. If no btree operator class is specified when creating a
> - partitioned table, the default btree operator class for the datatype will
> - be used. If there is none, an error will be reported.
> + expression. List and range partitioning uses only btree operator class.
> + Hash partitioning uses only hash operator class. If no operator class is
> + specified when creating a partitioned table, the default operator class
> + for the datatype will be used. If there is none, an error will be
> + reported.
> + </para>
>
> I suggest: If no operator class is specified when creating a
> partitioned table, the default operator class of the appropriate type
> (btree for list and range partitioning, hash for hash partitioning)
> will be used. If there is none, an error will be reported.
>
Done.

> + <para>
> + Since hash partitiong operator class, provide only equality,
> not ordering,
> + collation is not relevant in hash partition key column. An error will be
> + reported if collation is specified.
>
> partitiong -> partitioning. Also, remove the comma after "operator
> class" and change "not relevant in hash partition key column" to "not
> relevant for hash partitioning". Also change "if collation is
> specified" to "if a collation is specified".
>
Done.

> + Create a hash partitioned table:
> +<programlisting>
> +CREATE TABLE orders (
> + order_id bigint not null,
> + cust_id bigint not null,
> + status text
> +) PARTITION BY HASH (order_id);
> +</programlisting></para>
>
> Move this down so it's just above the example of creating partitions.
>
Done.

> + * For range and list partitioned tables, datums is an array of datum-tuples
> + * with key->partnatts datums each.
> + * For hash partitioned tables, it is an array of datum-tuples with 2 datums,
> + * modulus and remainder, corresponding to a given partition.
>
> Second line is very short; reflow as one paragraph.
>
Done

> * In case of range partitioning, it stores one entry per distinct range
> * datum, which is the index of the partition for which a given datum
> * is an upper bound.
> + * In the case of hash partitioning, the number of the entries in the indexes
> + * array is same as the greatest modulus amongst all partitions. For a given
> + * partition key datum-tuple, the index of the partition which would
> accept that
> + * datum-tuple would be given by the entry pointed by remainder produced when
> + * hash value of the datum-tuple is divided by the greatest modulus.
>
> Insert line break before the new text as a paragraph break.

Will wait for more inputs on Ashutosh's explanation upthread.

>
> + char strategy; /* hash, list or range bounds? */
>
> Might be clearer to just write /* hash, list, or range? */ or /*
> bounds for hash, list, or range? */
>

Done, used "hash, list, or range?"

>
> +static uint32 compute_hash_value(PartitionKey key, Datum *values,
> bool *isnull);
> +
>
> I think there should be a blank line after this but not before it.
>

Done.

> I don't really see why hash partitioning needs to touch
> partition_bounds_equal() at all. Why can't the existing logic work
> for hash partitioning without change?
>

Unlike list and range partition, ndatums does not represents size of
the indexes array, also dimension of datums array in not the same as
a key->partnatts.

> + valid_bound = true;
>
> valid_modulus, maybe?
>

Sure, added.

> - errmsg("data type %s has no default btree operator class",
> - format_type_be(atttype)),
> - errhint("You must specify a btree operator
> class or define a default btree operator class for the data type.")));
> + errmsg("data type %s has no default %s operator class",
> + format_type_be(atttype), am_method),
> + errhint("You must specify a %s operator
> class or define a default %s operator class for the data type.",
> + am_method, am_method)));
>
> Let's use this existing wording from typecmds.c:
>
> errmsg("data type %s has no default operator
> class for access method \"%s\"",
>
> and for the hint, maybe: You must specify an operator class or define
> a default operator class for the data type. Leave out the %s, in
> other words.
>

Done.

> + /*
> + * Hash operator classes provide only equality, not ordering.
> + * Collation, which is relevant for ordering and not for equality, is
> + * irrelevant for hash partitioning.
> + */
> + if (*strategy == PARTITION_STRATEGY_HASH && pelem->collation != NIL)
> + ereport(ERROR,
> + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> + errmsg("cannot use collation for hash partitioning"),
> + parser_errposition(pstate, pelem->location)));
>
> This error message is not very informative, and it requires
> propagating information about the partitioning type into parts of the
> code that otherwise don't require it. I was waffling before on
> whether to ERROR here; I think now I'm in favor of ignoring the
> problem. The collation won't do any harm; it just won't affect the
> behavior.
>

Removed.

> + * Identify opclass to use. For list and range partitioning we use
> + * only btree operator class, which seems enough for those. For hash
> + * partitioning, we use hash operator class.
>
> Strange wording. Suggest: Identify the appropriate operator class.
> For list and range partitioning, we use a btree operator class; hash
> partitioning uses a hash operator class.
>

Done

> + FOR VALUES WITH '(' hash_partbound ')' /*TODO: syntax is
> not finalised*/
>
> Remove the comment.
>

Done.

> + foreach (lc, $5)
> + {
> + DefElem *opt = (DefElem *) lfirst(lc);
> +
> + if (strcmp(opt->defname, "modulus") == 0)
> + n->modulus = defGetInt32(opt);
> + else if (strcmp(opt->defname, "remainder") == 0)
> + n->remainder = defGetInt32(opt);
> + else
> + ereport(ERROR,
> + (errcode(ERRCODE_SYNTAX_ERROR),
> + errmsg("unrecognized hash
> partition bound specification \"%s\"",
> + opt->defname),
> + parser_errposition(opt->location)));
> + }
>
> This logic doesn't complain if the same option is specified more than
> once. I suggest adding a check for that, and also pushing this logic
> out into a helper function that gets called here instead of including
> it inline.
>

Added duplicate error.
About separate helper function, can't we have as it is, because, imo,
we might not going to use that elsewhere?

> + errmsg("hash partition modulus must be a positive
> integer")));
>
> modulus for hash partition
>
> + errmsg("hash partition remainder must be a
> non-negative integer")));
>
> remainder for hash partition
>
> + errmsg("hash partition modulus must be greater than remainder")));
>
> modulus for hash partition must be greater than remainder
>

Done. Similar changes in gram.y as well.

> +-- values are hashed, row may map to different partitions, which result in
>
> the row
>
> +-- regression failure. To avoid this, let's create non-default hash function
>
> create a non-default

Done.

Updated patch attached. Thanks a lot for review.

Regards,
Amul

Attachment Content-Type Size
0001-Cleanup_v4.patch application/octet-stream 4.0 KB
0002-hash-partitioning_another_design-v11.patch application/octet-stream 82.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message tushar 2017-05-22 09:09:04 "create publication..all tables" ignore 'partition not supported' error
Previous Message Fabien COELHO 2017-05-22 07:40:45 Re: psql - add special variable to reflect the last query status