Re: [POC] hash partitioning

From: amul sul <sulamul(at)gmail(dot)com>
To: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp>, David Steele <david(at)pgmasters(dot)net>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [POC] hash partitioning
Date: 2017-05-17 08:37:44
Message-ID: CAAJ_b97D5sJRZoeRcXkw=Ao+F95-d9b9h+dHEPUAp5UipHQPkA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, May 17, 2017 at 11:11 AM, Ashutosh Bapat
<ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:
> On Wed, May 17, 2017 at 12:04 AM, amul sul <sulamul(at)gmail(dot)com> wrote:
>> On Tue, May 16, 2017 at 10:00 PM, Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>>> On Tue, May 16, 2017 at 4:22 PM, amul sul <sulamul(at)gmail(dot)com> wrote:
>>>> v6 patch has bug in partition oid mapping and indexing, fixed in the
>>>> attached version.
>>>>
>>>> Now partition oids will be arranged in the ascending order of hash
>>>> partition bound (i.e. modulus and remainder sorting order)
>>>
>>> Thanks for the update patch. I have some more comments.
>>>
>>> ------------
>>> + if (spec->remainder < 0)
>>> + ereport(ERROR,
>>> + (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
>>> + errmsg("hash partition remainder must be less than modulus")));
>>>
>>> I think this error message is not correct, you might want to change it
>>> to "hash partition remainder must be non-negative integer"
>>>
>>
>> Fixed in the attached version; used "hash partition remainder must be
>> greater than or equal to 0" instead.
>
> I would suggest "non-zero positive", since that's what we are using in
> the documentation.
>

Understood, Fixed in the attached version.

>>
>>> -------
>>>
>>> + The table is partitioned by specifying remainder and modulus for each
>>> + partition. Each partition holds rows for which the hash value of
>>>
>>> Wouldn't it be better to say "modulus and remainder" instead of
>>> "remainder and modulus" then it will be consistent?
>>>
>>
>> You are correct, fixed in the attached version.
>>
>>> -------
>>> + An <command>UPDATE</> that causes a row to move from one partition to
>>> + another fails, because
>>>
>>> fails, because -> fails because
>>>
>>
>> This hunk is no longer exists in the attached patch, that was mistaken
>> copied, sorry about that.
>>
>>> -------
>>>
>>> Wouldn't it be a good idea to document how to increase the number of
>>> hash partitions, I think we can document it somewhere with an example,
>>> something like Robert explained upthread?
>>>
>>> create table foo (a integer, b text) partition by hash (a);
>>> create table foo1 partition of foo with (modulus 2, remainder 0);
>>> create table foo2 partition of foo with (modulus 2, remainder 1);
>>>
>>> You can detach foo1, create two new partitions with modulus 4 and
>>> remainders 0 and 2, and move the data over from the old partition
>>>
>>> I think it will be good information for a user to have? or it's
>>> already documented and I missed it?
>>>
>
> This is already part of documentation contained in the patch.
>
> Here are some more comments
> @@ -3296,6 +3311,14 @@ ALTER TABLE measurement ATTACH PARTITION
> measurement_y2008m02
> not the partitioned table.
> </para>
> </listitem>
> +
> + <listitem>
> + <para>
> + An <command>UPDATE</> that causes a row to move from one partition to
> + another fails, because the new value of the row fails to satisfy the
> + implicit partition constraint of the original partition.
> + </para>
> + </listitem>
> </itemizedlist>
> </para>
> </sect3>
> The description in this chunk is applicable to all the kinds of partitioning.
> Why should it be part of a patch implementing hash partitioning?
>

This was already addressed in the previous patch(v8).

> + Declarative partitioning only supports hash, list and range
> + partitioning, whereas table inheritance allows data to be
> + divided in a manner of the user's choosing. (Note, however,
> + that if constraint exclusion is unable to prune partitions
> + effectively, query performance will be very poor.)
> Looks like the line width is less than 80 characters.
>

Fixed in the attached version.

> In partition_bounds_equal(), please add comments explaining why is it safe to
> check just the indexes? May be we should add code under assertion to make sure
> that the datums are equal as well.

Added assert in the attached version.

> The comment could be something
> like, "If two partitioned tables have different greatest moduli, their
> partition schemes don't match. If they have same greatest moduli, and
> all remainders have different indexes, they all have same modulus
> specified and the partitions are ordered by remainders, thus indexes
> array will be an identity i.e. index[i] = i. If the partition
> corresponding to a given remainder exists, it will have same index
> entry for both partitioned tables or if it's missing it will be -1.
> Thus if indexes array matches, corresponding datums array matches. If
> there are multiple remainders corresponding to a given partition,
> their partitions are ordered by the lowest of the remainders, thus if
> indexes array matches, both of the tables have same indexes arrays, in
> both the tables remainders corresponding to multiple partitions all
> have same indexes and thus same modulus. Thus again if the indexes are
> same, datums are same.".
>

Thanks, added with minor modification.

> In the same function
> if (key->strategy == PARTITION_STRATEGY_HASH)
> {
> int greatest_modulus;
>
> /*
> * Compare greatest modulus of hash partition bound which
> * is the last element of datums array.
> */
> if (b1->datums[b1->ndatums - 1][0] != b2->datums[b2->ndatums - 1][0])
> return false;
>
> /* Compare indexes */
> greatest_modulus = DatumGetInt32(b1->datums[b1->ndatums - 1][0]);
> for (i = 0; i < greatest_modulus; i++)
> if (b1->indexes[i] != b2->indexes[i])
> return false;
> }
> if we return true from where this block ends, we will save one indenation level
> for rest of the code and also FWIW extra diffs in this patch because of this
> indentation change.
>

I still do believe having this code in the IF - ELSE block will be
better for longterm, rather having code clutter to avoid diff that
unpleasant for now.

> + /*
> + * Hash operator classes provide only equality, not ordering.
> + * Collation, which is relevant for ordering and not equality is
> + * irrelevant for hash partitioning.
> + */
> A comma is missing after "equality", and may be we need "for" before
> "equality".
> * Collation, which is relevant for ordering and not equality, is
>
> + * we use hash operator class. */
> */ should be on new line.
>

Fixed.

Regards,
Amul

Attachment Content-Type Size
0002-hash-partitioning_another_design-v9.patch application/octet-stream 82.3 KB
0001-Cleanup_v2.patch application/octet-stream 4.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ildus Kurbangaliev 2017-05-17 08:54:37 Re: Bug in ExecModifyTable function and trigger issues for foreign tables
Previous Message Thomas Munro 2017-05-17 07:42:03 Re: transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)