Re: [POC] hash partitioning

From: amul sul <sulamul(at)gmail(dot)com>
To: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>, Thom Brown <thom(at)linux(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, David Steele <david(at)pgmasters(dot)net>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [POC] hash partitioning
Date: 2017-09-28 09:56:48
Message-ID: CAAJ_b97dGjD3ns2DhVMyLFN2EZaEmb22mq9LbEb_3RbeYvA7NQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 28, 2017 at 11:24 AM, Amit Langote
<Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> On 2017/09/27 22:41, Jesper Pedersen wrote:
>> On 09/27/2017 03:05 AM, amul sul wrote:
>>>>> Attached rebased patch, thanks.
>>>>>
>>>>>
>>>> While reading through the patch I thought it would be better to keep
>>>> MODULUS and REMAINDER in caps, if CREATE TABLE was in caps too in order to
>>>> highlight that these are "keywords" for hash partition.
>>>>
>>>> Also updated some of the documentation.
>>>>
>>>>
>>> Thanks a lot for the patch, included in the attached version.
>>>
>>
>> Thank you.
>>
>> Based on [1] I have moved the patch to "Ready for Committer".
>
> Thanks a lot Amul for working on this. Like Jesper said, the patch looks
> pretty good overall. I was looking at the latest version with intent to
> study certain things about hash partitioning the way patch implements it,
> during which I noticed some things.
>

Thanks Amit for looking at the patch.

> + The modulus must be a positive integer, and the remainder must a
>
> must be a
>

Fixed in the attached version.

> + suppose you have a hash-partitioned table with 8 children, each of
> which
> + has modulus 8, but find it necessary to increase the number of
> partitions
> + to 16.
>

Fixed in the attached version.

> Might it be a good idea to say 8 "partitions" instead of "children" in the
> first sentence?
>
> + each modulus-8 partition until none remain. While this may still
> involve
> + a large amount of data movement at each step, it is still better than
> + having to create a whole new table and move all the data at once.
> + </para>
> +
>

Fixed in the attached version.

> I read the paragraph that ends with the above text and started wondering
> if the example to redistribute data in hash partitions by detaching and
> attaching with new modulus/remainder could be illustrated with an example?
> Maybe in the examples section of the ALTER TABLE page?
>

I think hint in the documentation is more than enough. There is N number of
ways of data redistribution, the document is not meant to explain all of those.

> + Since hash operator class provide only equality, not ordering,
> collation
>
> Either "Since hash operator classes provide" or "Since hash operator class
> provides"
>

Fixed in the attached version.

> Other than the above points, patch looks good.
>
>
> By the way, I noticed a couple of things about hash partition constraints:
>
> 1. In get_qual_for_hash(), using
> get_fn_expr_rettype(&key->partsupfunc[i]), which returns InvalidOid for
> the lack of fn_expr being set to non-NULL value, causes funcrettype of the
> FuncExpr being generated for hashing partition key columns to be set to
> InvalidOid, which I think is wrong. That is, the following if condition
> in get_fn_expr_rettype() is always satisfied:
>
> if (!flinfo || !flinfo->fn_expr)
> return InvalidOid;
>
> I think we could use get_func_rettype(&key->partsupfunc[i].fn_oid)
> instead. Attached patch
> hash-v21-set-funcexpr-funcrettype-correctly.patch, which applies on top
> v21 of your patch.
>

Thanks for the patch, included in the attached version.

> 2. It seems that the reason constraint exclusion doesn't work with hash
> partitions as implemented by the patch is that predtest.c:
> operator_predicate_proof() returns false even without looking into the
> hash partition constraint, which is of the following form:
>
> satisfies_hash_partition(<mod>, <rem>, <key1-exthash>,..)
>
> beccause the above constraint expression doesn't translate into a a binary
> opclause (an OpExpr), which operator_predicate_proof() knows how to work
> with. So, false is returned at the beginning of that function by the
> following code:
>
> if (!is_opclause(predicate))
> return false;
>
> For example,
>
> create table p (a int) partition by hash (a);
> create table p0 partition of p for values with (modulus 4, remainder 0);
> create table p1 partition of p for values with (modulus 4, remainder 1);
> \d+ p0
> <...>
> Partition constraint: satisfies_hash_partition(4, 0, hashint4extended(a,
> '8816678312871386367'::bigint))
>
> -- both p0 and p1 scanned
> explain select * from p where satisfies_hash_partition(4, 0,
> hashint4extended(a, '8816678312871386367'::bigint));
> QUERY PLAN
>
> ----------------------------------------------------------------------------------------------------
> Append (cost=0.00..96.50 rows=1700 width=4)
> -> Seq Scan on p0 (cost=0.00..48.25 rows=850 width=4)
> Filter: satisfies_hash_partition(4, 0, hashint4extended(a,
> '8816678312871386367'::bigint))
> -> Seq Scan on p1 (cost=0.00..48.25 rows=850 width=4)
> Filter: satisfies_hash_partition(4, 0, hashint4extended(a,
> '8816678312871386367'::bigint))
> (5 rows)
>
> -- both p0 and p1 scanned
> explain select * from p where satisfies_hash_partition(4, 1,
> hashint4extended(a, '8816678312871386367'::bigint));
> QUERY PLAN
>
> ----------------------------------------------------------------------------------------------------
> Append (cost=0.00..96.50 rows=1700 width=4)
> -> Seq Scan on p0 (cost=0.00..48.25 rows=850 width=4)
> Filter: satisfies_hash_partition(4, 1, hashint4extended(a,
> '8816678312871386367'::bigint))
> -> Seq Scan on p1 (cost=0.00..48.25 rows=850 width=4)
> Filter: satisfies_hash_partition(4, 1, hashint4extended(a,
> '8816678312871386367'::bigint))
> (5 rows)
>
>
> I looked into how satisfies_hash_partition() works and came up with an
> idea that I think will make constraint exclusion work. What if we emitted
> the hash partition constraint in the following form instead:
>
> hash_partition_mod(hash_partition_hash(key1-exthash, key2-exthash),
> <mod>) = <rem>
>
> With that form, constraint exclusion seems to work as illustrated below:
>
> \d+ p0
> <...>
> Partition constraint:
> (hash_partition_modulus(hash_partition_hash(hashint4extended(a,
> '8816678312871386367'::bigint)), 4) = 0)
>
> -- note only p0 is scanned
> explain select * from p where
> hash_partition_modulus(hash_partition_hash(hashint4extended(a,
> '8816678312871386367'::bigint)), 4) = 0;
> QUERY PLAN
>
> --------------------------------------------------------------------------------------------------------------------------
> Append (cost=0.00..61.00 rows=13 width=4)
> -> Seq Scan on p0 (cost=0.00..61.00 rows=13 width=4)
> Filter:
> (hash_partition_modulus(hash_partition_hash(hashint4extended(a,
> '8816678312871386367'::bigint)), 4) = 0)
> (3 rows)
>
> -- note only p1 is scanned
> explain select * from p where
> hash_partition_modulus(hash_partition_hash(hashint4extended(a,
> '8816678312871386367'::bigint)), 4) = 1;
> QUERY PLAN
>
> --------------------------------------------------------------------------------------------------------------------------
> Append (cost=0.00..61.00 rows=13 width=4)
> -> Seq Scan on p1 (cost=0.00..61.00 rows=13 width=4)
> Filter:
> (hash_partition_modulus(hash_partition_hash(hashint4extended(a,
> '8816678312871386367'::bigint)), 4) = 1)
> (3 rows)
>
> I tried to implement that in the attached
> hash-v21-hash-part-constraint.patch, which applies on top v21 of your
> patch (actually on top of
> hash-v21-set-funcexpr-funcrettype-correctly.patch, which I think should be
> applied anyway as it fixes a bug of the original patch).
>
> What do you think? Eventually, the new partition-pruning method [1] will
> make using constraint exclusion obsolete, but it might be a good idea to
> have it working if we can.
>

It does not really do the partition pruning via constraint exclusion and I don't
think anyone is going to use the remainder in the where condition to fetch
data and hash partitioning is not meant for that.

But I am sure that we could solve this problem using your and Beena's work
toward faster partition pruning[1] and Runtime Partition Pruning[2].

Will think on this changes if it is required for the pruning feature.

Regards,
Amul

1] https://postgr.es/m/098b9c71-1915-1a2a-8d52-1a7a50ce79e8@lab.ntt.co.jp
2] https://postgr.es/m/CAOG9ApE16ac-_VVZVvv0gePSgkg_BwYEV1NBqZFqDR2bBE0X0A@mail.gmail.com

Attachment Content-Type Size
0001-hash-partitioning_another_design-v22.patch application/octet-stream 86.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Khandekar 2017-09-28 10:18:11 Re: Parallel Append implementation
Previous Message Ashutosh Bapat 2017-09-28 09:42:08 Re: Partition-wise aggregation/grouping