Re: [POC] hash partitioning

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: jesper(dot)pedersen(at)redhat(dot)com, amul sul <sulamul(at)gmail(dot)com>
Cc: Thom Brown <thom(at)linux(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, David Steele <david(at)pgmasters(dot)net>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [POC] hash partitioning
Date: 2017-09-28 05:54:59
Message-ID: cdd10cdc-0c92-8294-db56-3e89b65604b6@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2017/09/27 22:41, Jesper Pedersen wrote:
> On 09/27/2017 03:05 AM, amul sul wrote:
>>>> Attached rebased patch, thanks.
>>>>
>>>>
>>> While reading through the patch I thought it would be better to keep
>>> MODULUS and REMAINDER in caps, if CREATE TABLE was in caps too in order to
>>> highlight that these are "keywords" for hash partition.
>>>
>>> Also updated some of the documentation.
>>>
>>>
>> Thanks a lot for the patch, included in the attached version.​
>>
>
> Thank you.
>
> Based on [1] I have moved the patch to "Ready for Committer".

Thanks a lot Amul for working on this. Like Jesper said, the patch looks
pretty good overall. I was looking at the latest version with intent to
study certain things about hash partitioning the way patch implements it,
during which I noticed some things.

+ The modulus must be a positive integer, and the remainder must a

must be a

+ suppose you have a hash-partitioned table with 8 children, each of
which
+ has modulus 8, but find it necessary to increase the number of
partitions
+ to 16.

Might it be a good idea to say 8 "partitions" instead of "children" in the
first sentence?

+ each modulus-8 partition until none remain. While this may still
involve
+ a large amount of data movement at each step, it is still better than
+ having to create a whole new table and move all the data at once.
+ </para>
+

I read the paragraph that ends with the above text and started wondering
if the example to redistribute data in hash partitions by detaching and
attaching with new modulus/remainder could be illustrated with an example?
Maybe in the examples section of the ALTER TABLE page?

+ Since hash operator class provide only equality, not ordering,
collation

Either "Since hash operator classes provide" or "Since hash operator class
provides"

Other than the above points, patch looks good.

By the way, I noticed a couple of things about hash partition constraints:

1. In get_qual_for_hash(), using
get_fn_expr_rettype(&key->partsupfunc[i]), which returns InvalidOid for
the lack of fn_expr being set to non-NULL value, causes funcrettype of the
FuncExpr being generated for hashing partition key columns to be set to
InvalidOid, which I think is wrong. That is, the following if condition
in get_fn_expr_rettype() is always satisfied:

if (!flinfo || !flinfo->fn_expr)
return InvalidOid;

I think we could use get_func_rettype(&key->partsupfunc[i].fn_oid)
instead. Attached patch
hash-v21-set-funcexpr-funcrettype-correctly.patch, which applies on top
v21 of your patch.

2. It seems that the reason constraint exclusion doesn't work with hash
partitions as implemented by the patch is that predtest.c:
operator_predicate_proof() returns false even without looking into the
hash partition constraint, which is of the following form:

satisfies_hash_partition(<mod>, <rem>, <key1-exthash>,..)

beccause the above constraint expression doesn't translate into a a binary
opclause (an OpExpr), which operator_predicate_proof() knows how to work
with. So, false is returned at the beginning of that function by the
following code:

if (!is_opclause(predicate))
return false;

For example,

create table p (a int) partition by hash (a);
create table p0 partition of p for values with (modulus 4, remainder 0);
create table p1 partition of p for values with (modulus 4, remainder 1);
\d+ p0
<...>
Partition constraint: satisfies_hash_partition(4, 0, hashint4extended(a,
'8816678312871386367'::bigint))

-- both p0 and p1 scanned
explain select * from p where satisfies_hash_partition(4, 0,
hashint4extended(a, '8816678312871386367'::bigint));
QUERY PLAN

----------------------------------------------------------------------------------------------------
Append (cost=0.00..96.50 rows=1700 width=4)
-> Seq Scan on p0 (cost=0.00..48.25 rows=850 width=4)
Filter: satisfies_hash_partition(4, 0, hashint4extended(a,
'8816678312871386367'::bigint))
-> Seq Scan on p1 (cost=0.00..48.25 rows=850 width=4)
Filter: satisfies_hash_partition(4, 0, hashint4extended(a,
'8816678312871386367'::bigint))
(5 rows)

-- both p0 and p1 scanned
explain select * from p where satisfies_hash_partition(4, 1,
hashint4extended(a, '8816678312871386367'::bigint));
QUERY PLAN

----------------------------------------------------------------------------------------------------
Append (cost=0.00..96.50 rows=1700 width=4)
-> Seq Scan on p0 (cost=0.00..48.25 rows=850 width=4)
Filter: satisfies_hash_partition(4, 1, hashint4extended(a,
'8816678312871386367'::bigint))
-> Seq Scan on p1 (cost=0.00..48.25 rows=850 width=4)
Filter: satisfies_hash_partition(4, 1, hashint4extended(a,
'8816678312871386367'::bigint))
(5 rows)

I looked into how satisfies_hash_partition() works and came up with an
idea that I think will make constraint exclusion work. What if we emitted
the hash partition constraint in the following form instead:

hash_partition_mod(hash_partition_hash(key1-exthash, key2-exthash),
<mod>) = <rem>

With that form, constraint exclusion seems to work as illustrated below:

\d+ p0
<...>
Partition constraint:
(hash_partition_modulus(hash_partition_hash(hashint4extended(a,
'8816678312871386367'::bigint)), 4) = 0)

-- note only p0 is scanned
explain select * from p where
hash_partition_modulus(hash_partition_hash(hashint4extended(a,
'8816678312871386367'::bigint)), 4) = 0;
QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------
Append (cost=0.00..61.00 rows=13 width=4)
-> Seq Scan on p0 (cost=0.00..61.00 rows=13 width=4)
Filter:
(hash_partition_modulus(hash_partition_hash(hashint4extended(a,
'8816678312871386367'::bigint)), 4) = 0)
(3 rows)

-- note only p1 is scanned
explain select * from p where
hash_partition_modulus(hash_partition_hash(hashint4extended(a,
'8816678312871386367'::bigint)), 4) = 1;
QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------
Append (cost=0.00..61.00 rows=13 width=4)
-> Seq Scan on p1 (cost=0.00..61.00 rows=13 width=4)
Filter:
(hash_partition_modulus(hash_partition_hash(hashint4extended(a,
'8816678312871386367'::bigint)), 4) = 1)
(3 rows)

I tried to implement that in the attached
hash-v21-hash-part-constraint.patch, which applies on top v21 of your
patch (actually on top of
hash-v21-set-funcexpr-funcrettype-correctly.patch, which I think should be
applied anyway as it fixes a bug of the original patch).

What do you think? Eventually, the new partition-pruning method [1] will
make using constraint exclusion obsolete, but it might be a good idea to
have it working if we can.

Thanks,
Amit

[1] https://commitfest.postgresql.org/14/1272/

Attachment Content-Type Size
hash-v21-hash-part-constraint-v1.patch text/plain 8.4 KB
hash-v21-set-funcexpr-funcrettype-correctly-v1.patch text/plain 892 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2017-09-28 05:59:41 Re: proposal - Default namespaces for XPath expressions (PostgreSQL 11)
Previous Message Noah Misch 2017-09-28 05:31:23 Re: Shaky coding for vacuuming partitioned relations