Re: [POC] hash partitioning

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: amul sul <sulamul(at)gmail(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Thom Brown <thom(at)linux(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp>, David Steele <david(at)pgmasters(dot)net>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [POC] hash partitioning
Date: 2017-10-12 01:01:43
Message-ID: CA+TgmoYhB0+r--=bYkvexS7h+6P9GdoO8BN+vFWUuFxTUArZXQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Oct 10, 2017 at 7:07 AM, amul sul <sulamul(at)gmail(dot)com> wrote:
> How about the attached patch(0003)?
> Also, the dim variable is renamed to natts.

I'm not sure I believe this comment:

+ /*
+ * We arrange the partitions in the ascending order of their modulus
+ * and remainders. Also every modulus is factor of next larger
+ * modulus. This means that the index of a given partition is same as
+ * the remainder of that partition. Also entries at (remainder + N *
+ * modulus) positions in indexes array are all same for (modulus,
+ * remainder) specification for any partition. Thus datums array from
+ * both the given bounds are same, if and only if their indexes array
+ * will be same. So, it suffices to compare indexes array.
+ */

I am particularly not sure that I believe that the index of a
partition must be the same as the remainder. It doesn't seem like
that would be true when there is more than one modulus or when some
partitions are missing.

+ if (offset < 0)
+ {
+ next_modulus = DatumGetInt32(datums[0][0]);
+ valid_modulus = (next_modulus % spec->modulus) == 0;
+ }
+ else
+ {
+ prev_modulus = DatumGetInt32(datums[offset][0]);
+ valid_modulus = (spec->modulus % prev_modulus) == 0;
+
+ if (valid_modulus && (offset + 1) < ndatums)
+ {
+ next_modulus =
DatumGetInt32(datums[offset + 1][0]);
+ valid_modulus = (next_modulus %
spec->modulus) == 0;
+ }
+ }

I don't think this is quite right. It checks the new modulus against
prev_modulus whenever prev_modulus is defined, which is correct, but
it doesn't check the new modulus against the next_modulus except when
offset < 0. But actually that check needs to be performed, I think,
whenever the new modulus is less than the greatest modulus seen so
far.

+ * For a partitioned table defined as:
+ * CREATE TABLE simple_hash (a int, b char(10)) PARTITION BY HASH (a, b);
+ *
+ * CREATE TABLE p_p1 PARTITION OF simple_hash
+ * FOR VALUES WITH (MODULUS 2, REMAINDER 1);
+ * CREATE TABLE p_p2 PARTITION OF simple_hash
+ * FOR VALUES WITH (MODULUS 4, REMAINDER 2);
+ * CREATE TABLE p_p3 PARTITION OF simple_hash
+ * FOR VALUES WITH (MODULUS 8, REMAINDER 0);
+ * CREATE TABLE p_p4 PARTITION OF simple_hash
+ * FOR VALUES WITH (MODULUS 8, REMAINDER 4);
+ *
+ * This function will return one of the following in the form of an
+ * expression:
+ *
+ * for p_p1: satisfies_hash_partition(2, 1, hash_fn_1_extended(a, HASH_SEED),
+ * hash_fn_2_extended(b,
HASH_SEED))
+ * for p_p2: satisfies_hash_partition(4, 2, hash_fn_1_extended(a, HASH_SEED),
+ * hash_fn_2_extended(b,
HASH_SEED))
+ * for p_p3: satisfies_hash_partition(8, 0, hash_fn_1_extended(a, HASH_SEED),
+ * hash_fn_2_extended(b,
HASH_SEED))
+ * for p_p4: satisfies_hash_partition(8, 4, hash_fn_1_extended(a, HASH_SEED),
+ * hash_fn_2_extended(b,
HASH_SEED))

I think instead of this lengthy example you should try to explain the
general rule. Maybe something like: the partition constraint for a
hash partition is always a call to the built-in function
satisfies_hash_partition(). The first two arguments are the modulus
and remainder for the partition; the remaining arguments are the hash
values computed for each column of the partition key using the
extended hash function from the appropriate opclass.

+static uint64
+mix_hash_value(int nkeys, Datum *hash_array, bool *isnull)

It would be nice to use the hash_combine() facility Andres recently
added for this rather than having a way to do it that is specific to
hash partitioning, but that function only works for 32-bit hash
values. Maybe we can persuade Andres to add a hash_combine64...

+ * a hash operator class

Missing period at end.

+ if (strategy == PARTITION_STRATEGY_HASH)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
+ errmsg("default hash partition is not supported")));

Maybe errmsg("a hash-partitioned table may not have a default partition")?

+/* Seed for the extended hash function */
+#define HASH_SEED UINT64CONST(0x7A5B22367996DCFF)

I suggest HASH_PARTITION_SEED -- this is too generic.

Have you checked how well the tests you've added cover the code you've
added? What code is not covered by the tests, and is there any way to
cover it?

Thanks,

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2017-10-12 01:23:06 Re: Slow synchronous logical replication
Previous Message Michael Paquier 2017-10-12 00:46:23 Re: 64-bit queryId?