Re: [POC] hash partitioning

From: amul sul <sulamul(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Thom Brown <thom(at)linux(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp>, David Steele <david(at)pgmasters(dot)net>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [POC] hash partitioning
Date: 2017-10-12 13:08:37
Message-ID: CAAJ_b948MUW38gEOzkFAgO-mdty4K=iJ7orf13WPL2y_m5_y8w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Oct 12, 2017 at 6:31 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Tue, Oct 10, 2017 at 7:07 AM, amul sul <sulamul(at)gmail(dot)com> wrote:
>> How about the attached patch(0003)?
>> Also, the dim variable is renamed to natts.
>
> I'm not sure I believe this comment:
>
> + /*
> + * We arrange the partitions in the ascending order of their modulus
> + * and remainders. Also every modulus is factor of next larger
> + * modulus. This means that the index of a given partition is same as
> + * the remainder of that partition. Also entries at (remainder + N *
> + * modulus) positions in indexes array are all same for (modulus,
> + * remainder) specification for any partition. Thus datums array from
> + * both the given bounds are same, if and only if their indexes array
> + * will be same. So, it suffices to compare indexes array.
> + */
>
> I am particularly not sure that I believe that the index of a
> partition must be the same as the remainder. It doesn't seem like
> that would be true when there is more than one modulus or when some
> partitions are missing.
>

Looks like an explanation by the comment is not good enough, will think on this.

Here are the links for the previous discussion:
1] https://postgr.es/m/CAFjFpRfHqSGBjNgJV2p%2BC4Yr5Qxvwygdsg4G_VQ6q9NTB-i3MA%40mail.gmail.com
2] https://postgr.es/m/CAFjFpRdeESKFkVGgmOdYvmD3d56-58c5VCBK0zDRjHrkq_VcNg%40mail.gmail.com

> + if (offset < 0)
> + {
> + next_modulus = DatumGetInt32(datums[0][0]);
> + valid_modulus = (next_modulus % spec->modulus) == 0;
> + }
> + else
> + {
> + prev_modulus = DatumGetInt32(datums[offset][0]);
> + valid_modulus = (spec->modulus % prev_modulus) == 0;
> +
> + if (valid_modulus && (offset + 1) < ndatums)
> + {
> + next_modulus =
> DatumGetInt32(datums[offset + 1][0]);
> + valid_modulus = (next_modulus %
> spec->modulus) == 0;
> + }
> + }
>
> I don't think this is quite right. It checks the new modulus against
> prev_modulus whenever prev_modulus is defined, which is correct, but
> it doesn't check the new modulus against the next_modulus except when
> offset < 0. But actually that check needs to be performed, I think,
> whenever the new modulus is less than the greatest modulus seen so
> far.
>
It does. See the "if (valid_modulus && (offset + 1) < ndatums)" block in the
else part of the snippet that you are referring.

For e.g new modulus 25 & 150 is not accepted for the hash partitioned bound with
modulus 10,50,200. Will cover this test as well.

> + * For a partitioned table defined as:
> + * CREATE TABLE simple_hash (a int, b char(10)) PARTITION BY HASH (a, b);
> + *
> + * CREATE TABLE p_p1 PARTITION OF simple_hash
> + * FOR VALUES WITH (MODULUS 2, REMAINDER 1);
> + * CREATE TABLE p_p2 PARTITION OF simple_hash
> + * FOR VALUES WITH (MODULUS 4, REMAINDER 2);
> + * CREATE TABLE p_p3 PARTITION OF simple_hash
> + * FOR VALUES WITH (MODULUS 8, REMAINDER 0);
> + * CREATE TABLE p_p4 PARTITION OF simple_hash
> + * FOR VALUES WITH (MODULUS 8, REMAINDER 4);
> + *
> + * This function will return one of the following in the form of an
> + * expression:
> + *
> + * for p_p1: satisfies_hash_partition(2, 1, hash_fn_1_extended(a, HASH_SEED),
> + * hash_fn_2_extended(b,
> HASH_SEED))
> + * for p_p2: satisfies_hash_partition(4, 2, hash_fn_1_extended(a, HASH_SEED),
> + * hash_fn_2_extended(b,
> HASH_SEED))
> + * for p_p3: satisfies_hash_partition(8, 0, hash_fn_1_extended(a, HASH_SEED),
> + * hash_fn_2_extended(b,
> HASH_SEED))
> + * for p_p4: satisfies_hash_partition(8, 4, hash_fn_1_extended(a, HASH_SEED),
> + * hash_fn_2_extended(b,
> HASH_SEED))
>
> I think instead of this lengthy example you should try to explain the
> general rule. Maybe something like: the partition constraint for a
> hash partition is always a call to the built-in function
> satisfies_hash_partition(). The first two arguments are the modulus
> and remainder for the partition; the remaining arguments are the hash
> values computed for each column of the partition key using the
> extended hash function from the appropriate opclass.
>
Okay will add this.

> +static uint64
> +mix_hash_value(int nkeys, Datum *hash_array, bool *isnull)
>
How about combining high 32 bits and the low 32 bits separately as shown below?

static inline uint64
hash_combine64(uint64 a, uint64 b)
{
return (((uint64) hash_combine((uint32) a >> 32, (uint32) b >> 32) << 32)
| hash_combine((unit32) a, (unit32) b));
}

> It would be nice to use the hash_combine() facility Andres recently
> added for this rather than having a way to do it that is specific to
> hash partitioning, but that function only works for 32-bit hash
> values. Maybe we can persuade Andres to add a hash_combine64...
>
> + * a hash operator class
>
> Missing period at end.
>
Okay will fix this.

> + if (strategy == PARTITION_STRATEGY_HASH)
> + ereport(ERROR,
> + (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
> + errmsg("default hash partition is not supported")));
>
> Maybe errmsg("a hash-partitioned table may not have a default partition")?
>
Okay will add this.

> +/* Seed for the extended hash function */
> +#define HASH_SEED UINT64CONST(0x7A5B22367996DCFF)
>
> I suggest HASH_PARTITION_SEED -- this is too generic.
>
Okay will add this.

> Have you checked how well the tests you've added cover the code you've
> added? What code is not covered by the tests, and is there any way to
> cover it?
>
Will try to get gcov report for this patch.

Thanks for your review.

Regards,
Amul

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2017-10-12 13:14:10 Re: Parallel Bitmap Heap Scans segfaults due to (tbm->dsa==NULL) on PostgreSQL 10
Previous Message Tomas Vondra 2017-10-12 13:07:30 Re: Parallel Bitmap Heap Scans segfaults due to (tbm->dsa==NULL) on PostgreSQL 10