Re: [HACKERS] path toward faster partition pruning

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Amit Langote <amitlangote09(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] path toward faster partition pruning
Date: 2018-04-11 06:04:24
Message-ID: d0b975f7-18a0-e95d-b1b6-f3d27450adf9@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks for the review.

On 2018/04/10 21:02, David Rowley wrote:
> On 10 April 2018 at 20:56, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> On 2018/04/10 13:27, Ashutosh Bapat wrote:
>>> On Mon, Apr 9, 2018 at 8:56 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>>> CREATE OR REPLACE FUNCTION hashint4_noop(int4, int8) RETURNS int8 AS
>>>> $$SELECT coalesce($1,0)::int8$$ LANGUAGE sql IMMUTABLE;
>>>> CREATE OPERATOR CLASS test_int4_ops FOR TYPE int4 USING HASH AS
>>>> OPERATOR 1 = , FUNCTION 2 hashint4_noop(int4, int8);
>>>> CREATE TABLE mchash (a int, b text, c jsonb)
>>>> PARTITION BY HASH (a test_int4_ops, b test_text_ops);
>>
>> Thanks for the idea. I think it makes sense and also agree that alternate
>> outputs approach is not perfectly reliable and maintainable.
>>
>>> +1.
>>
>> Attached find a patch that rewrites hash partition pruning tests that
>> away. It creates two hash operator classes, one for int4 and another for
>> text type and uses them to create hash partitioned table to be used in the
>> tests, like done in the existing tests in hash_part.sql. Since that makes
>> tests (hopefully) reliably return the same result always, I no longer see
>> the need to keep them in a separate partition_prune_hash.sql. The
>> reasoning behind having the separate file was to keep the alternative
>> output file small as David explained in [1].
>> [1]
>> https://www.postgresql.org/message-id/CAKJS1f-SON_hAekqoV4_WQwJBtJ_rvvSe68jRNhuYcXqQ8PoQg%40mail.gmail.com
>
> I had a quick look, but I'm still confused about why a function like
> hash_uint32_extended() is susceptible to varying results depending on
> CPU endianness but hash_combine64 is not.

It might as well be the combination of both that's sensitive to
endianness. I too am not sure exactly which part. They're are both used
in succession in compute_hash_value:

/*
* Compute hash for each datum value by calling respective
* datatype-specific hash functions of each partition key
* attribute.
*/
hash = FunctionCall2(&partsupfunc[i], values[i], seed);

/* Form a single 64-bit hash value */
rowHash = hash_combine64(rowHash, DatumGetUInt64(hash));

> Apart from that confusion, looking at the patch:
>
> +CREATE OR REPLACE FUNCTION pp_hashint4_noop(int4, int8) RETURNS int8 AS
> +$$SELECT coalesce($1)::int8$$ LANGUAGE sql IMMUTABLE STRICT;
> +CREATE OPERATOR CLASS pp_test_int4_ops FOR TYPE int4 USING HASH AS
> +OPERATOR 1 = , FUNCTION 2 pp_hashint4_noop(int4, int8);
> +CREATE OR REPLACE FUNCTION pp_hashtext_length(text, int8) RETURNS int8 AS
> +$$SELECT length(coalesce($1))::int8$$ LANGUAGE sql IMMUTABLE STRICT;
>
>
> Why coalesce here? Maybe I've not thought of something, but coalesce
> only seems useful to me if there's > 1 argument. Plus the function is
> strict, so not sure it's really doing even if you added a default.

After reading Ashutosh's comment, I realized I didn't really mean to add
the STRICT to those function definitions. As these are not operators, but
support (hash) procedures, it's insignificant to the pruning code whether
they are STRICT or not, unlike clause operators where it is.

Also, I've adopted the coalesce-based hashing function from hash_part.sql,
albeit with unnecessary tweaks. I've not read anywhere about why the
coalesce was used in the first place, but it's insignificant for our
purpose here anyway.

> I know this one was there before, but I only just noticed it:
>
> +-- pruning should work if non-null values are provided for all the keys
> +explain (costs off) select * from hp where a is null and b is null;
>
> The comment is a bit misleading given the first test below it is
> testing for nulls. Maybe it can be changed to
>
> +-- pruning should work if values or is null clauses are provided for
> all partition keys.
I have adjusted the comments.

Updated patch attached.

Thanks,
Amit

Attachment Content-Type Size
v2-0001-Rewrite-hash-partition-pruning-tests-to-use-custo.patch text/plain 28.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2018-04-11 06:12:52 Re: [HACKERS] Runtime Partition Pruning
Previous Message Amit Langote 2018-04-11 05:47:53 Re: [HACKERS] path toward faster partition pruning