Re: [Sender Address Forgery]Re: [Sender Address Forgery]Re: [HACKERS] path toward faster partition pruning

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [Sender Address Forgery]Re: [Sender Address Forgery]Re: [HACKERS] path toward faster partition pruning
Date: 2018-01-17 03:32:16
Message-ID: CAKJS1f-zyX2k08rpahzDudwt1_FFZ4E_5jMNQJ8JZBr3pmG5bA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 16 January 2018 at 21:08, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> On 2018/01/12 12:30, David Rowley wrote:
>> 8. The code in get_partitions_from_ne_clauses() does perform quite a
>> few nested loops. I think a more simple way to would be to track the
>> offsets you've seen in a Bitmapset. This would save you having to
>> check for duplicates, as an offset can only contain a single datum.
>> You'd just need to build a couple of arrays after that, one to sum up
>> the offsets found per partition, and one for the total datums allowed
>> in the partition. If the numbers match then you can remove the
>> partition.
>>
>> I've written this and attached it to this email. It saves about 50
>> lines of code and should perform much better for complex cases, for
>> example, a large NOT IN list. This also implements #6.
>
> I liked your patch, so incorporated it, except, I feel slightly
> uncomfortable about the new name that you chose for the function because
> it sounds a bit generic.

You're right. I only renamed it because I inverted the meaning of the
function in the patch. It no longer did
"get_partitions_from_ne_clauses", it did the opposite and give the
partitions which can't match. Please feel free to think of a new
better name. Is "get_partitions_excluded_by_ne_clauses" too long?

>> I think you quite often use "the same" to mean "it". Can you change that?
>
> I guess that's just one of my many odd habits when writing English, all of
> which I'm trying to get rid of, but apparently with limited success. Must
> try harder. :)

Oops, on re-reading that it sounded as though I was asking you to
change some habit, but I just meant the comments. I understand there
will be places that use English where that's normal. It's just I don't
recall seeing that in PostgreSQL code before. American English is
pretty much the standard for the project, despite that not always
being strictly applied (e.g we have a command called ANALYSE which is
an alias for ANALYZE). I always try and do my best to spell words in
American English (which is not where I'm from), which for me stretches
about as far as putting 'z' in the place of some of my 's'es.

> I rewrote the above comment block as:
>
> * Try to compare the constant arguments of 'leftarg' and 'rightarg', in that
> * order, using the operator of 'op' and set *result to the result of this
> * comparison.
>
> Is that any better?

Sounds good.

>
>> 15. "the latter" is normally used when you're referring to the last
>> thing in a list which was just mentioned. In this case, leftarg_const
>> and rightarg_const is the list, so "the latter" should mean
>> rightarg_const, but I think you mean to compare them using the
>> operator.
>>
>> * If the leftarg_const and rightarg_const are both of the type expected
>> * by op's operator, then compare them using the latter.
>
> Rewrote it as:
>
> * We can compare leftarg_const and rightarg_const using op's operator
> * only if both are of the type expected by it.

I'd probably write "expected type." instead of "type expected by it."

>> 17. The following example will cause get_partitions_for_keys_hash to misbehave:
>>
>> create table hashp (a int, b int) partition by hash (a, b);
>> create table hashp1 partition of hashp for values with (modulus 4, remainder 0);
>> create table hashp2 partition of hashp for values with (modulus 4, remainder 1);
>> create table hashp3 partition of hashp for values with (modulus 4, remainder 3);
>> create table hashp4 partition of hashp for values with (modulus 4, remainder 2);
>> explain select * from hashp where a = 1 and a is null;
>>
>> The following code assumes that you'll never get a NULL test for a key
>> that has an equality test, and ends up trying to prune partitions
>> thinking we got compatible clauses for both partition keys.
>
> Yeah, I think this example helps spot a problem. I thought we'd never get
> to get_partitions_for_keys_hash() for the above query, because we
> should've been able to prove much earlier that the particular clause
> combination should be always false (a cannot be both non-null 1 and null).
> Now, because the planner itself doesn't substitute a constant-false for
> that, I taught classify_partition_bounding_keys() to do so. It would now
> return constfalse=true if it turns out that clauses in the input list lead
> to contradictory nullness condition for a given partition column.
>
>> memset(keyisnull, false, sizeof(keyisnull));
>> for (i = 0; i < partkey->partnatts; i++)
>> {
>> if (bms_is_member(i, keys->keyisnull))
>> {
>> keys->n_eqkeys++;
>> keyisnull[i] = true;
>> }
>> }
>>
>> /*
>> * Can only do pruning if we know all the keys and they're all equality
>> * keys including the nulls that we just counted above.
>> */
>> if (keys->n_eqkeys == partkey->partnatts)
>>
>> The above code will need to be made smarter. It'll likely crash if you
>> change "b" to a pass-by-ref type.
>
> Hmm, not sure why. It seems to work:

Yeah, works now because you've added new code to test for
contradictions in the quals, e.g a = 1 and a is null is now rejected
as constfalse.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2018-01-17 04:05:44 Re: [Sender Address Forgery]Re: [Sender Address Forgery]Re: [HACKERS] path toward faster partition pruning
Previous Message Andrew Dunstan 2018-01-17 00:29:09 pgsql: Centralize json and jsonb handling of datetime types