Re: [HACKERS] path toward faster partition pruning

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] path toward faster partition pruning
Date: 2018-03-23 12:15:47
Message-ID: 115c4212-47a5-a914-7424-da4a16106412@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi David.

On 2018/03/23 16:38, David Rowley wrote:
> On 21 March 2018 at 00:07, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> Attached is further revised version.
>
> Hi Amit,
>
> Thanks for sending the v38 patch.
>
> I started reviewing this, but I just ended up starting to hack at the
> patch instead. It's still got quite a bit of work to be done as I
> think, unfortunately, the cross type stuff is still pretty broken.
> There's not really any sort of checking done to make sure you've found
> a valid cross type hash or compare function in the code which results
> in errors like:
>
> create table hashp (a int, b numeric) partition by hash(a,b);
> create table hashp1 partition of hashp for values with (modulus 4, remainder 0);
> create table hashp2 partition of hashp for values with (modulus 4, remainder 1);
> create table hashp3 partition of hashp for values with (modulus 4, remainder 2);
> create table hashp4 partition of hashp for values with (modulus 4, remainder 3);
> explain select * from hashp where a = 3::smallint and b = 1.0;
> ERROR: cache lookup failed for function 0

Hmm yes. I had realized that while addressing Robert's related comment.

> I'm not really sure if this should be a matter of doing an if
> (!OidIsValid(new_supfuncid)) return false; I think the
> context->partsupfunc must be pretty broken in cases like:
>
> create table listp (a bigint) partition by list(a);
> create table listp1 partition of listp for values in(1);
> select * from listp where a <> 1::smallint and a <> 1::bigint;
>
> The current patch simply just remembers the last comparison function
> for comparing int8 to int4 and uses that one for the int8 to int2
> comparison too.
>
> Probably we need to cache the comparison function's Oid in with the
> Expr in the step and use the correct one each time. I'm unsure of how
> the fmgr info should be cached, but looks like it certainly cannot be
> cached in the context in an array per partition key. I've so far only
> thought some sort of hash table, but I'm sure there must be a much
> better way to do this.

Yeah, I realized that simply replacing the context->partsupfunc member is
not a solution.

In the updated patch (that is, after incorporating your changes), I have
moved this partsupfunc switching to the caller of partkey_datum_from_expr
instead of doing it there. New patch also checks that returned function
OID is valid, which if not we don't use the expression's value for pruning.

So now. we statically allocate a partsupfunc array on every invocation of
perform_pruning_base_step() or of get_partitions_excluded_by_ne_datums().
Considering run-time pruning, we may have to find some other place to
cache that.

> I started hacking it partition.c and ended up changing quite a few
> things. I changed get_partitions_for_keys into 3 separate functions,
> one for hash, list and range and tidied a few things up in that area.
> There were a few bugs, for example passing the wrong value for the
> size of the array into get_partitions_excluded_by_ne_datums.
>
> I also changed how the Bitmapsets are handled in the step functions
> and got rid of the Noop step type completely. I also got rid of the
> passing of the srcparts into these functions. I think Roberts idea is
> to process the steps in isolation and just combine the partitions
> matching each step.>
> It would be great if we could coordinate our efforts here. I'm posting
> this patch now just in case you're working or about to work on this.

Thanks a lot for making all those changes and sharing the patch. I've
incorporated in the attached latest version.

> In the meantime, I'll continue to drip feed cleanup patches. I'll try
> to start writing some comments too, once I figure a few things out...

Here is the updated version.

I'm still thinking about what to do about avoiding recursion when
performing combine steps [1] as Robert mentioned in his email.

Thanks,
Amit

Attachment Content-Type Size
v39-0001-Add-partsupfunc-to-PartitionSchemeData.patch text/plain 3.4 KB
v39-0002-Add-more-tests-for-partition-pruning.patch text/plain 16.3 KB
v39-0003-Faster-partition-pruning.patch text/plain 102.8 KB
v39-0004-Add-only-unpruned-partitioned-child-rels-to-part.patch text/plain 24.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2018-03-23 12:19:27 Re: [HACKERS] path toward faster partition pruning
Previous Message Robert Haas 2018-03-23 12:02:33 Re: [HACKERS] Add support for tuple routing to foreign partitions