Re: Skip partition tuple routing with constant partition key

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Skip partition tuple routing with constant partition key
Date: 2021-05-20 12:22:52
Message-ID: CA+HiwqFWGNSy1zwcno=AuLLScCFV64P3BYt+EXqqwM1DYhQ16Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hou-san,

On Thu, May 20, 2021 at 7:35 PM houzj(dot)fnst(at)fujitsu(dot)com
<houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> From: Amit Langote <amitlangote09(at)gmail(dot)com>
> Sent: Wednesday, May 19, 2021 9:17 PM
> > I guess it would be nice if we could fit in a solution for the use case that houjz
> > mentioned as a special case. BTW, houjz, could you please check if a patch like
> > this one helps the case you mentioned?
>
> Thanks for the patch!
> I did some test on it(using the table you provided above):

Thanks a lot for doing that.

> 1): Test plain column in partition key.
> SQL: insert into foo select 1 from generate_series(1, 10000000);
>
> HEAD:
> Time: 5493.392 ms (00:05.493)
>
> AFTER PATCH(skip constant partition key)
> Time: 4198.421 ms (00:04.198)
>
> AFTER PATCH(cache the last partition)
> Time: 4484.492 ms (00:04.484)
>
> The test results of your patch in this case looks good.
> It can fit many more cases and the performance gain is nice.

Hmm yeah, not too bad.

> 2) Test expression in partition key
>
> create or replace function partition_func(i int) returns int as $$
> begin
> return i;
> end;
> $$ language plpgsql immutable parallel restricted;
> create unlogged table foo (a int) partition by range (partition_func(a));
>
> SQL: insert into foo select 1 from generate_series(1, 10000000);
>
> HEAD
> Time: 8595.120 ms (00:08.595)
>
> AFTER PATCH(skip constant partition key)
> Time: 4198.421 ms (00:04.198)
>
> AFTER PATCH(cache the last partition)
> Time: 12829.800 ms (00:12.830)
>
> If add a user defined function in the partition key, it seems have
> performance degradation after the patch.

Oops.

> I did some analysis on it, for the above testcase , ExecPartitionCheck
> executed three expression 1) key is null 2) key > low 3) key < top
> In this case, the "key" contains a funcexpr and the funcexpr will be executed
> three times for each row, so, it bring extra overhead which cause the performance degradation.
>
> IMO, improving the ExecPartitionCheck seems a better solution to it, we can
> Calculate the key value in advance and use the value to do the bound check.
> Thoughts ?

This one seems bit tough. ExecPartitionCheck() uses the generic
expression evaluation machinery like a black box, which means
execPartition.c can't really tweal/control the time spent evaluating
partition constraints. Given that, we may have to disable the caching
when key->partexprs != NIL, unless we can reasonably do what you are
suggesting.

> Besides, are we going to add a reloption or guc to control this cache behaviour if we more forward with this approach ?
> Because, If most of the rows to be inserted are routing to a different partition each time, then I think the extra ExecPartitionCheck
> will become the overhead. Maybe it's better to apply both two approaches(cache the last partition and skip constant partition key)
> which can achieve the best performance results.

A reloption will have to be a last resort is what I can say about this
at the moment.

--
Amit Langote
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joe Conway 2021-05-20 12:24:11 Re: Freenode woes
Previous Message Daniel Gustafsson 2021-05-20 12:15:52 Re: Force disable of SSL renegociation in the server