RE: Skip partition tuple routing with constant partition key

From: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
To: Amit Langote <amitlangote09(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: Skip partition tuple routing with constant partition key
Date: 2021-05-20 10:35:40
Message-ID: OS0PR01MB5716B2AFF51ECB0775A77480942A9@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

From: Amit Langote <amitlangote09(at)gmail(dot)com>
Sent: Wednesday, May 19, 2021 9:17 PM
> I gave a shot to implementing your idea and ended up with the attached PoC
> patch, which does pass make check-world.
>
> I do see some speedup:
>
> -- creates a range-partitioned table with 1000 partitions create unlogged table
> foo (a int) partition by range (a); select 'create unlogged table foo_' || i || '
> partition of foo for values from (' || (i-1)*100000+1 || ') to (' || i*100000+1 || ');'
> from generate_series(1, 1000) i;
> \gexec
>
> -- generates a 100 million record file
> copy (select generate_series(1, 100000000)) to '/tmp/100m.csv' csv;
>
> Times for loading that file compare as follows:
>
> HEAD:
>
> postgres=# copy foo from '/tmp/100m.csv' csv; COPY 100000000
> Time: 31813.964 ms (00:31.814)
> postgres=# copy foo from '/tmp/100m.csv' csv; COPY 100000000
> Time: 31972.942 ms (00:31.973)
> postgres=# copy foo from '/tmp/100m.csv' csv; COPY 100000000
> Time: 32049.046 ms (00:32.049)
>
> Patched:
>
> postgres=# copy foo from '/tmp/100m.csv' csv; COPY 100000000
> Time: 26151.158 ms (00:26.151)
> postgres=# copy foo from '/tmp/100m.csv' csv; COPY 100000000
> Time: 28161.082 ms (00:28.161)
> postgres=# copy foo from '/tmp/100m.csv' csv; COPY 100000000
> Time: 26700.908 ms (00:26.701)
>
> I guess it would be nice if we could fit in a solution for the use case that houjz
> mentioned as a special case. BTW, houjz, could you please check if a patch like
> this one helps the case you mentioned?

Thanks for the patch!
I did some test on it(using the table you provided above):

1): Test plain column in partition key.
SQL: insert into foo select 1 from generate_series(1, 10000000);

HEAD:
Time: 5493.392 ms (00:05.493)

AFTER PATCH(skip constant partition key)
Time: 4198.421 ms (00:04.198)

AFTER PATCH(cache the last partition)
Time: 4484.492 ms (00:04.484)

The test results of your patch in this case looks good.
It can fit many more cases and the performance gain is nice.

-----------
2) Test expression in partition key

create or replace function partition_func(i int) returns int as $$
begin
return i;
end;
$$ language plpgsql immutable parallel restricted;
create unlogged table foo (a int) partition by range (partition_func(a));

SQL: insert into foo select 1 from generate_series(1, 10000000);

HEAD
Time: 8595.120 ms (00:08.595)

AFTER PATCH(skip constant partition key)
Time: 4198.421 ms (00:04.198)

AFTER PATCH(cache the last partition)
Time: 12829.800 ms (00:12.830)

If add a user defined function in the partition key, it seems have
performance degradation after the patch.

I did some analysis on it, for the above testcase , ExecPartitionCheck
executed three expression 1) key is null 2) key > low 3) key < top
In this case, the "key" contains a funcexpr and the funcexpr will be executed
three times for each row, so, it bring extra overhead which cause the performance degradation.

IMO, improving the ExecPartitionCheck seems a better solution to it, we can
Calculate the key value in advance and use the value to do the bound check.
Thoughts ?

------------

Besides, are we going to add a reloption or guc to control this cache behaviour if we more forward with this approach ?
Because, If most of the rows to be inserted are routing to a different partition each time, then I think the extra ExecPartitionCheck
will become the overhead. Maybe it's better to apply both two approaches(cache the last partition and skip constant partition key)
which can achieve the best performance results.

Best regards,
houzj

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2021-05-20 11:00:47 Force disable of SSL renegociation in the server
Previous Message David Rowley 2021-05-20 10:32:28 Re: Skip partition tuple routing with constant partition key