Re: Declarative partitioning - another take

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Amit Langote <amitlangote09(at)gmail(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Declarative partitioning - another take
Date: 2016-11-29 11:24:27
Message-ID: 3c9f55c0-6353-052d-cd49-52f573972c55@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2016/11/17 20:27, Amit Langote wrote:
> On 2016/11/16 4:21, Robert Haas wrote:
>> Have you done any performance testing on the tuple routing code?
>> Suppose we insert a million (or 10 million) tuples into an
>> unpartitioned table, a table with 10 partitions, a table with 100
>> partitions, a table with 1000 partitions, and a table that is
>> partitioned into 10 partitions each of which has 10 subpartitions.
>> Ideally, the partitioned cases would run almost as fast as the
>> unpartitioned case, but probably there will be some overhead.
>> However, it would be useful to know how much. Also, it would be
>> useful to set up the same cases with inheritance using a PL/pgsql ON
>> INSERT trigger for tuple routing and compare. Hopefully the tuple
>> routing code is far faster than a trigger, but we should make sure
>> that's the case and look for optimizations if not. Also, it would be
>> useful to know how much slower the tuple-mapping-required case is than
>> the no-tuple-mapping-required case.
>
> OK, I will share the performance results soon.

Sorry about the delay; here are some numbers with the following
partitioning schema:

# plain table
create table plain (a date, b int, c int);

# partitioned table
create table ptab (a date, b int, c int) partition by range (a, b);

Partitions (not the full commands):

ptab_00001 for values from ('2016-11-29', 1) to ('2016-11-29', 1000);
ptab_00002 for values from ('2016-11-29', 1000) to ('2016-11-29', 2000);
...
ptab_00005 for values from ('2016-11-29', 4000) to ('2016-11-29', 5000);

ptab_00006 for values from ('2016-11-30', 1) to ('2016-11-30', 1000);
...
...
ptab_NNNNN for values from ('20XX-XX-XX', 4000) to ('20XX-XX-XX', 5000);

# inheritance partitioned table
create table itab (a date, b int, c int);
create table itab_00001 (
check part_check check (a = '2016-11-29' and b >= 1 and b < 1000)
) inherits (itab);
...
create table itab_00005 (
check part_check check (a = '2016-11-29' and b >= 4000 and b < 5000)
) inherits (itab);
create table itab_0006 (
check part_check check (a = '2016-11-30' and b >= 1and b < 1000)
) inherits (itab);
...
...
create table itab_NNNNN (
check part_check check (a = '2016-11-29' and b >= 4000 and b < 5000)
) inherits (itab);

The BR trigger (on itab) procedure as follows:

CREATE OR REPLACE FUNCTION itab_ins_trig()
RETURNS TRIGGER AS $$
DECLARE
partno text;
BEGIN
SELECT to_char((NEW.a - '2016-11-29'::date) * 5 + NEW.b / 1000 + 1,
'fm00000') INTO partno;
EXECUTE 'INSERT INTO itab_' || partno || ' SELECT $1.*' USING NEW;
RETURN NULL;
END; $$ LANGUAGE plpgsql;

Note that the tuple-routing procedure above assumes a fixed-stride range
partitioning scheme (shown as tg-direct-map below). In other cases, the
simplest approach involves defining a if-else ladder, which I tried too
(shown as tg-if-else below), but reporting times only for up to 200
partitions at most (I'm sure there might be ways to be smarter there
somehow, but I didn't; the point here may only be to compare the new
tuple-routing code's overhead vs. trigger overhead in the traditional method).

# All times in seconds (on my modestly-powerful development VM)
#
# nrows = 10,000,000 generated using:
#
# INSERT INTO $tab
# SELECT '$last'::date - ((s.id % $maxsecs + 1)::bigint || 's')::interval,
# (random() * 5000)::int % 4999 + 1,
# case s.id % 10
# when 0 then 'a'
# when 1 then 'b'
# when 2 then 'c'
# ...
# when 9 then 'j'
# end
# FROM generate_series(1, $nrows) s(id)
# ORDER BY random();
#
# The first item in the select list is basically a date that won't fall
# outside the defined partitions.

Time for a plain table = 98.1 sec

#part parted tg-direct-map tg-if-else
===== ====== ============= ==========
10 114.3 1483.3 742.4
50 112.5 1476.6 2016.8
100 117.1 1498.4 5386.1
500 125.3 1475.5 --
1000 129.9 1474.4 --
5000 137.5 1491.4 --
10000 154.7 1480.9 --

Then for a 2-level partitioned table with each of the above partitions
partitioned by list (c), with 10 sub-partitions each as follows:

ptab_NNNNN_a for values in ('a');
ptab_NNNNN_b for values in ('b');
...
ptab_NNNNN_k for values in ('j');

I didn't include the times for inheritance table with a routing trigger in
this case, as it seems that the results would look something like the above:

Time for a plain table = 98.1 sec

#part (sub-)parted
===== ============
10 127.0
50 152.3
100 156.6
500 191.8
1000 187.3

Regarding tuple-mapping-required vs no-tuple-mapping-required, all cases
currently require tuple-mapping, because the decision is based on the
result of comparing parent and partition TupleDesc using
equalTupleDescs(), which fails so quickly because TupleDesc.tdtypeid are
not the same. Anyway, I simply commented out the tuple-mapping statement
in ExecInsert() to observe just slightly improved numbers as follows
(comparing with numbers in the table just above):

#part (sub-)parted
===== =================
10 113.9 (vs. 127.0)
100 135.7 (vs. 156.6)
500 182.1 (vs. 191.8)

Thanks,
Amit

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2016-11-29 11:24:57 Re: Parallel execution and prepared statements
Previous Message Amit Kapila 2016-11-29 11:19:55 Re: make default TABLESPACE belong to target table.