Re: Speeding up INSERTs and UPDATEs to partitioned tables

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Speeding up INSERTs and UPDATEs to partitioned tables
Date: 2018-07-26 16:19:34
Message-ID: CAKJS1f9ZA=1iJ6HRPkocNQ1kw5T9pcFECtViuVk3Vn5D-w1fJg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 18 July 2018 at 20:29, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> Let me know what you think of the code in the updated patch.

Thanks for sending the updated patch.

I looked over it tonight and made a number of changes:

1) Got rid of PARTITION_ROUTING_MAXSIZE. The code using this was
useless since the int would have wrapped long before it reached
UINT_MAX. There's no shortage of other code doubling the size of an
array by multiplying it by 2 unconditionally without considering
overflowing an int. Unsure why you considered this more risky.
2) Fixed a series of bugs regarding the size of the arrays in
PartitionTupleRouting. The map arrays and the partitions array could
differ in size despite your comment that claimed
child_parent_tupconv_maps was the same size as 'partitions' when
non-NULL. The map arrays being a different size than the partitions
array caused the following two cases to segfault. I've included two
cases as it was two seperate bugs that caused them.

-- case 1
drop table listp;
create table listp (a int, b int) partition by list (a);
create table listp1 partition of listp for values in (1);
create table listp2 partition of listp for values in (2);
create table listp3 partition of listp for values in (3);
create table listp4 partition of listp for values in (4);
create table listp5 partition of listp for values in (5);
create table listp6 partition of listp for values in (6);
create table listp7 partition of listp for values in (7);
create table listp8 partition of listp for values in (8);
create table listp9 (b int, a int);

alter table listp attach partition listp9 for values in(9);

insert into listp select generate_series(1,9);

-- case 2
drop table listp;
create table listp (a int, b int) partition by list (a);
create table listp1 (b int, a int);

alter table listp attach partition listp1 for values in(1);

create table listp1 partition of listp for values in (1);
create table listp2 partition of listp for values in (2);
create table listp3 partition of listp for values in (3);
create table listp4 partition of listp for values in (4);
create table listp5 partition of listp for values in (5);
create table listp6 partition of listp for values in (6);
create table listp7 partition of listp for values in (7);
create table listp8 partition of listp for values in (8);
create table listp9 partition of listp for values in (9);

insert into listp select generate_series(1,9);

3) Got rid of ExecUseUpdateResultRelForRouting. I started to change
this to remove references to UPDATE in order to make it more friendly
towards other possible future node types that it would get used for
(aka MERGE). In the end, I found that performance could regress when
in cases like:

drop table listp;
create table listp (a int) partition by list(a);
\o /dev/null
\timing off
select 'create table listp'||x::Text||' partition of listp for values
in('||x::Text||');' from generate_series(1,1000) x;
\gexec
\o
insert into listp select x from generate_series(1,999) x;
\timing on
update listp set a = a+1;

It's true that UPDATEs with a large number of subplans performance is
quite terrible today in the planner, but this code made the
performance of planning+execution a bit worse. If we get around to
fixing the inheritance planner then I think
ExecUseUpdateResultRelForRouting() could easily appear in profiles.

I ended up rewriting it to just get called once and build a hash table
by Oid storing a ResultRelInfo pointer. This also gets rid of the
slow nested loop in the cleanup operation inside
ExecCleanupTupleRouting().

4) Did some tuning work in ExecFindPartition() getting rid of a
redundant check after the loop completion. Also added some likely()
and unlikely() decorations around some conditions.

5) Updated some newly out-dated comments since your patch in execPartition.h.

6) Replaced the palloc0() in ExecSetupPartitionTupleRouting() with a
palloc() updating the few fields that were not initialised. This might
save a few TPS (at least once we get rid of the all partition locking)
in the single-row INSERT case, but I've not tested the performance of
this yet.

7) Also moved and edited some comments above
ExecSetupPartitionTupleRouting() that I felt explained a little too
much about some internal implementation details.

One thing that I thought of, but didn't do was just having
ExecFindPartition() return the ResultRelInfo. I think it would be much
nicer in both call sites to not have to check the ->partitions array
to get that. The copy.c call site would need a few modifications
around the detection code to see if the partition has changed, but it
all looks quite possible to change. I left this for now as I have
another patch which touches all that code that I feel is closer to
commit than this patch is.

I've attached a delta of the changes I made since your v2 delta and
also a complete updated patch.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
v2-0001-Speed-up-INSERT-and-UPDATE-on-partitioned-tables.patch application/octet-stream 54.7 KB
v2_insert_speedups_delta.patch application/octet-stream 33.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-07-26 16:28:23 Re: negative bitmapset member not allowed Error with partition pruning
Previous Message Bossart, Nathan 2018-07-26 15:40:07 Re: BUG #15182: Canceling authentication due to timeout aka Denial of Service Attack