Making "COPY partitioned_table FROM" faster

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Making "COPY partitioned_table FROM" faster
Date: 2018-06-21 09:13:42
Message-ID: CAKJS1f93DeHN+9RrD9jYn0iF_o89w2B+U8-Ao5V1kd8Cf7oSGQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I was looking at the COPY FROM performance gap between bulk loads with
partitioned tables vs non-partitioned tables. There's quite a gap!
Almost twice as slow in my test.

It seems to be mostly down to lack of usage of heap_multi_insert() for
the partitioned table case, which I guess is because we can only do
that into a single heap. I didn't really see any reason not do when
the partition for this tuple is the same as the one for the last
tuple. Such cases may well be quite common, especially so in time
series data stored in RANGE partitioned tables.

I've implemented this in the attached. Performance is much better
when the rows are located in the same partition. I've also tested the
worst case; when the partition changes on each row. That's now
slightly slower. Although, if we're worried about that we could
probably make the insert-method adaptive, and only enable
multi-inserts if the partition remains the same for X consecutive
tuples then have it revert back to single inserts when the partition
changed X times after X tuples, where X is some number above 1, say
10? I've not done that. I'm not sure it's worthwhile.

This patch seems fairly simple, only touching copy.c. I think it's a
good candidate for July's 'fest.

src/backend/commands/copy.c | 225 +++++++++++++++++++++++++++++++++++---------
1 file changed, 182 insertions(+), 43 deletions(-)

Benchmarks below:

Setup:

-- non-partitioned (control)
CREATE TABLE partbench_ (date TIMESTAMP NOT NULL, i1 INT NOT NULL, i2
INT NOT NULL, i3 INT NOT NULL, i4 INT NOT NULL, i5 INT NOT NULL);

-- 10k parts
CREATE TABLE partbench (date TIMESTAMP NOT NULL, i1 INT NOT NULL, i2
INT NOT NULL, i3 INT NOT NULL, i4 INT NOT NULL, i5 INT NOT NULL)
PARTITION BY RANGE (date);

\o /dev/null
select 'CREATE TABLE partbench' || x::text || ' PARTITION OF partbench
FOR VALUES FROM (''' || '2017-03-06'::date + (x::text || '
hours')::interval || ''') TO (''' || '2017-03-06'::date + ((x+1)::text
|| ' hours')::interval || ''');'
from generate_Series(0,9999) x;
\gexec
\o

Test:

-- Time loading of 1GB of data.
\timing on
copy partbench_ from program $$perl ~/partbench.pl$$ delimiter '|';
truncate table partbench_;
copy partbench from program $$perl ~/partbench.pl$$ delimiter '|';
truncate table partbench;
copy partbench from program $$perl ~/partbench_alternate.pl$$ delimiter '|';
truncate table partbench;

Unpatched:

postgres=# copy partbench_ from program $$perl ~/partbench.pl$$ delimiter '|';
COPY 17825782
Time: 22669.017 ms (00:22.669)
postgres=# truncate table partbench_;
postgres=# copy partbench from program $$perl ~/partbench.pl$$ delimiter '|';
COPY 17825782
Time: 44095.884 ms (00:44.096)
postgres=# truncate table partbench;
postgres=# copy partbench from program $$perl
~/partbench_alternate.pl$$ delimiter '|';
COPY 17825782
Time: 45129.004 ms (00:45.129)
postgres=# truncate table partbench;

Patched:

postgres=# copy partbench_ from program $$perl ~/partbench.pl$$ delimiter '|';
COPY 17825782
Time: 22701.290 ms (00:22.701)
postgres=# truncate table partbench_;
postgres=# copy partbench from program $$perl ~/partbench.pl$$ delimiter '|';
COPY 17825782
Time: 27721.054 ms (00:27.721)
postgres=# truncate table partbench;
postgres=# copy partbench from program $$perl
~/partbench_alternate.pl$$ delimiter '|';
COPY 17825782
Time: 46151.844 ms (00:46.152)
postgres=# truncate table partbench;

partbench.pl:
for (my $i=0; $i < 8912891; $i++) {
print "2018-04-26 15:00:00|1|2|3|4|5\n";
print "2018-04-26 15:00:00|1|2|3|4|5\n";
}

partbench_alternate.pl:
for (my $i=0; $i < 8912891; $i++) {
print "2018-04-25 15:00:00|1|2|3|4|5\n";
print "2018-04-26 15:00:00|1|2|3|4|5\n";
}

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
v1-0001-Allow-multi-inserts-during-COPY-into-a-partitione.patch application/octet-stream 14.8 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2018-06-21 09:48:50 Re: bug with expression index on partition
Previous Message Arseny Sher 2018-06-21 08:31:17 Re: Fix slot's xmin advancement and subxact's lost snapshots in decoding.