Table with large number of int columns, very slow COPY FROM

From: Alex Tokarev <dwalin(at)dwalin(dot)ru>
To: <pgsql-performance(at)postgresql(dot)org>
Subject: Table with large number of int columns, very slow COPY FROM
Date: 2017-12-08 04:21:45
Message-ID: D64F5359.2993D%dwalin@dwalin.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

Hi,

I have a set of tables with fairly large number of columns, mostly int with
a few bigints and short char/varchar columns. I¹ve noticed that Postgres is
pretty slow at inserting data in such a table. I tried to tune every
possible setting: using unlogged tables, increased shared_buffers, etc; even
placed the db cluster on ramfs and turned fsync off. The results are pretty
much the same with the exception of using unlogged tables that improves
performance just a little bit.

I have made a minimally reproducible test case consisting of a table with
848 columns, inserting partial dataset of 100,000 rows with 240 columns. On
my dev VM the COPY FROM operation takes just shy of 3 seconds to complete,
which is entirely unexpected for such a small dataset.

Here¹s a tarball with test schema and data:
http://nohuhu.org/copy_perf.tar.bz2; it¹s 338k compressed but expands to
~50mb. Here¹s the result of profiling session with perf:
https://pastebin.com/pjv7JqxD

--
Regards,
Alex.

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Laurenz Albe 2017-12-08 06:00:19 Re: proposal: alternative psql commands quit and exit
Previous Message Jeevan Chalke 2017-12-08 04:21:17 Re: Partition-wise aggregation/grouping

Browse pgsql-performance by date

  From Date Subject
Next Message Mark Kirkwood 2017-12-08 04:51:08 Re: Setting effective_io_concurrency in VM?
Previous Message Flávio Henrique 2017-12-08 01:12:16 Learning EXPLAIN