Re: large regression for parallel COPY

From: Andres Freund <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: large regression for parallel COPY
Date: 2016-03-30 20:10:53
Message-ID: 20160330201053.GC13305@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2016-03-30 15:50:21 -0400, Robert Haas wrote:
> On Thu, Mar 10, 2016 at 8:29 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > Allow to trigger kernel writeback after a configurable number of writes.
>
> While testing out Dilip Kumar's relation extension patch today, I
> discovered (with some help from Andres) that this causes nasty
> regressions when doing parallel COPY on hydra (3.2.6-3.fc16.ppc64,
> lousy disk subsystem). What I did was (1) run pgbench -i -s 100, (2)
> copy the results to a file, (3) truncate and drop the indexes on the
> original table, and (4) try copying in one or more copies of the data
> from the file. Typical command line:
>
> time pgbench -n -f f -t 1 -c 4 -j 4 && psql -c "select
> pg_size_pretty(pg_relation_size('pgbench_accounts'));" && time psql -c
> checkpoint && psql -c "truncate pgbench_accounts; checkpoint;"
>
> With default settings against
> 96f8373cad5d6066baeb7a1c5a88f6f5c9661974, pgbench takes 9 to 9.5
> minutes and the subsequent checkpoint takes 9 seconds. After setting
> , it takes 1 minute and 11 seconds and the subsequent checkpoint takes
> 11 seconds. With a single copy of the data (that is, -c 1 -j 1 but
> otherwise as above), it takes 28-29 seconds with default settings and
> 26-27 seconds with backend_flush_after=0, bgwriter_flush_after=0. So
> the difference is rather small with a straight-up COPY, but with 4
> copies running at the same time, it's near enough to an order of
> magnitude.
>
> Andres reports that on his machine, non-zero *_flush_after settings
> make things faster, not slower, so apparently this is
> hardware-dependent or kernel-dependent. Nevertheless, it seems to me
> that we should try to get some broader testing here to see which
> experience is typical.

Indeed. On SSDs I see about a 25-35% gain, on HDDs about 5%. If I
increase the size of backend_flush_after to 64 (like it's for bgwriter)
I however do get about 15% for HDDs as well.

I wonder if the default value of backend_flush_after is too small for
some scenarios. I've reasoned that backend_flush_after should have a
*lower* default value than e.g. checkpointer or bgwriter, because
there's many concurrent writers increasing the total amount of unflushed
dirty writes. Which is true for OLTP write workloads; but less so for
bulk load.

Andres

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joshua D. Drake 2016-03-30 20:18:44 Re: large regression for parallel COPY
Previous Message Kevin Grittner 2016-03-30 20:09:59 Re: snapshot too old, configured by time