Re: Pg_upgrade speed for many tables

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Magnus Hagander <magnus(at)hagander(dot)net>
Subject: Re: Pg_upgrade speed for many tables
Date: 2012-11-12 18:29:39
Message-ID: CAMkU=1yw1aweSBjxmg7oHzzjGV1wXFQKtaZ3PQZUfmc2WQ7grQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Nov 5, 2012 at 12:08 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> Magnus reported that a customer with a million tables was finding
> pg_upgrade slow. I had never considered many table to be a problem, but
> decided to test it. I created a database with 2k tables like this:
>
> CREATE TABLE test1990 (x SERIAL);
>
> Running the git version of pg_upgrade on that took 203 seconds. Using
> synchronous_commit=off dropped the time to 78 seconds. This was tested
> on magnetic disks with a write-through cache. (No change on an SSD with
> a super-capacitor.)
>
> I don't see anything unsafe about having pg_upgrade use
> synchronous_commit=off. I could set it just for the pg_dump reload, but
> it seems safe to just use it always. We don't write to the old cluster,
> and if pg_upgrade fails, you have to re-initdb the new cluster anyway.
>
> Patch attached. I think it should be applied to 9.2 as well.

Is turning off synchronous_commit enough? What about turning off fsync?

When I'm doing a pg_upgrade with thousands of tables, the shutdown
checkpoint after restoring the dump to the new cluster takes a very
long time, as the writer drains its operation table by opening and
individually fsync-ing thousands of files. This takes about 40 ms per
file, which I assume is a combination of slow lap-top disk drive, and
a strange deal with ext4 which makes fsyncing a recently created file
very slow. But even with faster hdd, this would still be a problem
if it works the same way, with every file needing 4 rotations to be
fsynced and this happens in serial.

Worse, the shutdown only waits for the default of 60 seconds for the
shutdown to take place before it throws an error and the entire
pg_upgrade gives up. It seems to me that either the -t setting should
be increased, or should be an option to pg_upgrade.

My work around was to invoke a system-wide "sync" a couple seconds
after the 'pg_ctl stop' is initiated. Flushing the files wholesale
seems to work to make the checkpoint writer rapidly find it has
nothing to do when it tries to flush them retail.

Anyway, the reason I think turning fsync off might be reasonable is
that as soon as the new cluster is shut down, pg_upgrade starts
overwriting most of those just-fsynced file with other files from the
old cluster, and AFAICT makes no effort to fsync them. So until there
is a system-wide sync after the pg_upgrade finishes, your new cluster
is already in mortal danger anyway.

Cheers,

Jeff

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2012-11-12 19:12:38 Re: Pg_upgrade speed for many tables
Previous Message Alvaro Herrera 2012-11-12 17:55:47 Re: [PATCH] Patch to compute Max LSN of Data Pages