Re: Pg_upgrade speed for many tables

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Magnus Hagander <magnus(at)hagander(dot)net>
Subject: Re: Pg_upgrade speed for many tables
Date: 2012-11-19 20:11:26
Message-ID: CAMkU=1wiOoSt3gPvqyv_1zehCYfRyjTwVPFbZO0Y6q8ZDAd=tw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Nov 14, 2012 at 3:55 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> On Mon, Nov 12, 2012 at 10:29:39AM -0800, Jeff Janes wrote:
>>
>> Is turning off synchronous_commit enough? What about turning off fsync?
>
> I did some testing with the attached patch on a magnetic disk with no
> BBU that turns off fsync;

With which file system? I wouldn't expect you to see a benefit with
ext2 or ext3, it seems to be a peculiarity of ext4 that inhibits
"group fsync" of new file creations but rather does each one serially.
Whether it is worth applying a fix that is only needed for that one
file system, I don't know. The trade-offs are not all that clear to
me yet.

> I got these results
>
> sync_com=off fsync=off
> 1 15.90 13.51
> 1000 26.09 24.56
> 2000 33.41 31.20
> 4000 57.39 57.74
> 8000 102.84 116.28
> 16000 189.43 207.84
>
> It shows fsync faster for < 4k, and slower for > 4k. Not sure why this
> is the cause but perhaps the buffering of the fsync is actually faster
> than doing a no-op fsync.

synchronous-commit=off turns off not only the fsync at each commit,
but also the write-to-kernel at each commit; so it is not surprising
that it is faster at large scale. I would specify both
synchronous-commit=off and fsync=off.

>> When I'm doing a pg_upgrade with thousands of tables, the shutdown
>> checkpoint after restoring the dump to the new cluster takes a very
>> long time, as the writer drains its operation table by opening and
>> individually fsync-ing thousands of files. This takes about 40 ms per
>> file, which I assume is a combination of slow lap-top disk drive, and
>> a strange deal with ext4 which makes fsyncing a recently created file
>> very slow. But even with faster hdd, this would still be a problem
>> if it works the same way, with every file needing 4 rotations to be
>> fsynced and this happens in serial.
>
> Is this with the current code that does synchronous_commit=off? If not,
> can you test to see if this is still a problem?

Yes, it is with synchronous_commit=off. (or if it wasn't originally,
it is now, with the same result)

Applying your fsync patch does solve the problem for me on ext4.
Having the new cluster be on ext3 rather than ext4 also solves the
problem, without the need for a patch; but it would be nice to more
friendly to ext4, which is popular even though not recommended.

>>
>> Anyway, the reason I think turning fsync off might be reasonable is
>> that as soon as the new cluster is shut down, pg_upgrade starts
>> overwriting most of those just-fsynced file with other files from the
>> old cluster, and AFAICT makes no effort to fsync them. So until there
>> is a system-wide sync after the pg_upgrade finishes, your new cluster
>> is already in mortal danger anyway.
>
> pg_upgrade does a cluster shutdown before overwriting those files.

Right. So as far as the cluster is concerned, those files have been
fsynced. But then the next step is go behind the cluster's back and
replace those fsynced files with different files, which may or may not
have been fsynced. This is what makes me thing the new cluster is in
mortal danger. Not only have the new files perhaps not been fsynced,
but the cluster is not even aware of this fact, so you can start it
up, and then shut it down, and it still won't bother to fsync them,
because as far as it is concerned they already have been.

Given that, how much extra danger would be added by having the new
cluster schema restore run with fsync=off?

In any event, I think the documentation should caution that the
upgrade should not be deemed to be a success until after a system-wide
sync has been done. Even if we use the link rather than copy method,
are we sure that that is safe if the directories recording those links
have not been fsynced?

Cheers,

Jeff

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2012-11-19 20:38:36 Re: Doc patch making firm recommendation for setting the value of commit_delay
Previous Message Tom Lane 2012-11-19 19:46:51 Re: Do we need so many hint bits?