Re: Excessive (and slow) fsync() within single transaction

From: Stephen Tyler <stephen(at)stephen-tyler(dot)com>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Excessive (and slow) fsync() within single transaction
Date: 2009-12-10 10:21:36
Message-ID: 51549ea20912100221u1833c047j738b37830da9f8be@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Wed, Dec 9, 2009 at 12:57 PM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:

> You should turn on log_checkpoint in the postgresql.conf and confirm the
> slowdowns are happening around the same time as the checkpoint report gets
> written to the log files.

I have turned on log_checkpoints, and re-run the command. Checkpoints are
being written every 220 to 360 seconds. About 80% are "time" and 20%
"xlog". Here are a representative sample:

10/12/09 5:12:59 PM org.postgresql.postgres[445] LOG: checkpoint
starting: time
10/12/09 5:16:41 PM org.postgresql.postgres[445] LOG: checkpoint
complete: wrote 133795 buffers (25.5%); 0 transaction log file(s) added, 0
removed, 98 recycled; write=112.281 s, sync=108.809 s, total=221.166 s
10/12/09 5:17:32 PM org.postgresql.postgres[445] LOG: checkpoint
starting: xlog
10/12/09 5:23:45 PM org.postgresql.postgres[445] LOG: checkpoint
complete: wrote 292601 buffers (55.8%); 0 transaction log file(s) added, 0
removed, 76 recycled; write=162.919 s, sync=209.277 s, total=372.229 s
10/12/09 5:23:45 PM org.postgresql.postgres[445] LOG: checkpoint
starting: time
10/12/09 5:27:58 PM org.postgresql.postgres[445] LOG: checkpoint
complete: wrote 113330 buffers (21.6%); 0 transaction log file(s) added, 0
removed, 128 recycled; write=165.252 s, sync=87.588 s, total=253.285 s
10/12/09 5:28:45 PM org.postgresql.postgres[445] LOG: checkpoint
starting: time
10/12/09 5:34:11 PM org.postgresql.postgres[445] LOG: checkpoint
complete: wrote 206947 buffers (39.5%); 0 transaction log file(s) added, 0
removed, 65 recycled; write=152.953 s, sync=172.978 s, total=326.460 s

So a "typical" checkpoint is around 200K buffers (1.5GBytes, 40% of max),
taking around 150 seconds to write (10MByte/second, 1300 buffers/second),
and around 150 seconds to sync.

The system pauses/freezes are much more frequent. Roughly 20 to 60 seconds
long, interspersed with a brief pulse of activity. They align with the
lengthy fsync calls reported by DTrace. But there are many more fsync calls
being made. Most fsync calls return in a couple of milliseconds or
microseconds.

As I have previously mentioned, my system is using the default
"open_datasync" WAL sync method. The test_fsync utility reports a small
"open o_dsync, write" call returning in around 300 microseconds or less,
both on the SSD RAID 0 array and a 7200rpm SATA HDD. If that utility is to
be believed, "open o_dsync, write" is not reaching the disk platters by the
time it returns.

Why then does it take postgresql 200seconds to sync a checkpoint (or DTrace
report 20 to 60 second fsync calls)? The drives themselves have only
smallish 32MB buffers. I can write a 1.5GByte file (the size of the average
checkpoint) in only 10 seconds, if I do it from outside postgresql.

Stephen

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Dimitri Fontaine 2009-12-10 10:44:08 Re: [HACKERS] Installing PL/pgSQL by default
Previous Message dipti shah 2009-12-10 09:24:43 Re: Defining permissions for tables, schema etc..