Re: wal-size limited to 16MB - Performance issue for subsequent backup

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: jesper(at)krogh(dot)cc, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: wal-size limited to 16MB - Performance issue for subsequent backup
Date: 2014-10-22 00:43:20
Message-ID: 5446FDA8.5050102@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 10/21/2014 03:03 AM, jesper(at)krogh(dot)cc wrote:

> That being said, along comes the backup, scheduled ones a day and tries to
> read off these wal-files, which to the backup looks like "an awfull lot of
> small files", our backup utillized a single thread to read of those files
> and levels of at reading through 30-40MB/s from a 21 drive Raid50 of
> rotating drives, which is quite bad. That causes a daily incremental run
> to take in the order of 24h. Differential picking up larger deltas and
> full are even worse.

What's the backup system?

151952 files should be a trivial matter for any backup system. I'm very
surprised you're seeing those kind of run times for 2TB of WAL, and
think it's worth investigating just why the backup system is behaving
this way.

What does 'filefrag' say about the WAL segments? Are they generally a
single extent each? If not, how many extents?

It'd be useful to know the kernel version, file system, RAID controller,
whether you use LVM, and other relevant details? What's your RAID
array's stripe size?

> A short test like:
> find . -type f -ctime -1 | tail -n 50 | xargs cat | pipebench > /dev/null
> confirms the backup speed to be roughly the same as seen by the backup
> software.
> Another test from the same volume doing:
> find . -type f -ctime -1 | tail -n 50 | xargs cat > largefile
> And then wait for the fs to not cache the file any more and
> cat largefile | pipebench > /dev/null
> confirms that the disk-subsystem can do way (150-200MB/s) better on larger
> files.

OK, so a larger contiguously allocated file looks like it's probably
read faster. That doesn't mean there's any guarantee that big WAL
segment would be allocated contiguously if there are lots of other
writes interspersed, but the FS will try.

(What does 'filefrag' say about your 'largefile'?)

I'm wondering if you're having issues related to a RAID stripe size that
is close to, or bigger than, your WAL segment size. So each segment is
only being read from one disk or a couple of disks. If that's the case
you're probably not getting ideal write performance either.

That said, I don't see any particular reason why readahead wouldn't
result in you getting similar results from multiple smaller WAL segments
that're allocated contiguously, and they usually would be if they're
created one after the other. What are your readahead settings? (There
are often several at different levels; what exists depends on how your
storage is configured, use of LVM, use of SW RAID, etc).

In my opinion RAID 50, or RAID 5, are generally pretty poor options for
a database file system in performance terms anyway. Especially for
transaction logs. RAID 50 is also not wonderfully durable for arrays of
larger numbers of bigger disks given modern disks' sizes, even with the
low block error rates and relatively low disk failure rates. I
personally tend to consider two parity disks the minimum acceptable for
arrays of more than four or five disks. I'd certainly want continuous
archiving or streaming replication in place if I was running RAID 50 on
a big array.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2014-10-22 04:12:36 Re: narwhal and PGDLLIMPORT
Previous Message Jim Nasby 2014-10-22 00:06:41 Re: Possible micro-optimization in CacheInvalidateHeapTuple