Re: SSD + RAID

From: Scott Carey <scott(at)richrelevance(dot)com>
To: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Laszlo Nagy <gandalf(at)shopzeus(dot)com>
Cc: Ivan Voras <ivoras(at)freebsd(dot)org>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: SSD + RAID
Date: 2009-11-19 04:22:29
Message-ID: C72A0805.175E2%scott@richrelevance.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


On 11/15/09 12:46 AM, "Craig Ringer" <craig(at)postnewspapers(dot)com(dot)au> wrote:
> Possible fixes for this are:
>
> - Don't let the drive lie about cache flush operations, ie disable write
> buffering.
>
> - Give Pg some way to find out, from the drive, when particular write
> operations have actually hit disk. AFAIK there's no such mechanism at
> present, and I don't think the drives are even capable of reporting this
> data. If they were, Pg would have to be capable of applying entries from
> the WAL "sparsely" to account for the way the drive's write cache
> commits changes out-of-order, and Pg would have to maintain a map of
> committed / uncommitted WAL records. Pg would need another map of
> tablespace blocks to WAL records to know, when a drive write cache
> commit notice came in, what record in what WAL archive was affected.
> It'd also require Pg to keep WAL archives for unbounded and possibly
> long periods of time, making disk space management for WAL much harder.
> So - "not easy" is a bit of an understatement here.

3: Have PG wait a half second (configurable) after the checkpoint fsync()
completes before deleting/ overwriting any WAL segments. This would be a
trivial "feature" to add to a postgres release, I think. Actually, it
already exists!

Turn on log archiving, and have the script that it runs after a checkpoint
sleep().

BTW, the information I have seen indicates that the write cache is 256K on
the Intel drives, the 32MB/64MB of other RAM is working memory for the drive
block mapping / wear leveling algorithms (tracking 160GB of 4k blocks takes
space).

4: Yet another solution: The drives DO adhere to write barriers properly.
A filesystem that used these in the process of fsync() would be fine too.
So XFS without LVM or MD (or the newer versions of those that don't ignore
barriers) would work too.

So, I think that write caching may not be necessary to turn off for non-xlog
disk.

>
> You still need to turn off write caching.
>
> --
> Craig Ringer
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2009-11-19 04:24:00 Re: SSD + RAID
Previous Message Scott Carey 2009-11-19 04:06:42 Re: SSD + RAID