Skip site navigation (1) Skip section navigation (2)

Re: swapcache-style cache?

From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: swapcache-style cache?
Date: 2012-02-23 20:57:55
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-hackers
On 02/22/2012 05:31 PM, james wrote:
> Has anyone considered managing a system like the DragonFLY swapcache for
> a DBMS like PostgreSQL?
> ie where the admin can assign drives with good random read behaviour
> (but perhaps also-ran random write) such as SSDs to provide a cache for
> blocks that were dirtied, with async write that hopefully writes them
> out before they are forcibly discarded.

We know that battery-backed write caches are extremely effective for 
PostgreSQL writes.  I see most of these tiered storage ideas as acting 
like a big one of those, which seems to hold in things like SAN storage 
that have adopted this sort of technique already.  A SSD is quite large 
relative to a typical BBWC.

There are a few reasons that doesn't always give the win hoped for though:

-Database writes have write durability requirements that require safe 
storage more often than most other applications.  One of the reasons the 
swapcache helps is that it aims to bundle writes into 64K chunks, very 
SSD friendly.  The database may force them more often than that.  The 
fact that all the Dragonfly documentation uses Intel drives for its 
examples that don't write reliably doesn't make me too optimistic about 
that being a priority of the design.  The SSDs that have safe, 
battery-backed write buffers >=64KB make that win go away.

-Ultimately all this data needs to make it out to real disk.  The funny 
thing about caches is that no matter how big they are, you can easily 
fill them up if doing something faster than the underlying storage can 

-If you have something like a BBWC in front of traditional storage, as 
well as a few gigabytes of operating system write buffering, that really 
helps traditional storage a lot already.  Those two things do so much 
write reordering that some of the random seek gain gap between spinning 
disk and SSD shrinks.  And sequential throughput is usually not sped up 
very much by SSD, except at the high end (using lots of banks).

One reaction to all this is to point out that it's sometimes easier to 
add a SSD to a system than a BBWC.  That is true.  The thing that 
benefits most from this are the WAL writes though, and since they're 
both sequential and very high volume they're really smacking into the 
worst case scenario for SSD vs. spinning disk too.

> I'd been thinking that swapcache would help where the working set won't
> fit in RAM, also L2ARC on Solaris - but it seems to me that there is no
> reason not to allow the DBMS to manage the set-aside area itself where
> it is given either access to the raw device or to a pre-sized file on
> the device it can map in segments.

Well, you could argue that if we knew what to do with it, we'd have 
already built that logic into a superior usage of shared_buffers. 
Instead we punt a lot of this work toward the kernel, often usefully. 
Write cache reordering and read-ahead are the two biggest things storage 
does that we'd have to reinvent inside PostgreSQL if more direct disk 
I/O was attempted.

I don't think the idea of a swapcache is without merit; there's surely 
some applications that will benefit from it.  It's got a lot of 
potential as a way to absorb short-term bursts of write activity.  And 
there are some applications that could benefit from having a second tier 
of read cache, not as fast as RAM but larger and faster than real disk 
seeks.  In all of those potential win cases, though, I don't see why the 
OS couldn't just manage the whole thing for us.

Greg Smith   2ndQuadrant US    greg(at)2ndQuadrant(dot)com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support

In response to


pgsql-hackers by date

Next:From: Noah MischDate: 2012-02-23 21:12:35
Subject: Re: foreign key locks, 2nd attempt
Previous:From: Greg SmithDate: 2012-02-23 20:44:32
Subject: Re: Initial 9.2 pgbench write results

Privacy Policy | About PostgreSQL
Copyright © 1996-2018 The PostgreSQL Global Development Group