Quick Links

Re: Fusion-io ioDrive

From:	"Scott Carey" <scott(at)richrelevance(dot)com>
To:	"Markus Wanner" <markus(at)bluegap(dot)ch>
Cc:	"Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, "Merlin Moncure" <mmoncure(at)gmail(dot)com>, "Jeffrey Baker" <jwbaker(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Fusion-io ioDrive
Date:	2008-07-08 16:38:39
Message-ID:	a1ec7d000807080938i33fc59b9x591f05cb40f7a25@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Well, what does a revolution like this require of Postgres? That is the
question.

I have looked at the I/O drive, and it could increase our DB throughput
significantly over a RAID array.

Ideally, I would put a few key tables and the WAL, etc. I'd also want all
the sort or hash overflow from work_mem to go to this device. Some of our
tables / indexes are heavily written to for short periods of time then more
infrequently later -- these are partitioned by date. I would put the fresh
ones on such a device then move them to the hard drives later.

Ideally, we would then need a few changes in Postgres to take full advantage
of this:

#1 Per-Tablespace optimizer tuning parameters. Arguably, this is already
needed. The tablespaces on such a solid state device would have random and
sequential access at equal (low) cost. Any one-size-fits-all set of
optimizer variables is bound to cause performance issues when two
tablespaces have dramatically different performance profiles.
#2 Optimally, work_mem could be shrunk, and the optimizer would have to not
preferentially sort - group_aggregate whenever it suspected that work_mem
was too large for a hash_agg. A disk based hash_agg will pretty much win
every time with such a device over a sort (in memory or not) once the number
of rows to aggregate goes above a moderate threshold of a couple hundred
thousand or so.
In fact, I have several examples with 8.3.3 and a standard RAID array where
a hash_agg that spilled to disk (poor or -- purposely distorted statistics
cause this) was a lot faster than the sort that the optimizer wants to do
instead. Whatever mechanism is calculating the cost of doing sorts or
hashes on disk will need to be tunable per tablespace.

I suppose both of the above may be one task -- I don't know enough about the
Postgres internals.

#3 Being able to move tables / indexes from one tablespace to another as
efficiently as possible.

There are probably other enhancements that will help such a setup. These
were the first that came to mind.

On Tue, Jul 8, 2008 at 2:49 AM, Markus Wanner <markus(at)bluegap(dot)ch> wrote:

> Hi,
>
> Jonah H. Harris wrote:
>
>> I'm not sure how those cards work, but my guess is that the CPU will
>> go 100% busy (with a near-zero I/O wait) on any sizable workload. In
>> this case, the current pgbench configuration being used is quite small
>> and probably won't resemble this.
>>
>
> I'm not sure how they work either, but why should they require more CPU
> cycles than any other PCIe SAS controller?
>
> I think they are doing a clever step by directly attaching the NAND chips
> to PCIe, instead of piping all the data through SAS or (S)ATA (and then
> through PCIe as well). And if the controller chip on the card isn't
> absolutely bogus, that certainly has the potential to reduce latency and
> improve throughput - compared to other SSDs.
>
> Or am I missing something?
>
> Regards
>
> Markus
>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

In response to

Re: Fusion-io ioDrive at 2008-07-08 09:49:36 from Markus Wanner

Responses

Re: Fusion-io ioDrive at 2008-07-08 19:24:41 from Jeremy Harris

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Tom Lane	2008-07-08 18:34:01	Re: syslog performance when logging big statements
Previous Message	Achilleas Mantzios	2008-07-08 15:21:57	Re: syslog performance when logging big statements