Re: When does sequential performance matter in PG?

From: Scott Carey <scott(at)richrelevance(dot)com>
To: Matthew Wakeling <matthew(at)flymine(dot)org>, henk de wit <henk53602(at)hotmail(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: When does sequential performance matter in PG?
Date: 2009-03-10 18:01:00
Message-ID: C5DBF8E9.3248%scott@richrelevance.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 3/10/09 6:28 AM, "Matthew Wakeling" <matthew(at)flymine(dot)org> wrote:

On Tue, 10 Mar 2009, henk de wit wrote:
> It is frequently said that for PostgreSQL the number 1 thing to pay
> attention to when increasing performance is the amount of IOPS a storage
> system is capable of. Now I wonder if there is any situation in which
> sequential IO performance comes into play. E.g. perhaps during a
> tablescan on a non-fragmented table, or during a backup or restore?

Yes, up to a point. That point is when a single CPU can no longer handle
the sequential transfer rate. Yes, there are some parallel restore
possibilities which will get you further. Generally it only takes a few
discs to max out a single CPU though.

This is not true if you have concurrent sequential scans. Then an array can be tuned for total throughput with concurrent access. Single thread sequential measurements are similarly useful to single thread random i/o measurement - not really a test like the DB will act, but useful as a starting point for tuning.
I'm past the point where a single thread can not keep up with the disk on a sequential scan. For the most simple select * queries, this is ~ 800MB/sec for me.
For any queries those with more complicated processing/filtering, its much less, usually 400MB/sec is a pretty good rate for a single thread.
However our raw array does about 1200MB/sec, and can get 75% efficiency on this or so with between 4 and 8 concurrent sequential scans. It took some significant tuning and testing time to make sure this worked, and to balance that with random i/o requirements.

Furthermore, higher sequential rates help your random IOPS when you have sequential access concurrent with random access. You can tune OS parameters (readahead in linux, I/O scheduler types) to bias throughput or latency towards random iops throughput or sequential MB/sec throughput. Having faster sequential disk access means less % of time doing sequential I/O, meaning more time left for random I/O. It only goes so far, but it does help with mixed loads.

Overall, it depends a lot on how important sequential scans are to your use case.

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Steve McLellan 2009-03-10 21:12:17 Query performance over a large proportion of data
Previous Message Greg Smith 2009-03-10 17:50:05 Re: When does sequential performance matter in PG?