Re: random_page_costs - are defaults of 4.0 realistic for SCSI RAID 1

From: "Luke Lonergan" <llonergan(at)greenplum(dot)com>
To: "Mark Mielke" <mark(at)mark(dot)mielke(dot)cc>, "Gregory Stark" <stark(at)enterprisedb(dot)com>
Cc: "Carlo Stonebanks" <stonec(dot)register(at)sympatico(dot)ca>, pgsql-performance(at)postgresql(dot)org
Subject: Re: random_page_costs - are defaults of 4.0 realistic for SCSI RAID 1
Date: 2007-09-10 22:45:26
Message-ID: C30B1716.4236C%llonergan@greenplum.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi Mark, Greg,

On 9/10/07 3:08 PM, "Mark Mielke" <mark(at)mark(dot)mielke(dot)cc> wrote:

> One suggestion: The plan is already in a tree. With some dependency analysis,
> I assume the tree could be executed in parallel (multiple threads or event
> triggered entry into a state machine), and I/O to fetch index pages or table
> pages could be scheduled in parallel. At this point, AIO becomes necessary to
> let the underlying system (and hardware with tagged queueing?) schedule which
> pages should be served best first.

Right now the pattern for index scan goes like this:

- Find qualifying TID in index
- Seek to TID location in relfile
- Acquire tuple from relfile, return

When the tuples are widely distributed in the table, as is the case with a
very selective predicate against an evenly distributed attribute on a
relation 2x larger than the I/O cache + bufcache, this pattern will result
in effectively "random I/O". In actual fact, the use of the in-memory
bitmap index will make the I/Os sequential, but sparse, which is another
version of "random" if the sequential I/Os are larger than the
gather/scatter I/O aggregation in the OS scheduler (say 1MB). This is a
very common circumstance for DSS / OLAP / DW workloads.

For plans that qualify with the above conditions, the executor will issue
blocking calls to lseek(), which will translate to a single disk actuator
moving to the needed location in seek_time, approximately 8ms. The
seek_time for a single query will not improve with the increase in number of
disks in an underlying RAID pool, so we can do about 1000/8 = 125 seeks per
second no matter what I/O subsystem we have.

If we implement AIO and allow for multiple pending I/Os used to prefetch
groups of qualifying tuples, basically a form of random readahead, we can
improve the throughput for any given query by taking advantage of multiple
disk actuators. This will work for RAID5, RAID10 and other disk pooling
mechanisms because the lseek() will be issued as parallel events. Note that
the same approach would also work to speed sequential access by overlapping
compute and I/O.

- Luke

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Luke Lonergan 2007-09-10 22:46:56 Re: random_page_costs - are defaults of 4.0 realistic for SCSI RAID 1
Previous Message Mark Mielke 2007-09-10 22:22:06 Re: random_page_costs - are defaults of 4.0 realistic for SCSI RAID 1