Re: [PERFORMANCE] expanding to SAN: which portion best to move

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Willy-Bas Loos <willybas(at)gmail(dot)com>
Cc: Jesper Krogh <jesper(at)krogh(dot)cc>, pgsql-performance(at)postgresql(dot)org
Subject: Re: [PERFORMANCE] expanding to SAN: which portion best to move
Date: 2011-05-13 19:04:43
Message-ID: BANLkTi=Oo4jRHpBY_74nJrcryr=BLDZR9A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-performance

On Wed, May 4, 2011 at 6:31 AM, Willy-Bas Loos <willybas(at)gmail(dot)com> wrote:
> I'm asking them for (real) benchmarks, thanks for the advice. (fio is not
> available for us now to do it myself, grmbl)
> It just occurred to me that it is not necessarily the case that reading the
> indexes causes a lot of random I/O (on the index itself).
> I mean, maybe the index is generally read sequentially and then, when
> retrieving the data, there is a lot of random I/O.
> if it's a long story, any tips for info about this (book or web site)?

If you don't do anything special, and if the query plan says "Index
Scan" rather than "Bitmap Index Scan", then both the index I/O and the
table I/O are likely to be fairly random. However there are a number
of cases in which you can expect the table I/O to be sequential:

- In some cases, you may happen to insert rows with an ordering that
matches the index. For example, if you have a table with not too many
updates and deletes, and an index on a serial column, then new rows
will have a higher value in that column than old rows, and will also
typically be physically after older rows in the file. Or you might be
inserting timestamped data from oldest to newest.
- If the planner chooses a Bitmap Index Scan, it effectively scans the
index to figure out which table blocks to read, and then reads those
table blocks in block number order, so that the I/O is sequential,
with skips.
- If you CLUSTER the table on a particular index, it will be
physically ordered to match the index's key ordering. As the table is
further modified the degree of clustering will gradually decline;
eventually you may wish to re-CLUSTER.

It's also worth keeping in mind that the index itself won't
necessarily be accessed in physically sequential order. The point of
the index is to emit the rows in key order, but if the table is
heavily updated, it won't necessarily be the case that a page
containing lower-valued keys physically precedes a page containing
higher-valued keys. I'm actually somewhat fuzzy on how this works,
and to what extent it's a problem in practice, but I am fairly sure it
can happen.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Bob Pawley 2011-05-13 19:48:55 Re: Postgredac Dump
Previous Message Merlin Moncure 2011-05-13 18:50:40 Re: simple update query too long

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2011-05-13 19:20:37 Re: reducing random_page_cost from 4 to 2 to force index scan
Previous Message Kevin Grittner 2011-05-13 17:54:52 Re: reducing random_page_cost from 4 to 2 to force index scan