Re: [GENERAL] [PERFORMANCE] expanding to SAN: which portion best to move

From: Robert Klemme <shortcutter(at)googlemail(dot)com>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: [GENERAL] [PERFORMANCE] expanding to SAN: which portion best to move
Date: 2011-06-10 10:02:50
Message-ID: BANLkTi=NwvR6jH-gHFyP8fbATp0GnYc7WA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-performance

On Thu, Jun 9, 2011 at 7:44 PM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:

> **
> On 06/09/2011 07:43 AM, Willy-Bas Loos wrote:
>
> Well, after reading your article i have been reading some materail about it
> on the internet, stating that separating indexes from data for performance
> benefits is a myth.
> I found your comment " So then a single query will only ever access one of
> both at a time." very smart (no sarcasm there).
> I also found a thread<http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:901906930328>on AskTom that said mainly "the goal is to achieve even io." (that makes
> absolute sense)
>
>
> The idea that separating indexes and tables from one another via a
> tablespace is inherently good is a myth. Particularly nowadays, where the
> fastest part of a drive is nearly twice as fast as the slowest one in
> sequential transfers, and the ratio between sequential and random I/O is
> huge. Trying to get clever about breaking out a tablespace is unlikely to
> outsmart what you'd get if you just let the OS deal with that stuff.
>
> What is true is that when you have multiple tiers of storage speeds
> available, allocating the indexes and tables among them optimally is both
> difficult and potentially worthwhile. A customer of mine has two drive
> arrays, one of which is about 50% faster than the other; second was added as
> expansion once the first filled. Nowadays, both are 75% full, and I/O on
> each has to be carefully balanced. Making sure the heavily hit indexes are
> on the fast array, and that less critical things are not placed there, is
> the difference between that site staying up or going down.
>
> The hidden surprise in this problem for most people is the day they
> discover that *the* most popular indexes, the ones they figured had to go on
> the fastest storage around, are actually sitting in RAM all the time
> anyway. It's always fun and sad at the same time to watch someone spend a
> fortune on some small expensive storage solution, move their most
> performance critical data to it, and discover nothing changed. Some days
> that works great; others, it's no faster all, because that data was already
> in memory.
> <http://www.2ndQuadrant.com/books>
>

Adding a few more thoughts to this: it is important to understand the very
different nature of read and write IO. While write IO usually can be done
concurrently to different IO channels (devices) for read IO there are
typically dependencies, e.g. you need to read the index before you know
which part of the table you need to read. Thus both cannot be done
concurrently for a *single select* unless the whole query is partitioned and
executed in parallel (Oracle can do that for example). Even then each
parallel executor has this dependency between index and table data. That's
the same situation as with concurrent queries into the same table and
index. There are of course exceptions, e.g. during a sequential table scan
you can know beforehand which blocks need to be read next and fetch them
wile processing the current block(s). The buffering strategy also plays an
important role here.

Bottom line: one needs to look at each case individually, do the math and
ideally also measurements.

Kind regards

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Willy-Bas Loos 2011-06-10 12:15:38 [PERFORM] change sample size for statistics
Previous Message Vikram A 2011-06-10 09:22:25 PGP encrypt/decrypt - Prereqistes

Browse pgsql-performance by date

  From Date Subject
Next Message tv 2011-06-10 11:22:58 Re: strange query plan with LIMIT
Previous Message Marti Raudsepp 2011-06-10 09:00:59 Re: 100% CPU Utilization when we run queries.