Re: RAID arrays and performance

From: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
To: Matthew <matthew(at)flymine(dot)org>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: RAID arrays and performance
Date: 2007-12-04 16:11:28
Message-ID: 47557C30.10809@mark.mielke.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Matthew wrote:
> On Tue, 4 Dec 2007, Mark Mielke wrote:
>
>>> The larger the set of requests, the closer the performance will scale to
>>> the number of discs
>>>
>> This assumes that you can know which pages to fetch ahead of time -
>> which you do not except for sequential read of a single table.
>>
> There are circumstances where it may be hard to locate all the pages ahead
> of time - that's probably when you're doing a nested loop join. However,
> if you're looking up in an index and get a thousand row hits in the index,
> then there you go. Page locations to load.
>
Sure.
>> Please show one of your query plans and how you as a person would design
>> which pages to request reads for.
>>
> How about the query that "cluster <skrald(at)amossen(dot)dk>" was trying to get
> to run faster a few days ago? Tom Lane wrote about it:
>
> | Wouldn't help, because the accesses to "questions" are not the problem.
> | The query's spending nearly all its time in the scan of "posts", and
> | I'm wondering why --- doesn't seem like it should take 6400msec to fetch
> | 646 rows, unless perhaps the data is just horribly misordered relative
> | to the index.
>
> Which is exactly what's going on. The disc is having to seek 646 times
> fetching a single row each time, and that takes 6400ms. He obviously has a
> standard 5,400 or 7,200 rpm drive with a seek time around 10ms.
>
Your proposal would not necessarily improve his case unless he also
purchased additional disks, at which point his execution time may be
different. More speculation. :-)

It seems reasonable - but still a guess.

> Or on a similar vein, fill a table with completely random values, say ten
> million rows with a column containing integer values ranging from zero to
> ten thousand. Create an index on that column, analyse it. Then pick a
> number between zero and ten thousand, and
>
> "SELECT * FROM table WHERE that_column = the_number_you_picked
This isn't a real use case. Optimizing for the worst case scenario is
not always valuable.

Cheers,
mark

--
Mark Mielke <mark(at)mielke(dot)cc>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Matthew 2007-12-04 16:16:41 Re: RAID arrays and performance
Previous Message Mark Mielke 2007-12-04 16:06:42 Re: RAID arrays and performance