Re: Scaling SELECT:s with the number of disks on a stripe

From: Andrew - Supernews <andrew+nonews(at)supernews(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Scaling SELECT:s with the number of disks on a stripe
Date: 2007-03-31 22:55:48
Message-ID: slrnf0tpnk.2i67.andrew+nonews@atlantis.supernews.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 2007-03-30, Peter Schuller <peter(dot)schuller(at)infidyne(dot)com> wrote:
[...]
> Other than absolute performance, an important goal is to be able to
> scale fairly linearly with the number of underlying disk drives. We
> are fully willing to take a disk seek per item selected, as long as it
> scales.
>
> To this end I have been doing some benchmarking to see whether the
> plan is going to be feasable. On a 12 disk hardware stripe, insertion
> performance does scale somewhat with concurrent inserters. However, I
> am seeing surprising effects with SELECT:s: a single selecter
> generates the same amount of disk activity as two concurrent selecters
> (I was easily expecting about twice).
>
> The query is simple:
>
> SELECT * FROM test WHERE value = 'xxx' LIMIT 1000;

I tested this on a 14-way software raid10 on freebsd, using pg 8.1.6, and
couldn't reproduce anything like it. With one client I get about 200 disk
requests per second, scaling almost exactly linearly for the first 5 or so
clients, as expected. At 14 clients it was down to about 150 reqs/sec per
client, but the total throughput continued to increase with additional
concurrency up to about 60 clients, giving about 3600 reqs/sec (260 per
disk, which is about right for 10krpm scsi disks under highly concurrent
random loads).

> So my first question is - why am I not seeing this scaling?

A good question. Have you tried testing the disks directly? e.g. create
some huge files, and run a few concurrent random readers on them? That
would test the array and the filesystem without involving postgres.

> Secondly, I am seeing a query plan switch after a certain
> threshold. Observe:
[snip index scan changing to bitmapscan]

This is entirely expected. With the larger row count, it is more likely
(or so the planner estimates) that rows will need to be fetched from
adjacent or at least nearby blocks, thus a plan which fetches rows in
physical table order rather than index order would be expected to be
superior. The planner takes into account the estimated startup cost and
per-row cost when planning LIMIT queries; therefore it is no surprise
that for larger limits, it switches to a plan with a higher startup cost
but lower per-row cost.

> The interesting part is that the latter query is entirely CPU bound
> (no disk I/O at all) for an extended period of time before even
> beginning to read data from disk.

Most likely your index is small enough that large parts of it will be
cached in RAM, so that the scan of the index to build the bitmap does
not need to hit the disk much if at all.

> And when it *does* start performing
> disk I/O, the performance is about the same as for the other case. In
> other words, the change in query plan seems to do nothing but add
> overhead.

This is going to depend quite a bit on the physical ordering of the data
relative to the indexed values.

> What is the bitmap heap scan supposed to be doing that would increase
> performance above a "seek once per matching row" plan? I haven't been
> able to Google my way to what the intended benefit is of a heap scan
> vs. a plain index scan.

The bitmap scan visits the heap in heap order, rather than index order,
thus enabling it to take advantage of prefetch and other sequential-read
optimizations (in the underlying OS and disk subsystem, rather than in
pg itself).

--
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Ron Mayer 2007-04-01 10:22:08 Re: scalablility problem
Previous Message Xiaoning Ding 2007-03-31 17:08:25 Re: scalablility problem