Quick Links

Re: Scaling SELECT:s with the number of disks on a stripe

From:	"Marc Mamin" <M(dot)Mamin(at)intershop(dot)de>
To:	"Peter Schuller" <peter(dot)schuller(at)infidyne(dot)com>, <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Scaling SELECT:s with the number of disks on a stripe
Date:	2007-03-30 07:07:53
Message-ID:	CA896D7906BF224F8A6D74A1B7E54AB301750A12@JENMAIL01.ad.intershop.net
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Hello Peter,

If you are dealing with timed data or similar, you may consider to
partition your table(s).

In order to deal with large data, I've built a "logical" partition
system,
whereas the target partition is defined by the date of my data (the date
is part of the filenames that I import...).

Instead of using the Postgres partitioning framework, I keep the tables
boundaries within a refererence table.
Then I've built a function that takes the different query parameters as
argument (column list, where clause...).
This functions retrieve the list of tables to query from my reference
table and build the final query, binding
the different subqueries from each partition with "UNION ALL".
It also requires an additional reference table that describes the table
columns (data type, behaviour , e.g. groupable,summable...)

This allowed me to replace many "delete" with "drop table" statements,
whis is probably the main advantage of the solution.

The biggest issue was the implementation time ;-) but I'm really happy
with the resulting performances.

HTH,

Marc

-----Original Message-----
From: pgsql-performance-owner(at)postgresql(dot)org
[mailto:pgsql-performance-owner(at)postgresql(dot)org] On Behalf Of Peter
Schuller
Sent: Friday, March 30, 2007 7:17 AM
To: pgsql-performance(at)postgresql(dot)org
Subject: [PERFORM] Scaling SELECT:s with the number of disks on a stripe

Hello,

I am looking to use PostgreSQL for storing some very simple flat data
mostly in a single table. The amount of data will be in the hundreds of
gigabytes range. Each row is on the order of 100-300 bytes in size; in
other words, small enough that I am expecting disk I/O to be seek bound
(even if PostgreSQL reads a full pg page at a time, since a page is
significantly smaller than the stripe size of the volume).

The only important performance characteristics are insertion/deletion
performance, and the performance of trivial SELECT queries whose WHERE
clause tests equality on one of the columns.

Other than absolute performance, an important goal is to be able to
scale fairly linearly with the number of underlying disk drives. We are
fully willing to take a disk seek per item selected, as long as it
scales.

To this end I have been doing some benchmarking to see whether the plan
is going to be feasable. On a 12 disk hardware stripe, insertion
performance does scale somewhat with concurrent inserters. However, I am
seeing surprising effects with SELECT:s: a single selecter generates the
same amount of disk activity as two concurrent selecters (I was easily
expecting about twice).

The query is simple:

SELECT * FROM test WHERE value = 'xxx' LIMIT 1000;

No ordering, no joins, no nothing. Selecting concurrently with two
different values of 'xxx' yields the same amount of disk activity (never
any significant CPU activity). Note that the total amount of data is too
large to fit in RAM (> 500 million rows), and the number of distinct
values in the value column is 10000. The column in the WHERE clause is
indexed.

So my first question is - why am I not seeing this scaling? The absolute
amount of disk activity with a single selecter is consistent with what I
would expect from a SINGLE disk, which is completely expected since I
never thought PostgreSQL would introduce disk I/O concurrency on its
own. But this means that adding additional readers doing random-access
reads *should* scale very well with 12 underlying disks in a stripe.

(Note that I have seen fairly similar results on other RAID variants
too, including software RAID5 (yes yes I know), in addition to the
hardware stripe.)

These tests have been done Linux 2.6.19.3 and PostgreSQL 8.1.

Secondly, I am seeing a query plan switch after a certain threshold.
Observe:

perftest=# explain select * from test where val='7433' limit 1000;
QUERY PLAN

------------------------------------------------------------------------
-----------------
Limit (cost=0.00..4016.50 rows=1000 width=143)
-> Index Scan using test_val_ix on test (cost=0.00..206620.88
rows=51443 width=143)
Index Cond: ((val)::text = '7433'::text)
(3 rows)

Now increasing to a limit of 10000:

perftest=# explain select * from test where val='7433' limit 10000;
QUERY PLAN

------------------------------------------------------------------------
--------------
Limit (cost=360.05..38393.36 rows=10000 width=143)
-> Bitmap Heap Scan on test (cost=360.05..196014.82 rows=51443
width=143)
Recheck Cond: ((val)::text = '7433'::text)
-> Bitmap Index Scan on test_val_ix (cost=0.00..360.05
rows=51443 width=0)
Index Cond: ((val)::text = '7433'::text)
(5 rows)

The interesting part is that the latter query is entirely CPU bound (no
disk I/O at all) for an extended period of time before even beginning to
read data from disk. And when it *does* start performing disk I/O, the
performance is about the same as for the other case. In other words, the
change in query plan seems to do nothing but add overhead.

What is the bitmap heap scan supposed to be doing that would increase
performance above a "seek once per matching row" plan? I haven't been
able to Google my way to what the intended benefit is of a heap scan vs.
a plain index scan.

--
/ Peter Schuller

PGP userID: 0xE9758B7D or 'Peter Schuller <peter(dot)schuller(at)infidyne(dot)com>'
Key retrieval: Send an E-Mail to getpgpkey(at)scode(dot)org
E-Mail: peter(dot)schuller(at)infidyne(dot)com Web: http://www.scode.org

In response to

Scaling SELECT:s with the number of disks on a stripe at 2007-03-30 05:16:45 from Peter Schuller

Responses

Re: Scaling SELECT:s with the number of disks on a stripe at 2007-04-02 10:34:44 from Peter Schuller

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Marc Mamin	2007-03-30 07:39:31	Re: Planner doing seqscan before indexed join
Previous Message	Vincenzo Romano	2007-03-30 06:36:16	Re: Weird performance drop