Re: Slow performance when querying millions of rows

From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Slow performance when querying millions of rows
Date: 2011-06-29 00:03:23
Message-ID: 4E0A6BCB.1030807@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Dne 29.6.2011 01:26, Craig McIlwee napsal(a):
>> Dne 28.6.2011 23:28, Craig McIlwee napsal(a):
>> Are you sure those two queries are exactly the same? Because the daily
>> case output says the width is 50B, while the half-month case says it's
>> 75B. This might be why the sort/aggregate steps are switched, and that
>> increases the amount of data so that it has to be sorted on disk (which
>> is why the half-month is so much slower).
>>
>> Haven't you added some columns to the half-month query?
>
> The daily tables were created using CREATE TABLE AS from the half month
> tables, structure is the same with the exception of fill factor.
> Queries are identical except for the name of the master table that they
> select from.

Hm, I'm not sure where this width value comes from but I don't think
it's related to fillfactor.

>> Not much experience with PostgreSQL on Windows, but this looks good to
>> me. Not sure if RAID5 is a good choice, especially because of write
>> performance - this is probably one of the reasons why the disk sort is
>> so slow (in the half-month case).
>
> Yes, the data import is painfully slow but I hope to make up for that
> with the read performance later.

Generally you're right that RAID10 is going to be slower than RAID5 when
reading (and faster when writing) the data, but how big the gap is
really depends on the controller. It's not that big I guess - see for
example this:

http://www.kendalvandyke.com/2009/02/disk-performance-hands-on-part-5-raid.html

The first test shows that RAID10 is about 10% slower on reads but about
60% faster on writes.

BTW have you tuned the GUC settings for write (increasing checkpoint
segments may give much better write performance).

> The 0 volume is the only thing that will always be present, but those
> records do account for 10 to 15% of the data. I'll give this a shot,
> I'm really interested in seeing what impact this had. For some reason I
> was under the impression that partial indexes were used for text
> searches, so I completely overlooked this.

Or you might actually do two partitions for each day - one for volume=0
and the other one for volume!=0. Not sure if that is worth the effort.

One more thing to try in this case - it's not that important how many
rows suffice the condition, much more important is how many blocks need
to be read from the disk. If those 10% rows are distributed evenly
through the table (i.e. there's at least one in each 8kB block), the I/O
still needs to be done.

And it's very likely the case, as you've clustered the tables according
to the timestamp. Try to cluster the tables according to 'volume' and
check the difference.

regards
Tomas

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Tomas Vondra 2011-06-29 00:10:48 Re: Slow performance when querying millions of rows
Previous Message Craig McIlwee 2011-06-28 23:50:50 Re: Slow performance when querying millions of rows