Re: Slow count(*) again...

From: Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Slow count(*) again...
Date: 2010-10-13 06:47:19
Message-ID: 201010122347.20542.neil.whelchel@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On Sunday 10 October 2010 21:15:56 Neil Whelchel wrote:

> Right now, I am building a test machine with two dual core Intel processors
> and two 15KRPM mirrored hard drives, 1 GB ram. I am using a small amount of
> ram because I will be using small test tables. I may do testing in the
> future with more ram and bigger tables, but I think I can accomplish what
> we are all after with what I have. The machine will be limited to running
> the database server in test, init, bash, and ssh, no other processes will
> be running except for what is directly involved with testing. I will post
> exact specs when I post test results. I will create some test tables, and
> the same tables will be used in all tests. Suggestions for optimal
> Postgres and system configuration are welcome. I will try any suggested
> settings that I have time to test. -Neil-
>

Ok the test machine is up and running:
A few more details, the hard drives are SCSI Ultra-320, the CPUs are 2.8 GHZ,
533 MHZ FSB. I wanted to make a more memory cramped machine to keep the table
to RAM ratio closer to the production machines, but for now, all I have are
1GB DDRs, and the machine requires pairs, so total memory is 2GB. Swap is
turned off.

The data I will be using is a couple of days of raw data from a production
system. The columns of interest are numeric and timestamp. I will use the
exact same data for all tests.

Table "public.log"
Column | Type | Modifiers
------------------+-----------------------------+------------------------
batch_id | integer |
t_stamp | timestamp without time zone | not null default now()
raw_data | numeric |
data_value | numeric |
data_value_delta | numeric |
journal_value | numeric |
journal_data | numeric |
machine_id | integer | not null
group_number | integer |
Indexes:
"log_idx" btree (group_number, batch_id)
"log_oid_idx" btree (oid)
"log_t_stamp" btree (t_stamp)

The initial test is with XFS with write barriers turned on, this makes for
very slow writes. The point of the first test is to get a baseline of
everything out-of-the-box. So, here are the numbers:

Insert the data into one table:
crash:~# time psql -U test test -q < log.sql
real 679m43.678s
user 1m4.948s
sys 13m1.893s

crash:~# echo 3 > /proc/sys/vm/drop_caches
crash:~# time psql -U test test -c "SELECT count(*) FROM log;"
count
----------
10050886
(1 row)

real 0m11.812s
user 0m0.000s
sys 0m0.004s

crash:~# time psql -U test test -c "SELECT count(*) FROM log;"
count
----------
10050886
(1 row)

real 0m3.737s
user 0m0.000s
sys 0m0.000s

As can be seen here, the cache helps..
And the numbers are not all that bad, so let's throw a sabot into the gears:
crash:~# time psql -U test test -c "UPDATE log SET raw_data=raw_data+1"
UPDATE 10050886

real 14m13.802s
user 0m0.000s
sys 0m0.000s

crash:~# time psql -U test test -c "SELECT count(*) FROM log;"
count
----------
10050886
(1 row)

real 3m32.757s
user 0m0.000s
sys 0m0.000s

Just to be sure:
crash:~# time psql -U test test -c "SELECT count(*) FROM log;"
count
----------
10050886
(1 row)

real 2m38.631s
user 0m0.000s
sys 0m0.000s

It looks like cache knocked about a minute off, still quite sad.
So, I shutdown Postgres, ran xfs_fsr, and started Postgres:
crash:~# echo 3 > /proc/sys/vm/drop_caches
crash:~# time psql -U test test -c "SELECT count(*) FROM log;"
count
----------
10050886
(1 row)

real 1m36.304s
user 0m0.000s
sys 0m0.000s

So it seems that defragmentation knocked another minute off:
Let's see how much cache helps now:
crash:~# time psql -U test test -c "SELECT count(*) FROM log;"
count
----------
10050886
(1 row)

real 1m34.873s
user 0m0.000s
sys 0m0.000s

Not much... And we are a long way from the 3.7 seconds with a freshly inserted
table. Maybe the maid can help here.
crash:~# time psql -U test test -c "VACUUM log;"
VACUUM

real 22m31.931s
user 0m0.000s
sys 0m0.000s

crash:~# time psql -U test test -c "SELECT count(*) FROM log;"
count
----------
10050886
(1 row)

real 1m30.927s
user 0m0.000s
sys 0m0.000s

Nope...
So, possible conclusions are:
1. Even with VACUUM database table speed degrades as tables are updated.
2. Time testing on a freshly INSERTed table gives results that are not real-
world.
3. Filesystem defragmentation helps (some).
4. Cache only makes a small difference once a table has been UPDATEd.

I am going to leave this configuration running for the next day or so. This
way I can try any suggestions and play with any more ideas that I have.
I will try these same tests on ext4 later, along with any good suggested
tests.
I will try MySQL with the dame data with both XFS and ext4.
-Neil-

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2010-10-13 06:50:06 Re: Issues with Quorum Commit
Previous Message Robert Haas 2010-10-13 06:45:16 Re: Slow count(*) again...

Browse pgsql-performance by date

  From Date Subject
Next Message Mark Kirkwood 2010-10-13 07:19:26 Re: Slow count(*) again...
Previous Message Robert Haas 2010-10-13 06:45:16 Re: Slow count(*) again...