Re: So, is COUNT(*) fast now?

From: Kääriäinen Anssi <anssi(dot)kaariainen(at)thl(dot)fi>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: So, is COUNT(*) fast now?
Date: 2011-10-30 12:02:41
Message-ID: BC19EF15D84DC143A22D6A8F2590F0A788641330BE@EXMAIL.stakes.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Quoting Robert Haas:
"""
I tried this on my MacBook Pro this morning, using pgbench -i -s 500
to create a database about 7.5GB in size, and then using "SELECT
sum(1) FROM pgbench_accounts" as a test query, on a build WITHOUT
--enable-cassert. This machine has 4GB of memory, and I set
shared_buffers = 400MB. (No, I'm not sure whether that's the optimal
setting for shared_buffers for this machine.)
"""

I did some tests where I tried to compare the effect of having the index
ordered tuples not be in the same order they are in the base table.
The idea is to test what effect accessing the VM map randomly as
opposed to sequential order has on performance. I suspect the above
test access the VM in order (the accounts table is effectively clustered
on the index used in the test). I might be mistaken here.

My test setup is this:
drop table if exists test;
drop table if exists test2;
create unlogged table test /* with (fillfactor = 10) */
as select generate_series(0, 20*1000*1000) as id;
create index idx1 on test(id);
vacuum test;
create unlogged table test2 /* with (fillfactor = 10) */
as (select * from test order by random());
create index idx2 on test2(id);
vacuum test2;

Table size is around 600MB, index size is around 350MB and VM on-disk
size is 16kB with default fillfactor. With fillfactor = 10, the VM size is 104
KB, and table size is around 6GB. The index size is the same.

Results for the randomly ordered table:
# select count(*) from test2;
14822.045 ms
14826.253 ms
14815.450 ms

Results for the effectively clustered table:
# select count(*) from test;
11761.890 ms
11767.926 ms
11810.900 ms

Now, this test still has the benefit of fitting the VM easily into the L1 cache.

Next, I did a ugly hack to get the table size large enough so that the VM
will trash the L1 cache while still having somewhat reasonable test setup
creation time. My harware is old, 1GB of memory, processor is Genuine
Intel(R) CPU L2400 @ 1.66GHz. The L1 data cache size is 32kB on my.

The hack is to simply set fillfactor to 10. The VM size is now 104kB, the
table size is about 6.3 GB while the index size is still the same as in above
test.

Results for the randomly ordered table:
# select count(*) from test2;
21606.683 ms
21829.063 ms
21637.434 ms

Results for the effectively clustered table:
# select count(*) from test;
11714.663 ms
11449.264 ms
11658.534 ms

Now, the next step would be to trash the L2 cache (20GB table size should
do this on Sandy Bridge, where L2 cache is 256KB). I don't have hardware
to do that test. It is worth noting that the L2 cache is shared on Sandy
Bridge, so it is likely that an index-only scan of a large enough table would
slow down other processes, too. Without tests this is only FUD, though. The
test would be to scan a 20GB table's index repeatedly in one process, and
see how it affects standard in-memory pgbench results for other processes.
Compare this with doing the same with a sequential scan process.

Lessons learned (or what I learned, at least):
- Clustering is important for index only scans. Picking a clustered index
over non-clustered index will have a big performance effect.
- Large table index-only scans are going to be more expensive compared
to sequential scan than what pgbench accounts tests suggests. I assume
that the accounts table is effectively clustered on the index used. I
haven't verified this.
- There is the possibility that index-only scans will trash the caches for
other processes, too. Not tested, though.

I am sure these results will vary significantly based on hardware used. I
am also notorious for screwing up benchmarks, so verifying these results
is recommended.

You will need around 16GB of disk space for the fillfactor = 10 test. I would
recommend you have more than 1GB of memory, otherwise creating the
test setup can take some time...

- Anssi Kääriäinen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kääriäinen Anssi 2011-10-30 12:13:01 Re: So, is COUNT(*) fast now?
Previous Message David Wilson 2011-10-30 08:21:38 Re: Thoughts on "SELECT * EXCLUDING (...) FROM ..."?