Re: Need to run CLUSTER to keep performance

From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: Rafael Martinez <r(dot)m(dot)guerrero(at)usit(dot)uio(dot)no>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Need to run CLUSTER to keep performance
Date: 2007-11-08 11:12:07
Message-ID: 4732EF07.3060807@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Rafael Martinez wrote:
> This is a question about something we have seen sometimes in the last
> months. It happens with tables with a large amount of updates/selects
> compared with the amount of inserts/deletes. The sizes of these tables
> are small and the amount of rows too.
>
> The 'problem' is that performance decrease during the day and the only
> thing that helps is to run CLUSTER on the table with problems. VACUUM
> ANALYZE does not help.
>
> Some information that can help to find out why this happens:
>
> - PostgreSQL version: 8.1.9
>
> ------------------------------------------------------------------------------
> scanorama=# SELECT pg_size_pretty(pg_relation_size('hosts'));
>
> pg_size_pretty
> ----------------
> 12 MB
> ------------------------------------------------------------------------------
> scanorama=# SELECT count(*) FROM hosts ;
>
> count
> -------
> 16402
> ------------------------------------------------------------------------------
> scanorama=# EXPLAIN ANALYZE SELECT * FROM hosts;
>
> Seq Scan on hosts (cost=0.00..2771.56 rows=66756 width=314) (actual
> time=0.008..2013.415 rows=16402 loops=1)
> Total runtime: 2048.486 ms
> ------------------------------------------------------------------------------
> scanorama=# VACUUM ANALYZE ;
> VACUUM
> ------------------------------------------------------------------------------
> scanorama=# EXPLAIN ANALYZE SELECT * FROM hosts;
>
> Seq Scan on hosts (cost=0.00..2718.57 rows=61357 width=314) (actual
> time=0.008..1676.283 rows=16402 loops=1)
> Total runtime: 1700.826 ms
> ------------------------------------------------------------------------------
> scanorama=# CLUSTER hosts_pkey ON hosts ;
> CLUSTER
> ------------------------------------------------------------------------------
> scanorama=# EXPLAIN ANALYZE SELECT * FROM hosts;
>
> Seq Scan on hosts (cost=0.00..680.02 rows=16402 width=314) (actual
> time=0.008..31.205 rows=16402 loops=1)
> Total runtime: 53.635 ms
> ------------------------------------------------------------------------------
> scanorama=# SELECT * from pg_stat_all_tables WHERE relname LIKE 'hosts';
> relid | schemaname | relname | seq_scan | seq_tup_read | idx_scan |
> idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del
> --------+------------+---------+----------+--------------+----------+---------------+-----------+-----------+-----------
> 105805 | public | hosts | 1996430 | 32360280252 | 2736391 |
> 3301856 | 948 | 1403325 | 737
>
> The information from pg_stat_all_tables is from the last 20 days.
> ------------------------------------------------------------------------------
> INFO: analyzing "public.hosts"
> INFO: "hosts": scanned 2536 of 2536 pages, containing 16410 live rows
> and 57042 dead rows; 16410 rows in sample, 16410 estimated total rows
> INFO: free space map contains 191299 pages in 786 relations
> DETAIL: A total of 174560 page slots are in use (including overhead).
> 174560 page slots are required to track all free space.
> Current limits are: 2000000 page slots, 4000 relations, using 12131 KB.
> ------------------------------------------------------------------------------
>
> The tables with this 'problem' are not big, so CLUSTER finnish very fast
> and it does not have an impact in the access because of locking. But we
> wonder why this happens.

2 seconds for seq scanning 12 MB worth of data sounds like a lot. Have
you increased shared_buffers from the default? Which operating system
are you using? Shared memory access is known to be slower on Windows.

On a small table like that you could run VACUUM every few minutes
without much impact on performance. That should keep the table size in
check.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tomáš Vondra 2007-11-08 11:20:22 Re: Need to run CLUSTER to keep performance
Previous Message Rafael Martinez 2007-11-08 10:36:36 Need to run CLUSTER to keep performance