Re: Need to run CLUSTER to keep performance

From: Rafael Martinez <r(dot)m(dot)guerrero(at)usit(dot)uio(dot)no>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Need to run CLUSTER to keep performance
Date: 2007-11-12 15:38:03
Message-ID: 4738735B.6030408@usit.uio.no
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Tom Lane wrote:
> Rafael Martinez <r(dot)m(dot)guerrero(at)usit(dot)uio(dot)no> writes:
>> Heikki Linnakangas wrote:
>>> On a small table like that you could run VACUUM every few minutes
>>> without much impact on performance. That should keep the table size in
>>> check.
>
>> Ok, we run VACUUM ANALYZE only one time a day, every night.
>
> There's your problem.
>
> Reading between the lines I gather that you think an update is "free"
> in the sense of not creating a need for vacuum. It's not --- it's
> exactly equivalent to an insert + a delete, and it leaves behind a
> dead row that needs to be vacuumed. If you do a lot of updates, you
> need to vacuum.
>

Hello again

We have more information about this 'problem'.

Tom, we have many other tables which are much bigger and have larger
amount of updates/deletes and are working very well with our actual
vacuum configuration. We are aware of how important is to run vacuum
jobs and we think we have a good understanding of how/why vacuum works.

We think the problem we are seeing sometimes with these small tables is
another thing.

We increased the vacuum analyze jobs, as you all pointed, from one a day
to four every hour (we did not run cluster at all since we started with
this new configuration). We started with this after a fresh 'cluster' of
the table. This has been in production since last week and the
performance of this table only gets worst and worst.

After 4 days with the new maintenance jobs, it took more than 4 sec to
run a select on this table. After running a cluster we are down to
around 50ms. again.

I can not believe 4 vacuum jobs every hour is not enough for this table.
If we see the statistics, it has only ca.67000 updates/day, ca.43
deletes/day and ca.48 inserts/day. This is nothing compare with many of
the systems we are administrating.

What we see in common between these tables (we have seen this a couple
of times before) is:

- Small table size.
- Small amount of tuples in the table (almost constant).
- Large amount of updates compared to inserts/deletes and compared to
the amount of tuples in the table.

You that know the interns of postgres :), can you think of anything that
can be causing this behavior? Any more suggestions? do you need more data?

Thanks in advance :)

We are sending all data we had before the last cluster command and after
it.

----------------------------------------------------------------------
**** BEFORE CLUSTER ****
----------------------------------------------------------------------
INFO: vacuuming "public.hosts"
INFO: index "hosts_pkey" now contains 99933 row versions in 558 pages
DETAIL: 0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: "hosts": found 0 removable, 99933 nonremovable row versions in
3875 pages
DETAIL: 83623 dead row versions cannot be removed yet.
There were 12079 unused item pointers.
0 pages are entirely empty.
CPU 0.02s/0.03u sec elapsed 0.06 sec.
INFO: vacuuming "pg_toast.pg_toast_376272"
INFO: index "pg_toast_376272_index" now contains 133 row versions in 2
pages
DETAIL: 0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: "pg_toast_376272": found 0 removable, 133 nonremovable row
versions in 65 pages
DETAIL: 2 dead row versions cannot be removed yet.
There were 127 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: analyzing "public.hosts"
INFO: "hosts": scanned 3875 of 3875 pages, containing 16310 live rows
and 83623 dead rows; 16310 rows in sample, 16310 estimated total rows

scanorama=# SELECT age(now(), pg_postmaster_start_time());
age
-------------------------
25 days 22:40:01.241036
(1 row)

scanorama=# SELECT pg_size_pretty(pg_relation_size('hosts'));
pg_size_pretty
----------------
30 MB
(1 row)

scanorama=# SELECT count(*) from hosts;
count
-------
16311
(1 row)

scanorama=# SELECT
relname,relpages,reltuples,reltoastrelid,reltoastidxid from pg_class
where relname = 'hosts';
relname | relpages | reltuples | reltoastrelid | reltoastidxid
---------+----------+-----------+---------------+---------------
hosts | 3875 | 100386 | 376276 | 0
(1 row)

scanorama=# SELECT * from pg_stat_all_tables where schemaname = 'public'
and relname = 'hosts';
relid | schemaname | relname | seq_scan | seq_tup_read | idx_scan |
idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del
--------+------------+---------+----------+--------------+----------+---------------+-----------+-----------+-----------
105805 | public | hosts | 2412159 | 39109243131 | 3244406 |
9870886 | 1208 | 1685525 | 1088
(1 row)

scanorama=# EXPLAIN ANALYZE SELECT * from hosts;
QUERY PLAN

----------------------------------------------------------------------------------------------------------------
Seq Scan on hosts (cost=0.00..4878.86 rows=100386 width=314) (actual
time=0.025..4719.082 rows=16311 loops=1)
Total runtime: 4742.754 ms
(2 rows)

scanorama=# CLUSTER hosts_pkey ON hosts ;
CLUSTER

----------------------------------------------------------------------
**** AFTER CLUSTER ****
----------------------------------------------------------------------

scanorama=# VACUUM VERBOSE ANALYZE hosts;

INFO: vacuuming "public.hosts"
INFO: index "hosts_pkey" now contains 16321 row versions in 65 pages
DETAIL: 0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: "hosts": found 0 removable, 16321 nonremovable row versions in
514 pages
DETAIL: 10 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: vacuuming "pg_toast.pg_toast_383759"
INFO: index "pg_toast_383759_index" now contains 131 row versions in 2
pages
DETAIL: 0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: "pg_toast_383759": found 0 removable, 131 nonremovable row
versions in 33 pages
DETAIL: 0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: analyzing "public.hosts"
INFO: "hosts": scanned 514 of 514 pages, containing 16311 live rows and
10 dead rows; 16311 rows in sample, 16311 estimated total rows
VACUUM

scanorama=# SELECT pg_size_pretty(pg_relation_size('hosts'));
pg_size_pretty
----------------
4112 kB
(1 row)

scanorama=# SELECT count(*) from hosts;
count
-------
16311
(1 row)

scanorama=# SELECT
relname,relpages,reltuples,reltoastrelid,reltoastidxid from pg_class
where relname = 'hosts';
relname | relpages | reltuples | reltoastrelid | reltoastidxid
---------+----------+-----------+---------------+---------------
hosts | 514 | 16321 | 383763 | 0
(1 row)

scanorama=# SELECT * from pg_stat_all_tables where schemaname = 'public'
and relname = 'hosts';
relid | schemaname | relname | seq_scan | seq_tup_read | idx_scan |
idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del
--------+------------+---------+----------+--------------+----------+---------------+-----------+-----------+-----------
105805 | public | hosts | 2412669 | 39117480187 | 3244962 |
9887752 | 1208 | 1685857 | 1088
(1 row)

scanorama=# EXPLAIN ANALYZE SELECT * from hosts;
QUERY PLAN

------------------------------------------------------------------------------------------------------------
Seq Scan on hosts (cost=0.00..678.53 rows=16353 width=314) (actual
time=0.006..32.143 rows=16311 loops=1)
Total runtime: 57.408 ms
(2 rows)
----------------------------------------------------------------------

--
Rafael Martinez, <r(dot)m(dot)guerrero(at)usit(dot)uio(dot)no>
Center for Information Technology Services
University of Oslo, Norway

PGP Public Key: http://folk.uio.no/rafael/

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Rafael Martinez 2007-11-12 16:11:59 Re: Need to run CLUSTER to keep performance
Previous Message Erik Jones 2007-11-12 15:22:04 Re: PostgreSQL vs MySQL, and FreeBSD