Auto ANALYZE criteria

From: Joe Miller <joe(dot)d(dot)miller(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Auto ANALYZE criteria
Date: 2010-09-20 20:38:37
Message-ID: AANLkTi=W02Lcn63qqxoP9jjQ0CJ8wZfCJkKV7fry7CP=@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

The autovacuum daemon currently uses the number of inserted and
updated tuples to determine if it should run VACUUM ANALYZE on a
table.  Why doesn’t it consider deleted tuples as well?

For example, I have a table which gets initially loaded with several
million records. A batch process grabs the records 100 at a time, does
some processing and deletes them from the table in the order of the
primary key.  Eventually, performance degrades because an autoanalyze
is never run. The planner decides that it should do a sequential scan
instead of an index scan because the stats don't reflect reality. See
example below.

I can set up a cron job to run the ANALYZE manually, but it seems like
the autovacuum daemon should be smart enough to figure this out on its
own. Deletes can have as big an impact on the stats as inserts and
updates.

Joe Miller

---------------------------

testdb=# \d test
Table "public.test"
Column | Type | Modifiers
--------+---------+-----------
id | integer | not null
data | bytea |
Indexes:
"test_pkey" PRIMARY KEY, btree (id)

testdb=# insert into public.test select s.a, gen_random_bytes(256)
from generate_series(1,10000000) as s(a);
INSERT 0 10000000

testdb=# SELECT *
FROM pg_stat_all_tables
WHERE schemaname='public' AND relname='test';
relid | schemaname | relname | seq_scan | seq_tup_read | idx_scan |
idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del | n_tup_hot_upd |
n_live_tup | n_dead_tup | last_vacuum | last_autovacuum | last_analyze
| last_autoanalyze
---------+------------+---------+----------+--------------+----------+---------------+-----------+-----------+-----------+---------------+------------+------------+-------------+-----------------+--------------+------------------
5608158 | public | test | 1 | 0 | 0 |
0 | 10000000 | 0 | 0 | 0 |
0 | 0 | | | |
2010-09-20 10:46:37.283775-04
(1 row)

testdb=# explain analyze delete from public.test where id <= 100;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Index Scan using test_pkey on test (cost=0.00..71.63 rows=1000
width=6) (actual time=13.251..22.916 rows=100 loops=1)
Index Cond: (id <= 100)
Total runtime: 23.271 ms
(3 rows)

{ delete records ad nauseum }

testdb=# explain analyze delete from public.test where id <= 7978800;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Seq Scan on test (cost=0.00..410106.17 rows=2538412 width=6) (actual
time=48771.772..49681.562 rows=100 loops=1)
Filter: (id <= 7978800)
Total runtime: 49682.006 ms
(3 rows)

testdb=# SELECT *
FROM pg_stat_all_tables
WHERE schemaname='public' AND relname='test';
relid | schemaname | relname | seq_scan | seq_tup_read | idx_scan |
idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del | n_tup_hot_upd |
n_live_tup | n_dead_tup | last_vacuum | last_autovacuum
| last_analyze | last_autoanalyze
---------+------------+---------+----------+--------------+----------+---------------+-----------+-----------+-----------+---------------+------------+------------+-------------+-------------------------------+--------------+-------------------------------
5608158 | public | test | 1 | 0 | 54345 |
5433206 | 10000000 | 0 | 5433200 | 0 |
5459506 | 725300 | | 2010-09-20 14:45:54.757611-04 |
| 2010-09-20 10:46:37.283775-04

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message George Sexton 2010-09-20 21:54:32 Re: 3ware trivia overload
Previous Message Josh Kupershmidt 2010-09-20 19:24:22 Re: cleanup on pg_ system tables?