Planner estimates and VACUUM/autovacuum

From: Artur Zając CFI <azajac(at)cfi(dot)pl>
To: <pgsql-performance(at)postgresql(dot)org>
Subject: Planner estimates and VACUUM/autovacuum
Date: 2014-02-03 18:57:45
Message-ID: 033801cf2111$d3518750$79f495f0$@cfi.pl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi,

I am not sure if it is bug or not but I found some strange behaviour. Maybe
it is the same as described on
http://www.postgresql.org/message-id/14616.1244317490@sss.pgh.pa.us ?). If
yes - I'm sorry for the trouble, but I think that my example is more
obvious.

Tested on PostgreSQL 9.2.4 and 9.2.6.

Console 1:

BEGIN;

DECLARE a CURSOR FOR SELECT * FROM tab;

--- Keep cursor open for disallow full vacuum of tab

Console 2:

SELECT count(*) FROM tab;

---- Result: 3588;

select reltuples from pg_class where relname='table';

--- Result: 3588

UPDATE tab SET id=id;

UPDATE tab SET id=id;

UPDATE tab SET id=id;

VACUUM ANALYZE tab;

select reltuples from pg_class where relname='table';

--- Result: 3588

Now wait few seconds :)

select reltuples from pg_class where relname='table';

--- Result: 12560

VACUUM ANALYZE tab;

select reltuples from pg_class where relname='table';

--- Result: 3588

There is 3588 live records and 12560 live+dead records in table.

That is strange for me. VACUUM updates pg_class.reltuples differently (only
live roiws count) than autovacuum (live and dead rows). Why?

Also in planning:

explain SELECT id FROM tab;

QUERY PLAN

----------------------------------------------------------------------

Seq Scan on tab (cost=0.00..1074.60 rows=12560 width=4)

Estimation is done with the use of current pg_class.reltuples value. This
value includes dead rows count after autovacuum so estimation is bad,
especially in more complex planner tree, for example:

Explain SELECT a.id FROM tab AS a JOIN tab AS b USING (id);

QUERY PLAN

----------------------------------------------------------------------------
-----------------------------

Nested Loop (cost=0.00..6410.70 rows=12560 width=4)

-> Seq Scan on tab a (cost=0.00..1074.60 rows=12560 width=8)

-> Index Only Scan using tab_pkey on tab b (cost=0.00..0.41 rows=1
width=4)

Index Cond: (id = a.id)

PostgreSQL estimates 12560 records in query result. This is wrong estimation
if dead tuples are removed during seq scan or index scan (I suppose that it
is).

I don't think that AUTOVACUUM and VACUUM ANALYZE should behave differently
:(

--------------------------------------------------------------------------

Artur Zajac

Browse pgsql-performance by date

  From Date Subject
Next Message Artur Zając 2014-02-03 19:29:34 Planner estimates and VACUUM/autovacuum
Previous Message Robert Haas 2014-02-03 16:55:34 Re: [PERFORM] encouraging index-only scans