Estimation error in n_dead_tuples

From: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Estimation error in n_dead_tuples
Date: 2007-02-01 10:43:45
Message-ID: 20070201184722.5AD4.ITAGAKI.TAKAHIRO@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

AFAICS, the statistics information "the number of dead tuples"
(n_dead_tuples) has an estimation error.

VACUUM sends a message to stats collector process when it has swept a table.
The stats collector receives the message and sets n_dead_tuples of the table
to zero. However, we can update or delete tuples even if a concurrent vacuum
is running through the table. There might be some dead tuples that were
created after start of the vacuum. Therefore, it's not always correct to set
n_dead_tuples to zero at the end of vacuum. Especially, the error will be
worse when a vacuum takes long time.

The autovacuum sees the stats information to decide when to vacuum.
The error in n_dead_tuples misleads it and vacuum starvation might occur
because we always underestimate the number of dead tuples.

I'm thinking to add "the number of vacuumed tuples" to the message from
vacuum. The stats collector will subtract the value from n_dead_tuples
instead of setting it to zero. This is also needed if we want to make
some kinds of "partial" vacuum methods.

Thoughts? Is this worth doing, I'd like to implement it.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2007-02-01 11:08:07 Re: PL/pgSQL RENAME functionality in TODOs
Previous Message Jignesh K. Shah 2007-02-01 09:54:24 Re: fixing Makefile.shlib for solaris/gcc with -m64 flag