Re: ANALYZE getting dead tuple count hopelessly wrong

From: "Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>
To: "Stuart Brooks" <stuartb(at)cat(dot)co(dot)za>
Cc: pgsql-general(at)postgresql(dot)org, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ANALYZE getting dead tuple count hopelessly wrong
Date: 2008-03-31 12:34:26
Message-ID: 2e78013d0803310534j4462b1bepd4bd37f287e16bee@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

On Mon, Mar 31, 2008 at 1:33 PM, Stuart Brooks <stuartb(at)cat(dot)co(dot)za> wrote:
> I have a table with about 15 million rows which is constantly having
> tuples added to the head and deleted in blocks from the tail to maintain
> the size. The dead tuple count in pg_stat_user_tables tracks the deleted
> rows fairly accurately until an auto-ANALYZE is done in the background
> at which point the value it calculates is wrong by a factor of 2-3 times
> (calculated value is 30-50% of the correct value)

(copying -hackers)

Seems like the redirected-dead line pointers are playing spoil-sport here.
In this particular example, the deleted tuples may get truncated to
redirected-dead line pointers. Analyze would report them as empty
slots and not as dead tuples. So in the worst case, if all the deleted
tuples are already truncated to redirected-dead line pointers, analyze
may report "zero" dead tuple count.

This is a slightly tricky situation because in normal case we might want
to delay autovacuum to let subsequent UPDATEs in the page to reuse
the space released by the deleted tuples. But in this particular example,
delaying autovacuum is not a good thing because the relation would
just keep growing.

I think we should check for redirected-dead line pointers in analyze.c
and report them as dead tuples. The other longer term alternative
could be to track redirected-dead line pointers and give them some
weightage while deciding on autovacuum. We can also update the
FSM information of a page when its pruned/defragged so that the page
can also be used for subsequent INSERTs or non-HOT UPDATEs in
other pages. This might be easier said than done.

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Teodor Sigaev 2008-03-31 12:36:28 Re: Fragments in tsearch2 headline
Previous Message Roberts, Jon 2008-03-31 12:21:54 Re: postgreSQL multithreading

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2008-03-31 12:48:28 WIP: CASE statement for PL/pgSQL
Previous Message Peter Eisentraut 2008-03-31 10:02:47 Re: [PATCHES] Minimum selectivity estimate for LIKE 'prefix%'