Skip site navigation (1) Skip section navigation (2)

BUG #14057: vacuum setting reltuples=0 for tables with >0 tuples

From: andrew(at)tao11(dot)riddles(dot)org(dot)uk
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #14057: vacuum setting reltuples=0 for tables with >0 tuples
Date: 2016-03-31 10:37:39
Message-ID: 20160331103739.8956.94469@wrigleys.postgresql.org (view raw, whole thread or download thread mbox)
Thread:
Lists: pgsql-bugs
The following bug has been logged on the website:

Bug reference:      14057
Logged by:          Andrew Gierth
Email address:      andrew(at)tao11(dot)riddles(dot)org(dot)uk
PostgreSQL version: 9.4.5
Operating system:   any
Description:        

This is my analysis of an issue reported via IRC by
nicolas(dot)baccelli(at)gmail(dot)com(dot)

The original issue was bad query plans caused by strangely bad estimates,
which were traced to reltuples=0 (with relpages>0) values in pg_class. The
affected relations are very small (one page only, order of 10 to 100 rows).

Monitoring over time showed that these were being reset to 0 by autovacuum
(even though the tables involved are static). This was traced to
vacuum-for-wraparound, which is relevant since it means that the vacuum is
being performed with scan_all true. (The tables are targets of FKs, thus
many key-share locks which may require mxid cleanup.)

The problem then seems to be this:

If cleanup lock isn't acquired for the page when we try and lock it
conditionally, and scan_all is true, then we scan the page to see if it
needs freezing before blocking on the cleanup lock. If it does not, we skip
it, but scanned_pages is still incremented in this code path, even though we
did not update any of the tuple counts.

(We are assuming that the cleanup lock is frequently missed in this case
because the vulnerable tables are frequently used in joins.)

Accordingly, we get a new reltuples estimate of 0 (since scanned_pages is
not 0 the tuple counts are trusted and assumed to reflect the whole rel,
since it's only one page).

It looks like fixing this requires breaking scanned_pages out into at least
two separate counters, since we currently use it to track both whether we
can update stats, and whether we can update relfrozenxid.


Responses

pgsql-bugs by date

Next:From: Andres FreundDate: 2016-03-31 10:41:29
Subject: Re: BUG #14057: vacuum setting reltuples=0 for tables with >0 tuples
Previous:From: Srivathson KKDate: 2016-03-31 09:59:50
Subject: Re: BUG #14047: Problem while initdb - 0xC000001D

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group