Re: VACUUM/ANALYZE counting of in-doubt tuples

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: VACUUM/ANALYZE counting of in-doubt tuples
Date: 2007-11-19 19:16:32
Message-ID: 1195499792.4217.104.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 2007-11-19 at 13:33 -0500, Tom Lane wrote:
> Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> > On Mon, 2007-11-19 at 10:38 -0500, Tom Lane wrote:
> >> The race conditions are a lot more subtle than that. The stats
> >> collector cannot know when it receives a tabstat message after VACUUM
> >> starts whether VACUUM has/will see the tuples involved, or whether it
> >> will see them as committed or not. That would depend on whether VACUUM
> >> has yet reached the page(s) the tuples are in.
>
> > I think the before-and-after approach can be made to work:
>
> > VACUUM just needs to save the counter in memory, it doesn't need to
> > write that anywhere else.
>
> > VACUUM can force the flush of the tabstat file so that there is no race
> > condition, or at least a minimised one.
>
> I don't think you understood what I said at all. The race condition is
> not "before vs after VACUUM starts", it is "before vs after when VACUUM
> scans the page that the in-doubt tuple is in".

I thought we were looking for heuristics, not exact accuracy? I
understand the visibility issues.

Right now the larger the table the more likely we will get deletes start
and complete while we are doing the VACUUM. So right now, the larger the
table the larger the error, assuming a constant workload.

Large VACUUMs are currently not sweeping up all the rows they could do,
as Heikki's patch for re-evaluating xmin and associated test results
showed. Those are the same rows that we are currently ignoring now.

Carrying on ignoring them probably isn't the right thing to do, even if
the exactly right thing to do isn't fully knowable.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Markus Schiltknecht 2007-11-19 19:51:27 Re: High Availability, Load Balancing, and Replication Feature Matrix
Previous Message Matt Magoffin 2007-11-19 19:00:31 Re: possible to create multivalued index from xpath() results in 8.3?