Re: autovacuum truncate exclusive lock round two

From: Jan Wieck <JanWieck(at)Yahoo(dot)com>
To: PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: autovacuum truncate exclusive lock round two
Date: 2012-10-25 13:04:39
Message-ID: 508938E7.8060906@Yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Steven,

On 10/24/2012 10:46 PM, Stephen Frost wrote:
> Jan,
>
> * Jan Wieck (JanWieck(at)Yahoo(dot)com) wrote:
>> This problem has been discussed before. Those familiar with the
>> subject please skip the next paragraph.
>
> Apologies if this was already thought-of and ruled out for some reason,
> but...
>
>> Because all the scanning had been done in parallel to normal DB
>> activity, it needs to verify that all those blocks are still empty.
>
> Would it be possible to use the FSM to figure out if things have changed
> since the last scan..? Does that scan update the FSM, which would then
> be updated by another backend in the event that it decided to write
> something there? Or do we consider the FSM to be completely
> untrustworthy wrt this (and if so, I don't suppose there's any hope to
> using the visibility map...)?

I honestly don't know if we can trust the FSM enough when it comes to
throwing away heap pages. Can we?

>
> The notion of having to double-scan and the AccessExclusiveLock on the
> relation are telling me this work-around, while completely possible,
> isn't exactly ideal...

Under normal circumstances with just a few pages to trim off the end
this is no problem. Those pages were the last pages just scanned by this
very autovacuum, so they are found in the shared buffers anyway. All the
second scan does in that case is to fetch the page once more from shared
buffers to be 100% sure, we are not truncating off new tuples. We
definitely need the AccessExclusiveLock to prevent someone from
extending the relation at the end between our check for relation size
and the truncate. Fetching 50 empty blocks from the buffer cache while
at it isn't that big of a deal and that is what it normally looks like.

The problem case this patch is dealing with is rolling window tables
that experienced some bloat. The typical example is a log table, that
has new data constantly added and the oldest data constantly purged out.
This data normally rotates through some blocks like a rolling window. If
for some reason (purging turned off for example) this table bloats by
several GB and later shrinks back to its normal content, soon all the
used blocks are at the beginning of the heap and we find tens of
thousands of empty pages at the end. Only now does the second scan take
more than 1000ms and autovacuum is at risk to get killed while at it.

Since we have experienced this problem several times now on our
production systems, something clearly needs to be done. But IMHO it
doesn't happen often enough to take any risk here.

Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Berkus 2012-10-25 13:14:38 Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility
Previous Message Simon Riggs 2012-10-25 08:25:34 Re: splitting *_desc routines