Re: Avoid endless futile table locks in vacuuming.

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Avoid endless futile table locks in vacuuming.
Date: 2015-12-29 20:55:20
Message-ID: CAMkU=1yhQUnAfduP2_AZJduC_gLXb8Z=mnV32z5+S-mnmQSV9g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Dec 28, 2015 at 2:12 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I wrote:
>> Jeff Janes <jeff(dot)janes(at)gmail(dot)com> writes:
>>> If a partially-active table develops a slug of stable all-visible,
>>> non-empty pages at the end of it, then every autovacuum of that table
>>> will skip the end pages on the forward scan, think they might be
>>> truncatable, and take the access exclusive lock to do the truncation.
>>> And then immediately fail when it sees the last page is not empty.
>>> This can persist for an indefinite number of autovac rounds.
>>> The simple solution is to always scan the last page of a table, so it
>>> can be noticed that it is not empty and avoid the truncation attempt.
>
>> This seems like a reasonable proposal, but I find your implementation
>> unconvincing: there are two places in lazy_scan_heap() that pay attention
>> to scan_all, and you touched only one of them.
>
> After further investigation, there is another pre-existing bug: the short
> circuit path for pages not requiring freezing doesn't bother to update
> vacrelstats->nonempty_pages, causing the later logic to think that the
> page is potentially truncatable even if we fix the second check of
> scan_all! So this is pretty broken, and I almost think we should treat it
> as a back-patchable bug fix.
>
> In the attached proposed patch, I added another refinement, which is to
> not bother with forcing the last page to be scanned if we already know
> that we're not going to attempt truncation, because we already found a
> nonempty page too close to the end of the rel. I'm not quite sure this
> is worthwhile, but it takes very little added logic, and saving an I/O
> is probably worth the trouble.

If we are not doing a scan_all and we fail to acquire a clean-up lock on
the last block, and the last block reports that it needs freezing, then we
continue on to wait for the clean-up lock. But there is no need, we don't
really need to freeze the block, and we already know whether it has tuples
or not without the clean up lock. Couldn't we just set the flag based on
hastup, then 'continue'?

Cheers,

Jeff

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2015-12-29 21:26:35 Re: Making tab-complete.c easier to maintain
Previous Message Bruce Momjian 2015-12-29 20:16:20 Re: dynloader.h missing in prebuilt package for Windows?