Re: optimizing vacuum truncation scans

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: optimizing vacuum truncation scans
Date: 2015-05-27 00:19:31
Message-ID: CAMkU=1yJzfpTXigsoWqx1PRzeTa7hYzGz3za76-8OarbOt_vrw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, May 26, 2015 at 12:37 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:

> On Mon, Apr 20, 2015 at 10:18 AM, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>
> wrote:
>
>> On 4/20/15 1:50 AM, Jeff Janes wrote:
>>
>>>
>>> For that matter, why do we scan backwards anyway? The comments don't
>>> explain it, and we have nonempty_pages as a starting point, so why
>>> don't we just scan forward? I suspect that eons ago we didn't have
>>> that and just blindly reverse-scanned until we finally hit a
>>> non-empty buffer...
>>>
>>>
>>> nonempty_pages is not concurrency safe, as the pages could become used
>>> after vacuum passed them over but before the access exclusive lock was
>>> grabbed before the truncation scan. But maybe the combination of the
>>> two? If it is above nonempty_pages, then anyone who wrote into the page
>>> after vacuum passed it must have cleared the VM bit. And currently I
>>> think no one but vacuum ever sets VM bit back on, so once cleared it
>>> would stay cleared.
>>>
>>
>> Right.
>>
>> In any event nonempty_pages could be used to set the guess as to how
>>> many pages (if any) might be worth prefetching, as that is not needed
>>> for correctness.
>>>
>>
>> Yeah, but I think we'd do a LOT better with the VM idea, because we could
>> immediately truncate without scanning anything.
>
>
> Right now all the interlocks to make this work seem to be in place (only
> vacuum and startup can set visibility map bits, and only one vacuum can be
> in a table at a time). But as far as I can tell, those assumption are not
> "baked in" and we have pondered loosening them before.
>
> For example, letting HOT clean up mark a page as all-visible if it finds
> it be such. Now in that specific case it would be OK, as HOT cleanup would
> not cause a page to become empty (or could it? If an insert on a table
> with no indexes was rolled back, and hot clean up found it and cleaned it
> up, it could conceptually become empty--unless we make special code to
> prevent it) , and so the page would have to be below nonempty_pages. But
> there may be other cases.
>
> And I know other people have mentioned making VACUUM concurrent (although
> I don't see the value in that myself).
>
> So doing it this way would be hard to beat (scanning a bitmap vs the table
> itself), but it would also introduce a modularity violation that I am not
> sure is worth it.
>
> Of course this could always be reverted if its requirements became a
> problem for a more important change (assuming of course that we detected
> the problem)
>

The fatal problem here is that nonempty_pages is unreliable. If vacuum
skips all-visible pages, it doesn't necessarily increment nonempty_pages
beyond that skippage. So if you just rely on nonempty_pages, you will
truncate away pages that were already all visible but are not empty. If we
changed it so that it did increment nonempty_pages past the skipped ones,
then pages which were all empty and got marked as all visible without being
truncated (say, because a lock could not be acquired, or because there was
a non-empty page after them which later became empty), then those pages
would never get truncated away.

As it is currently, it is not clear what purpose nonempty_pages serves. It
is a guardian value which doesn't seem to actually guard anything. At best
it prevents you from needing to inspect one page (the page of the guardian
value itself) to see if that page is actually empty, and finding that it is
not. That hardly seems worthwhile.

We could adopt two nonempty_pages counters, once that fails low on skipped
all-visible pages, and one that failed high on them. And then fast
truncate down to the high one, and do the current page by page scan between
the low and high. That seems rather grotesque.

Cheers,

Jeff

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2015-05-27 00:25:24 Re: Run pgindent now?
Previous Message Stephen Frost 2015-05-27 00:10:04 Re: [COMMITTERS] pgsql: Add pg_audit, an auditing extension