Re: optimizing vacuum truncation scans

From: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: optimizing vacuum truncation scans
Date: 2015-04-20 17:18:24
Message-ID: 553534E0.5080209@BlueTreble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 4/20/15 1:50 AM, Jeff Janes wrote:
> Shouldn't completely empty pages be set as all-visible in the VM? If
> so, can't we just find the largest not-all-visible page and move
> forward from there, instead of moving backwards like we currently do?
>
>
> If the entire table is all-visible, we would be starting from the
> beginning, even though the beginning of the table still has read only
> tuples present.

Except we'd use it in conjunction with nonempty_pages. IIRC, that's set
to the last page that vacuum saw data on. If any page after that got
written to after vacuum visited it, the VM bit would be cleared. So
after we acquire the exclusive lock, AFAICT it's safe to just scan the
VM starting with nonempty_pages.

> For that matter, why do we scan backwards anyway? The comments don't
> explain it, and we have nonempty_pages as a starting point, so why
> don't we just scan forward? I suspect that eons ago we didn't have
> that and just blindly reverse-scanned until we finally hit a
> non-empty buffer...
>
>
> nonempty_pages is not concurrency safe, as the pages could become used
> after vacuum passed them over but before the access exclusive lock was
> grabbed before the truncation scan. But maybe the combination of the
> two? If it is above nonempty_pages, then anyone who wrote into the page
> after vacuum passed it must have cleared the VM bit. And currently I
> think no one but vacuum ever sets VM bit back on, so once cleared it
> would stay cleared.

Right.

> In any event nonempty_pages could be used to set the guess as to how
> many pages (if any) might be worth prefetching, as that is not needed
> for correctness.

Yeah, but I think we'd do a LOT better with the VM idea, because we
could immediately truncate without scanning anything.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2015-04-20 17:33:36 Re: Turning off HOT/Cleanup sometimes
Previous Message Robert Haas 2015-04-20 16:38:52 Re: Parallel Seq Scan