Re: optimizing vacuum truncation scans

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: optimizing vacuum truncation scans
Date: 2015-07-22 16:11:41
Message-ID: CAMkU=1yfq8vDvS8o+3ubNL6PjixLwN78T4PVjRY1Ef+cu44bKw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jul 22, 2015 at 6:59 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Mon, Jun 29, 2015 at 1:54 AM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
> > Attached is a patch that implements the vm scan for truncation. It
> > introduces a variable to hold the last blkno which was skipped during the
> > forward portion. Any blocks after both this blkno and after the last
> > inspected nonempty page (which the code is already tracking) must have
> been
> > observed to be empty by the current vacuum. Any other process rendering
> the
> > page nonempty are required to clear the vm bit, and no other process can
> set
> > the bit again during the vacuum's lifetime. So if the bit is still set,
> the
> > page is still empty without needing to inspect it.
>
> Urgh. So if we do this, that forever precludes having HOT pruning set
> the all-visible bit.

I wouldn't say forever, as it would be easy to revert the change if
something more important came along that conflicted with it. I don't think
this change would grow tentacles across the code that make it hard to
revert, you would just have to take the performance hit (and by that time,
maybe HDD will truly be dead anyway and so we don't care anymore). But yes,
that is definitely a downside. HOT pruning is one example, but also one
could envision having someone (bgwriter?) set vm bits on unindexed tables.
Or if we invent some efficient way to know that no expiring tids for a
certain block range are stored in indexes, other jobs could also set the vm
bit on indexed tables. Or parallel vacuums in the same table, not that I
really see a reason to have those.

> At the least we'd better document that carefully
> so that nobody breaks it later. But I wonder if there isn't some
> better approach, because I would certainly rather that we didn't
> foreclose the possibility of doing something like that in the future.
>

But where do we document it (other than in-place)? README.HOT doesn't seem
sufficient, and there is no README.vm.

I guess add an "Assert(InRecovery || running_a_vacuum);" to
the visibilitymap_set with a comment there, except that I don't know how to
implement running_a_vacuum so that it covers manual vacs as well as
autovac. Perhaps assert that we hold a SHARE UPDATE EXCLUSIVE on rel?

The advantage of the other approach, just force kernel read-ahead to work
for us, is that it doesn't impose any of these restrictions on future
development. The disadvantage is that I don't know how to auto-tune it, or
auto-disable it for SSD, and it will never be as quite as efficient.

Cheers,

Jeff

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Egor Rogov 2015-07-22 16:17:45 Revoke [admin option for] role
Previous Message Robert Haas 2015-07-22 15:44:06 Re: Parallel Seq Scan