Re: New vacuum option to do only freezing

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: "Bossart, Nathan" <bossartn(at)amazon(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: New vacuum option to do only freezing
Date: 2019-01-16 16:33:25
Message-ID: CA+Tgmoa796XctOC0JdEiYJUi-rX=CrGjM7C4=k_s0A1iCZb+WQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 16, 2019 at 3:30 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> As the above comment says, it's possible that the state of an
> INSERT_IN_PROGRESS tuple could be changed to 'dead' after
> heap_page_prune(). Since such tuple is not truncated at this point we
> record it and set it as UNUSED in lazy_vacuum_page(). I think that the
> DISABLE_INDEX_CLEANUP case is the same; we need to process them after
> recorded. Am I missing something?

I believe you are. Think about it this way. After the first pass
over the heap has been completed but before we've done anything to the
indexes, let alone started the second pass over the heap, somebody
could kill the vacuum process. Somebody could in fact yank the plug
out of the wall, stopping the entire server in its tracks. If they do
that, then lazy_vacuum_page() will never get executed. Yet, the heap
can't be in any kind of corrupted state at this point, right? We know
that the system is resilient against crashes, and killing a vacuum or
even the whole server midway through does not leave the system in any
kind of bad state. If it's fine for lazy_vacuum_page() to never be
reached in that case, it must also be fine for it never to be reached
if we ask for vacuum to stop cleanly before lazy_vacuum_page().

In the case of the particular comment to which you are referring, that
comment is part of lazy_scan_heap(), not lazy_vacuum_page(), so I
don't see how it bears on the question of whether we need to call
lazy_vacuum_page(). It's true that, at any point in time, an
in-progress transaction could abort. And if it does then some
insert-in-progress tuples could become dead. But if that happens,
then the next vacuum will remove them, just as it will remove any
tuples that become dead for that reason when vacuum isn't running in
the first place. You can't use that as a justification for needing a
second heap pass, because if it were, then you'd also need a THIRD
heap pass in case a transaction aborts after the second heap pass has
visited the pages, and a fourth heap pass in case a transaction aborts
after the third heap pass has visited the pages, etc. etc. forever.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message John Naylor 2019-01-16 16:40:01 Re: WIP: Avoid creation of the free space map for small tables
Previous Message Andrew Gierth 2019-01-16 16:32:20 Re: draft patch for strtof()