Re: lazy_vacuum_heap()'s removal of HEAPTUPLE_DEAD tuples

From: Noah Misch <noah(at)leadboat(dot)com>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: lazy_vacuum_heap()'s removal of HEAPTUPLE_DEAD tuples
Date: 2013-02-04 03:00:26
Message-ID: 20130204030026.GA27152@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 30, 2013 at 11:37:41PM +0530, Pavan Deolasee wrote:
> On Wed, Jan 30, 2013 at 7:34 AM, Noah Misch <noah(at)leadboat(dot)com> wrote:
> > On Mon, Jan 28, 2013 at 07:24:04PM +0530, Pavan Deolasee wrote:
> >> On Wed, Jan 23, 2013 at 10:05 AM, Noah Misch <noah(at)leadboat(dot)com> wrote:
> >>
> >> > You're the second commentator to be skittish about the patch's correctness, so
> >> > I won't argue against a conservatism-motivated bounce of the patch.
> >>
> >> Can you please rebase the patch against the latest head ? I see
> >> Alvaro's and Simon's recent changes has bit-rotten the patch.
> >
> > Attached.
>
> Thanks. Here are a few comments.
>
> 1. I saw you took care of the bug that I reported in the first email
> by allowing overwriting a LP_DEAD itemid during recovery. While this
> should be OK given that we had never seen reports of recovery trying
> to accidentally overwrite a LP_DEAD itemid, are you sure after this
> change we can't get into this situation post reachedConsistency ? If
> we ever do, I think the recovery would just fail.

Having pondered this again, I conclude that checking reachedConsistency is
wrong. As a simple example, during ongoing streaming replication, the startup
process will regularly be asked to overwrite LP_DEAD items. Those items were
LP_UNUSED on the master, but only a full-page image would transfer that fact
to the standby before something overwrites the item.

That shows another flaw. VACUUM marking a page all-visible is WAL-logged, so
the standby will receive that change. But the standby may still have LP_DEAD
pointers on the affected page, where the master has LP_UNUSED. I'm not aware
of any severe consequences; no scan type would be fooled. Still, this
violates a current assumption of the system.

> 2. In lazy_scan_heap() after detecting a DEAD tuple you now try to
> confirm that the tuple must not require a freeze. Is that really safe
> ? I think this assumes that the HOT prune would have already truncated
> *every* DEAD tuple to LP_DEAD except an INSERT_IN_PROGRESS which may
> have turned into DEAD between two calls to HeapTupleSatisfiesVacuum().
> While this might be true, but we never tried hard to prove that before
> because it wasn't necessary. I remember Heikki raising this concern
> when I proposed setting the VM bit after HOT prune. So a proof of that
> would probably help.

Yes, the patch assumes all that. Since lazy_scan_heap() already refuses to
remove a heap-only or HOT-updated HEAPTUPLE_DEAD tuple, heap_page_prune() had
better not miss truncating those. Otherwise, we would pass such a tuple to
heap_freeze_tuple(), which assumes the tuple is not HEAPTUPLE_DEAD. The case
where heap_page_prune() could be leaving a HEAPTUPLE_DEAD tuple today without
such problems is a simple tuple not participating in HOT. I think it's clear
from inspection that heap_page_prune() will visit all such tuples, and
heap_prune_chain() will truncate them when visited.

Since a tuple can move from HEAPTUPLE_INSERT_IN_PROGRESS to HEAPTUPLE_DEAD at
any instant here, heap_freeze_tuple() must not misbehave on such tuples.
Indeed, it does not; since the xmin was running at some point after we chose a
freeze horizon, the xmin will be above that horizon. Therefore, the
try_freeze dance I added ought to be unnecessary.

> 3. Converting LP_DEAD to LP_UNUSED in lazy_vacuum_page() while holding
> just a SHARE lock seems to be OK, but is a bit adventurous. I would
> rather just get an EX lock and do it.

Fair enough.

> Also, its probably more
> appropriate to just mark the buffer dirty instead of
> SetBufferCommitInfoNeedsSave().

There exist precedents for both styles.

> It may cause line pointer bloat and
> vacuum may not come to process this page again.

How would that happen?

> This will also be kind
> of unnecessary after the patch to set VM bit in the second phase gets
> in.

To the extent that pages tend to become all-visible after removing dead
material, yes. When a long-running transaction is holding back the
all-visible horizon, it will still matter.

> 4. Are changes in the variable names and logic around them in
> lazy_scan_heap() really required ? Just makes me a bit uncomfortable.
> See if we can minimize those changes or do it in a separate patch, if
> possible.

I assume you refer to the replacement of "all_visible" and "has_dead_tuples"
with "has_nonlive" and "has_too_recent". As ever, it would be hard to
honestly say that those changes were *required*. They serve the side goal of
fixing the bookkeeping behind one of the warnings, which, independent of this
patch, isn't firing in as often as it should. I could split that out as an
independent patch.

> I haven't run tests with the patch yet. Will see if I can try a few. I
> share other's views on making these changes late in the cycle, but if
> we can reduce the foot-print of the patch, we should be OK.

Given those reactions and my shortness of time to solidly fix everything noted
above, let's table this patch. I'm marking it Returned with Feedback.

> I see the following (and similar) messages while applying the patch,
> but may be they are harmless.
>
> (Stripping trailing CRs from patch.)

I don't see anything like that.

Thanks,
nm

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Kohei KaiGai 2013-02-04 05:02:24 Re: ALTER command reworks
Previous Message Tomas Vondra 2013-02-04 02:21:18 Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system