Re: PageIsAllVisible()'s trustworthiness in Hot Standby

From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PageIsAllVisible()'s trustworthiness in Hot Standby
Date: 2012-12-04 17:10:25
Message-ID: CABOikdNbwdJcgd70VJC6oKwQ0T-foRPe0+R0fmd2P5mOHMHtig@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 4, 2012 at 8:03 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Tue, Dec 4, 2012 at 9:00 AM, Andres Freund <andres(at)2ndquadrant(dot)com>
> wrote:
> > Youre right, it currently seems to be possible, there's no LSN interlock
> > prohibiting this as far as I can see.
>
> Yeah, there certainly isn't that. Now you could perhaps make an
> argument that no operation that can propagate a set bit from master to
> standby can arrive until after the standby's xmin horizon is
> sufficiently current, but that does feel a little fragile to me...
> even if it's true now, new WAL record types might break it, for
> example.
>
>
Hmm. Yeah, I do not have guts to prove that either. I'll probably write up
a comment for your consideration to explain why we don't trust
PD_ALL_VISIBLE in Hot standby for seq scans, but still trust VM for
index-only scans.

Another piece of code that caught my attention is this (sorry for digging
up topics that probably have been discussed to death before and I might not
have paid attention to then):

Whenever we clear PD_ALL_VISIBLE flag, we do that while holding the heap
page lock and also WAL log that action (in fact, piggy-back with
insert/update/delete). The only exception that I saw is in lazy_scan_heap()

916 else if (PageIsAllVisible(page) && has_dead_tuples)
917 {
918 elog(WARNING, "page containing dead tuples is marked as
all-visible in relation \"%s\" page %u",
919 relname, blkno);
920 PageClearAllVisible(page);
921 MarkBufferDirty(buf);
922 visibilitymap_clear(onerel, blkno, vmbuffer);
923 }

Obviously, this is not a case that we expect to occur. But if it does occur
for whatever reason, then even though we will correct the page flag on
master, this would never be populated to the standby. So the standby may
continue to have that tiny bit set on the page, at least until another
insert/update/delete happens on the page. Not sure if there is anything to
worry about, but looks a bit strange. I looked at the archive but can't see
any report of the warning, so may be we are safe after all.

Another side issue: The way we set visibility map bits in the VACUUM, we do
that in the first phase of the vacuum. Now the very fact that we have
selected the page for vacuuming, it will have one or more dead tuples. As
soon as we find a dead tuple in the page, we decide not to mark the page
all-visible and rightly so. But this page would not get marked all-visible
even if we remove all dead tuples in the second phase. Why don't we push
that work to the second phase or at least retry that again ? Of course, we
can't just do it that way without scanning the page again because new
tuples may get added or updated/deleted in the meanwhile. But that can be
addressed with some tricks - like tracking LSN of every such (which has
only dead and live tuples, but not recently dead or insert-in-progress or
delete-in-progress) and then comparing that LSN with the page LSN during
the second phase. If an update had happened in between, we will know that
and we won't deal with the visibility bits.

Another idea would be to track this intermediate state in the page header
itself, something like PD_ALL_VISIBLE_OR_DEAD. If an intermediate
update/delete/insert comes in, it will clear the bit. If we see the bit set
during the second phase of the vacuum, we will know that it contains only
dead and live tuples and the dead ones are being removed now. So we can
mark the page PD_ALL_VISIBLE.

Thanks,
Pavan

--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Janes 2012-12-04 17:22:29 Re: Slow query: bitmap scan troubles
Previous Message Jan Wieck 2012-12-04 16:55:48 Re: autovacuum truncate exclusive lock round two