Re: "page is not marked all-visible" warning in regression tests

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: "page is not marked all-visible" warning in regression tests
Date: 2012-06-06 19:07:28
Message-ID: 201206062107.29300.andres@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wednesday, June 06, 2012 08:19:15 PM Robert Haas wrote:
> On Wed, Jun 6, 2012 at 1:46 PM, Andres Freund <andres(at)2ndquadrant(dot)com>
wrote:
> > On a cursory lock it might just be a race condition in
> > vacuumlazy.c:lazy_scan_heap. If scan_all is set, which it has to be for
> > the warning to be visible, all_visible_according_to_vm is determined
> > before we loop over all blocks. At the point where one specific heap
> > block is actually read and locked that knowledge might be completely
> > outdated by any concurrent backend. Am I missing something?
>
> No, I think you're right. I think that warning is bogus. I added it
> in place of some older warning which no longer made sense, but I think
> this one doesn't make sense either.
Agreed.

It might be interesting to recheck the visibility and warn if its wrong. That
should be infrequent enough to bearable and it does check for an actually
dangerous case in a new code path.

> > I have to say the whole visibilitymap correctness and crash-safety seems
> > to be quite under documented, especially as it seems to be somewhat
> > intricate (to me). E.g. not having any note why visibilitymap_test
> > doesn't need locking. (I guess the theory is that a 1 byte read will
> > always be consistent. But how does that ensure other backends see an
> > up2date value?).
>
> It's definitely intricate, and it's very possible that we should have
> some more documentation. I am not sure exactly what and where, but
> feel free to suggest something.
I think some addition to the LOCKING part of visibilitymap.c's header
explaining some of what you wrote in your email might be a good start.
I would also suggest explictly mentioning that its ok to have the
visibilitymap and the page disagreeing about the visibility if its the
visibility map that think that the page contains invisible data but not the
other way round (I think that can currently happen).

visibilitymap_test() should explain that its results can be outdated if youre
not holding a buffer lock.

> visibilitymap_test() does have a comment saying that:
>
> /*
> * We don't need to lock the page, as we're only looking at a
> single bit.
> */
Oh. I conveniently skipped that comment in my brain ;)

> But that's a bit unsatisfying, because, as you say, it doesn't address
> the question of memory-ordering issues. I think that there's no
> situation in which it causes a problem to see the visibility map bit
> as unset when in reality it has just recently been set by some other
> back-end. It would be bad if someone did something like:
>
> if (visibilitymap_test(...))
> visibilitymap_clear();
>
> ...because then memory-ordering issues could cause us to accidentally
> fail to clear the bit. No one should be doing that, though; the
> relevant locking and conditional logic is built directly into
> visibilitymap_clear().
Then _test should document that... I don't think its impossible that we will
grow more uses of the visibilitymap logic.

> On the flip side, if someone sees the visibility map bit as set when
> it's actually just been cleared, that could cause a problem - most
> seriously, index-only scans could return wrong answers. For that to
> happen, someone would have to insert a heap tuple onto a previously
> all-visible page, clearing the visibility map bit, and then insert an
> index tuple; concurrently, some other backend would need to see the
> index tuple but not the fact that the visibility map bit had been
> cleared. I don't think that can happen: after inserting the heap
> tuple, the inserting backend would release buffer content lock, which
> acts as a full memory barrier; before reading the index tuple, the
> index-only-scanning backend would have to take the content lock on the
> index buffer, which also acts as a full memory barrier. So the
> inserter can't do the writes out of order, and the index-only-scanner
> can't do the reads out of order, so I think it's safe.... but we
> probably do need to explain that somewhere.
Hm. For a short while I thought there would be an issue with heap_delete and
IOS because the deleting transaction can commit without any barriers happening
on the IOS side. But that only seems to be possible with non MVCC snapshots
which are currently not allowed with index only scans.

Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2012-06-06 19:08:05 Re: 9.3: load path to mitigate load penalty for checksums
Previous Message Ants Aasma 2012-06-06 18:54:15 Re: 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile