Re: [HACKERS] A design for amcheck heapam verification

From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] A design for amcheck heapam verification
Date: 2018-03-28 12:04:09
Message-ID: CABOikdPNVBWh1tDWK_T1ZLCyjftpj5tDiFmqqokucjk3tjMfxw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 28, 2018 at 2:48 AM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:

> On Tue, Mar 27, 2018 at 6:48 AM, Pavan Deolasee
> <pavan(dot)deolasee(at)gmail(dot)com> wrote:
> > + * When index-to-heap verification is requested, a Bloom filter is used
> to
> > + * fingerprint all tuples in the target index, as the index is
> traversed to
> > + * verify its structure. A heap scan later verifies the presence in the
> > heap
> > + * of all index tuples fingerprinted within the Bloom filter.
> > + *
> >
> > Is that correct? Aren't we actually verifying the presence in the index
> of
> > all
> > heap tuples?
>
> I think that you could describe it either way. We're verifying the
> presence of heap tuples in the heap that ought to have been in the
> index (that is, most of those that were fingerprinted).
>

Hmm Ok. It still sounds backwards to me, but then English is not my first
or even second language. "A heap scan later verifies the presence in the
heap of all index tuples fingerprinted" sounds as if we expect to find all
fingerprinted index tuples in the heap. But in reality, we check if all
heap tuples have a fingerprinted index tuple.

>
> You're right - there is a narrow window for REPEATABLE READ and
> SERIALIZABLE transactions. This is a regression in v6, the version
> removed the TransactionXmin test.
>
> I am tempted to fix this by calling GetLatestSnapshot() instead of
> GetTransactionSnapshot(). However, that has a problem of its own -- it
> won't work in parallel mode, and we're dealing with parallel
> restricted functions, not parallel unsafe functions. I don't want to
> change that to fix such a narrow issue. IMV, a better fix is to treat
> this as a serialization failure. Attached patch, which applies on top
> of v7, shows what I mean.
>

Ok. I am happy with that.

>
> I think that this bug is practically impossible to hit, because we use
> the xmin from the pg_index tuple during "is index safe to use?"
> indcheckxmin/TransactionXmin checks (the patch that I attached adds a
> similar though distinct check), which raises another question for a
> REPEATABLE READ xact. That question is: How is a REPEATABLE READ
> transaction supposed to see the pg_index entry to get the index
> relation's oid to call a verification function in the first place?

Well pg_index entry will be visible and should be visible. Otherwise how do
you even maintain a newly created index. I am not sure, but I guess we take
fresh MVCC snapshots for catalog searches, even in RR transactions.

> My
> point is that there is no need for a more complicated solution than
> what I propose.
>

I agree on that.

>
> I don't think so. The way we compute OldestXmin for
> IndexBuildHeapRangeScan() is rather like a snapshot acquisition --
> GetOldestXmin() locks the proc array in shared mode. As you pointed
> out, the fact that it comes after everything else (fingerprinting)
> means that it must be equal to or later than what index scans saw,
> that allowed them to do the kill_prior_tuple() stuff (set LP_DEAD
> bits).
>
>
That's probably true.

>
> > Are there any interesting cases around INSERT_IN_PROGRESS/DELETE_IN_
> PROGRESS
> > tuples, especially if those tuples were inserted/deleted by our own
> > transaction? It probably worth thinking.
>

Anything here that you would like to check? I understand that you may see
such tuples only for catalog tables.

>
> IndexBuildHeapRangeScan() doesn't mention anything about CIC's heap
> ShareUpdateExclusiveLock (it just says SharedLock), because that lock
> strength doesn't have anything to do with IndexBuildHeapRangeScan()
> when it operates with an MVCC snapshot. I think that this means that
> this patch doesn't need to update comments within
> IndexBuildHeapRangeScan(). Perhaps that's a good idea, but it seems
> independent.
>

Ok, I agree. But note that we are now invoking that code
with AccessShareLock() whereas the existing callers either use ShareLock or
ShareUpdateExclusiveLock. That's probably does not matter, but it's a
change worth noting.

>
> > Is
> > there anything we can do to lessen that burden like telling other
> backends
> > to
> > ignore our xmin while computing OldestXmin (like vacuum does)?
>
> I don't think so. The transaction involved is still an ordinary user
> transaction.
>

While that's true, I am thinking if there is anything we can do to stop
this a consistency-checking utility to create other non-trivial side
effects on the state of the database, even if that means we can not check
all heap tuples. For example, can there be a way by which we allow
concurrent vacuum or HOT prune to continue to prune away dead tuples, even
if that means running bt_check_index() is some specialised way (such as not
allowing in a transaction block) and the heap scan might miss out some
tuples? I don't know answer to that, but given that sometimes bt_check_index()
may take hours if not days to finish, it seems worth thinking or at least
documenting the side-effects somewhere.

Thanks,
Pavan

--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2018-03-28 12:14:04 Re: Oddity in COPY FROM handling of check constraints on partition tables
Previous Message Yugo Nagata 2018-03-28 11:26:48 Re: [HACKERS] [PATCH] Lockable views