Re: [HACKERS] A design for amcheck heapam verification

From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] A design for amcheck heapam verification
Date: 2018-03-27 13:48:08
Message-ID: CABOikdMDr4r_MWwRSKhSZxNuQKi0JTHX6ANGWtAZMkZEDKBkEw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 27, 2018 at 8:50 AM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:

> On Fri, Mar 23, 2018 at 7:13 AM, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
> wrote:
> > I've just flipped patch to WoA. But if above issues will be fixed I
> think that patch is ready for committer.
>
> Attached is v7, which has the small tweaks that you suggested.
>
> Thank you for the review. I hope that this can be committed shortly.
>
>
Sorry for coming a bit too late on this thread, but I started looking at
0002 patch.

*
+ * When index-to-heap verification is requested, a Bloom filter is used to
+ * fingerprint all tuples in the target index, as the index is traversed to
+ * verify its structure. A heap scan later verifies the presence in the
heap
+ * of all index tuples fingerprinted within the Bloom filter.
+ *

Is that correct? Aren't we actually verifying the presence in the index of
all
heap tuples?

@@ -116,37 +140,47 @@ static inline bool
invariant_leq_nontarget_offset(BtreeCheckState *state,
static Page palloc_btree_page(BtreeCheckState *state, BlockNumber
blocknum);

/*
- * bt_index_check(index regclass)
+ * bt_index_check(index regclass, heapallindexed boolean)

Can we come up with a better name for heapallindexed? May be
"check_heapindexed"?

+
+ /*
+ * Register our own snapshot in !readonly case, rather than asking
+ * IndexBuildHeapScan() to do this for us later. This needs to happen
+ * before index fingerprinting begins, so we can later be certain that
+ * index fingerprinting should have reached all tuples returned by
+ * IndexBuildHeapScan().
+ */
+ if (!state->readonly)
+ snapshot = RegisterSnapshot(GetTransactionSnapshot());
+ }
+

So this looks safe. In !readonly mode, we take the snapshot *before*
fingerprinting the index. Since we're using MVCC snapshot, any tuple which
is
visible to heapscan must be reachable via indexscan too. So we must find the
index entry for every heap tuple returned by the scan.

What I am not sure about is whether we can really examine an index which is
valid but whose indcheckxmin hasn't crossed our xmin horizon? Notice that
this
amcheck might be running in a transaction block, probably in a
repeatable-read
isolation level and hence GetTransactionSnapshot() might actually return a
snapshot which can't yet read the index consistently. In practice, this is
quite unlikely, but I think we should check for that case if we agree that
it
could be a problem.

The case with readonly mode is also interesting. Since we're scanning heap
with
SnapshotAny, heapscan may return us tuples which are RECENTLY_DEAD. So the
question is: can this happen?

- some concurrent index scan sees a heaptuple as DEAD and marks the index
entry as LP_DEAD
- our index fingerprinting sees index tuple as LP_DEAD
- our heap scan still sees the heaptuple as RECENTLY_DEAD

Now that looks improbable given that we compute OldestXmin after our index
fingerprinting was done i.e between step 2 and 3 and hence if a tuple looked
DEAD to some OldestXmin/RecentGlobalXmin computed before we computed our
OldestXmin, then surely our OldestXmin should also see the tuple DEAD. Or is
there a corner case that we're missing?

Are there any interesting cases around INSERT_IN_PROGRESS/DELETE_IN_PROGRESS
tuples, especially if those tuples were inserted/deleted by our own
transaction? It probably worth thinking.

Apart from that, comments in IndexBuildHeapRangeScan() claim that the
function
is called only with ShareLock and above, which is no longer true. We should
check if that has any side-effects. I can't think of any, but better to
verify
and update the comments to reflect new reality,

The partial indexes look fine since the non-interesting tuples never get
called
back.

One thing that worth documenting/noting is the fact that a !readonly check
will
run with a long-duration registered snapshot, thus holding OldestXmin back.
Is
there anything we can do to lessen that burden like telling other backends
to
ignore our xmin while computing OldestXmin (like vacuum does)?

Thanks,
Pavan

--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jonathan S. Katz 2018-03-27 13:51:34 PostgreSQL 11 Release Management Team & Feature Freeze
Previous Message Teodor Sigaev 2018-03-27 13:34:24 Re: PATCH: Exclude temp relations from base backup