Re: Improve search for missing parent downlinks in amcheck

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Improve search for missing parent downlinks in amcheck
Date: 2019-04-28 00:36:56
Message-ID: CAH2-Wzmv444fW0XCrXM=STS23p2XADufgEoH37SrO=v9HJv5_w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Apr 27, 2019 at 5:13 PM Alexander Korotkov
<a(dot)korotkov(at)postgrespro(dot)ru> wrote:
> Yes, increasing of Bloom filter size also helps. But my intention was
> to make non-lossy check here.

I agree that that might be a good goal, but I am interested in knowing
if there is something naive about how the downlinkfilter Bloom filter
is sized. I think that we could probably do better than this without
much work:

/*
* Extra readonly downlink check.
*
* In readonly case, we know that there cannot be a concurrent
* page split or a concurrent page deletion, which gives us the
* opportunity to verify that every non-ignorable page had a
* downlink one level up. We must be tolerant of interrupted page
* splits and page deletions, though. This is taken care of in
* bt_downlink_missing_check().
*/
total_pages = (int64) state->rel->rd_rel->relpages;
state->downlinkfilter = bloom_create(total_pages, work_mem, seed);

Maybe we could use "smgrnblocks(index->rd_smgr, MAIN_FORKNUM))"
instead of relpages, for example.

> It wouldn't be really wasteful, because the same children were
> accessed recently. So, they are likely not yet evicted from
> shared_buffers. I think we can fit both checks into one loop over
> downlinks instead of two.

I see your point, but if we're going to treat this as a bug then I
would prefer a simple fix.

> Yes, we can do more checks with bloom filter. But assuming that we
> anyway iterate over children for each non-leaf page, can we do:
> 1) All the checks, which bt_downlink_check() does for now
> 2) Check there are no missing downlinks
> 3) Check hikeys
> in one pass, can't we?

We can expect every high key in a page to have a copy contained within
its parent, either as one of the keys, or as parent's own high key
(assuming no concurrent or interrupted page splits or page deletions).
This is true today, even though we truncate negative infinity items in
internal pages.

I think that the simple answer to your question is yes. It would be
more complicated that way, and the only extra check would be the check
of high keys against their parent, but overall this does seem
possible.

--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-04-28 00:56:51 Re: Race conditions with checkpointer and shutdown
Previous Message Alexander Korotkov 2019-04-28 00:12:54 Re: Improve search for missing parent downlinks in amcheck