Re: amcheck verification for GiST

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc: Alvaro Herrera from 2ndQuadrant <alvherre(at)alvh(dot)no-ip(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: amcheck verification for GiST
Date: 2019-09-11 23:10:20
Message-ID: CAH2-Wz=7KNKiohgmRHn29KCnr_2Tid1CuM7f_nX7KV_+vgoP_A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Sep 8, 2019 at 1:21 AM Andrey Borodin <x4mmm(at)yandex-team(dot)ru> wrote:
> Maybe we should PageGetItemIdCareful() to amcheck.c too?
> I think it can be reused later by GIN entry tree and to some extent by SP-GiST.
> SP-GiST uses redirect tuples, but do not store this information in line pointer.

Well, the details are slightly different in each case in at least one
way -- we use the size of the special area to determine the exact
bounds that it is safe for a tuple to appear in. The size of the
special area varies based on the access method. (Actually, pg_filedump
relies on this difference when inferring which access method a
particular page image is based on -- it starts out by looking at the
standard pd_special field that appears in page headers. So clearly
they're often different.)

> > My main concern now is the heavyweight lock strength needed by the new
> > function. I don't feel particularly qualified to sign off on the
> > concurrency aspects of the patch. Heikki's v6 used a ShareLock, like
> > bt_index_parent_check(), but you went back to an AccessShareLock,
> > Andrey. Why is this safe? I see that you do gist_refind_parent() in
> > your v9 a little differently to Heikki in his v6, which you seemed to
> > suggest made this safe in your e-mail on March 28, but I don't
> > understand that at all.
>
> Heikki's version was reading childblkno instead of parentblkno, thus never refinding parent tuple.

> When we suspect key consistency violation, we hold lock on page with some tuple. Then we take pin and lock on page that was parent for current some time before.
> For example of validity see gistfinishsplit(). Comments state "On entry, the caller must hold a lock on stack->buffer", line 1330 acquires LockBuffer(stack->parent->buffer, GIST_EXCLUSIVE);
> This function is used during inserts, but we are not going to modify data and place row locks, thus neither ROW EXCLUSIVE, not ROW SHARE is necessary.

But gistfinishsplit() is called when finishing a page split -- the
F_FOLLOW_RIGHT bit must be set on the leaf. Are you sure that you can
generalize from that, and safely relocate the parent?

I would be a lot more comfortable with this if Heikki weighed in. I am
also concerned about the correctness of this because of this paragraph
from the GiST README file:

"""
This page deletion algorithm works on a best-effort basis. It might fail to
find a downlink, if a concurrent page split moved it after the first stage.
In that case, we won't be able to remove all empty pages. That's OK, it's
not expected to happen very often, and hopefully the next VACUUM will clean
it up.
"""

Why is this not a problem for the new amcheck checks? Maybe this is a
very naive question. I don't claim to be a GiST expert.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2019-09-12 00:16:34 Re: [PATCH] Opclass parameters
Previous Message Alvaro Herrera 2019-09-11 22:15:24 Re: base backup client as auxiliary backend process