Re: xid wraparound danger due to INDEX_CLEANUP false

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
Subject: Re: xid wraparound danger due to INDEX_CLEANUP false
Date: 2020-04-24 00:44:08
Message-ID: CAH2-Wz=TM1LjEFZoJ3OaBLLqzRFSU1wA_W_FC+hpBZUm+bWqQw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Apr 22, 2020 at 9:05 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> > Given hint bits it seems fairly hard to make that a reliable check.
>
> I don't follow. It doesn't have to be a perfect check. Detecting if
> there is *any* buffer lock held at all would be a big improvement.

It is true that the assumptions that heapam makes about what a buffer
pin will prevent (that a pin will prevent any kind of page
defragmentation) are not really compatible with marking pages as
undefined in lower level code like bufmgr.c. There are too many
exceptions for it to work like that.

The case I was really thinking about was the nbtree
_bt_drop_lock_and_maybe_pin() stuff, which is very confusing. The
confusing structure of the
BTScanPosIsPinned()/_bt_drop_lock_and_maybe_pin() code more or less
caused the skip scan patch to have numerous bugs involving code
holding a buffer pin, but not a buffer lock, at least when I last
looked at it a couple of months ago. The only thing having a pin on a
leaf page guarantees is that the TIDs from tuples on the page won't be
concurrently recycled by VACUUM. This is a very weak guarantee -- in
particular, it's much weaker than the guarantees around buffer pins
that apply in heapam. It's certainly not going to prevent any kind of
defragmentation of the page -- the page can even split, for example.
Any code that relies on holding a pin to prevent anything more than
that is broken, but possibly only in a subtle way. It's not like page
splits happen all that frequently.

Given that I was concerned about a fairly specific situation, a
specific solution seems like it might be the best way to structure the
extra checks. The attached rough patch shows the kind of approach that
might be practical in specific index access methods. This works on top
of the patch I posted yesterday. The idea is to mark the buffer's page
as a noaccess region within _bt_drop_lock_and_maybe_pin(), and then
mark it defined again at either of the two points that we might have
to relock (but not repin) the buffer to re-read the page. This doesn't
cause any regression test failures, so maybe there are no bugs like
this currently, but it still seems like it might be worth pursuing on
top of the buffer pin stuff.

--
Peter Geoghegan

Attachment Content-Type Size
0003-Mark-lockless-nbtree-leaf-page-memory-undefined.patch application/octet-stream 3.2 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2020-04-24 01:48:56 Re: [BUG] non archived WAL removed during production crash recovery
Previous Message Gareth Palmer 2020-04-24 00:04:33 Re: [PATCH] Implement INSERT SET syntax