Re: BUG #17949: Adding an index introduces serialisation anomalies.

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
Cc: artem(dot)anisimov(dot)255(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Subject: Re: BUG #17949: Adding an index introduces serialisation anomalies.
Date: 2023-06-26 00:25:14
Message-ID: CA+hUKGKmR57CgZAPRjeSQfXb_oxjSUx7673SqDRw9RF2FfCw=A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Sun, Jun 25, 2023 at 1:59 AM Dmitry Dolgov <9erthalion6(at)gmail(dot)com> wrote:
> I've managed to resolve it, or at least reduce the chances for the issue to
> appear, via semi-randomly adding more CheckForSerializableConflictIn /
> PredicateLock around the new sublist that has to be created in
> ginHeapTupleFastInsert. I haven't seen the reproducer failing with this
> changeset after running it multiple times for a couple of minutes, where on the
> main branch, with the two fixes from Thomas included, it was failing within a
> couple of seconds.

Ahh, right, thanks. I don't think we need to lock all those pages as
you showed, as this whole "fast" path is supposed to be covered by the
meta page (in other words, GIN is expected to have the highest
possible serialisation failure rate under SSI unless you turn fast
updates off). But there is an ordering bug with the existing
predicate lock in that code, which allows this to happen:

S1: CheckForSerializableConflictIn(meta)

S2: PredicateLockPage(meta)
S2: scan, find no tuples

S1: BufferLock(EXCLUSIVE
S1: modify stuff...

CheckForSerializableConflictIn() was written with the assumption that
you're inserting a tuple (ie you have the page containing the tuple
locked), so you'll either conflict with a reader who already has a
predicate lock at that point OR you'll insert first and then the
reader will see your (invisible-to-snapshot) tuples, but here we're
doing some fancy footwork with a meta page, and we screwed up the
ordering and left a window where neither of those things happens.
Perhaps it was coded that way because there is drop-then-reacquire
dance, but it's easy enough to move the check in both branches. Does
that make sense?

Attachment Content-Type Size
v3-0001-Fix-race-in-SSI-interaction-with-empty-btrees.patch text/x-patch 2.6 KB
v3-0002-Fix-race-in-SSI-interaction-with-bitmap-heap-scan.patch text/x-patch 2.3 KB
v3-0003-Fix-race-in-SSI-interaction-with-gin-fast-path.patch text/x-patch 2.9 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Peter Geoghegan 2023-06-26 02:52:46 Re: BUG #17949: Adding an index introduces serialisation anomalies.
Previous Message Thomas Munro 2023-06-25 22:12:40 Re: BUG #17990: PSQL Process hangs in parallel mode