Re: An out-of-date comment in nodeIndexonlyscan.c

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Ashwin Agrawal <aagrawal(at)pivotal(dot)io>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Kevin Grittner <kgrittn(at)gmail(dot)com>
Subject: Re: An out-of-date comment in nodeIndexonlyscan.c
Date: 2021-06-14 06:11:51
Message-ID: CA+hUKGKbYe3=0tV4SgE5QOeSsTDFRZn3a7ptr2kFroXhPPSQEg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jun 14, 2021 at 2:29 PM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> Have you also thought about deferrable unique / primary key constraints?
>
> It's possible to the uniqueness temporarily violated during a
> transaction when the unique constraint is deferred,

Oh, yeah, very good question. I think many scenarios involving
duplicates work correctly, but I think there is a danger like this:

create table t (i int primary key deferrable initially deferred, j int);
create unique index on t(j);
insert into t values (999, 999); -- avoid empty index
set enable_seqscan = off;

begin transaction isolation level serializable;
insert into t values (1, 2); -- create phantom row
select * from t where i = 1;
delete from t where j = 2; -- remove phantom row
SELECT locktype, relation::regclass, page, tuple FROM pg_locks WHERE
mode = 'SIReadLock';
commit;

master:

locktype | relation | page | tuple
----------+----------+------+-------
page | t_pkey | 1 |
page | t_j_idx | 1 |
(2 rows)

v3 patch:

locktype | relation | page | tuple
----------+----------+------+-------
(0 rows)

In fact the lock on t_pkey's page 1 was important: it represents the
search for a tuple with i = 1, other than our temporary phantom (only
allowed because constraint deferred). If someone else inserts a row
matching i = 1, the SSI system will not know that we tried to look for
it, because our own temporary phantom confused us.

> I think you'd just need to add a check to ensure that indimmediate is
> true around where you're checking the indisunique flag.

Yeah, that fixes the problem in this case at least. With v4:

locktype | relation | page | tuple
----------+----------+------+-------
page | t_pkey | 1 |
(1 row)

(It's expected that t_j_idx is not locked: the delete query benefits
from the optimisation when accessing the index on t(j)).

That test case is a little confusing, because at no point does it ever
actually create a duplicate, but I suspect the problem is avoided
without the deferred constraint because you'd either get a UCV on
insertion, or block anyone else from inserting until after you commit
(even after you delete the phantom), and I think that may avoid the
hazard. I could be confused about that. If I am wrong, then a
possible general solution may be to apply the optimisation only if we
find a match that wasn't written by this transaction, though I'm not
sure how given the current layering.

Attachment Content-Type Size
v4-0001-Use-tuple-level-SIREAD-locks-for-index-only-scans.patch text/x-patch 6.7 KB
v4-0002-WIP-Skip-SIREAD-locks-on-btree-pages-when-possibl.patch text/x-patch 32.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Yugo NAGATA 2021-06-14 06:11:55 Fix around conn_duration in pgbench
Previous Message Drouvot, Bertrand 2021-06-14 05:41:12 Re: Minimal logical decoding on standbys