Quick Links

Too coarse predicate locks granularity for B+ tree indexes

From:	Rinat Shigapov <rinatshigapov(at)gmail(dot)com>
To:	pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	Too coarse predicate locks granularity for B+ tree indexes
Date:	2023-02-07 10:23:54
Message-ID:	CAJneKMX826bmGpTHyt9-oM3Qbfk1mX8-Kn1E7s2Gt2MNF=yRaQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general pgsql-hackers

Hi,

TLDR: this email describes a serialization failure that happens (as I
understand it) due to too coarse predicate locks granularity for primary
key index.

I have a concurrent testsuite that runs 14 test cases. Each test case
operates on a disjoint set of records, doesn't retry transactions and is
run under 'serializable' isolation level. The test data is small and likely
fits within a single tuple page.

When I finished the test suite I was surprised that PostgreSQL 14.5 returns
serialization failure on every test suite run. I was even more surprised
when I tested the suite against the current CockroachDB and didn't get
serialization failures. Actually I was able to reproduce RETRY_SERIALIZABLE
errors a couple of times on CockroachDB but it required me to run the test
suite in a loop for more than a half hour.

I started to investigate the test behavior with PostgreSQL with more
simplified and shrinked code and found a serialization failure of two
concurrent `update_user` operations.

The test defines the following `Users` table:

CREATE TABLE Users (
> id UUID,
> title VARCHAR(255),
> first_name VARCHAR(40),
> last_name VARCHAR(80) NOT NULL,
> email VARCHAR(255) NOT NULL,
> lower_email VARCHAR(255) GENERATED ALWAYS AS (lower(email)) STORED,
> marketing_optin BOOLEAN,
> mobile_phone VARCHAR(50),
> phone VARCHAR(50),
> phone_ext VARCHAR(40),
> is_contact BOOLEAN DEFAULT false NOT NULL,
> unlinked_link_ids UUID[],

> CONSTRAINT unique_user_email UNIQUE(lower_email),
> PRIMARY KEY (id)
> );

Concurrent `update_user` operation run the UPDATE query to change user
email to a unique value

UPDATE Users
> SET
> title = CASE WHEN false= true THEN 'foo' ELSE title END,
> first_name = CASE WHEN false= true THEN 'foo' ELSE first_name END,
> last_name = CASE WHEN false= true THEN 'foo' ELSE last_name END,
> email = CASE WHEN true = true THEN 'email2' ELSE email END,
> marketing_optin = CASE WHEN false = true THEN true ELSE
> marketing_optin END,
> mobile_phone = CASE WHEN false = true THEN 'foo' ELSE mobile_phone END,
> phone = CASE WHEN false = true THEN 'foo' ELSE phone END,
> phone_ext = CASE WHEN false = true THEN 'foo' ELSE phone_ext END
> WHERE id = '018629fd-7b28-743c-8647-b6321c166d46';
>

I use the following helper view to monitor locks:

> CREATE VIEW locks_v AS
> SELECT pid,
> virtualtransaction,
> locktype,
> CASE locktype
> WHEN 'relation' THEN relation::regclass::text
> WHEN 'virtualxid' THEN virtualxid::text
> WHEN 'transactionid' THEN transactionid::text
> WHEN 'tuple' THEN
> relation::regclass::text||':'||page::text||':'||tuple::text
> WHEN 'page' THEN relation::regclass::text||':'||page::text
> END AS lockid,
> mode,
> granted
> FROM pg_locks;

When the test Users table has only a few records the query uses a
sequential scan the serialization failure is reproducible without inserting
sleeps before `update_user` transaction commit.

This is caused by relation level predicate locks on Users table:

> select * from locks_v;
> pid | virtualtransaction | locktype | lockid |
> mode | granted
>
> ------+--------------------+---------------+-------------------+------------------+---------
> 3676 | 5/2444 | relation | unique_user_email |
> RowExclusiveLock | t
> 3676 | 5/2444 | relation | users_pkey |
> RowExclusiveLock | t
> 3676 | 5/2444 | relation | users |
> RowExclusiveLock | t
> 3676 | 5/2444 | virtualxid | 5/2444 |
> ExclusiveLock | t
> 3737 | 4/13470 | relation | pg_locks |
> AccessShareLock | t
> 3737 | 4/13470 | relation | locks_v |
> AccessShareLock | t
> 3737 | 4/13470 | virtualxid | 4/13470 |
> ExclusiveLock | t
> 3669 | 3/17334 | relation | unique_user_email |
> RowExclusiveLock | t
> 3669 | 3/17334 | relation | users_pkey |
> RowExclusiveLock | t
> 3669 | 3/17334 | relation | users |
> RowExclusiveLock | t
> 3669 | 3/17334 | virtualxid | 3/17334 |
> ExclusiveLock | t
> 3676 | 5/2444 | transactionid | 6571 |
> ExclusiveLock | t
> 3669 | 3/17334 | transactionid | 6570 |
> ExclusiveLock | t
> 3676 | 5/2444 | relation | users |
> SIReadLock | t
> 3669 | 3/17334 | relation | users |
> SIReadLock | t
> (15 rows)
>

If I add ballast data to Users table (1000 records) the cost optimizer
switches to index scan and it's hard to reproduce the issue for two
concurrent `update_user` operations without sleeps. After adding long
sleeps after UPDATE query and before commit I could see page-level
predicates locks for the primary key index users_pkey:

select * from locks_v;
> pid | virtualtransaction | locktype | lockid | mode
> | granted
>
> -----+--------------------+---------------+-------------------+------------------+---------
> 371 | 6/523 | relation | unique_user_email |
> RowExclusiveLock | t
> 371 | 6/523 | relation | users_pkey |
> RowExclusiveLock | t
> 371 | 6/523 | relation | users |
> RowExclusiveLock | t
> 371 | 6/523 | virtualxid | 6/523 |
> ExclusiveLock | t
> 381 | 14/215 | relation | unique_user_email |
> RowExclusiveLock | t
> 381 | 14/215 | relation | users_pkey |
> RowExclusiveLock | t
> 381 | 14/215 | relation | users |
> RowExclusiveLock | t
> 381 | 14/215 | virtualxid | 14/215 |
> ExclusiveLock | t
> 350 | 4/885 | relation | pg_locks |
> AccessShareLock | t
> 350 | 4/885 | relation | locks_v |
> AccessShareLock | t
> 350 | 4/885 | virtualxid | 4/885 |
> ExclusiveLock | t
> 371 | 6/523 | transactionid | 1439 |
> ExclusiveLock | t
> 381 | 14/215 | transactionid | 1431 |
> ExclusiveLock | t
> 381 | 14/215 | page | users_pkey:5 | SIReadLock
> | t
> 371 | 6/523 | page | users_pkey:5 | SIReadLock
> | t
> (15 rows)
>

With sleeps the serialization failure is reproduced on each run.

I started to read more about SSI implementation in PostgreSQL. The article
https://arxiv.org/pdf/1208.4179.pdf mentions that

> Currently, locks on B+-tree indexes are acquired at page granularity; we
> intend to refine this to next-key locking [16] in a future release.
>
[16] C. Mohan. ARIES/KVL: A key-value locking method for concurrency
> control of multiaction transactions operating on B-tree indexes. In VLDB,
> pages 392–405, 1990.

My question follows:

Does the current PostgreSQL release support B+ tree index predicate locks
more granular then page-level locks?

With kindest regards, Rinat Shigapov

Responses

Re: Too coarse predicate locks granularity for B+ tree indexes at 2023-02-07 10:29:50 from Laurenz Albe
Re: Too coarse predicate locks granularity for B+ tree indexes at 2023-02-07 11:10:37 from Thomas Munro

Browse pgsql-general by date

	From	Date	Subject
Next Message	Laurenz Albe	2023-02-07 10:29:50	Re: Too coarse predicate locks granularity for B+ tree indexes
Previous Message	Rahila Syed	2023-02-07 09:05:26	Re: Logical Replication - "invalid ordering of speculative insertion changes"

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Dean Rasheed	2023-02-07 10:28:42	Re: MERGE ... WHEN NOT MATCHED BY SOURCE
Previous Message	John Naylor	2023-02-07 10:22:59	Re: [PoC] Improve dead tuple storage for lazy vacuum