Re: Too coarse predicate locks granularity for B+ tree indexes

From: Rinat Shigapov <rinatshigapov(at)gmail(dot)com>
To: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
Cc: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Too coarse predicate locks granularity for B+ tree indexes
Date: 2023-02-07 11:08:26
Message-ID: CAJneKMWy+4SQ6_JTESEzRJakaVX2gfGSkH24zvDjqymZ3LK05A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

Thank you for your prompt reply!

I've mentioned that I've generated ballast data to make the cost optimizer
to switch to page-level locks.

But my question is about more finer grained (less then page) predicate
locks for indices. With page-level locks I could still get serialization
failures if I add more queries (or emulate it with sleeps) to the
transaction with the UPDATE Users query.

Below I describe the problem again for psql-general:

I have a concurrent testsuite that runs 14 test cases. Each test case
operates on a disjoint set of records, doesn't retry transactions and is
run under 'serializable' isolation level. The test data is small and likely
fits within a single tuple page.

When I finished the test suite I was surprised that PostgreSQL 14.5 returns
serialization failure on every test suite run. I was even more surprised
when I tested the suite against the current CockroachDB and didn't get
serialization failures. Actually I was able to reproduce RETRY_SERIALIZABLE
errors a couple of times on CockroachDB but it required me to run the test
suite in a loop for more than a half hour.

I started to investigate the test behavior with PostgreSQL with more
simplified and shrinked code and found a serialization failure of two
concurrent `update_user` operations.

The test defines the following `Users` table:

CREATE TABLE Users (
> id UUID,
> title VARCHAR(255),
> first_name VARCHAR(40),
> last_name VARCHAR(80) NOT NULL,
> email VARCHAR(255) NOT NULL,
> lower_email VARCHAR(255) GENERATED ALWAYS AS (lower(email)) STORED,
> marketing_optin BOOLEAN,
> mobile_phone VARCHAR(50),
> phone VARCHAR(50),
> phone_ext VARCHAR(40),
> is_contact BOOLEAN DEFAULT false NOT NULL,
> unlinked_link_ids UUID[],

> CONSTRAINT unique_user_email UNIQUE(lower_email),
> PRIMARY KEY (id)
> );

Concurrent `update_user` operation run the UPDATE query to change user
email to a unique value

UPDATE Users
> SET
> title = CASE WHEN false= true THEN 'foo' ELSE title END,
> first_name = CASE WHEN false= true THEN 'foo' ELSE first_name END,
> last_name = CASE WHEN false= true THEN 'foo' ELSE last_name END,
> email = CASE WHEN true = true THEN 'email2' ELSE email END,
> marketing_optin = CASE WHEN false = true THEN true ELSE
> marketing_optin END,
> mobile_phone = CASE WHEN false = true THEN 'foo' ELSE mobile_phone END,
> phone = CASE WHEN false = true THEN 'foo' ELSE phone END,
> phone_ext = CASE WHEN false = true THEN 'foo' ELSE phone_ext END
> WHERE id = '018629fd-7b28-743c-8647-b6321c166d46';
>

I use the following helper view to monitor locks:

> CREATE VIEW locks_v AS
> SELECT pid,
> virtualtransaction,
> locktype,
> CASE locktype
> WHEN 'relation' THEN relation::regclass::text
> WHEN 'virtualxid' THEN virtualxid::text
> WHEN 'transactionid' THEN transactionid::text
> WHEN 'tuple' THEN
> relation::regclass::text||':'||page::text||':'||tuple::text
> WHEN 'page' THEN relation::regclass::text||':'||page::text
> END AS lockid,
> mode,
> granted
> FROM pg_locks;

When the test Users table has only a few records the query uses a
sequential scan the serialization failure is reproducible without inserting
sleeps before `update_user` transaction commit.

This is caused by relation level predicate locks on Users table:

> select * from locks_v;
> pid | virtualtransaction | locktype | lockid |
> mode | granted
>
> ------+--------------------+---------------+-------------------+------------------+---------
> 3676 | 5/2444 | relation | unique_user_email |
> RowExclusiveLock | t
> 3676 | 5/2444 | relation | users_pkey |
> RowExclusiveLock | t
> 3676 | 5/2444 | relation | users |
> RowExclusiveLock | t
> 3676 | 5/2444 | virtualxid | 5/2444 |
> ExclusiveLock | t
> 3737 | 4/13470 | relation | pg_locks |
> AccessShareLock | t
> 3737 | 4/13470 | relation | locks_v |
> AccessShareLock | t
> 3737 | 4/13470 | virtualxid | 4/13470 |
> ExclusiveLock | t
> 3669 | 3/17334 | relation | unique_user_email |
> RowExclusiveLock | t
> 3669 | 3/17334 | relation | users_pkey |
> RowExclusiveLock | t
> 3669 | 3/17334 | relation | users |
> RowExclusiveLock | t
> 3669 | 3/17334 | virtualxid | 3/17334 |
> ExclusiveLock | t
> 3676 | 5/2444 | transactionid | 6571 |
> ExclusiveLock | t
> 3669 | 3/17334 | transactionid | 6570 |
> ExclusiveLock | t
> 3676 | 5/2444 | relation | users |
> SIReadLock | t
> 3669 | 3/17334 | relation | users |
> SIReadLock | t
> (15 rows)
>

If I add ballast data to Users table (1000 records) the cost optimizer
switches to index scan and it's hard to reproduce the issue for two
concurrent `update_user` operations without sleeps. After adding long
sleeps after UPDATE query and before commit I could see page-level
predicates locks for the primary key index users_pkey:

select * from locks_v;
> pid | virtualtransaction | locktype | lockid | mode
> | granted
>
> -----+--------------------+---------------+-------------------+------------------+---------
> 371 | 6/523 | relation | unique_user_email |
> RowExclusiveLock | t
> 371 | 6/523 | relation | users_pkey |
> RowExclusiveLock | t
> 371 | 6/523 | relation | users |
> RowExclusiveLock | t
> 371 | 6/523 | virtualxid | 6/523 |
> ExclusiveLock | t
> 381 | 14/215 | relation | unique_user_email |
> RowExclusiveLock | t
> 381 | 14/215 | relation | users_pkey |
> RowExclusiveLock | t
> 381 | 14/215 | relation | users |
> RowExclusiveLock | t
> 381 | 14/215 | virtualxid | 14/215 |
> ExclusiveLock | t
> 350 | 4/885 | relation | pg_locks |
> AccessShareLock | t
> 350 | 4/885 | relation | locks_v |
> AccessShareLock | t
> 350 | 4/885 | virtualxid | 4/885 |
> ExclusiveLock | t
> 371 | 6/523 | transactionid | 1439 |
> ExclusiveLock | t
> 381 | 14/215 | transactionid | 1431 |
> ExclusiveLock | t
> 381 | 14/215 | page | users_pkey:5 | SIReadLock
> | t
> 371 | 6/523 | page | users_pkey:5 | SIReadLock
> | t
> (15 rows)
>

With sleeps the serialization failure is reproduced on each run.

I started to read more about SSI implementation in PostgreSQL. The article
https://arxiv.org/pdf/1208.4179.pdf mentions that

> Currently, locks on B+-tree indexes are acquired at page granularity; we
> intend to refine this to next-key locking [16] in a future release.
>
[16] C. Mohan. ARIES/KVL: A key-value locking method for concurrency
> control of multiaction transactions operating on B-tree indexes. In VLDB,
> pages 392–405, 1990.

My question follows:

Does the current PostgreSQL release support B+ tree index predicate locks
more granular then page-level locks?

With kindest regards, Rinat Shigapov

вт, 7 февр. 2023 г. в 16:29, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>:

> On Tue, 2023-02-07 at 16:23 +0600, Rinat Shigapov wrote:
> > I have a concurrent testsuite that runs 14 test cases. Each test case
> operates
> > on a disjoint set of records, doesn't retry transactions and is run under
> > 'serializable' isolation level. The test data is small and likely fits
> within
> > a single tuple page.
> >
> > When I finished the test suite I was surprised that PostgreSQL 14.5
> returns
> > serialization failure on every test suite run.
>
> This is no question for the hackers list; redirecting to general.
>
> That behavior sounds perfectly normal to me: if everything is in a single
> page, PostgreSQL probably won't use an index scan. With a sequential scan,
> the predicate lock will be on the whole table. So you should expect
> serialization failures. This is well documented.
>
> Perhaps you should use a more realistic test case with a reasonable
> amount of data.
>
> Yours,
> Laurenz Albe
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Thomas Kellerer 2023-02-07 11:10:07 Re: PostgreSQL
Previous Message Joseph Kennedy 2023-02-07 11:02:21 PostgreSQL

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2023-02-07 11:10:37 Re: Too coarse predicate locks granularity for B+ tree indexes
Previous Message Dean Rasheed 2023-02-07 10:56:52 Re: MERGE ... RETURNING