From: | Rinat Shigapov <rinatshigapov(at)gmail(dot)com> |
---|---|
To: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Too coarse predicate locks granularity for B+ tree indexes |
Date: | 2023-02-07 10:23:54 |
Message-ID: | CAJneKMX826bmGpTHyt9-oM3Qbfk1mX8-Kn1E7s2Gt2MNF=yRaQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general pgsql-hackers |
Hi,
TLDR: this email describes a serialization failure that happens (as I
understand it) due to too coarse predicate locks granularity for primary
key index.
I have a concurrent testsuite that runs 14 test cases. Each test case
operates on a disjoint set of records, doesn't retry transactions and is
run under 'serializable' isolation level. The test data is small and likely
fits within a single tuple page.
When I finished the test suite I was surprised that PostgreSQL 14.5 returns
serialization failure on every test suite run. I was even more surprised
when I tested the suite against the current CockroachDB and didn't get
serialization failures. Actually I was able to reproduce RETRY_SERIALIZABLE
errors a couple of times on CockroachDB but it required me to run the test
suite in a loop for more than a half hour.
I started to investigate the test behavior with PostgreSQL with more
simplified and shrinked code and found a serialization failure of two
concurrent `update_user` operations.
The test defines the following `Users` table:
CREATE TABLE Users (
> id UUID,
> title VARCHAR(255),
> first_name VARCHAR(40),
> last_name VARCHAR(80) NOT NULL,
> email VARCHAR(255) NOT NULL,
> lower_email VARCHAR(255) GENERATED ALWAYS AS (lower(email)) STORED,
> marketing_optin BOOLEAN,
> mobile_phone VARCHAR(50),
> phone VARCHAR(50),
> phone_ext VARCHAR(40),
> is_contact BOOLEAN DEFAULT false NOT NULL,
> unlinked_link_ids UUID[],
> CONSTRAINT unique_user_email UNIQUE(lower_email),
> PRIMARY KEY (id)
> );
Concurrent `update_user` operation run the UPDATE query to change user
email to a unique value
UPDATE Users
> SET
> title = CASE WHEN false= true THEN 'foo' ELSE title END,
> first_name = CASE WHEN false= true THEN 'foo' ELSE first_name END,
> last_name = CASE WHEN false= true THEN 'foo' ELSE last_name END,
> email = CASE WHEN true = true THEN 'email2' ELSE email END,
> marketing_optin = CASE WHEN false = true THEN true ELSE
> marketing_optin END,
> mobile_phone = CASE WHEN false = true THEN 'foo' ELSE mobile_phone END,
> phone = CASE WHEN false = true THEN 'foo' ELSE phone END,
> phone_ext = CASE WHEN false = true THEN 'foo' ELSE phone_ext END
> WHERE id = '018629fd-7b28-743c-8647-b6321c166d46';
>
I use the following helper view to monitor locks:
> CREATE VIEW locks_v AS
> SELECT pid,
> virtualtransaction,
> locktype,
> CASE locktype
> WHEN 'relation' THEN relation::regclass::text
> WHEN 'virtualxid' THEN virtualxid::text
> WHEN 'transactionid' THEN transactionid::text
> WHEN 'tuple' THEN
> relation::regclass::text||':'||page::text||':'||tuple::text
> WHEN 'page' THEN relation::regclass::text||':'||page::text
> END AS lockid,
> mode,
> granted
> FROM pg_locks;
When the test Users table has only a few records the query uses a
sequential scan the serialization failure is reproducible without inserting
sleeps before `update_user` transaction commit.
This is caused by relation level predicate locks on Users table:
> select * from locks_v;
> pid | virtualtransaction | locktype | lockid |
> mode | granted
>
> ------+--------------------+---------------+-------------------+------------------+---------
> 3676 | 5/2444 | relation | unique_user_email |
> RowExclusiveLock | t
> 3676 | 5/2444 | relation | users_pkey |
> RowExclusiveLock | t
> 3676 | 5/2444 | relation | users |
> RowExclusiveLock | t
> 3676 | 5/2444 | virtualxid | 5/2444 |
> ExclusiveLock | t
> 3737 | 4/13470 | relation | pg_locks |
> AccessShareLock | t
> 3737 | 4/13470 | relation | locks_v |
> AccessShareLock | t
> 3737 | 4/13470 | virtualxid | 4/13470 |
> ExclusiveLock | t
> 3669 | 3/17334 | relation | unique_user_email |
> RowExclusiveLock | t
> 3669 | 3/17334 | relation | users_pkey |
> RowExclusiveLock | t
> 3669 | 3/17334 | relation | users |
> RowExclusiveLock | t
> 3669 | 3/17334 | virtualxid | 3/17334 |
> ExclusiveLock | t
> 3676 | 5/2444 | transactionid | 6571 |
> ExclusiveLock | t
> 3669 | 3/17334 | transactionid | 6570 |
> ExclusiveLock | t
> 3676 | 5/2444 | relation | users |
> SIReadLock | t
> 3669 | 3/17334 | relation | users |
> SIReadLock | t
> (15 rows)
>
If I add ballast data to Users table (1000 records) the cost optimizer
switches to index scan and it's hard to reproduce the issue for two
concurrent `update_user` operations without sleeps. After adding long
sleeps after UPDATE query and before commit I could see page-level
predicates locks for the primary key index users_pkey:
select * from locks_v;
> pid | virtualtransaction | locktype | lockid | mode
> | granted
>
> -----+--------------------+---------------+-------------------+------------------+---------
> 371 | 6/523 | relation | unique_user_email |
> RowExclusiveLock | t
> 371 | 6/523 | relation | users_pkey |
> RowExclusiveLock | t
> 371 | 6/523 | relation | users |
> RowExclusiveLock | t
> 371 | 6/523 | virtualxid | 6/523 |
> ExclusiveLock | t
> 381 | 14/215 | relation | unique_user_email |
> RowExclusiveLock | t
> 381 | 14/215 | relation | users_pkey |
> RowExclusiveLock | t
> 381 | 14/215 | relation | users |
> RowExclusiveLock | t
> 381 | 14/215 | virtualxid | 14/215 |
> ExclusiveLock | t
> 350 | 4/885 | relation | pg_locks |
> AccessShareLock | t
> 350 | 4/885 | relation | locks_v |
> AccessShareLock | t
> 350 | 4/885 | virtualxid | 4/885 |
> ExclusiveLock | t
> 371 | 6/523 | transactionid | 1439 |
> ExclusiveLock | t
> 381 | 14/215 | transactionid | 1431 |
> ExclusiveLock | t
> 381 | 14/215 | page | users_pkey:5 | SIReadLock
> | t
> 371 | 6/523 | page | users_pkey:5 | SIReadLock
> | t
> (15 rows)
>
With sleeps the serialization failure is reproduced on each run.
I started to read more about SSI implementation in PostgreSQL. The article
https://arxiv.org/pdf/1208.4179.pdf mentions that
> Currently, locks on B+-tree indexes are acquired at page granularity; we
> intend to refine this to next-key locking [16] in a future release.
>
[16] C. Mohan. ARIES/KVL: A key-value locking method for concurrency
> control of multiaction transactions operating on B-tree indexes. In VLDB,
> pages 392–405, 1990.
My question follows:
Does the current PostgreSQL release support B+ tree index predicate locks
more granular then page-level locks?
With kindest regards, Rinat Shigapov
From | Date | Subject | |
---|---|---|---|
Next Message | Laurenz Albe | 2023-02-07 10:29:50 | Re: Too coarse predicate locks granularity for B+ tree indexes |
Previous Message | Rahila Syed | 2023-02-07 09:05:26 | Re: Logical Replication - "invalid ordering of speculative insertion changes" |
From | Date | Subject | |
---|---|---|---|
Next Message | Dean Rasheed | 2023-02-07 10:28:42 | Re: MERGE ... WHEN NOT MATCHED BY SOURCE |
Previous Message | John Naylor | 2023-02-07 10:22:59 | Re: [PoC] Improve dead tuple storage for lazy vacuum |