PostgreSql: Canceled on conflict out to old pivot

From: "Wirch, Eduard" <eduard(dot)w(at)smart-host(dot)com>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: PostgreSql: Canceled on conflict out to old pivot
Date: 2023-11-28 05:41:31
Message-ID: CAPX+m-hdbj4b1Ufcpx+YAGj8BLF91gdi6zr0Pc3NKZTBDtKUrQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi

I posted this question already on pgsql-general, but it got no answers.
Maybe the topic is too technical? So I'm trying it here. Maybe a SSI
specialist is here on the list.

We have a PostgreSql 15 server serving around 30 databases, one schema each
with the same layout. Each database is used by one application instance.
The application consistently uses transactions with isolation level
serializable to access the database, optimizing by using explicit read only
transactions, where applicable. Once the server reaches 100% CPU load we
get an increased amount of serialize conflict errors. This is expected, due
to more concurrent access. But I fail to explain this kind of error:

ERROR: could not serialize access due to read/write dependencies among
transactions
Detail: Reason code: Canceled on identification as a pivot, with conflict
out to old committed transaction 61866959.

There is a variation of the error:

PSQLException: ERROR: could not serialize access due to read/write
dependencies among transactions
Detail: Reason code: Canceled on conflict out to old pivot 61940806.

We're logging the id, begin and end of every transaction. Transaction
61940806 was committed without errors. The transaction responsible for the
above error was started 40min later (and failed immediately). With 61866959
it is even more extreme: the first conflict error occurred 2.5h after
61866959 was committed.

The DB table access pattern is too complex to lay out here. There are like
20 tables that are read/written to. Transactions are usually short living.
The longest transaction that could occur is 1 min long. My understanding of
serializable isolation is that only overlapping transactions can conflict.
I can be pretty sure that in the above cases there is no single
transaction, which overlaps with 61940806 and with the failing transaction
40 min later. Such long running transactions would cause different types of
errors in our system ("out of shared memory", "You might need to increase
max_pred_locks_per_transaction").

Why does PostgreSql detect a conflict with a transaction which was
committed more than 1h before? Can there be a long dependency chain between
many short running transactions? Does the high load prevent Postgres from
doing some clean up?

Cheers,
Eduard

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ilya Kosmodemiansky 2023-11-28 06:14:38 Re: PostgreSql: Canceled on conflict out to old pivot
Previous Message Masahiko Sawada 2023-11-28 05:35:23 Re: [PoC] pg_upgrade: allow to upgrade publisher node