Re: PostgreSql: Canceled on conflict out to old pivot

From: "Wirch, Eduard" <eduard(dot)w(at)smart-host(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: PostgreSql: Canceled on conflict out to old pivot
Date: 2023-11-30 16:24:57
Message-ID: CAPX+m-jF52dHkve0et0WAScC2JkELChdmxVUNbTCrDM0oq903Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks for the detailed answer, Heikki.

> > The longest transaction that could occur is 1 min long.
> I hate to drill on this, but are you very sure about that? A transaction
in a different database?

Don't be sorry for that, drilling down is important. ;) It took me so long
to reply because I had to prepare the information carefully. You're right,
on that day I observed the behavior, there were indeed long running
transactions in different DBs! My understanding of serializable isolation
is that only transactions which can somehow affect each other can conflict.
It should be clear for PostgreSql, that transactions belonging to different
databases cannot affect each other. Why do they cause serializable
conflicts?

If you want something visual, I prepared a SO question with similar content
like this mail, but added an image of the tx flow:
https://stackoverflow.com/questions/77544821/postgresql-canceled-on-conflict-out-to-old-pivot

Cheers,
Eduard

Am Di., 28. Nov. 2023 um 09:53 Uhr schrieb Heikki Linnakangas <
hlinnaka(at)iki(dot)fi>:

> On 28/11/2023 07:41, Wirch, Eduard wrote:
> > ERROR: could not serialize access due to read/write dependencies among
> > transactions
> > Detail: Reason code: Canceled on identification as a pivot, with
> > conflict out to old committed transaction 61866959.
> >
> > There is a variation of the error:
> >
> > PSQLException: ERROR: could not serialize access due to read/write
> > dependencies among transactions
> > Detail: Reason code: Canceled on conflict out to old pivot 61940806.
>
> Both of these errors are coming from CheckForSerializableConflictOut(),
> and are indeed variations of the same kind of conflict.
>
> > We're logging the id, begin and end of every transaction. Transaction
> > 61940806 was committed without errors. The transaction responsible for
> > the above error was started 40min later (and failed immediately). With
> > 61866959 it is even more extreme: the first conflict error occurred 2.5h
> > after 61866959 was committed.
>
> Weird indeed. There is only one caller of
> CheckForSerializableConflictOut(), and it does this:
>
> > /*
> > * Find top level xid. Bail out if xid is too early to be a
> conflict, or
> > * if it's our own xid.
> > */
> > if (TransactionIdEquals(xid, GetTopTransactionIdIfAny()))
> > return;
> > xid = SubTransGetTopmostTransaction(xid);
> > if (TransactionIdPrecedes(xid, TransactionXmin))
> > return;
> >
> > CheckForSerializableConflictOut(relation, xid, snapshot);
>
> That check with TransactionXmin is very clear: if 'xid' precedes the
> xmin of the current transaction, IOW if there were no transactions with
> 'xid' or older running when the current transcaction started,
> CheckForSerializableConflictOut() is not called.
>
> > The DB table access pattern is too complex to lay out here. There are
> > like 20 tables that are read/written to. Transactions are usually short
> > living. The longest transaction that could occur is 1 min long. My
> > understanding of serializable isolation is that only overlapping
> > transactions can conflict. I can be pretty sure that in the above cases
> > there is no single transaction, which overlaps with 61940806 and with
> > the failing transaction 40 min later.
>
> I hate to drill on this, but are you very sure about that? I don't see
> how this could happen if there are no long-running transactions. Maybe a
> forgotten two-phase commit transaction? A transaction in a different
> database? A developer who did "begin;" in psql and went for lunch?
>
> > Such long running transactions
> > would cause different types of errors in our system ("out of shared
> > memory", "You might need to increase max_pred_locks_per_transaction").
>
> I don't see why that would necessarily be the case, unless it's
> something very specific to your application.
>
> > Why does PostgreSql detect a conflict with a transaction which was
> > committed more than 1h before? Can there be a long dependency chain
> > between many short running transactions? Does the high load prevent
> > Postgres from doing some clean up?
>
> The dependencies don't chain like that, but there is a system of
> "summarizing" old transactions to limit the shared memory usage. When a
> transaction has dependencies on other transactions, we track those
> dependencies in shared memory. But if we run short on the space reserved
> for that, we summarize the dependencies, losing granularity. We lose
> information of which relations/pages/tuples the xid accessed and which
> transactions exactly it had a dependency on. That is safe, but can cause
> false positives.
>
> The amount of shared memory reserved for tracking the dependencies is
> determined by max_pred_locks_per_transaction, so you could try
> increasing that to reduce those false positives, even if you never get
> the "out of shared memory" error.
>
> --
> Heikki Linnakangas
> Neon (https://neon.tech)
>
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2023-11-30 16:25:21 Re: pg_upgrade and logical replication
Previous Message Alexander Lakhin 2023-11-30 16:00:01 Re: Random pg_upgrade test failure on drongo