Re: Documenting when to retry on serialization failure

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Documenting when to retry on serialization failure
Date: 2021-12-29 03:29:54
Message-ID: CA+hUKG+PTRwdakDZ3hR263PJb8CcxmozuXCgKPKYgeVx-dOSAA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Dec 10, 2021 at 1:43 AM Simon Riggs
<simon(dot)riggs(at)enterprisedb(dot)com> wrote:
> "Applications using this level must be prepared to retry transactions
> due to serialization failures."
> ...
> "When an application receives this error message, it should abort the
> current transaction and retry the whole transaction from the
> beginning."
>
> I note that the specific error codes this applies to are not
> documented, so lets discuss what the docs for that would look like.

+1 for naming the error.

> I had a conversation with Kevin Grittner about retry some years back
> and it seemed clear that the application should re-execute application
> logic from the beginning, rather than just slavishly re-execute the
> same SQL. But that is not documented either.

Right, the result of the first statement could cause the application
to do something completely different the second time through. I
personally think the best way for applications to deal with this
problem (and at least also deadlock, serialisation failure's
pessimistic cousin) is to represent transactions as blocks of code
that can be automatically retried, however that looks in your client
language. It might be that you pass a
function/closure/whatever-you-call-it to the transaction management
code so it can rerun it if necessary, or that a function is decorated
in some way that some magic infrastructure understands, but that's a
little tricky to write about in a general enough way for our manual.
(A survey of how this looks with various different libraries and tools
might make a neat conference talk though.) But isn't that exactly
what that existing sentence "... from the beginning" is trying to say,
especially with the follow sentence ("The second time through...")?
Hhm, yeah, perhaps that next sentence could be clearer.

> Is *automatic* retry possible? In all cases? None? Or maybe Some?

I'm aware of a couple of concrete cases that confound attempts to
retry automatically: sometimes we report a unique constraint
violation or an exclusion constraint failure, when we have the
information required to diagnose a serialisation anomaly. In those
cases, we really should figure out how to spit out 40001 (otherwise
what is general purpose auto retry code supposed to do with UCV?). We
fixed a single-index variant of this problem in commit fcff8a57. I
have an idea for how this might be fixed for the multi-index UCV[1]
and exclusion constraint[2] variants of the problem, but haven't
actually tried yet.

If there are other things that stand in the way of reliable automated
retry (= a list of error codes a client library could look for) then
I'd love to have a list of them.

> But what about the case of a single statement transaction? Can we just
> re-execute then? I guess if it didn't run anything other than
> IMMUTABLE functions then it should be OK, assuming the inputs
> themselves were immutable, which we've no way for the user to declare.
> Could we allow a user-defined auto_retry parameter?

I've wondered about that too, but so far it didn't seem worth the
effort, since application developers need another solution for
multi-statement retry anyway.

> We don't mention that a transaction might just repeatedly fail either.

According to the VLDB paper, the "safe retry" property (§ 5.4) means
that a retry won't abort for the same reason (due to a cycle with the
same set of other transactions as your last attempt), unless prepared
transactions are involved (§ 7.1). This means that the whole system
continues to make some kind of progress in the absence of 2PC, though
of course your transaction might or might not fail because of a cycle
with some other set of transactions. Maybe that is too technical for
our manual, which already provides the link to that paper, but it's
interesting to note that you can suffer from a stuck busy-work loop
until conflicting prepared xacts go away, with a naive
automatic-retry-forever system.

[1] https://www.postgresql.org/message-id/flat/CAGPCyEZG76zjv7S31v_xPeLNRuzj-m%3DY2GOY7PEzu7vhB%3DyQog%40mail.gmail.com
[2] https://www.postgresql.org/message-id/flat/CAMTXbE-sq9JoihvG-ccC70jpjMr%2BDWmnYUj%2BVdnFRFSRuaaLZQ%40mail.gmail.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2021-12-29 04:21:36 Re: PublicationActions - use bit flags.
Previous Message Andrey V. Lepikhov 2021-12-29 03:22:51 Re: Look at all paths?