Re: Do we need to handle orphaned prepared transactions in the server?

From: Hamid Akhtar <hamid(dot)akhtar(at)gmail(dot)com>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Thomas Kellerer <shammat(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Do we need to handle orphaned prepared transactions in the server?
Date: 2020-02-19 15:04:50
Message-ID: CANugjhvNmiSc=RKf78Jq=aVopGkvnsVQJv7s03vk9h4bvD5Sjw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

All,

Attached is version 1 of POC patch for notifying of orphaned
prepared transactions via warnings emitted to a client
application and/or log file. It applies to PostgreSQL branch
"master" on top of "e2e02191" commit.

I've tried to keep the patch as less invasive as I could with
minimal impact on vacuum processes, so the performance impact
and the changes are minimal in that area of PostgreSQL core.

- What's in this Patch:

This patch throws warnings when an autovacuum worker encounters
an orphaned prepared transaction. It also throws warnings to a
client when a vacuum command is issued. This patch also
introduces two new GUCs:

(1) max_age_prepared_xacts
- The age after creation of a prepared transaction after which it
will be considered an orphan.

(2) prepared_xacts_vacuum_warn_timeout
- The timeout period for an autovacuum (essentially any of its
worker) to check for orphaned prepared transactions and throw
warnings if any are found.

- What This Patch Does:

If the GUCs are enabled (set to a value higher than -1), an
autovacuum worker running in the background checks if the
timeout has expired. If so, it checks if there are any orphaned
prepared transactions (i.e. their age has exceeded
max_age_prepared_xacts). If it finds any, it throws a warning for
every such transaction. It also emits the total number of orphaned
prepared transactions if one or more are found.

When a vacuum command is issued from within a client, say psql,
in that case, we skip the vacuum timeout check and simply scan
for any orphaned prepared transactions. Warnings are emitted to
the client and log file if any are found.

- About the New GUCs:

= max_age_prepared_xacts:
Sets maximum age after which a prepared transaction is considered an
orphan. It applies when "prepared transactions" are enabled. The
age for a transaction is calculated from the time it was created to
the current time. If this value is specified without units, it is taken
as milliseconds. The default value is -1 which allows prepared
transactions to live forever.

= prepared_xacts_vacuum_warn_timeout:
Sets timeout after which vacuum starts throwing warnings for every
prepared transactions that has exceeded maximum age defined by
"max_age_prepared_xacts". If this value is specified without units,
it is taken as milliseconds. The default value of -1 will disable
this warning mechanism. Setting a too value could potentially fill
up log with orphaned prepared transaction warnings, so this
parameter must be set to a value that is reasonably large to not
fill up log file, but small enough to notify of long running and
potential orphaned prepared transactions. There is no additional
timer or worker introduced with this change. Whenever a vacuum
worker runs, it first checks for any orphaned prepared transactions.
So at best, this GUC serves as a guideline for a vacuum worker
if a warning should be thrown to log file or a client issuing
vacuum command.

- What this Patch Does Not Cover:

The warning is not thrown when user either runs vacuumdb or passes
individual relations to be vacuum. Specifically in case of vacuumdb,
it breaks down a vacuum command to an attribute-wise vacuum command.
So the vacuum command is indirectly run many times. Considering that
we want to emit warnings for every manual vacuum command, this simply
floods the terminal and log with orphaned prepared transactions
warnings. We could potentially handle that, but the overhead of
that seemed too much to me (and I've not invested any time looking
to fix that either). Hence, warnings are not thrown when user runs
vacuumdb and relation specific vacuum.

On Fri, Jan 31, 2020 at 7:02 PM Hamid Akhtar <hamid(dot)akhtar(at)gmail(dot)com> wrote:

>
>
> On Thu, Jan 30, 2020 at 8:28 AM Craig Ringer <craig(at)2ndquadrant(dot)com>
> wrote:
>
>> On Thu, 30 Jan 2020 at 02:04, Hamid Akhtar <hamid(dot)akhtar(at)gmail(dot)com>
>> wrote:
>> >
>> > So having seen the feedback on this thread, and I tend to agree with
>> most of what has been said here, I also agree that the server core isn't
>> really the ideal place to handle the orphan prepared transactions.
>> >
>> > Ideally, these must be handled by a transaction manager, however, I do
>> believe that we cannot let database suffer for failing of an external
>> software, and we did a similar change through introduction of idle in
>> transaction timeout behavior.
>>
>> The difference, IMO, is that idle-in-transaction aborts don't affect
>> anything we've promised to be durable.
>>
>> Once you PREPARE TRANSACTION the DB has made a promise that that txn
>> is durable. We don't have any consistent feedback channel to back to
>> applications and say "Hey, if you're not going to finish this up we
>> need to get rid of it soon, ok?". If a distributed transaction manager
>> gets consensus for commit and goes to COMMIT PREPARED a previously
>> prepared txn only to find that it has vanished, that's a major
>> problem, and one that may bring the entire DTM to a halt until the
>> admin can intervene.
>>
>> This isn't like idle-in-transaction aborts. It's closer to something
>> like uncommitting a previously committed transaction.
>>
>> I do think it'd make sense to ensure that the documentation clearly
>> highlights the impact of abandoned prepared xacts on server resource
>> retention and performance, preferably with pointers to appropriate
>> views. I haven't reviewed the docs to see how clear that is already.
>>
>
> Having seen the documentation, IMHO the document does contain enough
> information for users to understand what issues can be caused by these
> orphaned prepared transactions.
>
>
>>
>> I can also see an argument for a periodic log message (maybe from
>> vacuum?) warning when old prepared xacts hold xmin down. Including one
>> sent to the client application when an explicit VACUUM is executed.
>> (In fact, it'd make sense to generalise that for all xmin-retention).
>>
>
> I think that opens up the debate on what we really mean by "old" and
> whether that requires a syntax change when creating a prepared
> transactions as Thomas Kellerer suggested earlier?
>
> I agree that vacuum should periodically throw warnings for any prepared
> xacts that are holding xmin down.
>
> Generalising it for all xmin-retention is a fair idea IMHO, though that
> does increase the overall scope here. A vacuum process should (ideally)
> periodically throw out warnings for anything that is preventing it
> (including
> orphaned prepared transactions) from doing its routine work so that
> somebody can take necessary actions.
>
>
>> But I'm really not a fan of aborting such txns. If you operate with
>> some kind of broken global transaction manager that can forget or
>> abandon prepared xacts, then fix it, or adopt site-local periodic
>> cleanup tasks that understand your site's needs.
>>
>> --
>> Craig Ringer http://www.2ndQuadrant.com/
>> 2ndQuadrant - PostgreSQL Solutions for the Enterprise
>>
>
>
> --
> Highgo Software (Canada/China/Pakistan)
> URL : www.highgo.ca
> ADDR: 10318 WHALLEY BLVD, Surrey, BC
> CELL:+923335449950 EMAIL: mailto:hamid(dot)akhtar(at)highgo(dot)ca
> SKYPE: engineeredvirus
>

--
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
CELL:+923335449950 EMAIL: mailto:hamid(dot)akhtar(at)highgo(dot)ca
SKYPE: engineeredvirus

Attachment Content-Type Size
orphaned_pxact.v1.patch application/octet-stream 12.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2020-02-19 15:44:27 Re: Internal key management system
Previous Message Ildar Musin 2020-02-19 13:59:40 Constraint's NO INHERIT option is ignored in CREATE TABLE LIKE statement