Re: StandbyRecoverPreparedTransactions recovers subtrans links incorrectly

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: StandbyRecoverPreparedTransactions recovers subtrans links incorrectly
Date: 2017-04-23 09:59:32
Message-ID: CANP8+j+j_5FjeuSz4OUF1=6FnKiMN5bA9xoZD8LUR-swrNx3kA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 23 April 2017 at 00:55, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Now that we've got consistent failure reports about the 009_twophase.pl
> recovery test, I set out to find out why it's failing. It looks to me
> like the reason is that this (twophase.c:2145):
>
> SubTransSetParent(xid, subxid, overwriteOK);
>
> ought to be this:
>
> SubTransSetParent(subxid, xid, overwriteOK);
>
> because the definition of SubTransSetParent is
>
> void
> SubTransSetParent(TransactionId xid, TransactionId parent, bool overwriteOK)
>
> not the other way 'round.
>
> While "git blame" blames this line on the recent commit 728bd991c,
> that just moved the call from somewhere else. AFAICS this has actually
> been wrong since StandbyRecoverPreparedTransactions was written,
> in 361bd1662 of 2010-04-13.

Thanks for finding that.

> It's not clear to me how much potential this has to create user data
> corruption, but it doesn't look good at first glance. Discuss.

Well, strangely for different reason I was looking at that particular
call a couple of days back. I didn't spot that issue, but I was
thinking why we even make that call at that time. My conclusion was
that it could be optimized away mostly, since it isn't needed very
often, but its not really worth optimizing.

SubTransSetParent() only matters for the case where we are half-way
through a commit with a large commit. Since we do atomic updates of
commit and subcommmit when on same page, this problem would only
matter when top level xid and other subxids were separated across
multiple pages and the issue would only affect a read only query
checking visibility during the commit for that prepared transaction.
Furthermore, the nature of the corruption is that we set the xid to
point to the subxid; since we never mark a top-level commit as
subcommitted, subtrans would never be consulted and so the corruption
would not lead to any incorrect answer even during the window of
exposure. So it looks to me like this error shouldn't cause a problem.

We can fix that, but...

> Also, when I fix that, it gets further but still crashes at the same
> Assert in SubTransSetParent. The proximate cause this time seems to be
> that RecoverPreparedTransactions's calculation of overwriteOK is wrong:
> it's computing that as "false", but in reality the subtrans link in
> question has already been set.

Not sure about that, investigating.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2017-04-23 10:05:44 Re: StandbyRecoverPreparedTransactions recovers subtrans links incorrectly
Previous Message Michael Banck 2017-04-23 09:14:21 Re: A note about debugging TAP failures