Re: Data corruption issues using streaming replication on 9.0.14/9.2.5/9.3.1

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Christophe Pettus <xof(at)thebuild(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Data corruption issues using streaming replication on 9.0.14/9.2.5/9.3.1
Date: 2013-11-20 10:48:50
Message-ID: 528C9392.8000004@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 19.11.2013 16:22, Andres Freund wrote:
> On 2013-11-19 15:20:01 +0100, Andres Freund wrote:
>> Imo something the attached patch should be done. The description I came
>> up with is:
>>
>> Fix Hot-Standby initialization of clog and subtrans.

Looks ok for a back-patchable fix.

It's a bit bizarre that the ExtendSUBTRANS loop in
ProcArrayApplyRecoveryInfo looks different from the one in
RecordKnownAssignedTransactionIds, but both look correct to me.

In master, it'd be nice to do some further cleanup. Some gripes:

In ProcArrayApplyXidAssignment, I wonder if it would be best to just
remove the (standbyState == STANDBY_INITIALIZED) check altogether. The
KnownAssignedXidsRemoveTree() that follows is harmless if there is
nothing in the known-assigned-xids array (xact_redo_commit does it in
STANDBY_INITIALIZED state too). The other thing that's done after that
check is updating lastOverflowedXid, and AFAICS it would be harmless to
update that, too. It will be overwritten by the
ProcArrayApplyRecoveryInfo() call that comes later.

Clog, subtrans and multixact are all handled differently. Extensions of
clog and multixact are WAL-logged, but extensions of subtrans are not.
They all have a Startup function, but it has a slightly different
purpose. StartupCLOG only sets latest_page_number, but StartupSUBTRANS
and StartupMultiXact zero out the current page. For CLOG, the TrimCLOG()
function does that. Why is clog handled differently from multixact?

StartupCLOG() and StartupMultiXact set latest_page_number, but
StartupSUBTRANS does not. Is that a problem for subtrans? StartupCLOG()
and StartupMultiXact() are called at different stages in hot standby -
StartupCLOG() is called at the beginning of recovery, but
StartupMultiXact() is only called at end of recovery. Why the
discrepancy, does latest_page_number need to be set during hot standby
or not?

I think we should bite the bullet, and WAL-log the extension of
subtrans, too. Then make the startup and extension procedure for all the
SLRUs the same.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ian Lawrence Barwick 2013-11-20 11:12:17 Re: Review: pre-commit triggers
Previous Message Haribabu kommi 2013-11-20 10:43:37 Re: New option for pg_basebackup, to specify a different directory for pg_xlog