Re: Data corruption issues using streaming replication on 9.0.14/9.2.5/9.3.1

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Christophe Pettus <xof(at)thebuild(dot)com>
Cc: PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Data corruption issues using streaming replication on 9.0.14/9.2.5/9.3.1
Date: 2013-11-19 14:20:01
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 2013-11-18 23:15:59 +0100, Andres Freund wrote:
> Afaics it's likely a combination/interaction of bugs and fixes between:
> * the initial HS code
> * 5a031a5556ff83b8a9646892715d7fef415b83c3
> * f44eedc3f0f347a856eea8590730769125964597

Yes, the combination of those is guilty.

Man, this is (to a good part my) bad.

> But that'd mean nobody noticed it during 9.3's beta...

It's fairly hard to reproduce artificially since a) there have to be
enough transactions starting and committing from the start of the
checkpoint the standby is starting from to the point it does
LogStandbySnapshot() to cross a 32768 boundary b) hint bits often save
the game by not accessing clog at all anymore and thus not noticing the
I've reproduced the issue by having an INSERT ONLY table that's never
read from. It's helpful to disable autovacuum.

Imo something the attached patch should be done. The description I came
up with is:

Fix Hot-Standby initialization of clog and subtrans.

These bugs can cause data loss on standbys started with hot_standby=on
at the moment they start to accept read only queries by marking
committed transactions as uncommited. The likelihood of such
corruptions is small unless the primary has a high transaction rate.

5a031a5556ff83b8a9646892715d7fef415b83c3 fixed bugs in HS's startup
logic by maintaining less state until at least
STANDBY_SNAPSHOT_PENDING state was reached, missing the fact that both
clog and subtrans are written to before that. This only failed to fail
in common cases because the usage of ExtendCLOG in procarray.c was
superflous since clog extensions are actually WAL logged.

f44eedc3f0f347a856eea8590730769125964597/I then tried to fix the
missing extensions of pg_subtrans due to the former commit's changes -
which are not WAL logged - by performing the extensions when switching
to a state > STANDBY_INITIALIZED and not performing xid assignments
before that - again missing the fact that ExtendCLOG is unneccessary -
but screwed up twice: Once because latestObservedXid wasn't updated
anymore in that state due to the earlier commit and once by having an
off-by-one error in the loop performing extensions.
This means that whenever a CLOG_XACTS_PER_PAGE (32768 with default
settings) boundary was crossed between the start of the checkpoint
recovery started from and the first xl_running_xact record old
transactions commit bits in pg_clog could be overwritten if they
started and committed in that window.

Fix this mess by not performing ExtendCLOG() in HS at all anymore
since it's unneeded and evidently dangerous and by performing subtrans
extensions even before reaching STANDBY_SNAPSHOT_PENDING.

Imo this warrants and expedited point release :(


Andres Freund

Andres Freund
PostgreSQL Development, 24x7 Support, Training & Services

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2013-11-19 14:22:18 Re: Data corruption issues using streaming replication on 9.0.14/9.2.5/9.3.1
Previous Message Greg Stark 2013-11-19 14:16:04 Re: Replication Node Identifiers and crashsafe Apply Progress