Re: PITR potentially broken in 9.2

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Pg Bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: PITR potentially broken in 9.2
Date: 2012-12-05 02:27:34
Message-ID: 22653.1354674454@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

Andres Freund <andres(at)2ndquadrant(dot)com> writes:
>> But the key is, the database was not actually consistent at that
>> point, and so opening hot standby was a dangerous thing to do.
>>
>> The bug that allowed the database to open early (the original topic if
>> this email chain) was masking this secondary issue.

> Could you check whether the attached patch fixes the behaviour?

Yeah, I had just come to the same conclusion: the bug is not with
Heikki's fix, but with the pause logic. The comment says that it
shouldn't pause unless users can connect to un-pause, but the actual
implementation of that test is several bricks shy of a load.

Your patch is better, but it's still missing a bet: what we need to be
sure of is not merely that we *could* have told the postmaster to start
hot standby, but that we *actually have done so*. Otherwise, it's
flow-of-control-dependent whether it works or not; we could fail if the
main loop hasn't gotten to CheckRecoveryConsistency since the conditions
became true. Looking at the code in CheckRecoveryConsistency, testing
LocalHotStandbyActive seems appropriate.

I also thought it was pretty dangerous that this absolutely critical
test was not placed in recoveryPausesHere() itself, rather than relying
on the call sites to remember to do it.

So the upshot is that I propose a patch more like the attached.

This is not a regression because the pause logic is broken this same
way since 9.1. So I no longer think that we need a rewrap.

regards, tom lane

Attachment Content-Type Size
fix-pause-logic.patch text/x-patch 2.1 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Andres Freund 2012-12-05 02:37:17 Re: PITR potentially broken in 9.2
Previous Message Andres Freund 2012-12-05 02:15:38 Re: PITR potentially broken in 9.2

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2012-12-05 02:37:17 Re: PITR potentially broken in 9.2
Previous Message Andres Freund 2012-12-05 02:15:38 Re: PITR potentially broken in 9.2