Re: BUG #7710: Xid epoch is not updated properly during checkpoint

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: tarvip(at)gmail(dot)com, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #7710: Xid epoch is not updated properly during checkpoint
Date: 2012-12-02 12:51:57
Message-ID: CA+U5nM+qw3b74FrkY7z2bK73Q6svtCHLiGb3ZrHwXwLRp_nuDQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On 1 December 2012 22:56, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> tarvip(at)gmail(dot)com writes:
>> [ txid_current can show a bogus value near XID wraparound ]
>> This happens only if wal_level=hot_standby.
>
> I believe what is happening here is
>
> (1) CreateCheckPoint sets up checkPoint.nextXid and
> checkPoint.nextXidEpoch, near xlog.c line 7070 in HEAD. At this point,
> nextXid is still a bit less than the wrap point.
>
> (2) After performing the checkpoint, at line 7113, CreateCheckPoint
> calls LogStandbySnapshot() which "helpfully" updates checkPoint.nextXid
> to the latest value. Which by now has wrapped around. But it doesn't
> fix checkPoint.nextXidEpoch, so the checkpoint that gets written out has
> effectively lost the epoch bump that should have happened.
>
> While we could add some more logic to try to correct the epoch value
> in this scenario, I think it's a much better idea to just stop having
> LogStandbySnapshot update the nextXid. That seems to me to be useless
> complication. I also quite dislike the fact that we're effectively
> redefining the checkpoint nextXid from being taken before the main
> body of the checkpoint to being taken afterwards, but *only* in
> XLogStandbyInfoActive mode. If that inconsistency isn't already causing
> bugs (besides this one) today, it'll probably cause them in the future.

I agree that the coding looks weird and agree it shouldn't be there.
The meaning of the checkpoint values should not differ because
wal_level has changed.

> So barring objections, I'm going to remove LogStandbySnapshot's behavior
> of returning the updated nextXid.

Removing it may cause other bugs, but if so, those other bugs need to
be solved in the right way, not by having a too-far-forwards nextxid
on the checkpoint record. Having said that, I can't see any bugs that
would be caused by this.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Simon Riggs 2012-12-02 15:08:35 Re: BUG #7710: Xid epoch is not updated properly during checkpoint
Previous Message Simon Riggs 2012-12-02 10:12:53 Re: BUG #7710: Xid epoch is not updated properly during checkpoint