Re: snapshot too old issues, first around wraparound and then more.

From: Andres Freund <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Kevin Grittner <kgrittn(at)gmail(dot)com>
Subject: Re: snapshot too old issues, first around wraparound and then more.
Date: 2020-04-01 15:39:47
Message-ID: 20200401153947.lchk4yf3g63rw7wm@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2020-04-01 11:15:14 -0400, Robert Haas wrote:
> On Wed, Apr 1, 2020 at 2:40 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > I added some debug output to print the mapping before/after changes by
> > MaintainOldSnapshotTimeMapping() (note that I used timestamps relative
> > to the server start in minutes/seconds to make it easier to interpret).
> >
> > And the output turns out to be something like:
> >
> > WARNING: old snapshot mapping at "before update" with head ts: 7, current entries: 8 max entries: 15, offset: 0
> > entry 0 (ring 0): min 7: xid 582921233
> > entry 1 (ring 1): min 8: xid 654154155
> > entry 2 (ring 2): min 9: xid 661972949
> > entry 3 (ring 3): min 10: xid 666899382
> > entry 4 (ring 4): min 11: xid 644169619
> > entry 5 (ring 5): min 12: xid 644169619
> > entry 6 (ring 6): min 13: xid 644169619
> > entry 7 (ring 7): min 14: xid 644169619
> >
> > WARNING: head 420 s: updating existing bucket 4 for sec 660 with xmin 666899382
> >
> > WARNING: old snapshot mapping at "after update" with head ts: 7, current entries: 8 max entries: 15, offset: 0
> > entry 0 (ring 0): min 7: xid 582921233
> > entry 1 (ring 1): min 8: xid 654154155
> > entry 2 (ring 2): min 9: xid 661972949
> > entry 3 (ring 3): min 10: xid 666899382
> > entry 4 (ring 4): min 11: xid 666899382
> > entry 5 (ring 5): min 12: xid 644169619
> > entry 6 (ring 6): min 13: xid 644169619
> > entry 7 (ring 7): min 14: xid 644169619
> >
> > It's pretty obvious that the xids don't make a ton of sense, I think:
> > They're not monotonically ordered. The same values exist multiple times,
> > despite xids being constantly used. Also, despite the ringbuffer
> > supposedly having 15 entries (that's snapshot_too_old = 5min + the 10 we
> > always add), and the workload having run for 14min, we only have 8
> > entries.
>
> The function header comment for MaintainOldSnapshotTimeMapping could
> hardly be more vague, as it's little more than a restatement of the
> function name. However, it looks to me like the idea is that this
> function might get called multiple times for the same or similar
> values of whenTaken. I suppose that's the point of this code:

Right. We enforce whenTaken to be monotonic
(cf. GetSnapshotCurrentTimestamp()), but since
GetSnapshotCurrentTimestamp() reduces the granularity of the timestamp
to one-minute (the AlignTimestampToMinuteBoundary() call), it's
obviously possible to end up in the same bin as a previous

> What I interpret this to be doing is saying - if we got a new call to
> this function with a rounded-to-the-minute timestamp that we've seen
> previously and for which we still have an entry, and if the XID passed
> to this function is newer than the one passed by the previous call,
> then advance the xid_by_minute[] bucket to the newer value. Now that
> begs the question - what does this XID actually represent? The
> comments don't seem to answer that question, not even the comments for
> OldSnapshotControlData, which say that we should "Keep one xid per
> minute for old snapshot error handling." but don't say which XIDs we
> should keep or how they'll be used. However, the only call to
> MaintainOldSnapshotTimeMapping() is in GetSnapshotData(). It appears
> that we call this function each time a new snapshot is taken and pass
> the current time (modulo some fiddling) and snapshot xmin. Given that,
> one would expect that any updates to the map would be tight races,
> i.e. a bunch of processes that all took their snapshots right around
> the same time would all update the same map entry in quick succession,
> with the newest value winning.

Right.

> And that make the debugging output which I quoted from your message
> above really confusing. At this point, the "head timestamp" is 7
> minutes after this facility started up. The first we entry we have is
> for minute 7, and the last is for minute 14. But the one we're
> updating is for minute 11. How the heck can that happen?

If I undestand what your reference correctly, I think that is because,
due to the bug, the "need a new bucket" branch doesn't just extend by
one bucket, it extends it by many in common cases. Basically filling
buckets "into the future".

the advance = ... variable in the branch will not always be 1, even when
we continually call Maintain*. Here's some debug output showing that
(slightly modified from the patch I previously sent):

WARNING: old snapshot mapping at "before update" with head ts: 1, current entries: 2 max entries: 15, offset: 0
entry 0 (ring 0): min 1: xid 1089371384
entry 1 (ring 1): min 2: xid 1099553206

WARNING: head 1 min: filling 2 buckets starting at 0 for whenTaken 3 min, with xmin 1109840204

WARNING: old snapshot mapping at "after update" with head ts: 3, current entries: 4 max entries: 15, offset: 0
entry 0 (ring 0): min 3: xid 1089371384
entry 1 (ring 1): min 4: xid 1099553206
entry 2 (ring 2): min 5: xid 1109840204
entry 3 (ring 3): min 6: xid 1109840204

Note how the two new buckets have the same xid, and how we're inserting
for "whenTaken 3 min", but we've filled the mapping up to minute 6.

I don't think the calculation of the 'advance' variable is correct as
is, even if we ignore the wrong setting of the head_timestamp variable.

Greetings,

Andres Freund

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David Steele 2020-04-01 15:43:45 Re: BufFileRead() error signalling
Previous Message David Steele 2020-04-01 15:34:54 Re: Removing unneeded self joins