Re: snapshot too old issues, first around wraparound and then more.

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Kevin Grittner <kgrittn(at)gmail(dot)com>
Subject: Re: snapshot too old issues, first around wraparound and then more.
Date: 2020-04-01 15:15:14
Message-ID: CA+TgmoYTt1AYEhN3=esoQzXMW38L1rVLzZxpgFw_hXqTZNv7nQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Apr 1, 2020 at 2:40 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> I added some debug output to print the mapping before/after changes by
> MaintainOldSnapshotTimeMapping() (note that I used timestamps relative
> to the server start in minutes/seconds to make it easier to interpret).
>
> And the output turns out to be something like:
>
> WARNING: old snapshot mapping at "before update" with head ts: 7, current entries: 8 max entries: 15, offset: 0
> entry 0 (ring 0): min 7: xid 582921233
> entry 1 (ring 1): min 8: xid 654154155
> entry 2 (ring 2): min 9: xid 661972949
> entry 3 (ring 3): min 10: xid 666899382
> entry 4 (ring 4): min 11: xid 644169619
> entry 5 (ring 5): min 12: xid 644169619
> entry 6 (ring 6): min 13: xid 644169619
> entry 7 (ring 7): min 14: xid 644169619
>
> WARNING: head 420 s: updating existing bucket 4 for sec 660 with xmin 666899382
>
> WARNING: old snapshot mapping at "after update" with head ts: 7, current entries: 8 max entries: 15, offset: 0
> entry 0 (ring 0): min 7: xid 582921233
> entry 1 (ring 1): min 8: xid 654154155
> entry 2 (ring 2): min 9: xid 661972949
> entry 3 (ring 3): min 10: xid 666899382
> entry 4 (ring 4): min 11: xid 666899382
> entry 5 (ring 5): min 12: xid 644169619
> entry 6 (ring 6): min 13: xid 644169619
> entry 7 (ring 7): min 14: xid 644169619
>
> It's pretty obvious that the xids don't make a ton of sense, I think:
> They're not monotonically ordered. The same values exist multiple times,
> despite xids being constantly used. Also, despite the ringbuffer
> supposedly having 15 entries (that's snapshot_too_old = 5min + the 10 we
> always add), and the workload having run for 14min, we only have 8
> entries.

The function header comment for MaintainOldSnapshotTimeMapping could
hardly be more vague, as it's little more than a restatement of the
function name. However, it looks to me like the idea is that this
function might get called multiple times for the same or similar
values of whenTaken. I suppose that's the point of this code:

else if (ts <= (oldSnapshotControl->head_timestamp +
((oldSnapshotControl->count_used - 1)
* USECS_PER_MINUTE)))
{
/* existing mapping; advance xid if possible */
int bucket = (oldSnapshotControl->head_offset
+ ((ts - oldSnapshotControl->head_timestamp)
/ USECS_PER_MINUTE))
% OLD_SNAPSHOT_TIME_MAP_ENTRIES;

if (TransactionIdPrecedes(oldSnapshotControl->xid_by_minute[bucket],
xmin))
oldSnapshotControl->xid_by_minute[bucket] = xmin;
}

What I interpret this to be doing is saying - if we got a new call to
this function with a rounded-to-the-minute timestamp that we've seen
previously and for which we still have an entry, and if the XID passed
to this function is newer than the one passed by the previous call,
then advance the xid_by_minute[] bucket to the newer value. Now that
begs the question - what does this XID actually represent? The
comments don't seem to answer that question, not even the comments for
OldSnapshotControlData, which say that we should "Keep one xid per
minute for old snapshot error handling." but don't say which XIDs we
should keep or how they'll be used. However, the only call to
MaintainOldSnapshotTimeMapping() is in GetSnapshotData(). It appears
that we call this function each time a new snapshot is taken and pass
the current time (modulo some fiddling) and snapshot xmin. Given that,
one would expect that any updates to the map would be tight races,
i.e. a bunch of processes that all took their snapshots right around
the same time would all update the same map entry in quick succession,
with the newest value winning.

And that make the debugging output which I quoted from your message
above really confusing. At this point, the "head timestamp" is 7
minutes after this facility started up. The first we entry we have is
for minute 7, and the last is for minute 14. But the one we're
updating is for minute 11. How the heck can that happen? I might
suspect that you'd stopped a process inside GetSnapshotData() with a
debugger, but that can't explain it either, because GetSnapshotData()
gets the xmin first and only afterwards gets the timestamp - so if
you'd stopped for ~3 minutes it just before the call to
MaintainOldSnapshotTimeMapping(), it would've been updating the map
with an *old* XID. In reality, though, it changed the XID from
644169619 to 666899382, advancing over 22 million XIDs. I don't
understand what's going on there. How is this function getting called
with a 4-minute old value of whenTaken?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Steele 2020-04-01 15:17:11 Re: Verify true root on replicas with amcheck
Previous Message Andres Freund 2020-04-01 15:09:14 Re: snapshot too old issues, first around wraparound and then more.