Re: snapshot too old issues, first around wraparound and then more.

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Greg Stark <stark(at)mit(dot)edu>, Peter Geoghegan <pg(at)bowt(dot)ie>, Noah Misch <noah(at)leadboat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Kevin Grittner <kgrittn(at)gmail(dot)com>
Subject: Re: snapshot too old issues, first around wraparound and then more.
Date: 2021-06-16 18:27:57
Message-ID: 20210616182757.h3myscpshwlzdj3m@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2021-06-16 13:04:07 -0400, Tom Lane wrote:
> Yeah, I think this scenario of a few transactions with old snapshots
> and the rest with very new ones could be improved greatly if we exposed
> more info about backends' snapshot state than just "oldest xmin". But
> that might be expensive to do.

I think it'd be pretty doable now. The snapshot scalability changes
separated out information needed to do vacuuming / pruning (i.e. xmin)
from the information needed to build a snapshot (xid, flags, subxids
etc). Because xmin is not frequently accessed from other backends
anymore, it is not important anymore to touch it as rarely as
possible. From the cross-backend POV I think it'd be practically free to
track a backend's xmax now.

It's not quite as obvious that it'd essentially free to track a
backend's xmax across all the snapshots it uses. I think we'd basically
need a second pairingheap in snapmgr.c to track the "most advanced"
xmax? That's *probably* fine, but I'm not 100% - Heikki wrote a faster
heap implementation for snapmgr.c for a reason I assume.

I think the hard part of this would be much more on the pruning / vacuum
side of things. There's two difficulties:

1) Keeping it cheap to determine whether a tuple can be vacuumed,
particularly while doing on-access pruning. This likely means that
we'd only assemble the information to do visibility determination for
rows above the "dead for everybody" horizon when encountering a
sufficiently old tuple. And then we need a decent datastructure for
checking whether an xid is in one of the "not needed" xid ranges.

This seems solvable.

2) Modeling when it is safe to remove row versions. It is easy to remove
a tuple that was inserted and deleted within one "not needed" xid
range, but it's far less obvious when it is safe to remove row
versions where prior/later row versions are outside of such a gap.

Consider e.g. an update chain where the oldest snapshot can see one
row version, then there is a chain of rows that could be vacuumed
except for the old snapshot, and then there's a live version. If the
old session updates the row version that is visible to it, it needs
to be able to follow the xid chain.

This seems hard to solve in general.

It perhaps is sufficiently effective to remove row version chains
entirely within one removable xid range. And it'd probably doable to
also address the case where a chain is larger than one range, as long
as all the relevant row versions are within one page: We can fix up
the ctids of older still visible row versions to point to the
successor of pruned row versions.

But I have a hard time seeing a realistic approach to removing chains
that span xid ranges and multiple pages. The locking and efficiency
issues seem substantial.

Greetings,

Andres

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2021-06-16 18:29:36 Re: snapshot too old issues, first around wraparound and then more.
Previous Message Fabien COELHO 2021-06-16 18:25:31 Re: pgbench bug candidate: negative "initial connection time"