Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <

From: Andres Freund <andres(at)anarazel(dot)de>
To: Kevin Grittner <kgrittn(at)gmail(dot)com>
Cc: Ants Aasma <ants(dot)aasma(at)eesti(dot)ee>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Kevin Grittner <kgrittn(at)postgresql(dot)org>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, "pgsql-committers(at)postgresql(dot)org" <pgsql-committers(at)postgresql(dot)org>
Subject: Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
Date: 2016-05-24 21:56:16
Message-ID: 20160524215616.y4xvze5f5tfv64gn@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

On 2016-05-24 16:09:27 -0500, Kevin Grittner wrote:
> On Tue, May 24, 2016 at 3:54 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> > what about e.g. concurrent index builds? E.g. IndexBuildHeapRangeScan() doesn't
> > seem to contain any checks against outdated blocks
>
> Why would it? We're talking about blocks where there were dead
> tuples, with the transaction which updated or deleted the tuples
> being complete, where those dead tuples were vacuumed away. These
> should not appear in the index, should they?

(it appears I had switched around the concerns for concurrent and
!concurrent in my head. But I think both might be negatively affected.)

For concurrent indexes we'll actually use a normal mvcc snapshot, which
means heap_getnext() in IndexBuildHeapRangeScan() and
validate_index_heapscan() will error out when encountering removed
tuples. Which means it'll be hard to ever perform a concurrent reindex
where an individual phase takes more than old_snapshot_threshold -
problematic from my POV, given that old_snapshot_threshold cannot be
changed at runtime.

For normal index scans the following appears to be problematic:
case HEAPTUPLE_RECENTLY_DEAD:
/*
* If tuple is recently deleted then we must index it
* anyway to preserve MVCC semantics. (Pre-existing
* transactions could try to use the index after we finish
* building it, and may need to see such tuples.)
*
* However, if it was HOT-updated then we must only index
* the live tuple at the end of the HOT-chain. Since this
* breaks semantics for pre-existing snapshots, mark the
* index as unusable for them.
*/

afaics that detection is broken if we advance the xmin horizon more
aggressively than the snapshot. The LSNs of the created index pages will
be new, and we'll thus not necessarily error out when requried. At the
very least we'd have to set indexInfo->ii_BrokenHotChain in that case.
As far as I can see normal index builds will allow concurrent hot prunes
and everything; since those only require page-level exclusive locks.

So for !concurrent builds we might end up with a corrupt index.

I think many of the places relying on heap scans with !mvcc snapshots
are problematic in that way. Outdated reads will not be detected by
TestForOldSnapshot() (given the (snapshot)->satisfies ==
HeapTupleSatisfiesMVCC condition therein), and rely on the snapshot to
be actually working. E.g. CLUSTER/ VACUUM FULL rely on accurate
HEAPTUPLE_RECENTLY_DEAD
switch (HeapTupleSatisfiesVacuum(tuple, OldestXmin, buf))
{
...
case HEAPTUPLE_RECENTLY_DEAD:
tups_recently_dead += 1;
/* fall through */
case HEAPTUPLE_LIVE:
/* Live or recently dead, must copy it */
isdead = false;
break;

If an old session with >= repeatable read accesses a clustered table
(after the cluster committed), they'll now not see any errors, because
all the LSNs look new.

> >>> Is there anything preventing this from becoming a problem?
> >>
> >> The fundamental approach that the error can only appear on
> >> user-facing scans, not internal reads and positioning.
> >
> >> Unless there is some code path that uses a scan where the snapshot
> >> age is checked, this cannot be a problem. I don't see any such
> >> path, but if you do, please let me know, and I'll fix things.
> >
> > That scares me. Not throwing an error, and not being broken are entirely
> > different things.
>
> Absolutely, but let's not start pointing at random chunks of code
> and asking why an error isn't thrown there without showing that you
> get bad results otherwise, or at least some plausible argument why
> you might.

This attitude is disturbing. You've evidently not at all looked at the
snapshot issues around analyze before; even though snapshot too old at
the very least introduces a behavioural issue there (if not a bug at
least in the case of expression based stuff). I've asked you about
principled defenses, and your answer was "we don't error out". Now I
point to another place, and you respond with a relatively strong
dismissal. Independent of me being right or wrong, that seems to be the
completely wrong way round.

Andres

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Stephen Frost 2016-05-25 00:10:22 pgsql: Qualify table usage in dumpTable() and use regclass
Previous Message Kevin Grittner 2016-05-24 21:51:50 Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <

Browse pgsql-hackers by date

  From Date Subject
Next Message David G. Johnston 2016-05-24 22:00:28 Re: Speaking of breaking compatibility...standard_conforming_strings
Previous Message Kevin Grittner 2016-05-24 21:51:50 Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <