Re: snapshot too old issues, first around wraparound and then more.

From: Andres Freund <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Kevin Grittner <kgrittn(at)gmail(dot)com>
Subject: Re: snapshot too old issues, first around wraparound and then more.
Date: 2021-06-16 00:20:46
Message-ID: 20210616002046.hllne3z7t6leh2h6@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2021-06-15 15:17:05 -0400, Robert Haas wrote:
> But that's not clear to me. I'm not clear how exactly how many
> problems we know about and need to fix in order to keep the feature,
> and I'm also not clear how deep the hole goes. Like, if we need to get
> a certain number of specific bugs fixed, I might be willing to do
> that. If we need to commit to a major rewrite of the current
> implementation, that's more than I can do. But I guess I don't
> understand exactly how bad the current problems are. Reviewing
> complaints from Andres from this thread:

One important complaints I think your (useful!) list missed is that there's
missing *read side* checks that demonstrably lead to wrong query results:
https://www.postgresql.org/message-id/CAH2-Wz%3DFQ9rbBKkt1nXvz27kmd4A8i1%2B7dcLTNqpCYibxX83VQ%40mail.gmail.com
and that it's currently very hard to figure out where they need to be, because
there's no real explained model of what needs to be checked and what not.

> > * I am fairly sure that it can cause crashes (or even data corruption),
> > because it assumes that DML never needs to check for old snapshots
> > (with no meaningful justificiation). Leading to heap_update/delete to
> > assume the page header is a tuple.
>
> I don't understand the issue here, really. I assume there might be a
> wrong word here, because assuming that the page header is a tuple
> doesn't sound like a thing that would actually happen.

I suspect what I was thinking of is that a tuple could get pruned away due to
s_t_o, which would leave a LP_DEAD item around. As heap_update/delete neither
checks s_t_o, nor balks at targetting LP_DEAD items, we'd use the offset from
a the LP_DEAD item. ItemIdSetDead() sets lp_off to 0 - which would mean that
the page header is interpreted as a tuple. Right?

> I think one of the key problems for this feature is figuring out
> whether you've got snapshot-too-old checks in all the right places. I
> think what is being alleged here is that heap_update() and
> heap_delete() need them, and that it's not good enough to rely on the
> scan that found the tuple to be updated or deleted having already
> performed those checks. It is not clear to me whether that is true, or
> how it could cause crashes. Andres may have explained this to me at
> some point, but if he did I have unfortunately forgotten.

I don't think it is sufficient to rely on the scan. That works only as long as
the page with the to-be-modified tuple is pinned (since that'd prevent pruning
/ vacuuming from working on the page), but I am fairly sure that there are
plans where the target tuple is not pinned from the point it was scanned until
it is modified. In which case it is entirely possible that the u/d target can
be pruned away due to s_t_o between the scan checking s_t_o and the u/d
executing.

> My general point here is that I would like to know whether we have a
> finite number of reasonably localized bugs or a three-ring disaster
> that is unrecoverable no matter what we do. Andres seems to think it
> is the latter

Correct. I think there's numerous architectural issues the way the feature is
implemented right now, and that it'd be a substantial project to address them.

> For the record, and to Peter's point, I think it's reasonable to set
> v15 feature freeze as a drop-dead date for getting this feature into
> acceptable shape, but I would like to try to nail down what we think
> "acceptable" means in this context.

I think the absolute minimum would be to have
- actually working tests
- a halfway thorough code review of the feature
- added documentation explaining where exactly s_t_o tests need to be
- bugfixes obviously

If I were to work on the feature, I cannot imagine being sufficient confident
the feature works as long as the xid->time mapping granularity is a
minute. It's just not possible to write reasonable tests with the granularity
being that high. Or even to do manual tests of it - I'm not that patient. But
I "can accept" if somebody actually doing the work differs on this.

Greetings,

Andres Freund

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2021-06-16 00:39:57 Re: Different compression methods for FPI
Previous Message Peter Geoghegan 2021-06-16 00:11:39 Re: disfavoring unparameterized nested loops