Re: RFC: Making TRUNCATE more "MVCC-safe"

From: Noah Misch <noah(at)leadboat(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Marti Raudsepp <marti(at)juffo(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: RFC: Making TRUNCATE more "MVCC-safe"
Date: 2012-03-06 10:43:57
Message-ID: 20120306104357.GB15988@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 05, 2012 at 03:46:16PM -0500, Robert Haas wrote:
> On Mon, Mar 5, 2012 at 2:22 PM, Noah Misch <noah(at)leadboat(dot)com> wrote:
> > I can see this strategy applying to many relation-pertinent system catalogs.
> > Do you foresee applications to non-relation catalogs?
>
> Well, in theory, we have similar issues if, say, a query uses a
> function that didn't exist at the time the snapshot as taken; the
> actual results the user sees may not be consistent with any serial
> execution schedule. And the same could be true for any other SQL
> object. It's unclear that those cases are as compelling as this one,
> but then again it's unclear that no one will ever want to fix them,
> either. For example, suppose we have a view v over a table t that
> calls a function f. Somebody alters f to give different results and,
> in the same transaction, modifies the contents of t (but no DDL).
> This doesn't strike me as a terribly unlikely scenario; the change to
> t could well be envisioned as a compensating transaction. But now if
> somebody uses the new definition of f against the old contents of t,
> the user may fail to get what they were hoping for out of bundling
> those changes together in one transaction.

Good example.

> Now, maybe we're never going to fix those kinds of anomalies anyway,
> but if we go with this architecture, then I think the chances of it
> ever being palatable to try are pretty low.

Why?

> >> But it's not quite the
> >> same as the xmin of the row itself, because some updates might be
> >> judged not to matter. ?There could also be intermediate cases where
> >> updates are invalidating for some purposes but not others. ?I think
> >> we'd better get our hands around more of the problem space before we
> >> start trying to engineer solutions.
> >
> > I'm not seeing that problem. ?Any operation that would update some xmin
> > horizon should set it to the greater of its current value and the value the
> > operation needs for its own correctness. ?If you have something in mind that
> > needs more, could you elaborate?

Simon's point about xmin vs. xid probably leads to an example. One value is
fine for TRUNCATE, because only the most recent TRUNCATE matters. Not all DDL
is so simple.

> Well, consider something like CLUSTER. It's perfectly OK for CLUSTER
> to operate on a table that has been truncated since CLUSTER's snapshot
> was taken, and no serialization anomaly is created that would not have
> already existed as a result of the non-MVCC-safe TRUNCATE. On the
> other hand, if CLUSTER operates on a table that was created since
> CLUSTER's snapshot was taken, then you have a bona fide serialization
> anomaly.

Core CLUSTER does not use any MVCC snapshot. We do push one for the benefit
of functions called during the reindex phase, but it does not appear that you
speak of that snapshot. Could you elaborate this example?

> Maybe not a very important one, but does that prove that
> there's no significant problem of this type in general, or just
> nobody's thought through all the cases yet? After all, the issues
> with CREATE TABLE/TRUNCATE vs. a concurrent SELECT have been around
> for a very long time, and we're only just getting around to looking at
> them, so I don't have much confidence that there aren't other cases
> floating around out there.

Granted.

Thanks,
nm

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2012-03-06 11:06:58 Re: ECPG FETCH readahead
Previous Message Albe Laurenz 2012-03-06 10:09:08 Re: pgsql_fdw, FDW for PostgreSQL server