Re: vacuum on table1 skips rows because of a query on table2

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Virender Singla <virender(dot)cse(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: vacuum on table1 skips rows because of a query on table2
Date: 2019-10-28 21:54:56
Message-ID: 27211.1572299696@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Sat, Oct 26, 2019 at 1:44 PM Virender Singla <virender(dot)cse(at)gmail(dot)com> wrote:
>> If long-running transaction is "read committed", then we are sure that any new query coming
>> (even on same table1 as vacuum table) will need snapshot on point of time query start and not the time transaction
>> starts (but still why read committed transaction on table2 cause vacuum on table1 to skip rows).

> I wish that this argument were completely correct, but it isn't,
> [ for lots of reasons ]

On top of the problems Robert enumerated, there's another fairly serious
one, which is that "global xmin" is not just the minimal XID that's
running. Rather, it's the minimum XID that was running when any active
snapshot was taken. Thus, even if you could prove that some long-running
transaction isn't going to touch the table you wish to vacuum, that
fact in itself won't move your estimate of the relevant xmin very much:
that transaction's own XID is holding back the xmins of every other
transaction --- and not only the ones open now, but ones that will
start in future, which you certainly can't predict anything about.

Thus, to decide whether tuples newer than the long-running transaction's
XID are safe to remove, you'd have to figure out what the other
transactions' snapshots would look like if that transaction weren't there
... and you don't have that information. The model we use of exposing
only "xmin", and not any more-detailed info about the contents of other
transactions' snapshots, really isn't adequate to allow this sort of
analysis. You could imagine exposing more info, but that carries more
costs --- costs that would be paid whether or not VACUUM ever gets any
benefit from it.

> Even given all that, I'm moderately confident that something like this
> would benefit a lot of people. However, it would probably hurt some
> people too, either because the overhead of figuring out that the
> current query won't lock any more relations, so that we can advertise
> that fact in shared memory, or because of the increased overhead of
> figuring out the xmin horizon for a table to be vacuumed.

Yeah, the whole thing is a delicate tradeoff between the cost of
tracking/advertising transaction state and the value of being able
to remove tuples sooner. Maybe we can move that tradeoff, but it'd
require a whole lot of pretty fundamental rework.

> On the other hand, for users with giant tables, even more
> aggressive approaches might pay off - e.g. recompute the xmin horizon
> every 1GB or so, because it might have advanced, and the effort to
> recheck that might pay off by allowing us to vacuum more stuff sooner.

Hmm, that's an interesting idea. It wouldn't take a lot of work
to try it.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-10-28 22:00:21 Re: vacuum on table1 skips rows because of a query on table2
Previous Message Laurenz Albe 2019-10-28 21:41:57 Re: vacuum on table1 skips rows because of a query on table2