Re: Logical Decoding and HeapTupleSatisfiesVacuum assumptions

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Nikhil Sontakke <nikhils(at)2ndquadrant(dot)com>
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Logical Decoding and HeapTupleSatisfiesVacuum assumptions
Date: 2018-01-19 17:54:03
Message-ID: CA+TgmoZP0SxEfKW1Pn=ackUj+KdWCxs7PumMAhSYJeZ+_61_GQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 26, 2017 at 9:21 AM, Nikhil Sontakke
<nikhils(at)2ndquadrant(dot)com> wrote:
> The main issue here is that HeapTupleSatisfiesVacuum *assumes* that
> rows belonging to an aborted transaction are not visible to anyone
> else.

One problem here is that if a transaction aborts, it might have done
so after inserting or update a tuple in the heap and before inserting
new index entries for the tuple, or after inserting only some of the
necessary new index entries. Therefore, even if you prevent pruning,
a snapshot from the point of view of the aborted transaction may be
inconsistent. Similarly, if it aborts during a DDL operation, it may
have made some but not all of the catalog changes involved, so that
for example pg_class and pg_attribute could be inconsistent with each
other or various pg_attribute rows could even be inconsistent among
themselves. If you have a view of the catalog where these problems
exist, you can't rely on, for example, being able to build a relcache
entry without error. It is possible that you can avoid these problems
if your snapshot is always using a command ID value that was reached
prior to the error, although I'm not 100% sure that idea has no holes.

Another problem is that CTID chains may be broken. Suppose that a
transaction T1, using CID 1, does a HOT update of tuple A1 producing a
new version A2. Then, later on, when the CID counter is at least 2, it
aborts. A snapshot taken from the point of view of T1 at CID 1 should
see A2. That will work fine most of the time. However, if
transaction T2 comes along after T1 aborts and before logical decoding
gets there and does its own HOT update of tuple A1 producing a new
version A3, then tuple A2 is inaccessible through the indexes even if
it still exists in the heap page. I think this problem is basically
unsolvable and likely means that this whole approach needs to be
abandoned.

One other issue to consider is that the tuple freezing code assumes
that any tuple that does not get removed when a page is pruned is OK
to freeze. Commit 9c2f0a6c3cc8bb85b78191579760dbe9fb7814ec was
necessary to repair a case where that assumption was violated. You
might want to consider carefully whether there's any chance that this
patch could introduce a similar problem.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Claudio Freire 2018-01-19 18:35:52 Re: Built-in connection pooling
Previous Message Robert Haas 2018-01-19 17:32:51 Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)