Re: should there be a hard-limit on the number of transactions pending undo?

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: should there be a hard-limit on the number of transactions pending undo?
Date: 2019-07-29 20:03:49
Message-ID: CA+TgmoaETDxWKo+_7SyzFLsu4qwpfnMoTZF6=6+iGiQ=haFphA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jul 29, 2019 at 3:39 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> I'm not saying you can't handle it. But that necessitates "write
> amplification", in the sense that you must now create new index tuples
> even for indexes where the indexed columns were not logically altered.
> Isn't zheap supposed to fix that problem, at least at in version 2 or
> version 3? I also think that stable heap TIDs make index-only scans a
> lot easier and more effective.

I think there's a cost-benefit analysis here. You're completely
correct that inserting new index tuples causes write amplification
and, yeah, that's bad. On the other hand, row forwarding has its own
costs. If a row ends up persistently moved to someplace else, then
every subsequent access to that row has an extra level of indirection.
If it ends up split between two places, every read of that row incurs
two reads. The "someplace else" where moved rows or ends of split rows
are stored has to be skipped by sequential scans, which is complex and
possibly inefficient if it breaks up a sequential I/O pattern. Those
things are bad, too.

It's a little difficult to compare the kinds of badness. My thought
is that in the short run, the redirect strategy probably wins, because
there could be and likely are a bunch of indexes and it's cheaper to
just insert one redirect. But in the long term, the redirect thing
seems like a loser, because you have to keep following it. That
(perhaps naive) analysis is why zheap doesn't try to maintain TID
stability. Instead it wants to do in-place updates (no new TID) as
often as possible, but the fallback strategy is simply to do a
non-in-place update (new TID) rather than a redirect.

> I think that indexes (or at least B-Tree indexes) will ideally almost
> always have tuples that are the latest versions with zheap. The
> exception is tuples whose ghost bit is set, whose visibility varies
> based on the MVCC snapshot in use. But the instant that the
> deleting/updating xact commits it becomes legal to recycle the old
> heap TID. We don't need to go back to the index to permanently zap the
> tuple whose ghost bit we already set, because there is an undo pointer
> in the same leaf page, so nobody is in danger of getting confused and
> following the now-recycled heap TID.

I haven't run across the "ghost bit" terminology before. Is there a
good place to read about the technique you're assuming here? A major
question is how you handle inserted rows, that are new now and thus
not yet visible to everyone, but which will later become all-visible.
One idea is: if the undo pointer is new enough that a write
transaction which modified the page could still be in-flight, check
the undo log to ascertain visibility of index tuples. If not, then
any potentially-deleted index tuples are in fact deleted, and any
others are all-visible. With this design, you don't set the ghost bit
on new tuples, but are still able to stop following the undo pointers
for them after a while.

To put that another way, there seems to be pretty clearly a need for a
bit, but what does the bit mean? It could mean "please check the undo
log," in which case it'd have to be set on insert, eventually cleared,
and then reset on delete, but I think that's likely to suck. I think
therefore that the bit should mean
is-deleted-but-not-necessarily-all-visible-yet, which avoids that
problem.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2019-07-29 20:09:52 Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
Previous Message Sehrope Sarkuni 2019-07-29 20:03:46 Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)