Re: pgsql: Compute XID horizon for page level index vacuum on primary.

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-committers <pgsql-committers(at)lists(dot)postgresql(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgsql: Compute XID horizon for page level index vacuum on primary.
Date: 2019-03-29 09:37:11
Message-ID: CANP8+jKUF-DE9um44zEfxOY277byx3b4NMFrY+gwj=+=Yc=0wQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

On Wed, 27 Mar 2019 at 00:06, Andres Freund <andres(at)anarazel(dot)de> wrote:

> Compute XID horizon for page level index vacuum on primary.
>
> Previously the xid horizon was only computed during WAL replay.

This commit message was quite confusing. It took me a while to realize this
relates to btree index deletes and that what you mean is that we are
calculcating the latestRemovedXid for index entries. That is related to but
not same thing as the horizon itself. So now I understand the "was computed
only during WAL replay" since it seemed obvious that the xmin horizon was
calculcated regularly on the master, but as you say the latestRemovedXid
was not.

Now I understand, I'm happy that you've moved this from redo into mainline.
And you've optimized it, which is also important (since performance was the
original objection and why it was placed in redo). I can see you've removed
duplicate code in hash indexes as well, which is good.

The term "xid horizon" was only used once in the code in PG11. That usage
appears to be a typo, since in many other places the term "xmin horizon" is
used to mean the point at which we can finally mark tuples as dead. Now we
have some new, undocumented APIs that use the term "xid horizon" yet still
code that refers to "xmin horizon", with neither term being defined. I'm
hoping you'll do some later cleanup of that to avoid confusion.

While trying to understand this, I see there is an even better way to
optimize this. Since we are removing dead index tuples, we could alter the
killed index tuple interface so that it returns the xmax of the tuple being
marked as killed, rather than just a boolean to say it is dead. Indexes can
then mark the killed tuples with the xmax that killed them rather than just
a hint bit. This is possible since the index tuples are dead and cannot be
used to follow the htid to the heap, so the htid is redundant and so the
block number of the tid could be overwritten with the xmax, zeroing the
itemid. Each killed item we mark with its xmax means one less heap fetch we
need to perform when we delete the page - it's possible we optimize that
away completely by doing this.

Since this point of the code is clearly going to be a performance issue it
seems like something we should do now.

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Peter Eisentraut 2019-03-29 10:19:52 pgsql: Fix incorrect code in new REINDEX CONCURRENTLY code
Previous Message Michael Paquier 2019-03-29 08:09:27 Re: pgsql: REINDEX CONCURRENTLY

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2019-03-29 09:50:31 Re: propagating replica identity to partitions
Previous Message Peter Eisentraut 2019-03-29 09:34:49 Re: patch to allow disable of WAL recycling