Re: pgsql: Compute XID horizon for page level index vacuum on primary.

From: Andres Freund <andres(at)anarazel(dot)de>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: pgsql-committers <pgsql-committers(at)lists(dot)postgresql(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgsql: Compute XID horizon for page level index vacuum on primary.
Date: 2019-03-29 15:29:06
Message-ID: 20190329152906.356icmttz7yjgsyh@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

Hi,

On 2019-03-29 09:37:11 +0000, Simon Riggs wrote:
> This commit message was quite confusing. It took me a while to realize this
> relates to btree index deletes and that what you mean is that we are
> calculcating the latestRemovedXid for index entries. That is related to but
> not same thing as the horizon itself.

Well, it's the page level horizon...

> While trying to understand this, I see there is an even better way to
> optimize this. Since we are removing dead index tuples, we could alter the
> killed index tuple interface so that it returns the xmax of the tuple being
> marked as killed, rather than just a boolean to say it is dead.

Wouldn't that quite possibly result in additional and unnecessary
conflicts? Right now the page level horizon is computed whenever the
page is actually reused, rather than when an item is marked as
deleted. As it stands right now, the computed horizons are commonly very
"old", because of that delay, leading to lower rates of conflicts.

> Indexes can then mark the killed tuples with the xmax that killed them
> rather than just a hint bit. This is possible since the index tuples
> are dead and cannot be used to follow the htid to the heap, so the
> htid is redundant and so the block number of the tid could be
> overwritten with the xmax, zeroing the itemid. Each killed item we
> mark with its xmax means one less heap fetch we need to perform when
> we delete the page - it's possible we optimize that away completely by
> doing this.

That's far from a trivial feature imo. It seems quite possible that we'd
end up with increased overhead, because the current logic can get away
with only doing hint bit style writes - but would that be true if we
started actually replacing the item pointers? Because I don't see any
guarantee they couldn't cross a page boundary etc? So I think we'd need
to do WAL logging during index searches, which seems prohibitively
expensive.

And I'm also doubtful it's worth it because:

> Since this point of the code is clearly going to be a performance issue it
> seems like something we should do now.

I've tried quite a bit to find a workload where this matters, but after
avoiding redundant buffer accesses by sorting, and prefetching I was
unable to do so. What workload do you see where this would be really be
bad? Without the performance optimization I'd found a very minor
regression by trying to force the heap visits to happen in a pretty
random order, but after sorting that went away. I'm sure it's possible
to find a case on overloaded rotational disks where you'd find a small
regression, but I don't think it'd be particularly bad.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Andres Freund 2019-03-29 15:38:07 pgsql: tableam: Comment fixes.
Previous Message Alvaro Herrera 2019-03-29 14:22:55 Re: pgsql: Improve autovacuum logging for aggressive and anti-wraparound ru

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-03-29 15:29:45 Re: speeding up planning with partitions
Previous Message Christoph Berg 2019-03-29 15:25:57 Re: PostgreSQL pollutes the file system