Re: pgsql: Compute XID horizon for page level index vacuum on primary.

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-committers <pgsql-committers(at)lists(dot)postgresql(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgsql: Compute XID horizon for page level index vacuum on primary.
Date: 2019-04-01 06:59:19
Message-ID: CANP8+jLEy2vMFcSG8P=vFT4F5Loiw4OkD-uZ65cwmDa_C+fK7w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

On Fri, 29 Mar 2019 at 16:32, Andres Freund <andres(at)anarazel(dot)de> wrote:

> On 2019-03-29 16:20:54 +0000, Simon Riggs wrote:
> > On Fri, 29 Mar 2019 at 16:12, Andres Freund <andres(at)anarazel(dot)de> wrote:
> >
> >
> > > On 2019-03-29 15:58:14 +0000, Simon Riggs wrote:
> > > > On Fri, 29 Mar 2019 at 15:29, Andres Freund <andres(at)anarazel(dot)de>
> wrote:
> > > > > That's far from a trivial feature imo. It seems quite possible that
> > > we'd
> > > > > end up with increased overhead, because the current logic can get
> away
> > > > > with only doing hint bit style writes - but would that be true if
> we
> > > > > started actually replacing the item pointers? Because I don't see
> any
> > > > > guarantee they couldn't cross a page boundary etc? So I think we'd
> need
> > > > > to do WAL logging during index searches, which seems prohibitively
> > > > > expensive.
> > > > >
> > > >
> > > > Don't see that.
> > > >
> > > > I was talking about reusing the first 4 bytes of an index tuple's
> > > > ItemPointerData,
> > > > which is the first field of an index tuple. Index tuples are
> MAXALIGNed,
> > > so
> > > > I can't see how that would ever cross a page boundary.
> > >
> > > They're 8 bytes, and MAXALIGN often is 4 bytes:
> > >
> >
> > xids are 4 bytes, so we're good.
>
> I literally went on to explain why that's not sufficient? You can't
> *just* replace the block number with an xid. You *also* need to set a
> flag denoting that it's an xid and dead now. Which can't fit in the same
> 4 bytes. You either have to set it in the IndexTuple's t_info, or or in
> the ItemIdData's lp_flags. And both can be on a different sectors. If
> the flag persists, and the xid doesn't you're going to interpret a block
> number as an xid - not great; but even worse, if the xid survives, but
> the flag doesn't, you're going to access the xid as a block - definitely
> not ok, because you're going to return wrong results.
>

Yes, I agree, I was thinking the same thing after my last comment, but was
afk. The idea requires the atomic update of at least 4 bytes plus at least
1 bit and so would require at least 8byte MAXALIGN to be useful. Your other
points suggesting that multiple things all need to be set accurately for
this to work aren't correct. The idea was that we would write a hint that
would avoid later I/O, so the overall idea is still viable.

Anyway, thinking some more, I think the whole idea of generating
lastRemovedXid is moot and there are better ways in the future of doing
this that would avoid a performance issue altogether. Clearly not PG12.

The main issue relates to the potential overhead of moving this to the
master. I agree its the right thing to do, but we should have some way of
checking it is not a performance issue in practice.

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-committers by date

  From Date Subject
Next Message Peter Eisentraut 2019-04-01 08:49:27 pgsql: Catch syntax error in generated column definition
Previous Message Michael Paquier 2019-04-01 05:22:04 pgsql: Fix thinko in allocation call during MVC list deserialization

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2019-04-01 07:06:10 Re: Feature: triggers on materialized views
Previous Message Michael Paquier 2019-04-01 06:57:52 Re: Progress reporting for pg_verify_checksums