Re: xid wraparound danger due to INDEX_CLEANUP false

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: xid wraparound danger due to INDEX_CLEANUP false
Date: 2020-11-20 23:03:21
Message-ID: CAH2-WznVuvO-kGqqR-1xghasp=+b6UeVbwdD8m6ELPn-9sz3EA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Nov 20, 2020 at 2:17 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> > Does that make sense?
>
> I *think* so. For me the point is that the index never has a right to
> insist on being vacuumed, but it can offer an opinion on how helpful
> it would be.

Right, that might be the single most important point. It's a somewhat
more bottom-up direction for VACUUM, that is still fundamentally
top-down. Because that's still necessary.

Opportunistic heap pruning is usually very effective, so today we
realistically have these 4 byte line pointers accumulating in heap
pages. The corresponding "bloatum" in index pages is an index tuple +
line pointer (at least 16 bytes + 4 bytes). Meaning that we accumulate
that *at least* 20 bytes for each 4 bytes in the table. And, indexes
care about *where* items go, making the problem even worse. So in the
absence of index tuple LP_DEAD setting/deletion (or bottom-up index
deletion in Postgres 14), the problem in indexes is probably at least
5x worse.

The precise extent to which this is true will vary. It's a mistake to
try to reason about it at a high level, because there is just too much
variation for that approach to work. We should just give index access
methods *some* say. Sometimes this allows index vacuuming to be very
lazy, other times it allows index vacuuming to be very eager. Often
this variation exists among indexes on the same table.

Of course, vacuumlazy.c is still responsible for not letting the
accumulation of LP_DEAD heap line pointers get out of hand (without
allowing index TIDs to point to the wrong thing due to dangerous TID
recycling issues/bugs). The accumulation of LP_DEAD heap line pointers
will often take a very long time to get out of hand. But when it does
finally get out of hand, index access methods don't get to veto being
vacuumed. Because this isn't actually about their needs anymore.

Actually, the index access methods never truly veto anything. They
merely give some abstract signal about how urgent it is to them (like
the 0.0 - 1.0 thing). This difference actually matters. One index
among many on a table saying "meh, I guess I could benefit from some
index vacuuming if it's no real trouble to you vacuumlazy.c" rather
than saying "it's absolutely unnecessary, don't waste CPU cycles
vacuumlazy.c" may actually shift how vacuumlazy.c processes the heap
(at least occasionally). Maybe the high level VACUUM operation decides
that it is worth taking care of everything all at once -- if all the
indexes together either say "meh" or "now would be a good time", and
vacuumlazy.c then notices that the accumulation of LP_DEAD line
pointers is *also* becoming a problem (it's also a "meh" situation),
then it can be *more* ambitious. It can do a traditional VACUUM early.
Which might still make sense.

This also means that vacuumlazy.c would ideally think about this as an
optimization problem. It may be lazy or eager for the whole table,
just as it may be lazy or eager for individual indexes. (Though the
eagerness/laziness dynamic is probably much more noticeable with
indexes in practice.)

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andreas Karlsson 2020-11-20 23:03:32 Re: Different results between PostgreSQL and Oracle for "for update" statement
Previous Message Tom Lane 2020-11-20 22:26:46 Re: Strange behavior with polygon and NaN