Re: decoupling table and index vacuum

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: decoupling table and index vacuum
Date: 2021-04-22 11:50:48
Message-ID: CAFiTN-u-xtwsVP5BODa9pxDfmDBx2e_Dep0D81XRbbJt8p7bUQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Apr 21, 2021 at 8:51 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> Now, the reason for this is that when we discover dead TIDs, we only
> record them in memory, not on disk. So, as soon as VACUUM ends, we
> lose all knowledge of whether those TIDs were and must rediscover
> them. Suppose we didn't do this, and instead had a "dead TID" fork for
> each table.

Interesting idea.

However, you only need
> to force it for indexes that haven't been vacuumed recently enough for
> some other reason, rather than every index. If you have a target of
> reclaiming 30,000 TIDs, you can just pick the indexes where there are
> fewer than 30,000 dead TIDs behind their oldest-entry pointers and
> force vacuuming only of those.

How do we decide this target, I mean at a given point how do we decide
that what is the limit of dead TID's at which we want to trigger the
index vacuuming?

> One rather serious objection to this whole line of attack is that we'd
> ideally like VACUUM to reclaim disk space without using any more, in
> case the motivation for running VACUUM in the first place. A related
> objection is that if it's sometimes agreable to do everything all at
> once as we currently do, the I/O overhead could be avoided. I think
> we'd probably have to retain a code path that buffers the dead TIDs in
> memory to account, at least, for the low-on-disk-space case, and maybe
> that can also be used to avoid I/O in some other cases, too. I haven't
> thought through all the details here. It seems to me that the actual
> I/O avoidance is probably not all that much - each dead TID is much
> smaller than the deleted tuple that gave rise to it, and typically you
> don't delete all the tuples at once - but it might be material in some
> cases, and it's definitely material if you don't have enough disk
> space left for it to complete without error.

Is it a good idea to always perform an I/O after collecting the dead
TID's or there should be an option where the user can configure so
that it aggressively vacuum all the indexes and this I/O overhead can
be avoided completely.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2021-04-22 12:00:40 Reduce lock level for ALTER TABLE ... ADD CHECK .. NOT VALID
Previous Message Simon Riggs 2021-04-22 11:49:49 Docs for lock level of ALTER TABLE .. VALIDATE