Re: New vacuum option to do only freezing

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: "Bossart, Nathan" <bossartn(at)amazon(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: New vacuum option to do only freezing
Date: 2018-11-05 08:03:02
Message-ID: CAD21AoBjqJMZzLh8Ls9=xmxPvWNHEv1ccg=5QTryqo9zpn_Ehw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Nov 2, 2018 at 1:32 AM Bossart, Nathan <bossartn(at)amazon(dot)com> wrote:
>
> Hi,
>
> On 10/1/18, 5:23 AM, "Masahiko Sawada" <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > Attached patch adds a new option FREEZE_ONLY to VACUUM command. This
> > option is same as FREEZE option except for it disables reclaiming dead
> > tuples. That is, with this option vacuum does pruning HOT chain,
> > freezing live tuples and maintaining both visibility map and freespace
> > map but does not collect dead tuples and invoke neither heap vacuum
> > nor index vacuum. This option will be useful if user wants to prevent
> > XID wraparound a table as quick as possible, especially when table is
> > quite large and is about to XID wraparound. I think this usecase was
> > mentioned in some threads but I couldn't find them.
>

Thank you for the comment!

> I've thought about this a bit myself. One of the reasons VACUUM can
> take so long is because of all the index scans needed. If you're in a
> potential XID wraparound situation and just need a quick way out, it
> would be nice to have a way to do the minimum amount of work necessary
> to reclaim transaction IDs. At a high level, I think there are some
> improvements to this design we should consider.
>
> 1. Create a separate FREEZE command instead of adding a new VACUUM
> option
>
> The first line of the VACUUM documentation reads, "VACUUM reclaims
> storage occupied by dead tuples," which is something that we would
> explicitly not be doing with FREEZE_ONLY.

No. Actually FREEZE_ONLY option (maybe will be changed its name) could
reclaim dead tuples by HOT-purning. If a page have HOT-updated chains
the FREEZE_ONLY prunes them and reclaim disk space occupied.

> I think it makes sense to
> reuse many of the VACUUM code paths to implement this feature, but
> from a user perspective, it should be separate.

I'm concernced that since the existing users already have recognized
that vacuuming and freezing are closely related they would get
confused more if we have a similar purpose feature with different
name.

>
> 2. We should reclaim transaction IDs from dead tuples as well
>
> Unless we also have a way to freeze XMAX like we do XMIN, I doubt this
> feature will be useful for the imminent-XID-wraparound use-case. In
> short, we won't be able to advance relfrozenxid and relminmxid beyond
> the oldest XMAX value for the relation.
> IIUC the idea of freezing> XMAX doesn't really exist yet. Either the XMAX is aborted/invalid and
> can be reset to InvalidTransactionId, or it is committed and the tuple
> can be removed if it beyond the freezing threshold. So, we probably
> also want to look into adding a way to freeze XMAX, either by setting
> it to FrozenTransactionId or by setting the hint bits to
> (HEAP_XMAX_COMMITTED | HEAP_XMIN_INVALID) as is done for XMIN.

That's a good point. If the oldest xmax is close to the old
relfrozenxid we will not be able to advance relfrozenxid enough.
However, since dead tuples are vacuumed by autovacuum periodically I
think that we can advance relfrozenxid enough in common case. There is
possible that we eventually need to do vacuum with removing dead
tuples after done FREEZE_ONLY but it would be a rare case. Thought?

>
> Looking closer, I see that the phrase "freezing XMAX" is currently
> used to refer to setting it to InvalidTransactionId if it is aborted
> or invalid (e.g. lock-only).
>
> > Currently this patch just adds the new option to VACUUM command but it
> > might be good to make autovacuum use it when emergency vacuum is
> > required.
>
> This also seems like a valid use-case, but it should definitely be
> done as a separate effort after this feature has been committed.

Agreed.

>
> > This is a performance-test result for FREEZE option and FREEZE_ONLY
> > option. I've tested them on the table which is about 3.8GB table
> > without indexes and randomly modified.
> >
> > * FREEZE
> > ...
> > Time: 50301.262 ms (00:50.301)
> >
> > * FREEZE_ONLY
> > ...
> > Time: 44589.794 ms (00:44.590)
>
> I'd be curious to see the improvements you get when there are several
> indexes on the relation. The ability to skip the index scans is
> likely how this feature will really help speed things up.
>

I've tested performance of FREEZE option and FREEZE_ONLY option using
a 3GB table having 3 indexes. Before do vacuum I modified 1 % of data
on the table.

* FREEZE
Time: 78677.211 ms (01:18.677)
Time: 86958.452 ms (01:26.958)
Time: 78351.190 ms (01:18.351)

* FREEZE_ONLY
Time: 19913.863 ms (00:19.914)
Time: 18917.379 ms (00:18.917)
Time: 20048.541 ms (00:20.049)

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2018-11-05 08:08:24 Re: New vacuum option to do only freezing
Previous Message Michael Paquier 2018-11-05 07:42:23 Re: pg_dumpall --exclude-database option