Re: Behaviour when autovacuum is canceled

From: Martín Fernández <fmartin91(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, "pgsql-generallists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: Behaviour when autovacuum is canceled
Date: 2018-09-14 01:40:19
Message-ID: 5b9b0f243f8b8f4e0c000004@polymail.io
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Tom,

Thanks for the detailed explanation. I can start mapping your explanation with the source code I've been reading :)

We are in the process of tuning our autovacuum settings (on some tables) and stop relying on crontabs that are performing manual vacuums. 

By performing this changes we are going to start relying more heavily on the autovacuum work and the concern of "lost work" caused by autovacuum canceling itself when locking contention happen showed up. I'm guessing that we might be over thinking this and the canceling is not going to happen as frequently as we think it will.

Martín

On Thu, Sep 13th, 2018 at 9:21 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

>
>
>
> =?UTF-8?q?Mart=C3=ADn_Fern=C3=A1ndez?= < fmartin91(at)gmail(dot)com > writes:
> > We basically started a VACUUM on a given table, waited for one index to
> process (captured cleaned rows count) and cancel the VACUUM. When we run
> another VACUUM on the same table the dead rows removed from the first
> index was a number slightly higher than the value logged on the first
> VACUUM. This behaviour made us feel that the work done to clean dead
> tuples on the first index was performed again. 
>
> The unit of work that doesn't have to be repeated if VACUUM is canceled
> is:
>
> 1. Scan a bunch of heap pages to identify dead tuples;
> 2. Scan *all* the table's indexes to remove the corresponding index
> entries;
> 3. Rescan those heap pages to actually remove the tuples.
>
> It sounds like you canceled partway through phase 2.
>
> The actual size of this unit of work is the number of dead-tuple TIDs
> that will fit in maintenance_work_mem (at six or eight bytes apiece,
> I forget whether it's aligned...). Normally, people make
> maintenance_work_mem big so that they can reduce the number of index
> scan cycles needed to complete vacuuming a table. But if you're
> concerned about reducing the amount of work lost to a cancel,
> you might try *reducing* maintenance_work_mem. This will make
> vacuum slower overall (more index scans), but you have a better
> chance that it will manage to actually remove some tuples before
> getting canceled.
>
> Or you could look at fixing the access patterns that are causing
> so many autovacuum cancels.
>
> regards, tom lane
>
>
>
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Chris Travers 2018-09-14 08:31:25 Re: Code of Conduct plan
Previous Message Tom Lane 2018-09-14 00:21:18 Re: Behaviour when autovacuum is canceled