Re: Too many autovacuum workers spawned during forced auto-vacuum

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Too many autovacuum workers spawned during forced auto-vacuum
Date: 2017-01-16 10:24:22
Message-ID: CAD21AoA6r_U0cXpmkFwFZmxkde+t06oT4c7DN=S1bVaeGq2zrg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jan 16, 2017 at 1:50 PM, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> wrote:
> On 13 January 2017 at 19:15, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote:
>> I think this is the same problem as reported in
>> https://www.postgresql.org/message-id/CAMkU=1yE4YyCC00W_GcNoOZ4X2qxF7x5DUAR_kMt-Ta=YPyFPQ@mail.gmail.com
>
> Ah yes, this is the same problem. Not sure why I didn't land on that
> thread when I tried to search pghackers using relevant keywords.
>>
>>> === Fix ===
>> [...]
>>> Instead, the attached patch (prevent_useless_vacuums.patch) prevents
>>> the repeated cycle by noting that there's no point in doing whatever
>>> vac_update_datfrozenxid() does, if we didn't find anything to vacuum
>>> and there's already another worker vacuuming the same database. Note
>>> that it uses wi_tableoid field to check concurrency. It does not use
>>> wi_dboid field to check for already-processing worker, because using
>>> this field might cause each of the workers to think that there is some
>>> other worker vacuuming, and eventually no one vacuums. We have to be
>>> certain that the other worker has already taken a table to vacuum.
>>
>> Hmm, it seems reasonable to skip the end action if we didn't do any
>> cleanup after all. This would normally give enough time between vacuum
>> attempts for the first worker to make further progress and avoid causing
>> a storm. I'm not really sure that it fixes the problem completely, but
>> perhaps it's enough.
>
> I had thought about this : if we didn't clean up anything, skip the
> end action unconditionally without checking if there was any
> concurrent worker. But then thought it is better to skip only if we
> know there is another worker doing the same job, because :
> a) there might be some reason we are just calling
> vac_update_datfrozenxid() without any condition. But I am not sure
> whether it was intentionally kept like that. Didn't get any leads from
> the history.
> b) it's no harm in updating datfrozenxid() it if there was no other
> worker. In this case, we *know* that there was indeed nothing to be
> cleaned up. So the next time this database won't be chosen again, so
> there's no harm just calling this function.
>

Since autovacuum worker wakes up autovacuum launcher after launched
the autovacuum launcher could try to spawn worker process at high
frequently if you have database with very large table in it that has
just passed autovacuum_freeze_max_age.

autovacuum.c:L1605
/* wake up the launcher */
if (AutoVacuumShmem->av_launcherpid != 0)
kill(AutoVacuumShmem->av_launcherpid, SIGUSR2);

I think we should deal with this case as well.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2017-01-16 11:04:43 Re: An isolation test for SERIALIZABLE READ ONLY DEFERRABLE
Previous Message Mithun Cy 2017-01-16 10:14:47 Tuple sort is broken. It crashes on simple test.