Re: Berserk Autovacuum (let's save next Mandrill)

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Darafei Komяpa Praliaskouski <me(at)komzpa(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Michael Banck <mbanck(at)gmx(dot)net>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Subject: Re: Berserk Autovacuum (let's save next Mandrill)
Date: 2019-09-13 03:18:05
Message-ID: CAD21AoAap+b+o+LMjHbOb2hn3CrjCGrVYHPv7TpG8G07dCk+VQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Sep 10, 2019 at 8:19 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Tue, Jul 23, 2019 at 1:53 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> >
> > To invoke autovacuum even on insert-only tables we would need check
> > the number of inserted tuples since last vacuum. I think we can keep
> > track of the number of inserted tuples since last vacuum to the stats
> > collector and add the threshold to invoke vacuum with INDEX_CLEANUP =
> > false. If an autovacuum worker confirms that the number of inserted
> > tuples exceeds the threshold it invokes vacuum with INDEX_CLEANUP =
> > false. However if the number of dead tuples also exceeds the
> > autovacuum thresholds (autovacuum_vacuum_threshold and
> > autovacuum_vacuum_scale_factor) it should invoke vacuum with
> > INDEX_CLEANUP = true. Therefore new threshold makes sense only when
> > it's lower than the autovacuum thresholds.
> >
> > I guess we can have one new GUC parameter to control scale factor.
> > Since only relatively large tables will require this feature we might
> > not need the threshold based the number of tuples.
> >
>
> Generally speaking, having more guc's for autovacuum and that too
> which are in some way dependent on existing guc's sounds bit scary,
> but OTOH whatever you wrote makes sense and can help the scenarios
> which this thread is trying to deal with. Have you given any thought
> to what Alvaro mentioned up-thread "certain tables would have some
> sort of partial scan that sets the visibility map. There's no reason
> to invoke the whole vacuuming machinery. I don't think this is
> limited to append-only tables, but
> rather those are just the ones that are affected the most."?
>

Speaking of partial scan I've considered before that we could use WAL
to find which pages have garbage much and not all-visible pages. We
can vacuum only a particular part of table that is most effective of
garbage collection instead of whole tables. I've shared some results
of that at PGCon and it's still in PoC state.

Also, to address the issue of updating VM of mostly-append-only tables
I considered some possible solutions:

1. Using INDEX_CLEANUP = false and TRUNCATE = false vacuum does hot
pruning, vacuuming table and updating VM. In addition to updating VM
we need to do other two operations but since the mostly-insert-only
tables would have less garbage the hot pruning and vacuuming table
should be light workload. This is what I proposed on up-thread.
2. This may have already been discussed before but we could update
VM when hot pruning during SELECT operation. Since this affects SELECT
performance it should be enabled on only particular tables by user
request.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2019-09-13 03:23:12 Re: Create collation reporting the ICU locale display name
Previous Message Michael Paquier 2019-09-13 03:17:46 Re: psql - improve test coverage from 41% to 88%