Re: BUG #13750: Autovacuum slows down with large numbers of tables. More workers makes it slower.

From: David Gould <daveg(at)sonic(dot)net>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Pg Bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #13750: Autovacuum slows down with large numbers of tables. More workers makes it slower.
Date: 2015-10-31 06:19:52
Message-ID: 20151030231952.70eb5887@engels
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Fri, 30 Oct 2015 21:49:00 -0700
Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:

> On Fri, Oct 30, 2015 at 8:40 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> writes:
> >> David Gould wrote:
> >>> Anyway, they are not actually vacuuming. They are waiting on the
> >>> VacuumScheduleLock. And requesting freshs snapshots from the
> >>> stats_collector.
> >
> >> Oh, I see. Interesting. Proposals welcome. I especially dislike the
> >> ("very_expensive") pgstat check.
> >
> > Couldn't we simply move that out of the locked stanza? That is, if no
> > other worker is working on the table, claim it, and release the lock
> > immediately. Then do the "very expensive" check. If that fails, we
> > have to re-take the lock to un-claim the table, but that sounds OK.
>
>
> The attached patch does that. In a system with 4 CPUs and that had
> 100,000 tables, with a big chunk of them in need of vacuuming, and
> with 30 worker processes, this increased the throughput by a factor of
> 40. Presumably it will do even better with more CPUs.
>
> It is still horribly inefficient, but 40 times less so.

That is a good result for such a small change.

The attached patch against REL9_5_STABLE_goes a little further. It
claims the table under the lock, but also addresses the problem of all the
workers racing to redo the same table by enforcing an ordering on all the
workers. No worker can claim a table with an oid smaller than the highest
oid claimed by any worker. That is, instead of racing to the same table,
workers leapfrog over each other.

In theory the recheck of the stats could be eliminated although this patch
does not do that. It does eliminate the special handling of stats snapshots
for autovacuum workers which cuts back on the excess rewriting of the stats
file somewhat.

I'll send numbers shortly, but as I recall it is over 100 times better than
the original.

-dg

--
David Gould 510 282 0869 daveg(at)sonic(dot)net
If simplicity worked, the world would be overrun with insects.

Attachment Content-Type Size
autovacuum_worker_contention.diff text/x-patch 12.9 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message David Gould 2015-10-31 06:41:40 Re: BUG #13750: Autovacuum slows down with large numbers of tables. More workers makes it slower.
Previous Message Jeff Janes 2015-10-31 05:16:04 Re: BUG #13750: Autovacuum slows down with large numbers of tables. More workers makes it slower.