Re: cost based vacuum (parallel)

From: Andres Freund <andres(at)anarazel(dot)de>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Subject: Re: cost based vacuum (parallel)
Date: 2019-11-04 18:28:29
Message-ID: 20191104182829.57bkz64qn5k3uwc3@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2019-11-04 12:59:02 -0500, Jeff Janes wrote:
> On Mon, Nov 4, 2019 at 1:54 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> > For parallel vacuum [1], we were discussing what is the best way to
> > divide the cost among parallel workers but we didn't get many inputs
> > apart from people who are very actively involved in patch development.
> > I feel that we need some more inputs before we finalize anything, so
> > starting a new thread.
> >
>
> Maybe a I just don't have experience in the type of system that parallel
> vacuum is needed for, but if there is any meaningful IO throttling which is
> active, then what is the point of doing the vacuum in parallel in the first
> place?

I am wondering the same - but to be fair, it's pretty easy to run into
cases where VACUUM is CPU bound. E.g. because most pages are in
shared_buffers, and compared to the size of the indexes number of tids
that need to be pruned is fairly small (also [1]). That means a lot of
pages need to be scanned, without a whole lot of IO going on. The
problem with that is just that the defaults for vacuum throttling will
also apply here, I've never seen anybody tune vacuum_cost_page_hit = 0,
vacuum_cost_page_dirty=0 or such (in contrast, the latter is the highest
cost currently). Nor do we reduce the cost of vacuum_cost_page_dirty
for unlogged tables.

So while it doesn't seem unreasonable to want to use cost limiting to
protect against vacuum unexpectedly causing too much, especially read,
IO, I'm doubtful it has current practical relevance.

I'm wondering how much of the benefit of parallel vacuum really is just
to work around vacuum ringbuffers often massively hurting performance
(see e.g. [2]). Surely not all, but I'd be very unsurprised if it were a
large fraction.

Greetings,

Andres Freund

[1] I don't think the patch addresses this, IIUC it's only running index
vacuums in parallel, but it's very easy to run into being CPU
bottlenecked when vacuuming a busily updated table. heap_hot_prune
can be really expensive, especially with longer update chains (I
think it may have an O(n^2) worst case even).
[2] https://www.postgresql.org/message-id/20160406105716.fhk2eparljthpzp6%40alap3.anarazel.de

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2019-11-04 18:39:18 Re: 64 bit transaction id
Previous Message Andres Freund 2019-11-04 18:11:57 Re: cost based vacuum (parallel)