Re: cost based vacuum (parallel)

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Subject: Re: cost based vacuum (parallel)
Date: 2019-11-05 09:10:42
Message-ID: CAA4eK1LyNGQ3-26WphEOQdPswE8r+r04kwZOcEHXAgwp4FEmmA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Nov 4, 2019 at 11:58 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> Hi,
>
> On 2019-11-04 12:59:02 -0500, Jeff Janes wrote:
> > On Mon, Nov 4, 2019 at 1:54 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > > For parallel vacuum [1], we were discussing what is the best way to
> > > divide the cost among parallel workers but we didn't get many inputs
> > > apart from people who are very actively involved in patch development.
> > > I feel that we need some more inputs before we finalize anything, so
> > > starting a new thread.
> > >
> >
> > Maybe a I just don't have experience in the type of system that parallel
> > vacuum is needed for, but if there is any meaningful IO throttling which is
> > active, then what is the point of doing the vacuum in parallel in the first
> > place?
>
> I am wondering the same - but to be fair, it's pretty easy to run into
> cases where VACUUM is CPU bound. E.g. because most pages are in
> shared_buffers, and compared to the size of the indexes number of tids
> that need to be pruned is fairly small (also [1]). That means a lot of
> pages need to be scanned, without a whole lot of IO going on. The
> problem with that is just that the defaults for vacuum throttling will
> also apply here, I've never seen anybody tune vacuum_cost_page_hit = 0,
> vacuum_cost_page_dirty=0 or such (in contrast, the latter is the highest
> cost currently). Nor do we reduce the cost of vacuum_cost_page_dirty
> for unlogged tables.
>
> So while it doesn't seem unreasonable to want to use cost limiting to
> protect against vacuum unexpectedly causing too much, especially read,
> IO, I'm doubtful it has current practical relevance.
>

IIUC, you mean to say that it is of not much practical use to do
parallel vacuum if I/O throttling is enabled for an operation, is that
right?

> I'm wondering how much of the benefit of parallel vacuum really is just
> to work around vacuum ringbuffers often massively hurting performance
> (see e.g. [2]).
>

Yeah, it is a good thing to check, but if anything, I think a parallel
vacuum will further improve the performance with larger ring buffers
as it will make it more CPU bound.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2019-11-05 09:22:21 Re: cost based vacuum (parallel)
Previous Message Grigory Smolkin 2019-11-05 08:51:26 Re: [proposal] recovery_target "latest"