Re: cost based vacuum (parallel)

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Subject: Re: cost based vacuum (parallel)
Date: 2019-11-04 20:12:05
Message-ID: 20191104201205.GH6962@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings,

* Andres Freund (andres(at)anarazel(dot)de) wrote:
> On 2019-11-04 14:33:41 -0500, Stephen Frost wrote:
> > * Andres Freund (andres(at)anarazel(dot)de) wrote:
> > > On 2019-11-04 14:06:19 -0500, Stephen Frost wrote:
> > > > With parallelization across indexes, you could have a situation where
> > > > the individual indexes are on different tablespaces with independent
> > > > i/o, therefore the parallelization ends up giving you an increase in i/o
> > > > throughput, not just additional CPU time.
> > >
> > > How's that related to IO throttling being active or not?
> >
> > You might find that you have to throttle the IO down when operating
> > exclusively against one IO channel, but if you have multiple IO channels
> > then the acceptable IO utilization could be higher as it would be
> > spread across the different IO channels.
> >
> > In other words, the overall i/o allowance for a given operation might be
> > able to be higher if it's spread across multiple i/o channels, as it
> > wouldn't completely consume the i/o resources of any of them, whereas
> > with a higher allowance and a single i/o channel, there would likely be
> > an impact to other operations.
> >
> > As for if this is really relevant only when it comes to parallel
> > operations is a bit of an interesting question- these considerations
> > might not require actual parallel operations as a single process might
> > be able to go through multiple indexes concurrently and still hit the
> > i/o limit that was set for it overall across the tablespaces. I don't
> > know that it would actually be interesting or useful to spend the effort
> > to make that work though, so, from a practical perspective, it's
> > probably only interesting to think about this when talking about
> > parallel vacuum.
>
> But you could just apply different budgets for different tablespaces?

Yes, that would be one approach to addressing this, though it would
change the existing meaning of those cost parameters. I'm not sure if
we think that's an issue or not- if we only have this in the case of a
parallel vacuum then it's probably fine, I'm less sure if it'd be
alright to change that on an upgrade.

> That's quite doable independent of parallelism, as we don't have tables
> or indexes spanning more than one tablespace. True, you could then make
> the processing of an individual vacuum faster by allowing to utilize
> multiple tablespace budgets at the same time.

Yes, it's possible to do independent of parallelism, but what I was
trying to get at above is that it might not be worth the effort. When
it comes to parallel vacuum though, I'm not sure that you can just punt
on this question since you'll naturally end up spanning multiple
tablespaces concurrently, at least if the heap+indexes are spread across
multiple tablespaces and you're operating against more than one of those
relations at a time (which, I admit, I'm not 100% sure is actually
happening with this proposed patch set- if it isn't, then this isn't
really an issue, though that would be pretty unfortunate as then you
can't leverage multiple i/o channels concurrently and therefore Jeff's
question about why you'd be doing parallel vacuum with IO throttling is
a pretty good one).

Thanks,

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-11-04 20:31:27 Re: Include RELKIND_TOASTVALUE in get_relkind_objtype
Previous Message Robert Haas 2019-11-04 19:58:20 Re: Missed check for too-many-children in bgworker spawning