Re: WAL insert delay settings

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WAL insert delay settings
Date: 2019-02-20 21:43:35
Message-ID: 20190220214335.GY6197@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings,

* Tomas Vondra (tomas(dot)vondra(at)2ndquadrant(dot)com) wrote:
> On 2/19/19 8:40 PM, Andres Freund wrote:
> > On 2019-02-19 20:34:25 +0100, Tomas Vondra wrote:
> >> On 2/19/19 8:22 PM, Andres Freund wrote:
> >>> And my main point is that even if you implement a proper bytes/sec limit
> >>> ONLY for WAL, the behaviour of VACUUM rate limiting doesn't get
> >>> meaningfully more confusing than right now.
> >>
> >> So, why not to modify autovacuum to also use this approach? I wonder if
> >> the situation there is more complicated because of multiple workers
> >> sharing the same budget ...
> >
> > I think the main reason is that implementing a scheme like this for WAL
> > rate limiting isn't a small task, but it'd be aided by the fact that
> > it'd probably not on by default, and that there'd not be any regressions
> > because the behaviour didn't exist before. I contrast, people are
> > extremely sensitive to autovacuum behaviour changes, even if it's to
> > improve autovacuum. I think it makes more sense to build the logic in a
> > lower profile case first, and then migrate autovacuum over it. Even
> > leaving the maturity issue aside, reducing the scope of the project into
> > more bite sized chunks seems to increase the likelihood of getting
> > anything substantially.
>
> Maybe.

I concur with that 'maybe'. :)

> I guess the main thing I'm advocating for here is to aim for a unified
> throttling approach, not multiple disparate approaches interacting in
> ways that are hard to understand/predict.

Yes, agreed.

> The time-based approach you described looks fine, an it's kinda what I
> was imagining (and not unlike the checkpoint throttling). I don't think
> it'd be that hard to tweak autovacuum to use it too, but I admit I have
> not thought about it particularly hard and there's stuff like per-table
> settings which might make it more complex.

When reading Andres' proposal, I was heavily reminded of how checkpoint
throttling is handled and wondered if there might be some way to reuse
or generalize that existing code/technique/etc and make it available to
be used for WAL, and more-or-less any/every other bulk operation (CREATE
INDEX, REINDEX, CLUSTER, VACUUM...).

> So maybe doing it for WAL first makes sense. But I still think we need
> to have at least a rough idea how it interacts with the existing
> throttling and how it will work in the end.

Well, it seems like Andres explained how it'll work with the existing
throttling, no? As for how all of this will work in the end, that's a
good question but also a rather difficult one to answer, I suspect.

Just to share a few additional thoughts after pondering this for a
while, but the comment Andres made up-thread really struck a chord- we
don't necessairly want to throttle anything, what we'd really rather do
is *prioritize* things, whereby foreground work (regular queries and
such) have a higher priority than background/bulk work (VACUUM, REINDEX,
etc) but otherwise we use the system to its full capacity. We don't
actually want to throttle a VACUUM run any more than a CREATE INDEX, we
just don't want those to hurt the performance of regular queries that
are happening.

The other thought I had was that doing things on a per-table basis, in
particular, isn't really addressing the resource question appropriately.
WAL is relatively straight-forward and independent of a resource from
the IO for the heap/indexes, so getting an idea from the admin of how
much capacity they have for WAL makes sense. When it comes to the
capacity for the heap/indexes, in terms of IO, that really goes to the
underlying storage system/channel, which would actually be a tablespace
in properly set up environments (imv anyway).

Wrapping this up- it seems unlikely that we're going to get a
priority-based system in place any time particularly soon but I do think
it's worthy of some serious consideration and discussion about how we
might be able to get there. On the other hand, if we can provide a way
for the admin to say "these are my IO channels (per-tablespace values,
plus a value for WAL), here's what their capacity is, and here's how
much buffer for foreground work I want to have (again, per IO channel),
so, PG, please arrange to not use more than 'capacity-buffer' amount of
resources for background/bulk tasks (per IO channel)" then we can at
least help them address the issue that foreground tasks are being
stalled or delayed due to background/bulk work. This approach means
that they won't be utilizing the system to its full capacity, but
they'll know that and they'll know that it's because, for them, it's
more important that they have that low latency for foreground tasks.

Thanks!

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2019-02-20 21:56:17 Re: Delay locking partitions during INSERT and UPDATE
Previous Message Thomas Munro 2019-02-20 21:41:11 Re: BUG #15641: Autoprewarm worker fails to start on Windows with huge pages in use Old PostgreSQL community/pgsql-bugs x