Re: WAL Rate Limiting

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WAL Rate Limiting
Date: 2014-02-20 14:43:53
Message-ID: CA+U5nMJo-m8SakCuYtE+OMgBJZLwJdh66wCkLPaoOTjCvmi=XA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 19 February 2014 13:28, Greg Stark <stark(at)mit(dot)edu> wrote:
> On Mon, Jan 20, 2014 at 5:37 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>
>> Agreed; that was the original plan, but implementation delays
>> prevented the whole vision/discussion/implementation. Requirements
>> from various areas include WAL rate limiting for replication, I/O rate
>> limiting, hard CPU and I/O limits for security and mixed workload
>> coexistence.
>>
>> I'd still like to get something on this in 9.4 that alleviates the
>> replication issues, leaving wider changes for later releases.
>
> My first reaction was that we should just have a generic I/O resource
> throttling. I was only convinced this was a reasonable idea by the
> replication use case. It would help me to understand the specific
> situations where replication breaks down due to WAL bandwidth
> starvation.

The various bulk operations mentioned in the OP can dump GBs of data
into WAL. That causes replication to immediately fall behind, which
slows everything down if you are using synchronous replication. This
has been discussed before on list over last few years (and Heroku
staff have been involved in that, fyi).

General I/O limiting is a nice-to-have but what is suggested here is
an essential aspect of running bulk commands when using replication,
especially synchronous replication.

The use cases for I/O limiting and replication limiting are different
in both their effects and their criticality.

The design choice of making the limit only apply to bulk ops is
because that is where the main problem lies. Rate limiting will cause
a loss of performance in the main line for non-bulk operations, so
adding tests there will not be valuable.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2014-02-20 14:51:47 Re: Selecting large tables gets killed
Previous Message Greg Stark 2014-02-20 14:29:41 Re: WAL Rate Limiting