Re: Cost limited statements RFC

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Cost limited statements RFC
Date: 2013-06-09 00:37:21
Message-ID: CA+TgmoZVY=zsBbY8ERem=VG_XKOYQLFhhDD5XqjZ9JFr5GUcLA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Jun 8, 2013 at 4:43 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
> I don't know what two independent setting would look like. Say you keep two
> independent counters, where each can trigger a sleep, and the triggering of
> that sleep clears only its own counter. Now you still have a limit on the
> linear combination, it is just that summation has moved to a different
> location. You have two independent streams of sleeps, but they add up to
> the same amount of sleeping as a single stream based on a summed counter.
>
> Or if one sleep clears both counters (the one that triggered it and the
> other one), I don't think that that is what I would call independent either.
> Or at least not if it has no memory. The intuitive meaning of independent
> would require that it keep track of which of the two counters was
> "controlling" over the last few seconds. Am I overthinking this?

Yep. Suppose the user has a read limit of 64 MB/s and a dirty limit
of 4MB/s. That means that, each second, we can read 8192 buffers and
dirty 512 buffers. If we sleep for 20 ms (1/50th of a second), that
"covers" 163 buffer reads and 10 buffer writes, so we just reduce the
accumulate counters by those amounts (minimum zero).

> Also, in all the anecdotes I've been hearing about autovacuum causing
> problems from too much IO, in which people can identify the specific
> problem, it has always been the write pressure, not the read, that caused
> the problem. Should the default be to have the read limit be inactive and
> rely on the dirty-limit to do the throttling?

The main time I think you're going to hit the read limit is during
anti-wraparound vacuums. That problem may be gone in 9.4, if Heikki
writes that patch we were discussing just recently. But at the
moment, we'll do periodic rescans of relations that are already
all-frozen, and that's potentially expensive.

So I'm not particularly skeptical about the need to throttle reads. I
suspect many people don't need it, but there are probably some who do,
at least for anti-wraparound cases - especially on EC2, where the
limit on I/O is often the GigE card. What I *am* skeptical about is
the notion that people need the precise value of the write limit to
depend on how many of the pages read are being found in shared_buffers
versus not. That's essentially what the present system is
accomplishing - at a great cost in user-visible complexity.

Basically, I think that anti-wraparound vacuums may need either read
throttling or write throttling depending on whether the data is
already frozen; and regular vacuums probably only need
write-throttling. But I have neither any firsthand experience nor any
empirical reason to presume that the write limit needs to be lower
when the read-rate is high.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2013-06-09 01:12:59 Re: Optimising Foreign Key checks
Previous Message MauMau 2013-06-09 00:32:25 Re: Hard limit on WAL space used (because PANIC sucks)