Re: Allow a per-tablespace effective_io_concurrency setting

From: Greg Stark <stark(at)mit(dot)edu>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Julien Rouhaud <julien(dot)rouhaud(at)dalibo(dot)com>
Subject: Re: Allow a per-tablespace effective_io_concurrency setting
Date: 2015-09-02 18:49:13
Message-ID: CAM-w4HNFSi9HAvRXrvPhYf7awhzZq1ojOtcbK064VxC5U5t1jA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On 2 Sep 2015 14:54, "Andres Freund" <andres(at)anarazel(dot)de> wrote:
>
>
> > + /*----------
> > + * The user-visible GUC parameter is the number of drives (spindles),
> > + * which we need to translate to a number-of-pages-to-prefetch target.
> > + * The target value is stashed in *extra and then assigned to the actual
> > + * variable by assign_effective_io_concurrency.
> > + *
> > + * The expected number of prefetch pages needed to keep N drives busy is:
> > + *
> > + * drives | I/O requests
> > + * -------+----------------
> > + * 1 | 1
> > + * 2 | 2/1 + 2/2 = 3
> > + * 3 | 3/1 + 3/2 + 3/3 = 5 1/2
> > + * 4 | 4/1 + 4/2 + 4/3 + 4/4 = 8 1/3
> > + * n | n * H(n)
>
> I know you just moved this code. But: I don't buy this formula. Like at
> all. Doesn't queuing and reordering entirely invalidate the logic here?

I can take the blame for this formula.

It's called the "Coupon Collector Problem". If you hit get a random
coupon from a set of n possible coupons, how many random coupons would
you have to collect before you expect to have at least one of each.

This computation model assumes we have no information about which
spindle each block will hit. That's basically true for the case of
bitmapheapscan for most cases because the idea of bitmapheapscan is to
be picking a sparse set of blocks and there's no reason the blocks
being read will have any regularity that causes them all to fall on
the same spindles. If in fact you're reading a fairly dense set then
bitmapheapscan probably is a waste of time and simply reading
sequentially would be exactly as fast or even faster.

We talked about this quite a bit back then and there was no dispute
that the aim is to provide GUCs that mean something meaningful to the
DBA who can actually measure them. They know how many spindles they
have. They do not know what the optimal prefetch depth is and the only
way to determine it would be to experiment with Postgres. Worse, I
think the above formula works for essentially random I/O but for more
predictable I/O it might be possible to use a different formula. But
if we made the GUC something low level like "how many blocks to
prefetch" then we're left in the dark about how to handle that
different access pattern.

I did speak to a dm developer and he suggested that the kernel could
help out with an API. He suggested something of the form "how many
blocks do I have to read before the end of the current device". I
wasn't sure exactly what we would do with something like that but it
would be better than just guessing how many I/O operations we need to
issue to keep all the spindles busy.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2015-09-02 18:50:12 Re: [PATCH] SQL function to report log message
Previous Message Robert Haas 2015-09-02 18:45:10 Re: RFC: replace pg_stat_activity.waiting with something more descriptive

Browse pgsql-performance by date

  From Date Subject
Next Message Julien Rouhaud 2015-09-02 19:12:41 Re: Allow a per-tablespace effective_io_concurrency setting
Previous Message Andres Freund 2015-09-02 16:45:56 Re: Allow a per-tablespace effective_io_concurrency setting