Re: Re: Anyone have experience benchmarking very high effective_io_concurrency on NVME's?

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Greg Stark <stark(at)mit(dot)edu>, Chris Travers <chris(dot)travers(at)adjust(dot)com>
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: Anyone have experience benchmarking very high effective_io_concurrency on NVME's?
Date: 2017-10-31 17:47:06
Message-ID: cee22456-614b-9654-9082-cf64f0f569f5@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 10/31/2017 04:48 PM, Greg Stark wrote:
> On 31 October 2017 at 07:05, Chris Travers <chris(dot)travers(at)adjust(dot)com>
wrote:
>> Hi;
>>
>> After Andres's excellent talk at PGConf we tried benchmarking
>> effective_io_concurrency on some of our servers and found that those
which
>> have a number of NVME storage volumes could not fill the I/O queue
even at
>> the maximum setting (1000).
>
> And was the system still i/o bound? If the cpu was 100% busy then
> perhaps Postgres just can't keep up with the I/O system. It would
> depend on workload though, if you start many very large sequential
> scans you may be able to push the i/o system harder.
>
> Keep in mind effective_io_concurrency only really affects bitmap
> index scans (and to a small degree index scans). It works by issuing
> posix_fadvise() calls for upcoming buffers one by one. That gets
> multiple spindles active but it's not really going to scale to many
> thousands of prefetches (and effective_io_concurrency of 1000
> actually means 7485 prefetches). At some point those i/o are going
> to start completing before Postgres even has a chance to start
> processing the data.
>
Yeah, initiating the prefetches is not expensive, but it's not free
either. So there's a trade-off between time spent on prefetching and
processing the data.

I believe this may be actually illustrated using Amdahl's law - the I/O
is the parallel part, and processing the data is the serial part. And no
matter what you do, the device only has so much bandwidth, which defines
the maximum possible speedup (compared to "no prefetch" case).

Furthermore, the device does not wait for all the I/O requests to be
submitted - it won't wait for 1000 requests and then go "OMG! There's a
lot of work to do!" It starts processing the requests as they arrive,
and some of them will complete before you're done with submitting the
rest, so you'll never see all the requests in the queue at once.

And of course, iostat and other tools only give you "average queue
length", which is mostly determined by the average throughput.

In my experience (on all types of storage, including SSDs and NVMe), the
performance quickly and significantly improves once you start increasing
the value (say, to 8 or 16, maybe 64). And then the gains become much
more modest - not because the device could not handle more, but because
of the prefetch/processing ratio reached the optimal value.

But all this is actually per-process, if you can run multiple backends
(particularly when doing bitmap index scans), I'm sure you'll see the
queues being more full.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message MauMau 2017-10-31 17:47:25 Re: Statement-level rollback
Previous Message Simon Riggs 2017-10-31 17:44:45 Re: SQL procedures