Re: Re: Anyone have experience benchmarking very high effective_io_concurrency on NVME's?

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Chris Travers <chris(dot)travers(at)adjust(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: Anyone have experience benchmarking very high effective_io_concurrency on NVME's?
Date: 2017-11-01 04:06:49
Message-ID: CAMsr+YEbWaDqL9Lj5-EGXMtFY7eN-PGRSbwCQfRBmMSdsg=GWg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 1 November 2017 at 11:49, Andres Freund <andres(at)anarazel(dot)de> wrote:

> Right. It'd probably be good to be a bit more adaptive here. But it's
> hard to do with posix_fadvise - we'd need an operation that actually
> notifies us of IO completion. If we were using, say, asynchronous
> direct IO, we could initiate the request and regularly check how many
> blocks ahead of the current window are already completed and adjust the
> queue based on that, rather than jus tfiring off fadvises and hoping for
> the best.

In case it's of interest, I did some looking into using Linux's AIO
support in Pg a while ago, when chasing some issues around fsync
retries and handling of I/O errors.

It was a pretty serious dead end; it was clear that fsync support in
AIO is not only incomplete but inconsistent across kernel versions,
let alone other platforms.

But I see your name in the relevant threads, so you know that. To save
others the time, see:

* https://lwn.net/Articles/724198/
* https://lwn.net/Articles/671649/

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2017-11-01 04:15:58 Re: proposal: schema variables
Previous Message Alvaro Herrera 2017-11-01 03:59:36 Re: Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM