Quick Links

Re: lseek/read/write overhead becomes visible at scale ..

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc:	Tobias Oberstein <tobias(dot)oberstein(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: lseek/read/write overhead becomes visible at scale ..
Date:	2017-01-24 18:59:45
Message-ID:	20170124185945.zcyfs4pn65knfhq3@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 2017-01-24 15:36:13 -0300, Alvaro Herrera wrote:
> Tobias Oberstein wrote:
>
> > I am benchmarking IOPS, and while doing so, it becomes apparent that at
> > these scales it does matter _how_ IO is done.
> >
> > The most efficient way is libaio. I get 9.7 million/sec IOPS with low CPU
> > load. Using any synchronous IO engine is slower and produces higher load.
> >
> > I do understand that switching to libaio isn't going to fly for PG
> > (completely different approach).
>
> Maybe it is possible to write a new f_smgr implementation (parallel to
> md.c) that uses libaio. There is no "seek" in that interface, at least,
> though the interface does assume that the implementation is blocking.

For it to be beneficial we'd need to redesign the IO stack above that so
much that it'd be basically not recognizable (since we'd need to
actually use async io for it to be beneficial). Using libaio IIRC still
requires O_DIRECT, so we'd to take more care with ordering of writeback
etc too - we got closer with 9.6, but we're still far away from it.
Besides that, it's also not always that clear when AIO would be
beneficial, since a lot of the synchronous IO is actually synchronous
for a reason.

Andres

In response to

Re: lseek/read/write overhead becomes visible at scale .. at 2017-01-24 18:36:13 from Alvaro Herrera

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andres Freund	2017-01-24 19:07:05	Re: lseek/read/write overhead becomes visible at scale ..
Previous Message	Stephen Frost	2017-01-24 18:59:08	Re: [PATCH] Rename pg_switch_xlog to pg_switch_wal