Re: lseek/read/write overhead becomes visible at scale ..

From: Andres Freund <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tobias Oberstein <tobias(dot)oberstein(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: lseek/read/write overhead becomes visible at scale ..
Date: 2017-06-22 16:50:48
Message-ID: 20170622165048.lwuje4f5wqw57mke@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2017-06-22 12:43:16 -0400, Robert Haas wrote:
> On Wed, Jan 25, 2017 at 2:52 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > You'll, depending on your workload, still have a lot of lseeks even if
> > we were to use pread/pwrite because we do lseek(SEEK_END) to get file
> > sizes.
>
> I'm pretty convinced that the lseek overhead that we're incurring
> right now is excessive.

No argument there.

> I mean, the Linux kernel guys fixed lseek to
> scale better more or less specifically because of PostgreSQL, which
> indicates that we're hitting it harder than most people.[1] And, more
> concretely, I've seen strace -c output where the time spent in lseek
> is far ahead of any other system call -- so if lseek overhead is
> negligible, then all of our system call overhead taken together is
> negligible, too.

That'll partially be because syscalls is where the kernel "prefers" to
switch between processes, and lseek is the most frequent one.

> Having said that, it's probably not a big percentage of our runtime
> right now -- on normal workloads, it's probably some number of tenths
> of one percent. But I'm not sure that's a good reason to ignore it.
> The more we CPU-optimize other things (say, expression evaluation!)
> the more significant the things that remain will become. And we've
> certainly made performance fixes to save far fewer cycles than we're
> talking about here[2].

Well, there's some complexity / simplicity tradeoffs as everywhere ;)

> I'm no longer very sure fixing this is a very simple thing to do,
> partly because of the use of lseek to get the file size which you note
> above, and partly because of the possibility that this may, for
> example, break read-ahead, as Tom worried about previously[3]. But I
> think dismissing this as not-really-a-problem is the wrong approach.

I suspect this'll become a larger problem once we fix a few other
issues. Right now I've a hard time measuring this, but if we'd keep
file sizes cached in shared memory, and we'd use direct IO, then we'd
potentially be able to have high enough IO throughput for this to
matter. At the moment 8kb memcpy's (instead of DMA into user buffer) is
nearly always going to dwarf the overhead of the lseek().

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jan de Visser 2017-06-22 16:52:13 Re: SQL MERGE patches for PostgreSQL Versions
Previous Message Robert Haas 2017-06-22 16:50:28 Re: Misplacement of function declaration in contrib/postgres_fdw/postgres_fdw.h