Re: Improve lseek scalability v3

From: Andres Freund <andres(at)anarazel(dot)de>
To: Benjamin LaHaise <bcrl(at)kvack(dot)org>
Cc: Matthew Wilcox <matthew(at)wil(dot)cx>, Andi Kleen <andi(at)firstfloor(dot)org>, viro(at)zeniv(dot)linux(dot)org(dot)uk, linux-fsdevel(at)vger(dot)kernel(dot)org, linux-kernel(at)vger(dot)kernel(dot)org, robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Improve lseek scalability v3
Date: 2011-09-16 21:02:38
Message-ID: 201109162302.38780.andres@anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Friday, September 16, 2011 10:08:17 PM Benjamin LaHaise wrote:
> On Fri, Sep 16, 2011 at 07:27:33PM +0200, Andres Freund wrote:
> > many tuples does the table have. Those statistics are only updated every
> > now and then though.
> > So it uses those old stats to check how many tuples are normally stored
> > on a page and then uses that to extrapolate the number of tuples from
> > the current nr of pages (which is computed by lseek(SEEK_END) over the
> > 1GB segements of a table).
> >
> > I am not sure how interested you are on the relevant postgres internals?
>
> For such tables, can't Postgres track the size of the file internally? I'm
> assuming it's keeping file descriptors open on the tables it manages, in
> which case when it writes to a file to extend it, the internally stored
> size could be updated. Not making a syscall at all would scale far better
> than even a modified lseek() will perform.
Yes, it tracks the fds internally. The problem is that postgres is process
based so those tables are not reachable by all processes. It could start
tracking those in shared memory but the synchronization overhead for that
would likely be more expensive than the syscall overhead (Given that the
fdsets are possibly (and realistically) disjunct between the individual
backends you would have to reserve enough shared memory for a fully seperate
fds between each process... Which would complicate efficient lookup).

Also with fstat() instead of lseek() there was no bottleneck anymore, so I
don't think the benefits would warrant that.

Greetings,

Andres

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2011-09-16 21:05:18 Re: Improve lseek scalability v3
Previous Message Tom Lane 2011-09-16 20:40:06 Re: force_not_null option support for file_fdw