Re: Prereading using posix_fadvise (was Re: Commitfest patches)

From: "Zeugswetter Andreas OSB SD" <Andreas(dot)Zeugswetter(at)s-itsolutions(dot)at>
To: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>, "Gregory Stark" <stark(at)enterprisedb(dot)com>
Cc: "Simon Riggs" <simon(at)2ndquadrant(dot)com>, "Pg Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Prereading using posix_fadvise (was Re: Commitfest patches)
Date: 2008-03-28 15:13:38
Message-ID: E1539E0ED7043848906A8FF995BDA57902EB1583@m0143.s-mxs.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Heikki wrote:
> It seems that the worst case for this patch is a scan on a table that
> doesn't fit in shared_buffers, but is fully cached in the OS cache. In

> that case, the posix_fadvise calls would be a certain waste of time.

I think this is a misunderstanding, the fadvise is not issued to read
the
whole table and is not issued for table scans at all (and if it were it
would
only advise for the next N pages).

So it has nothing to do with table size. The fadvise calls need to be
(and are)
limited by what can be used in the near future, and not for the whole
statement.

e.g. N next level index pages that are relevant, or N relevant heap
pages one
index leaf page points at. Maybe in the index case N does not need to be
limited,
since we have a natural limit on how many pointers fit on one page.

In general I think separate reader processes (or threads :-) that
directly read
into the bufferpool would be a more portable and efficient
implementation.
E.g. it could use ScatterGather IO. So I think we should look, that the
fadvise
solution is not obstruing that path, but I think it does not.

Gregory wrote:
>> A more invasive form of this patch would be to assign and pin a
buffer when
>> the preread is done. That would men subsequently we would have a
pinned buffer
>> ready to go and not need to go back to the buffer manager a second
time. We
>> would instead just "complete" the i/o by issuing a normal read call.

I guess you would rather need to mark the buffer for use for this page,
but let any backend that needs it first, pin it and issue the read.
I think the fadviser should not pin it in advance, since he cannot
guarantee to
actually read the page [soon]. Rather remember the buffer and later
check and pin
it for the read. Else you might be blocking the buffer.
But I think doing something like this might be good since it avoids
issuing duplicate
fadvises.

Andreas

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2008-03-28 15:16:08 Re: advancing snapshot's xmin
Previous Message Tom Lane 2008-03-28 15:10:03 Re: segfault in locking code