Re: Prereading using posix_fadvise (was Re: Commitfest patches)

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: Zeugswetter Andreas OSB SD <Andreas(dot)Zeugswetter(at)s-itsolutions(dot)at>, Gregory Stark <stark(at)enterprisedb(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Prereading using posix_fadvise (was Re: Commitfest patches)
Date: 2008-03-28 15:41:58
Message-ID: 200803281541.m2SFfwk06208@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Heikki Linnakangas wrote:
> > So it has nothing to do with table size. The fadvise calls need to be
> > (and are)
> > limited by what can be used in the near future, and not for the whole
> > statement.
>
> Right, I was sloppy. Instead of table size, what matters is the amount
> of data the scan needs to access. The point remains that if the data is
> already in OS cache, the posix_fadvise calls are a waste of time,
> regardless of how many pages ahead you advise.

I now understand what posix_fadvise() is allowing us to do.
posix_fadvise(POSIX_FADV_WILLNEED) allows us to tell the kernel we will
need a certain block in the future --- this seems much cheaper than a
background reader.

We know we will need the blocks, and telling the kernel can't hurt,
except that there is overhead in telling the kernel. Has anyone
measured how much overhead? I would be interested in a test program
that read the same page over and over again from the kernel, with and
without a posix_fadvise() call.

Should we consider only telling the kernel X pages ahead, meaning when
we are on page 10 we tell it about page 16?

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Zubkovsky, Sergey 2008-03-28 15:43:29 Re: [DOCS] pg_total_relation_size() and CHECKPOINT
Previous Message Chris Browne 2008-03-28 15:33:42 Re: Transaction Snapshot Cloning