Re: Prereading using posix_fadvise

From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
Cc: "Zeugswetter Andreas OSB SD" <Andreas(dot)Zeugswetter(at)s-itsolutions(dot)at>, "Simon Riggs" <simon(at)2ndquadrant(dot)com>, "Pg Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Prereading using posix_fadvise
Date: 2008-03-28 16:00:18
Message-ID: 87zlsiyh2l.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"Heikki Linnakangas" <heikki(at)enterprisedb(dot)com> writes:

> Right, I was sloppy. Instead of table size, what matters is the amount of data
> the scan needs to access. The point remains that if the data is already in OS
> cache, the posix_fadvise calls are a waste of time, regardless of how many
> pages ahead you advise.

I'm not sure that's really fair to the scan. The typical bitmap index scan
will probably be accessing far fewer pages than what fits in cache. And
prereading them will usually help unless you're always reading the same pages.
The usual case will be bitmap scans for different pages each time.

I'm picturing something like our mail archive search. Each user who uses it
retrieves just 10 hits or so. But there's no reason to think that because
those 10 hits fit in effective_cache_size that they'll actually be found
there.

It might make more sense to compare the table size. If the table size fits in
cache then any random bunch of pages is likely to be in cache somewhere.
Except we have no idea how much of the database this table represents. If the
user has a schema like ones we've seen posted to the lists many times with
hundreds of tables then deductions based on the size of a single table will be
questionable.

I'm also leery about scaling back the prereading for systems with large
effective_cache_size since those are precisely the systems which are likely to
have raid arrays and be helped the most by this on queries where it's helpful.

I feel like we're probably stuck optimizing for the case where i/o is
happening and hoping that the filesystem cache is efficient. If the entire
database fits in cache then the user has to adjust random_page_cost and should
probably set preread_pages to 0 (or effective_spindle_count to 1 depending on
which version of the patch you're reading) as well.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's 24x7 Postgres support!

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Zubkovsky, Sergey 2008-03-28 16:07:31 Re: [DOCS] pg_total_relation_size() and CHECKPOINT
Previous Message Heikki Linnakangas 2008-03-28 15:59:35 Re: Prereading using posix_fadvise (was Re: Commitfest patches)