Re: posix_fadvise v22

From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Robert Haas" <robertmhaas(at)gmail(dot)com>, "ITAGAKI Takahiro" <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, Postgres <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: posix_fadvise v22
Date: 2009-01-11 20:04:17
Message-ID: 87y6xhmwxq.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> "Robert Haas" <robertmhaas(at)gmail(dot)com> writes:
>> OK, here's an update of Greg's patch with the runtime configure test
>> ripped out, some minor documentation tweaks, and a few unnecessary
>> whitespace diff hunks quashed. I think this is about ready for
>> committer review.
>
> I've started to look through this, and the only part I seriously don't
> like is the nbtsearch.c changes. I've got three complaints about that:
>
> * Doing it inside the index AMs is wrong, or at the very least forces
> us to do it over for each index AM (which the patch fails to do).

ok

> * As coded, it generates prefetch bursts that are much too large and too
> widely spaced to be effective, not to mention that they entirely
> ignore the effective_io_concurrency control knob as well as the order
> in which the pages will actually be needed. I wonder now whether
> Robert's inability to see any benefit came because he was testing
> indexscans and not bitmap scans.

Well experiments showed that it was very effective, even more so than for
bitmap scans. So much so that it nearly obliterated bitmap scans' advantage
over index scans.

I originally thought like you that all that logic was integral to the thing
but eventually came around to think the opposite. That logic is to overcome a
fundamental problem with bitmap scans -- that there's no logical group to
prefetch and a potentially unbounded stream of pages. Index scans just don't
have that problem so they don't need that extra logic.

> * It's only accidental that it's not kicking in during a bitmap
> indexscan and bollixing up the much-more-carefully-written
> nodeBitmapHeapscan prefetch logic.

ok.

> What I intend to do over the next day or so is commit the prefetch
> infrastructure and the bitmap scan prefetch logic, but I'm bouncing the
> indexscan part back for rework. I think that it should be implemented
> in or near index_getnext() and pay attention to
> effective_io_concurrency.

So to do that I would have a few questions.

a) ISTM necessary to keep a dynahash of previously prefetched pointers around
to avoid repeatedly prefetching the same pages. That could get quite large
though. Or do you think it would be fine to visit the buffer cache,
essentially using that as the hash, for every index pointer?

b) How would index_getnext keep two read pointers like bitmap heap scans? I
think this would require adding a new index AM api similar to your
tuplestore api where the caller can maintain multiple read pointers in the
scan. And I'm not sure how that would work with mark/restore.

c) I would be afraid that pushing the index scan to reach for the next index
leaf page prematurely (and not just a async prefetch, but a blocking read)
would cause extra random i/o which would slow down the critical path of
reading the current index tuples. So I think we would still want to pause
when we hit the end of the current leaf page. That would require some form
of feedback in the index am api as well.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's On-Demand Production Tuning

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2009-01-11 20:41:02 Re: posix_fadvise v22
Previous Message Devrim GÜNDÜZ 2009-01-11 20:01:57 Re: foreign_data test fails with non-C locale