Set hint bits upon eviction from BufMgr

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Set hint bits upon eviction from BufMgr
Date: 2011-03-25 14:52:29
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Maybe I'm being overly simplistic or incorrect here, but I was
thinking that there might be a route to reducing hint bit impact to
the main sufferers of the feature without adding too much pain in the
general case. I'm unfortunately convinced there is no getting rid of
them -- in fact their utility will become even more apparent with
faster storage and the pendulum of optimization swings back to the cpu

My idea is to reserve a bit in the page header, say PD_ALL_SAME_XMIN
that indicates all the tuples are from the same transaction and set it
when the first insertion tuple hits the page and unset it when any
tuple is added from another xmin/touched/deleted. The point here is
to set up a cheap check at the page level that we can make when a page
is getting evicted from the bufmgr. If the bit is set, we grab off
the xmin of the first tuple on the page and test it for visibility
(assuming the hint bit is not already set). If we get a thumbs up on
the transaction, we can look the page and set all tuple hints as
during the page evict/sync process. We don't worry about
logging/crash safety on the 'all same' hint because it's only
interesting to this bufmgr check (it can even be cleared when page is

Without this bit, the only way to set hint bits going during bufmgr
eviction is to do a visibility check on every tuple, which would
probably be prohibitively expensive. Since OLTP environments would
rarely see this bit, they would not have to pay for the check.

Also, we can maybe tweak the bufmgr to prefer not to evict pages with
this bit set if it's known they are not yet written out to primary
storage. Maybe this impossible or not logical...just thinking out
loud. Anyways, if this actually works, shared buffers can start to
play a role of mitigating hint bit i/o as long as the transaction
resolves before pages start jumping out into storage. If you couple
this with a facility to do bulk loads that break up transactions on
regular intervals, you have a good shot at getting all your hint bits
written out properly in large load situation.

You might be able to do similar tricks with deletes -- I haven't
thought about that. Also there might be some interplay with vacuum or
some other deal breaker -- curious to see if I have something worth
further thought here.



Browse pgsql-hackers by date

  From Date Subject
Next Message Gurjeet Singh 2011-03-25 15:07:00 Re: 2nd Level Buffer Cache
Previous Message aaronenabs 2011-03-25 13:43:50 How to Make a pg_filedump