The one remaining issue I'd like to address in the new FSM
implementation is the fact that the FSM is currently not updated at all
in WAL recovery. The old FSM wasn't updated on WAL recovery either, and
was in fact completely thrown away if the system wasn't shut down
cleanly. The difference is that after recovery, we used to start with no
FSM information at all, and all inserts would have to extend the
relations until the next vacuum, while now the inserts use the old data
in the FSM. In case of a PITR recovery or warm stand-by, the FSM would
information would come from the last base backup, which could be *very* old.
The first inserter after the recovery might have to visit a lot of pages
that the FSM claimed had free space, but didn't in reality, before
finding a suitable target. In the absolutely worst case, where the table
was almost empty when the base backup was taken, but is now full, it
might have to visit every single heap page. That's not good.
So we should try to update the FSM during recovery as well. It doesn't
need to be very accurate, as the FSM information isn't accurate anyway,
but we should try to avoid the worst case scenarios.
The attached patch is my first attempt at that. Arbitrarily, if after a
heap insert/update there's less than 20% of free space on the page, the
FSM is updated. Compared to updating it every time, that saves a lot of
overhead, while doing a pretty good job at marking full pages as full in
the FSM. My first thought was to update the FSM if there isn't enough
room on the page for a new tuple of the same size as the one just
inserted; that would be pretty close to the logic we have during normal
operation, where the FSM is updated when the tuple that we're about to
insert doesn't fit on the page. But because we don't know the fillfactor
during recovery, I don't think we can do reliably.
One issue with this patch is that it doesn't update the FSM at all when
pages are restored from full page images. It would require fetching the
page and checking the free space on it, or peeking into the size of the
backup block data, and I'm not sure if it's worth the extra code to do that.
pgsql-hackers by date
|Next:||From: Robert Haas||Date: 2008-10-28 14:26:42|
|Subject: Re: WIP patch: convert SQL-language functions to return tuplestores|
|Previous:||From: Simon Riggs||Date: 2008-10-28 14:22:15|
|Subject: Re: Visibility map, partial vacuums|