Tom Lane wrote:
> "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com> writes:
> > Tom argued that following the tuple chain is cheap enough, and might
> > even be cheaper than what we have now, that we don't need to prune just
> > for the purpose of keeping the chains short. To which I pointed out that
> > currently, without HOT, we mark index tuples pointing to dead tuples as
> > killed to avoid following them in the future, so HOT without pruning is
> > not cheaper than what we have now.
> That hack only works in plain indexscans, though, not bitmapped scans.
> Anyway, I remain unconvinced that the chains would normally get very
> long in the first place, if we could prune when updating.
> The we-already-pinned-the-page problem is a bit nasty but may not be
As I understand it, there are two HOT features:
Single-chain pruning, which trims HOT chains but doesn't reuse
Defragementation, which prunes the entire page and reuses space
and handles deleted rows, etc.
Defragementation is the HOT feature we really want. Single-chain
pruning is kind of nice because it speeds things up, but isn't
necessary. The fact that the entire chain is on the same page makes me
think that we could just leave single-chain pruning for 8.4 if
I think allowing the chain to be more often on the same page via
defragmentation and having a single index entry for the chain is going
to be a win, and the fact we don't have marked-dead index entries for
some of the chain isn't going to be a problem. My guess is that the
marked-dead index entries were a win only when the chain was on several
pages, which isn't the case for HOT chains.
FYI, I saw this comment in the patch:
+ * If the free space left in the page is less than the average FSM
+ * request size (or a percentage of it), prune all the tuples or
+ * tuple chains in the page. Since the operation requires exclusive
+ * access to the page and needs to be WAL logged, we want to do as
+ * much as possible. At the same time, since the function may be
+ * called from a critical path, we want it to be as fast as
+ * possible.
+ * Disregard the free space if PAGE_PRUNE_DEFRAG_FORCE option is set.
+ * XXX The value of 120% is a ad-hoc choice and we may want to
+ * tweak it if required:
+ * XXX The average request size for a relation is currently
+ * initialized to a small value such as 256. So for a table with
+ * large size tuples, during initial few UPDATEs we may not prune
+ * a page even if the free space available is less than the new
+ * tuple size - resulting in unnecessary extention of the relation.
+ * Add a temporary hack to prune the page if the free space goes
+ * below a certain percentage of the block size (set to 12.5% here))
So this is how the system determines if it should defrag the whole page.
The defrag function is heap_page_prune_defrag(). The big downside of
this function is it has to get a lock to survey things and it often has
to guess if it should activate or not, meaning it has no idea if free
space is needed on this page or not.
In summary, I feel we have the HOT mechanics down well, but the open
issue is _when_ to activate each operation.
(Can someone time the access time for following a chain that fills an
entire page (the worst case) vs. having a single tuple on the page?)
In an ideal world, we would prune single chains only when they were long
enough to cause a performance impact, and would defragment only when a
new row will not fit on the page. Other than those two cases, we don't
care how much dead space there is on a page.
However, there are two complexities to this. One, we can't be sure we
can defragment when we need it because we might not get the lock, and
second, we are only going to try to put a row on a page if we are
updating a row on that page. If the page is 90% dead but no rows are
being updated on that page no one will try to add a row to the page
because FSM thinks it is full. That might be OK, it might not.
Another issue. My guess is that it will take 2-3 weeks to get HOT
applied, meaning we aren't going to go to beta before October 1.
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
+ If your life is a hard drive, Christ can be your backup. +
In response to
pgsql-patches by date
|Next:||From: Andrew Dunstan||Date: 2007-09-09 04:02:28|
|Subject: invalidly encoded strings|
|Previous:||From: Tom Lane||Date: 2007-09-09 03:02:25|
|Subject: Re: HOT patch - version 15 |