I think the most important points are still that:
1. The WAL write should be happening asynchronously (if that is possible)
2. There should be an option do not perform these compactions if the page is only touched by reads.
(Assuming that when most of the databaseresides in the cache these optimizations are less important.)
----- Original Message -----
From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: lars <lhofhansl(at)yahoo(dot)com>; Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>; Ivan Voras <ivoras(at)freebsd(dot)org>; pgsql-performance(at)postgresql(dot)org
Sent: Wednesday, July 27, 2011 7:15 AM
Subject: Re: [PERFORM] UPDATEDs slowing SELECTs in a fully cached database
On Wed, Jul 13, 2011 at 10:52 AM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
There are couple of other (very important) things that HOT does, but
probably its not advertised a lot. Even for non-HOT updates (which
means either indexed columns were changed or page ran out of free
space) or deletes, HOT prunes those tuples and instead mark the line
pointer as DEAD. The page is defragmented and dead space is recovered.
Each such dead tuple now only consumes two bytes in the page until
vacuum removes the dead line pointers. Thats the reason why OP is
seeing the behavior even when index columns are being updated.
We made a few adjustments to ensure that a page is not pruned too
early. So we track the oldest XID that did any updates/deletes to the
page and attempt pruning only when the RecentXmin is past the XID. We
also mark the page as "full" if some previous update did not find
enough free space to do in-block update and use that hint to decide if
we should attempt to prune the page. Finally, we prune only if we get
the cleanup lock without blocking.
What might be worth looking at this condition in pruneheap.c:
* We prune when a previous UPDATE failed to find enough space on the page
* for a new tuple version, or when free space falls below the relation's
* fill-factor target (but not less than 10%).
* Checking free space here is questionable since we aren't holding any
* lock on the buffer; in the worst case we could get a bogus answer. It's
* unlikely to be *seriously* wrong, though, since reading either pd_lower
* or pd_upper is probably atomic. Avoiding taking a lock seems more
* important than sometimes getting a wrong answer in what is after all
* just a heuristic estimate.
minfree = RelationGetTargetPageFreeSpace(relation,
minfree = Max(minfree, BLCKSZ / 10);
if (PageIsFull(page) || PageGetHeapFreeSpace(page) < minfree)
So if the free space in a page falls below the fill-factor or 10% of
the block size, we would try to prune the page. We probably need to
revisit this area and see if we need to tune HOT ever better. One
option could be to see how much space we are going to free and carry
out the operation only if its significant enough to justify the cost.
I know we had done several benchmarking tests while HOT development,
but the tuning mechanism still may not be perfect for all kinds of
work loads and it would probably never be.
In response to
pgsql-performance by date
|Next:||From: Tom Lane||Date: 2011-07-29 17:40:27|
|Subject: Re: Performance penalty when using WITH |
|Previous:||From: Kevin Grittner||Date: 2011-07-29 15:25:38|
|Subject: Re: [PERFORM] Queries related to checkpoints|