Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> Zdenek Kotala wrote:
>> Second idea is that new FSM creates heavy defragmented data and index
>> scan needs to jump from one page to another too often.
> Hmm. That's remotely plausible, I suppose. The old FSM only kept track
> of pages with more than avg. request size of free space, but the new FSM
> tracks even the smallest free spots. Is there tables in that workload
> that are inserted to, with very varying row widths?
I'm not sure I buy that either. But after thinking a bit about how
search_avail() works, it occurs to me that it's not doing what the old
code did and that might contribute to contention. The old FSM did a
cyclic search through the pages it knew about, so as long as there were
plenty of pages with "enough" free space, different backends would
always get pointed to different pages. But consider what the algorithm
is now. (For simplicity, consider only the behavior on a leaf FSM page.)
* Starting from the "next" slot, bubble up to parent nodes until finding
a parent showing enough space.
* Descend to the *leftmost* leaf child of that parent that has enough
* Point "next" to the slot after that, and return that page.
What this means is that if we start with "next" pointing at a page
without enough space (quite likely considering that we now index all
pages not only those with free space), then it is highly possible that
the search will end on a page *before* where next was. The most trivial
case is that we have an even-numbered page with a lot of free space and
its odd-numbered successor has none --- in this case, far from spreading
out the backends, all comers will be handed back that same page! (Until
someone reports that it's full.) In general it seems that this behavior
will tend to concentrate the returned pages in a small area rather than
allowing them to range over the whole FSM page as was intended.
So the bottom line is that the "next" addition doesn't actually work and
needs to be rethought. It might be possible to salvage it by paying
attention to "next" during the descent phase and preferentially trying
to descend to the right of "next"; but I'm not quite sure how to make
that work efficiently, and even less sure how to wrap around cleanly
when the starting value of "next" is near the last slot on the page.
regards, tom lane
In response to
pgsql-hackers by date
|Next:||From: Tom Lane||Date: 2008-09-20 00:02:59|
|Subject: Re: PostgreSQL future ideas |
|Previous:||From: Gevik Babakhani||Date: 2008-09-19 23:03:50|
|Subject: Re: PostgreSQL future ideas|