Re: Free-space-map management thoughts

From: Stephen Marshall <smarshall(at)wsi(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org, Stephen Marshall <smarshall(at)wsicorp(dot)com>
Subject: Re: Free-space-map management thoughts
Date: 2003-02-27 13:49:15
Message-ID: 3E5E175B.2030404@wsi.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom,

I'm happy to see your attentions turning back to the FSM. I like the
design, but I do have a few suggestions, particularly about how to
handle oversubscription of the FSM.

1. When the FSM is oversubscribed and one is trying to decide which
pages to keep, remember that page info is stored in groups of
CHUNK_SIZE pages, where CHUNK_SIZE is current 32. Thus, if you need to
store info for 1 page, you have already committed space in the FSM for
CHUNK_SIZE pages, so you might as well fill that chunk with valid page
information.

Such logic is not needed just for optimization, but also to prevent the
oversubscription logic from trying to use more chunks than the FSM has.

2. The histogram concept is a neat idea, but I think some reorganization
of the page information might make it unnecessary. Currently the FSM
pages are sorted by BlockNumber. This was particularly useful for
adding information about a single page, but since that interface is no
longer to be supported, perhaps the decision to sort by BlockNumber
should also be revisited.

If we sort the page info by available space, we could then use binary
search to find space thresholds when we are handling oversubscription.
I think this would be both faster and more exact than the histogram
approach.

Sorting by available space would make the sgmr code a bit less
efficient, as we would not be able to use binary search to skip to the
min block number provided in MultiRecordFreeSpace. However, lazy
vacuuming would be more efficient, as this function starts with pages
ordered by available space, then sorts the page info by block number
just prior to the call of MultiRecordFreeSpace.

Am I missing something that requires the FSM to be ordered by block number?

Yours,
Steve Marshall
-----
Tom Lane wrote:

>I've been thinking about improving the algorithm that the free space map
>(FSM) uses to decide what to store when it's not got enough shared
>memory to keep track of everything. The present design uses a dynamically
>adjusted threshold for each relation, throwing away pages whose free
>space amount is less than the threshold. This unfortunately seems not
>to work as well as I'd hoped when I wrote it :-(. In particular it
>cannot cope effectively with situations where many pages have exactly
>the same amount of free space --- it degenerates to all-or-nothing.
>This problem has become acute now that btree indexes use the FSM to
>keep track of free pages: by definition, all those pages have exactly
>the same amount of free space.
>
>I had some off-list discussions last fall with Steve Marshall, who
>was trying to improve the thresholding algorithm to work better, but
>what he ended up with seemed to me to have a lot of ad-hoc logic and
>magic numbers in it. So I wasn't real satisfied with that.
>
>This is a request for comments about the following redesign:
>
>1. Continue to keep track of the average request size seen by
>GetPageWithFreeSpace(), but make it a pure request-size average; don't
>muck it up with thresholding adjustments. Add an entry point to make
>the average request size for a relation available to callers. VACUUM
>can use this as a starting point for its internal decisions about which
>pages are even worth reporting to the FSM.
>
>2. Eliminate "retail" addition of FSM entries. RecordFreeSpace() isn't
>being used anyway, and we can restrict RecordAndGetPageWithFreeSpace()
>to only update existing entries not make new ones. Only wholesale
>replacement of a relation's FSM data, via MultiRecordFreeSpace(), need
>be supported as a way of adding page entries to FSM.
>
>3. With the above two changes, the numbers of pages passed to the FSM
>by MultiRecordFreeSpace() calls become useful statistics in themselves.
>We can keep track of those numbers in the FSM's per-relation statistics
>independently of the number of pages actually stored in FSM.
>
>4. In particular, the sum of the MultiRecordFreeSpace (henceforth MRFS)
>page counts gives us the total number of pages we would *like* to keep
>track of; the ratio of this number to the actual allocated max_fsm_pages
>is our "oversubscription ratio". (One thing we can do in passing is
>make VACUUM VERBOSE print these numbers, so that people finally have
>some intelligent way of adjusting their FSM parameters.)
>
>5. When FSM space is oversubscribed, we can divide each relation's MRFS
>requested page count by the oversubscription ratio to arrive at an exact
>target page count for each relation. This page count is stable as long
>as the space-allocation picture isn't changing much, which is a big
>improvement over the existing inherently history-dependent thresholding
>algorithm.
>
>6. A reasonably effective way to reduce a relation's stored page list
>to the target (or select out the pages to actually remember from an MRFS
>request) is as follows:
> * Prescan the page data to compute a histogram of available-space
> values, with maybe 32 bins.
> * Find the histogram bin whose inclusion would make us exceed the target
> page count. Set thresholdL = its lower edge value, thresholdU = its
> upper edge value, and bincount = target page count minus sum of counts
> in higher histogram bins.
> * Scan pages. Keep all those >= thresholdU; drop those < thresholdL;
> of the pages between the thresholds, keep the first bincount entries.
> This approach will drop some pages with more free space while keeping
> some with less --- but the difference between those dropped and those
> kept is no more than the bin width.
>
>This algorithm still requires some heuristics around the edges --- in
>particular, in step 5 we probably don't want a purely linear division of
>available space, but should allow some minimum number of pages for each
>relation before divvying up the leftover space. But it doesn't seem to
>need nearly as much ad-hoc tuning as the thresholding approach does.
>
>One thing this does not do that the original code tries to do is give
>preference to more-heavily-used relations --- in this approach, each
>table's share of FSM is determined only by its number of pages with
>free space. However, one could argue that that is an indirect measure
>of activity, since it certainly reflects past deletion/update activity.
>So it may be okay not to give any explicit preference to active tables.
>
>Comments anyone?
>
> regards, tom lane
>
>PS: Another idea I'm toying with is to dump out the FSM contents at
>postmaster shutdown and reload them at restart, so that the FSM doesn't
>have to start from ground zero on every restart cycle. But that's
>independent of the management algorithm...
>
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2003-02-27 13:50:48 Re: XML ouput for psql
Previous Message Christoph Haller 2003-02-27 12:49:49 Re: numeric datataypes as seperate library