Tom Lane wrote:
> ... and not guaranteeing to reset theaccess pattern on failure, either.
Good catch, I thought I had that covered but apparently not.
> I think we've got to get rid of the global variable and make the access
> pattern be a parameter to StrategyGetBuffer, instead. Which in turn
> suggests a variant ReadBuffer, maybe ReadBufferWithStrategy(), at the
> next level up. This would serve directly for the heapscan case, and
> for the main heap accesses in vacuum/analyze, but it might be a bit
> of a pain to make index vacuuming pay attention to the strategy.
I thought of that as well, but settled on the global variable for that
same reason, the UnpinBuffer problem, and the fact that we'd end up with
quite many different ReadBuffer variants: ReadBuffer, ReadOrZeroBuffer,
ReleaseAndReadBufferWithStrategy (if we care about that).
Index vacuum code for all the indexams could be changed to use the
ReadBufferWithStrategy if we took that approach, though there's quite a
many of those calls in total. We could use a hybrid approach, where you
could use ReadBufferWithStrategy to give the strategy on page-by-page
basis, or use SetStrategyHintVacuum/SetAccessPattern to set it for all
> The other case that the patch addresses is COPY TO REL, which we could
> handle if we were willing to pass a strategy parameter down through
> heap_insert and RelationGetBufferForTuple ... is it worth the trouble?
> I don't recall anyone having presented measurements to show COPY TO REL
> being helped by this patch.
The theory was that it'll reduce the cache-spoiling of COPY, just like
it does for selects. I can run tests on that.
> I don't especially care for the ring data structure being global inside
> freelist.c, either. What I'm inclined to do is have the "access
> strategy parameter" actually be a pointer to some struct type, with NULL
> representing the default strategy. At the beginning of a bulk operation
> we'd create an access strategy object, which would be internally the
> current Ring data structure, but it'd be opaque to code outside
> freelist.c. This'd mean that a query using two large seqscans would use
> two rings not one, but that's not necessarily bad; it would in any case
> make it a lot easier to generalize the strategy concept in future to
> support more than one kind of non-default policy being active in
> different parts of a query.
I thought about the two large scans scenario but came to the conclusion
that it's not possible to have two large scans active at the same time
in a single backend. You could have a query like "SELECT * FROM
bigtable_1, bigtable_2" with a cartesian product, or something with a
function that scans a large table, but even then one of the scans would
progress veery slowly compared to the other scan, and using more than
one ring buffer would make no difference.
I agree on the generalization argument, though.
> A point I have not figured out how to deal with is that in the patch as
> given, UnpinBuffer needs to know the strategy; and getting it that info
> would make the changes far more invasive. But the patch's behavior here
> is pretty risky anyway, since the strategy global at the time of unpin
> might have nothing to do with how it was set when the buffer was
> acquired. What I'm tempted to do is remove the special case there and
> adjust buffer acquisition instead (maybe make it decrement the
> usage_count when re-using a buffer from the ring).
It's assumed that the same strategy is used when unpinning, which is
true for the current usage (and apparently needs to be documented).
Normally buffers that are in the ring are recycled quickly, in which
case the usage_count will always be increased from 0->1 in UnpinBuffer,
regardless of the check. The check is there to avoid inflating the
usage_count on buffers that happened to be already in shared_buffers. It
might not be that important for the selects, but that's what keeps COPY
from bumping the usage_count all the way up to 5.
If we figure out another way to deal with the COPY usage_count, maybe we
could remove the check altogether. I've been thinking of changing COPY
anyway so that it always kept the last page it inserted to pinned, to
avoid the traffic to buffer manager on each tuple.
> Comments? I'm still playing with the ideas at this point...
Let me know if you want me to make changes and submit a new version.
In response to
pgsql-patches by date
|Next:||From: Heikki Linnakangas||Date: 2007-05-28 20:25:31|
|Subject: Re: Logging checkpoints and other slowdown causes|
|Previous:||From: Tom Lane||Date: 2007-05-28 19:38:42|
|Subject: Re: boolean <=> text explicit casts |