On Thu, Feb 23, 2012 at 3:34 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On Wed, Feb 22, 2012 at 10:02 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Wed, Feb 22, 2012 at 2:23 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>>> The industry accepted description for non-sequential access is "random
>>> access" whether or not the function that describes the movement is
>>> entirely random. To argue otherwise is merely hairsplitting.
>> I don't think so.
> PostgreSQL already uses a parameter called "random_page_cost" to
> describe non-sequential access. Perhaps that is wrong and we need a
> third parameter?
>> For example, a bitmap index scan contrives to speed
>> things up by arranging for the table I/O to happen in ascending block
>> number order, with skips, rather than in random order, as a plain
>> index scan would do, and that seems to be a pretty effective
>> technique. Except to the extent that it interferes with the kernel's
>> ability to do readahead, it really can't be to read blocks 1, 2, 3, 4,
>> and 5 than to read blocks 1, 2, 4, and 5. Not reading block 3 can't
>> require more effort than reading it.
> By that argument, ANALYZE never could run longer than VACUUM ANALYZE,
> so you disagree with Tom and I and you can't explain Pavel's
> cost_bitmap_heap_scan() uses "random_page_cost" to evaluate the cost
> of accessing blocks, even though the author knew the access was in
> ascending block number order. Why was that?
> Note that the cost_bitmap_heap_scan() cost can be > than
> cost-seqscan() for certain parameter values.
I think all three of us are saying more or less the same thing in
slightly different words, so I'd rather not have an argument about
this one. But you're right: I can't explain Pavel's results, unless
doing ANALYZE before VACUUM is causing skip-block reads that defeat
the kernel's read-ahead detection. I think it's fairly self-evident
that reading a fixed-size subset of the pages in ascending order can't
*in general* be more expensive than reading all of an arbitrarily
large table, and so I believe we're all in agreement that the behavior
he observed is unusual. As to the cost estimation stuff, we use
random_page_cost as an approximation: there may be a head seek
involved, but to do better we'd have to estimate the likely length of
the seek based on the number of blocks skipped, something we currently
view as irrelevant, and it's not clear that it would improve the
quality of the estimate very much - there are other, probably larger
sources of error, such as the fact that the sequential logical block
number doesn't imply sequential physical position on the platter,
since the OS often fragments the file, especially (I think) on
The Enterprise PostgreSQL Company
In response to
pgsql-hackers by date
|Next:||From: Alvaro Herrera||Date: 2012-02-23 15:04:14|
|Subject: Re: foreign key locks, 2nd attempt|
|Previous:||From: Tom Lane||Date: 2012-02-23 14:41:42|
|Subject: Re: overriding current_timestamp |