Re: Vacuum, visibility maps and SKIP_PAGES_THRESHOLD

From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Vacuum, visibility maps and SKIP_PAGES_THRESHOLD
Date: 2011-05-27 15:13:42
Message-ID: BANLkTinOg-q-r7Zfvsdgs+h7OruK5nAV3Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, May 27, 2011 at 7:41 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> On 27.05.2011 16:52, Pavan Deolasee wrote:
>>
>> On closer inspection, I realized that we have
>> deliberately put in this hook to ensure that we use visibility maps
>> only when we see at least SKIP_PAGES_THRESHOLD worth of all-visible
>> sequential pages to take advantage of possible OS seq scan
>> optimizations.
>
> That, and the fact that if you skip any page, you can't advance
> relfrozenxid.

Hmm. For a significantly large table, wouldn't it be the case that we
would most likely skip one page somewhere ? Would it be better that we
instead do a full scan every once in a while instead of relying on a
not-so-well-understood heuristic ?

>
>> My statistical skills are limited, but wouldn't that mean that for a
>> fairly well distributed write activity across a large table, if there
>> are even 3-4% update/deletes, we would most likely hit a
>> not-all-visible page for every 32 pages scanned ? That would mean that
>> almost entire relation will be scanned even if the visibility map
>> tells us that only 3-4% pages require scanning ?  And the probability
>> will increase with the increase in the percentage of updated/deleted
>> tuples. Given that the likelihood of anyone calling VACUUM (manually
>> or through autovac settings) on a table which has less than 3-4%
>> updates/deletes is very low, I am worried that might be loosing all
>> advantages of visibility maps for a fairly common use case.
>
> Well, as with normal queries, it's usually faster to just seqscan the whole
> table if you need to access more than a few percent of the pages, because
> sequential I/O is so much faster than random I/O. The visibility map really
> only helps if all the updates are limited to some part of the table.

The vacuum scan is not a complete random scan. So I am not sure how
effective a complete seq scan be. May be we need to run some tests to
measure that too before we choose one over the other.

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB     http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Kevin Grittner 2011-05-27 15:19:18 Re: dblink crash on PPC
Previous Message Greg Stark 2011-05-27 15:10:19 Re: Vacuum, visibility maps and SKIP_PAGES_THRESHOLD