Re: Vacuum, visibility maps and SKIP_PAGES_THRESHOLD

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Vacuum, visibility maps and SKIP_PAGES_THRESHOLD
Date: 2011-05-27 14:11:17
Message-ID: 4DDFB105.9010806@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 27.05.2011 16:52, Pavan Deolasee wrote:
> On closer inspection, I realized that we have
> deliberately put in this hook to ensure that we use visibility maps
> only when we see at least SKIP_PAGES_THRESHOLD worth of all-visible
> sequential pages to take advantage of possible OS seq scan
> optimizations.

That, and the fact that if you skip any page, you can't advance
relfrozenxid.

> My statistical skills are limited, but wouldn't that mean that for a
> fairly well distributed write activity across a large table, if there
> are even 3-4% update/deletes, we would most likely hit a
> not-all-visible page for every 32 pages scanned ? That would mean that
> almost entire relation will be scanned even if the visibility map
> tells us that only 3-4% pages require scanning ? And the probability
> will increase with the increase in the percentage of updated/deleted
> tuples. Given that the likelihood of anyone calling VACUUM (manually
> or through autovac settings) on a table which has less than 3-4%
> updates/deletes is very low, I am worried that might be loosing all
> advantages of visibility maps for a fairly common use case.

Well, as with normal queries, it's usually faster to just seqscan the
whole table if you need to access more than a few percent of the pages,
because sequential I/O is so much faster than random I/O. The visibility
map really only helps if all the updates are limited to some part of the
table. For example, if you only recent records are updated frequently,
and old ones are almost never touched.

> Do we have any numbers to prove what we have today is good ? Sorry, I
> may not have followed the discussions very closely in the past and not
> sure if this has been debated/tested already.

I think that number was chosen quite arbitrary. When you consider
updating relfrozenxid, it's a bit difficult to decide what the optimal
value would be; if you decide to skip pages you might have to perform an
extra anti-wraparound somewhere down the line.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2011-05-27 14:34:53 Re: What is the best and easiest implementation to reliably wait for the completion of startup?
Previous Message Cédric Villemain 2011-05-27 14:07:25 Re: Vacuum, visibility maps and SKIP_PAGES_THRESHOLD