Re: Vacuum, visibility maps and SKIP_PAGES_THRESHOLD

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Vacuum, visibility maps and SKIP_PAGES_THRESHOLD
Date: 2011-06-03 19:16:29
Message-ID: 201106031916.p53JGTC27199@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Heikki Linnakangas wrote:
> On 27.05.2011 16:52, Pavan Deolasee wrote:
> > On closer inspection, I realized that we have
> > deliberately put in this hook to ensure that we use visibility maps
> > only when we see at least SKIP_PAGES_THRESHOLD worth of all-visible
> > sequential pages to take advantage of possible OS seq scan
> > optimizations.
>
> That, and the fact that if you skip any page, you can't advance
> relfrozenxid.
>
> > My statistical skills are limited, but wouldn't that mean that for a
> > fairly well distributed write activity across a large table, if there
> > are even 3-4% update/deletes, we would most likely hit a
> > not-all-visible page for every 32 pages scanned ? That would mean that
> > almost entire relation will be scanned even if the visibility map
> > tells us that only 3-4% pages require scanning ? And the probability
> > will increase with the increase in the percentage of updated/deleted
> > tuples. Given that the likelihood of anyone calling VACUUM (manually
> > or through autovac settings) on a table which has less than 3-4%
> > updates/deletes is very low, I am worried that might be loosing all
> > advantages of visibility maps for a fairly common use case.
>
> Well, as with normal queries, it's usually faster to just seqscan the
> whole table if you need to access more than a few percent of the pages,
> because sequential I/O is so much faster than random I/O. The visibility
> map really only helps if all the updates are limited to some part of the
> table. For example, if you only recent records are updated frequently,
> and old ones are almost never touched.

I realize we just read the pages from the kernel to maintain sequential
I/O, but do we actually read the contents of the page if we know it
doesn't need vacuuming? If so, do we need to?

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2011-06-03 19:26:13 Re: [HACKERS] DOCS: SGML identifier may not exceed 44 characters
Previous Message Kevin Grittner 2011-06-03 19:11:21 Re: SIREAD lock versus ACCESS EXCLUSIVE lock