Re: really lazy vacuums?

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jesper Krogh <jesper(at)krogh(dot)cc>
Cc: Jim Nasby <jim(at)nasby(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: really lazy vacuums?
Date: 2011-03-17 14:02:07
Message-ID: AANLkTinNX1Kx8TuLMiHW6foXyrs8Uqb=X=_3i2Vq2F7M@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 17, 2011 at 4:17 AM, Jesper Krogh <jesper(at)krogh(dot)cc> wrote:
> Is it obvious that the visibillity map bits should track complete
> pages and not individual tuples? If the visibillity map tracks at
> page-level the benefit would fall on "slim tables" where you squeeze
> 200 tuples into each page and having an update rate of 1% would
> lower the likelyhood even more. (it may be that for slim tables the
> index-only-scans are not as benefitial as to wide tables).

I'm not sure exactly what MaxHeapTuplesPerPage works out to be, but
say it's 200. If you track visibility info per tuple rather than per
page, then the size of the visibility map is going to expand by a
factor of 200. That might decrease contention, but otherwise it's a
bad thing - the whole point of having the visibility map in the first
place is that it's much, much smaller than the heap. If it were the
same size as the heap, we could just read the heap. What the map
attempts to accomplish is to allow us, by reading a small number of
pages, to check whether the tuples we're thinking of reading are
likely to be all-visible without actually looking at them.

> In collaboration with a vacuuming discussion, I dont know if it
> is there allready but how about "opportunistic vacuuming". Say
> you have a page what due to changes in one of the tuples are
> being written out, will it, while being written out anyway get the
> other tuples on the page vacuumed?

The really lazy kind of vacuuming I'm talking about could be done this
way. Regular vacuuming cannot, because you can't actually prune a
dead tuple until you've scanned all the indexes for references to that
CTID. The obvious place to do this would be the background writer: if
the page is dirty anyway and we're about to evict it, we could decide
to (1) set hint bits, (2) set visibility map bits, (3) freeze tuples
that need freezing, and (4) identify dead tuples and reclaim the space
they use, but not the associated line pointer.

Sadly, I'm not sure this would help much. If we have, say, a 4MB
relation, you're not even going to notice it when vacuum comes along
and does its thing. Even a vacuum freeze is chump change. The
problem is with big relations, like say 1GB+. Processing the whole
table at once can have a material impact on system performance, so
it'd be nice to do some work incrementally. But it's likely that
doing it opportunistically as you evict things from shared buffers is
only going to help here and there. Even if you optimistically assumed
that we could opportunistically do 10% of the vacuuming that way,
that's still not much of a dent. And I think it'd probably be less
than that in most real-world cases. A further problem is that the
background writer is already a pretty busy process, and giving it more
things to do isn't very appealing from the standpoint of overall
system performance.

> It actually dont have to hook into the process directly to benefit
> the IO-usage, if it just can get the opportunity to do it before
> the page gets evicted from the OS-cache, then it would save a
> second read on that page, but it seems way harder to do something
> sane around that assumption.

Yeah.

> Really lazy vacuums would "only" benefit "really static tables" ?  where
> vacuuming is not that big a problem in the first place.

I think the benefit would be tables that are either quite large (where
the ability to do this incrementally would be an advantage over
regular VACUUM) or insert-only (where we currently have no way to get
PD_ALL_VISIBLE bits set without user intervention).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2011-03-17 14:03:43 Re: volatile markings to silence compilers
Previous Message Alvaro Herrera 2011-03-17 13:46:32 Re: Rectifying wrong Date outputs