Skip site navigation (1) Skip section navigation (2)

Re: Have vacuum emit a warning when it runs out of maintenance_work_mem

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, "Jim C(dot) Nasby" <jim(at)nasby(dot)net>, Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>, pgsql-patches(at)postgresql(dot)org, Guillaume Smet <guillaume(dot)smet(at)gmail(dot)com>, Jim Nasby <decibel(at)decibel(dot)org>
Subject: Re: Have vacuum emit a warning when it runs out of maintenance_work_mem
Date: 2008-03-11 18:05:29
Message-ID: 200803111805.m2BI5Ta08146@momjian.us (view raw or flat)
Thread:
Lists: pgsql-patches
Added to TODO for VACUUM:

* Consider a more compact data representation for dead rows

>   http://archives.postgresql.org/pgsql-patches/2007-05/msg00143.php


---------------------------------------------------------------------------

Tom Lane wrote:
> Heikki Linnakangas <heikki(at)enterprisedb(dot)com> writes:
> >>> Or we could switch to a more compact representation of the dead tuples, 
> >>> and not need such a big maintenance_work_mem in the first place.
> 
> > One idea is to use a compressed bitmap like in the bitmap index patch, 
> > and a tree of block numbers or ranges to allow random access to it.
> 
> I thought a bit about that but it doesn't seem tremendously appealing,
> at least not as the only representation, because it's not more compact
> for small numbers of dead tuples per page.  (And we don't have the "out"
> of switching to lossy representation.)
> 
> Here's a design sketch that works if we are willing to limit VACUUM's
> usable maintenance_work_mem to 4GB:
> 
> 1. Store an array of page numbers plus offsets into a second working
> array of uint16 (the offsets are 32 bits, whence the 4GB limitation).
> This requires 8 bytes per page-with-dead-tuples, and since it will be
> built in order as a byproduct of our scanning order, it can be
> binary-searched on the page number.
> 
> 2. The offset part of the per-page entry points at a segment of the
> uint16 array belonging to this page.  It can have one of 2 formats.
> For a small number of dead tuples on the page, we just store an
> array of line numbers.  For a larger number, we store a bitmap
> showing the positions of dead tuples.  While we scan a page, we
> accumulate dead tuple line numbers in a small working array, and
> then either copy those to the large array or build a bitmap from
> them, depending on which will be smaller.  Since the offsets into
> the uint16 array will always be even, we can usurp the low-order
> bit of the pointer word to distinguish which representation is
> stored.
> 
> 3. You might think we need to spend an additional word storing
> how many line numbers or bitmap words there are per page, but
> we can save that space by comparing offsets of adjacent entries
> in the per-page array, since we know they're stored adjacently.
> 
> I envision the per-page array as being built upwards from the bottom of
> a single large maintenance_work_mem-sized array, and the uint16 array
> data as being filled from the top down, and whenever the two pointers
> are in danger of crossing, we stop and do an index vacuum cycle, just
> like in the current logic.  This lets us avoid having to guess in
> advance how much per-page versus per-tuple space we will need.  Note
> this means the end of a page entry's uint16 data is determined by
> looking at the prior page entry's offset instead of the next one,
> but that seems no big problem.
> 
> So the lookup process involves a binary search on the page number only,
> and then either a scan of the tuple line numbers or a single bitmap
> probe.  (We could make the scan be a binary search, but since that
> representation will only be used with small numbers of tuples, it'd
> probably not be any faster than a simple search loop.)  AFAICS that
> ought to be as fast or faster than the current lookup methodology;
> significantly faster where there are many dead tuples per page.
> 
> The space requirements are:
> 
> 	No dead tuples on page		0 bytes  (same as now)
> 	1 dead tuple on page		10 bytes (vs 6 now)
> 	2 dead tuples			12 bytes (same as now)
> 	3 dead tuples			14 bytes (vs 18 now)
> 
> and so on, except that for typical table densities of say 100 tuples per
> page, we will switch over to the bitmap representation at 6 or so dead
> tuples per page, and so the requirement will never go beyond about 20
> bytes per page whereas the current method could get as bad as 600 bytes
> for an all-dead page.
> 
> What this says is that it's not worth changing if you expect low
> dead-tuple densities (which IIRC was the assumption I made when I
> designed the current representation).  In a table with an average of
> less than 1 dead tuple per page, this way is a loser.  OTOH, with very
> low dead-tuple densities it may hardly matter, since you're going to
> have to scan many GB of heap before you fill maintenance_work_mem
> anyway.
> 
> If you assume that a more typical scenario is vacuuming after 10%
> or so tuple "churn", then there would be 10 or so dead tuples per
> page, which makes this a good win: about 20 vs about 60 bytes per
> page, with the win going up the longer vacuum is delayed.
> 
> HOT would take away some of the win, probably, but I'm not sure
> how much.
> 
> Comments?  Can anyone invent a better data structure?
> 
> 			regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
> 
>                http://archives.postgresql.org

-- 
  Bruce Momjian  <bruce(at)momjian(dot)us>        http://momjian.us
  EnterpriseDB                             http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

In response to

pgsql-patches by date

Next:From: Joshua D. DrakeDate: 2008-03-11 18:07:27
Subject: Re: Patch for Prevent pg_dump/pg_restore from being affected by statement_timeout
Previous:From: Tom LaneDate: 2008-03-11 18:02:09
Subject: Re: [PATCHES] Fix for large file support (nonsegment mode support)

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group