Skip site navigation (1) Skip section navigation (2)

Re: [HACKERS] Resurrecting per-page cleaner for btree

From: Jim Nasby <jnasby(at)pervasive(dot)com>
To: Hannu Krosing <hannu(at)skype(dot)net>
Cc: Martijn van Oosterhout <kleptog(at)svana(dot)org>, Greg Stark <gsstark(at)mit(dot)edu>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Csaba Nagy <nagy(at)ecircle-ag(dot)com>, postgres hackers <pgsql-hackers(at)postgresql(dot)org>, ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, pgsql-patches(at)postgresql(dot)org, teramoto(dot)junji(at)lab(dot)ntt(dot)co(dot)jp
Subject: Re: [HACKERS] Resurrecting per-page cleaner for btree
Date: 2006-07-27 15:34:59
Message-ID: 5F6080C2-F248-40E4-9122-EF076950B9F9@pervasive.com (view raw or flat)
Thread:
Lists: pgsql-hackerspgsql-patches
On Jul 26, 2006, at 4:29 PM, Hannu Krosing wrote:
>>> Well the desire for it comes from a very well established need  
>>> for dealing
>>> with extremely large tales with relatively small hot spots. The  
>>> basic problem
>>> being that currently the cost of vacuum is proportional to the  
>>> size of the
>>> table rather than the amount of dead space. There's no link  
>>> between those
>>> variables (at least in one direction) and any time they're far  
>>> out of whack it
>>> means excruciating pain for the DBA.
>>
>> I thought the suggested solution for that was the dead space map.  
>> That
>> way vacuum can ignore parts of the table that havn't changed...
>
> It can ignore parts of the *table* but still has to scan full  
> *indexes*.

Even if we stopped right there it would still be a huge win in many  
(most?) cases. How often do the indexes on a table comprise even 50%  
of the table's size? If indexes are 10% of the table's size, and you  
only scan 10% of the table that's dead, you've gone from scanning  
1.10*X pages (where X is the number of pages in the table) to 0.20*X  
pages. For large tables, that's a drastic improvement. Even in the  
50% case, you've gone from 1.5X to .6X (assuming 10% dead space). And  
I'm actually ignoring that we currently scan the heap twice; if you  
throw that into the mix the argument for doing this is even stronger.

While it would be ideal to eliminate the need to scan the indexes,  
I'd certainly rather have the 'half-solution' of not scanning the  
entire heap and still scanning the indexes over what we have today.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby(at)pervasive(dot)com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461



In response to

Responses

pgsql-hackers by date

Next:From: Florian G. PflugDate: 2006-07-27 15:39:21
Subject: Re: GUC with units, details
Previous:From: Csaba NagyDate: 2006-07-27 15:31:19
Subject: Re: GUC with units, details

pgsql-patches by date

Next:From: Rocco AltierDate: 2006-07-27 15:35:26
Subject: Re: [HACKERS] [COMMITTERS] pgsql: /contrib/cube improvements:
Previous:From: Joshua ReichDate: 2006-07-27 15:04:22
Subject: Re: [HACKERS] [COMMITTERS] pgsql: /contrib/cube improvements:

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group