Quick Links

Re: [HACKERS] Resurrecting per-page cleaner for btree

From:	"Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To:	Greg Stark <gsstark(at)mit(dot)edu>
Cc:	Hannu Krosing <hannu(at)skype(dot)net>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Csaba Nagy <nagy(at)ecircle-ag(dot)com>, postgres hackers <pgsql-hackers(at)postgresql(dot)org>, ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, pgsql-patches(at)postgresql(dot)org, teramoto(dot)junji(at)lab(dot)ntt(dot)co(dot)jp
Subject:	Re: [HACKERS] Resurrecting per-page cleaner for btree
Date:	2006-07-28 17:11:48
Message-ID:	20060728171147.GP66525@pervasive.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers pgsql-patches

On Thu, Jul 27, 2006 at 05:24:35PM -0400, Greg Stark wrote:
>
> Jim Nasby <jnasby(at)pervasive(dot)com> writes:
>
> > Even if we stopped right there it would still be a huge win in many (most?)
> > cases. How often do the indexes on a table comprise even 50% of the table's
> > size?
>
> I would say they're usually roughly comparable actually. It depends on how
> wide your table is of course but the wider your table rows the more indexes
> you're likely to have on the table too.

I think the number of fields in a table will correlate with the number
of indexes, but I don't think the width of those fields matters.

> > Even in the 50% case, you've gone from 1.5X to .6X
>
> Sure, and a 3x speedup is nothing to sneeze at, that would be a great
> improvement to vacuum. But it's still just a linear speedup and doesn't
> address the algorithmic problem.
>
> The fundamental problem is we have a process that's O(m) where m is the total
> space taken by a table and its indexes. The actual amount of space it has to
> reclaim is n. Other than n<m there's basically no relationship between these
> figures. As long as that's the case vacuum may as well be O(n^2) or O(n!).
>
> We frequently assume -- and often it's a valid assumption -- that these
> figures are roughly proportional. Hence all the talk about databases reaching
> a "steady-state" where the amount of dead space is constantly being reclaimed
> at more or less the same speed it's being generated. But there are also plenty
> of use cases where a complete vacuum pass takes thousands of times longer than
> the i/o it took to generate those dead tuples. Currently Postgres just isn't
> that great a tool for those use cases.

This is exactly why I'm suggesting that we stop waiting for the perfect
vacuum that will only hit the exact tuples in both the heap and indexes
that it needs to and at least take the giant leap forward of only
hitting heap tuples/pages that we know are dead. Even if indexes are the
same size as the heap, you've still gained nearly a 2x speed improvement
(depending on how long you wait to vacuum).

> Unfortunately while I'm convinced of the problem I'm equally unconvinced of
> the solution. I tried to solve online index builds using retail index lookups
> in a very similar way to what's being discussed here. And ran into the same
> problems. I eventually decided that while it could be made to work that way it
> would be far too much code, far too unsafe, and far too invasive in the index
> access methods to be the right approach.
>
> Our existing method works with minimal help from the index access methods
> which allows for an enormous degree of freedom in the index design.. To be
> able to support retail vacuum you would have to force index method
> implementors to keep information in a way that allowed them to look up a
> particular value/tid efficiently which would limit the kinds of indexes you
> could support drastically.
>
> --
> greg
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>

--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

In response to

Re: [HACKERS] Resurrecting per-page cleaner for btree at 2006-07-27 21:24:35 from Greg Stark

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Jim C. Nasby	2006-07-28 17:14:49	Re: Hash indexes (was: On-disk bitmap index patch)
Previous Message	Stephen Frost	2006-07-28 17:06:15	Re: Role incompatibilities

Browse pgsql-patches by date

	From	Date	Subject
Next Message	Jim C. Nasby	2006-07-28 17:38:41	Re: The vacuum-ignore-vacuum patch
Previous Message	Bruce Momjian	2006-07-28 16:22:54	Re: [HACKERS] pgstattuple extension for indexes