Re: Reviewing new index types (was Re: [PATCHES] Updatedbitmap indexpatch)

From: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Reviewing new index types (was Re: [PATCHES] Updatedbitmap indexpatch)
Date: 2007-07-23 22:11:55
Message-ID: 1185228715.4284.433.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

On Mon, 2007-07-23 at 17:19 -0400, Tom Lane wrote:
> "Simon Riggs" <simon(at)2ndquadrant(dot)com> writes:
> > ... BMI is not useful at all
> > for PKs, whilst GIT is specifically designed to handle them.
>
> This seems a strange statement, because GIT doesn't look particularly
> efficient for unique indexes AFAICS. In the worst case you'd have to
> look individually at each tuple on a heap page to check for uniqueness
> conflict (no binary search, because you couldn't assume they are
> ordered).

That is one of a few heuristics about the patch that need some active
discussion, so I'm glad you asked.

The main use case is nearly-unique, so for cases where we have a
Master:Detail relationship, e.g. Order:OrderItem. The Order index is a
PK, with the OrderItem index as a nearly unique key. The index is not
brilliant for the Order index, but is good for the OrderItem index.

Heikki designed the grouping so that there is a state change between
non-grouped and non-grouped (normal) index entries. By default the patch
uses a threshold of non-grouped -> grouped at N=2 index entries and then
no limit on the number of rows/block. Currently you can tune N, but we
might also envisage setting a limit on the width of the range of values
to limit the number of tids stored in a grouped index entry. That could
control the uniqueness overhead.

On an I/O bound workload the space saving on the index outweighs the CPU
loss from uniqueness checking. When I/O is not an issue then
unfortunately there is a CPU overhead.

For GIT it would appear that the summary is that it gives a slight loss
on medium sized PK indexes and an increasing win as index size
increases. We struggled to come up with ways of making it Just Work with
as few parameters as possible.

> > B-TREE INDEXES (Integers)
>
> > Rows/value Best time Size in blocks
> > 10000000 49s 21899
> > 1000000 49s 21899
> > 100000 49s 21899
> > 10000 47s 21899
> > 1000 43s 21899
> > 100 38s 21899
> > 10 38s 21899
> > 1 33s 21899
>
> Surely the GIT code failed to kick in at all here? That's just about
> exactly the index size I'd expect for 10 million integers with the
> existing btree code (at least when MAXALIGN=4).

That was the b-tree test, i.e. the control. The GIT patch has bitrot, so
not able to test just yet.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Wong 2007-07-24 00:04:48 Re: Why so many out-of-disk-space failures on buildfarm machines?
Previous Message Andrew Dunstan 2007-07-23 21:36:07 Re: COPYable logs

Browse pgsql-patches by date

  From Date Subject
Next Message Tom Lane 2007-07-23 22:59:57 Re: Async Commit, v21 (now: v22)
Previous Message Andrew Dunstan 2007-07-23 21:36:07 Re: COPYable logs