Re: Bitmapscan changes

From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Patches <pgsql-patches(at)postgresql(dot)org>, Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>
Subject: Re: Bitmapscan changes
Date: 2007-03-12 18:38:41
Message-ID: 45F59E31.5040700@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> Heikki Linnakangas <heikki(at)enterprisedb(dot)com> writes:
>> Tom Lane wrote:
>>> In the second place, this seems to
>>> forever kill the idea of indexscans that don't visit the heap --- not
>>> that we have any near-term prospect of doing that, but I know a lot of
>>> people remain interested in the idea.
>
>> I'm certainly interested in that. It's not really needed for clustered
>> indexes, though. A well-clustered index is roughly one level shallower,
>> and the heap effectively is the leaf-level, therefore the amount of I/O
>> you need to fetch the index tuple + heap tuple, is roughly the same that
>> as fetching just the index tuple from a normal b-tree index.
>
> That argument ignores the fact that the heap entries are likely to be
> much wider than the index entries, due to having other columns in them.

True, that's the "roughly" part. It does indeed depend on your schema.
As a data point, here's the index sizes (in pages) of a 140 warehouse
TPC-C database:

index name normal grouped % of normal size
--------------------------------------
i_customer 31984 29250 91.5%
i_orders 11519 11386 98.8%
pk_customer 11519 1346 11.6%
pk_district 6 2
pk_item 276 10 3.6%
pk_new_order 3458 42 1.2%
pk_order_line 153632 2993 1.9%
pk_orders 11519 191 1.7%
pk_stock 38389 2815 7.3%
pk_warehouse 8 2

The customer table is an example of pretty wide table, there's only ~12
tuples per page. pk_customer is still benefiting a lot. i_customer and
i_orders are not benefiting because the tables are not in the index
order. The orders-related indexes are seeing the most benefit, they
don't have many columns.

>> I think we could still come up with some safe condiitions when we could
>> enable it by default, though.
>
> At this point I'm feeling unconvinced that we want it at all. It's
> sounding like a large increase in complexity (both implementation-wise
> and in terms of API ugliness) for a fairly narrow use-case --- just how
> much territory is going to be left for this between HOT and bitmap indexes?

I don't see how HOT is overlapping with clustered indexes. On the
contrary, it makes clustered indexes work better, because it reduces the
amount of index inserts needed and helps to keep a table clustered.

The use cases for bitmap indexes and clustered indexes do overlap
somewhat. But clustered indexes have an edge because:
- there's no requirement of having only a small number of distinct values
- they support uniqueness checks
- you can efficiently have a mixture of grouped and non-grouped tuples,
if your table is only partly clustered

In general, clustered indexes are more suited for OLTP work than bitmap
indexes.

> I particularly dislike the idea of having the index AM reaching directly
> into the heap --- we should be trying to get rid of that, not add more
> cases.

I agree. The right way would be to add support for partial ordering and
candidate matches to the indexam API, and move all the sorting etc.
ugliness out of the indexam. That's been on my TODO since the beginning.

If you're still not convinced that we want this at all, how would you
feel about the another approach I described? The one where the
in-heap-page order is stored in the index tuples, so there's no need for
sorting, at the cost of losing part of the I/O benefit.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2007-03-12 18:40:17 Re: Bug: Buffer cache is not scan resistant
Previous Message Jonathan Scher 2007-03-12 18:33:00 To connect a debbuger...

Browse pgsql-patches by date

  From Date Subject
Next Message Alvaro Herrera 2007-03-12 19:32:35 Re: Packed varlena patch update
Previous Message Tom Lane 2007-03-12 17:56:50 Re: Bitmapscan changes