Re: On-disk bitmap index patch

From: "Jie Zhang" <jzhang(at)greenplum(dot)com>
To: "Bruce Momjian" <bruce(at)momjian(dot)us>
Cc: "Hannu Krosing" <hannu(at)skype(dot)net>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Gavin Sherry" <swm(at)linuxworld(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org, "Luke Lonergan" <LLonergan(at)greenplum(dot)com>
Subject: Re: On-disk bitmap index patch
Date: 2006-07-25 01:49:12
Message-ID: C0EAC6A8.8FC4%jzhang@greenplum.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 7/24/06 6:04 PM, "Bruce Momjian" <bruce(at)momjian(dot)us> wrote:

> Jie Zhang wrote:
>>> IIRC they quoted the cardinality of 10000 as something that is still
>>> faster than btree for several usecases.
>>>
>>> And also for AND-s of several indexes, where indexes are BIG, your btree
>>> indexes may be almost as big as tables but the resulting set of pages is
>>> small.
>>
>> Yeah, Hannu points it out very well -- the bitmap index works very well when
>> columns have low cardinalities and AND operations will produce small number
>> of results.
>
> What operations on columns of low cardinality produce a small number of
> results? That seems contradictory.

Let's see an example. Table 'T' includes two columns, 'p' and 's'. The
column 'p' has 100 distinct values, say p1-p100 and the column 'status' has
20 distinct values, say s1-s20. The query

'select * from order where priority=p1 and status=s1'

may produce small number of results. Also, if these related rows are
clustered together, that would be even better.

>
>> Also, the bitmap index is very small in low cardinality cases, where the
>> btree tends to take up at least 10 times more space.
>
> Also, are adding/changing rows is more expensive with bitmaps than
> btrees?

Inserting a row will only affect the last word (at most last several words)
of a bitmap vector, so this should not be very expensive: 3-4 IOs. When a
row is updated and the new row is inserted in the middle of the heap,
currently the code will update the bit in the place -- where the bit should
be. Searching for the page which includes the bit to be updated is not very
efficient now, but this can be fixed. Currently, we have to scan the pages
for a bitmap vector one by one until we hit the right page. Since the bitmap
vector is compressed, updating a bit in the middle may cause its page to
overflow. In this case, we create a new page to accommodate those extra
bits, and insert this new page right after the original page.

Overall, inserting a row or updating a row can be done efficiently. But it
is true that the bitmap index does not perform well if there are lots of
inserts and updates, especially updates.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2006-07-25 01:49:33 Re: pgstattuple extension for indexes
Previous Message Bruce Momjian 2006-07-25 01:45:27 Re: RESET CONNECTION?