Re: Indexing a Bit String column

From: George Oakman <oakmang(at)hotmail(dot)com>
To: <stark(at)enterprisedb(dot)com>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: Indexing a Bit String column
Date: 2009-02-24 16:39:44
Message-ID: COL115-W67D046C12CED696B40262BAFAF0@phx.gbl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


Hi,

Thanks for the info. I new it would have been too easy! :)

Sorry, I made a mistake earlier, my queries will actually be more like

SELECT * FROM myTable WHERE myBitStringCol & B'101' = B'101';

This doesn't make much difference for the indexing problem, but it may help address the very good point you raised regarding the predicted number of results, and therefore the justification of needing an index at all. In practice, my BitStrings will be 1024 bits long - both A and B in "WHERE A & B = B" will be 1024 bits long.

Assuming that bits are independents and randomly distributed in the database, am I right in thinking that a typical search is expected to return N*0.5^n, where N is the total number of rows in the table and n is the number of bits set to 1 in B?

If this calculation is correct, even if 1% of the bits are set to 1 in B, then this would give N*0.5^10 results, i.e. roughly 0.1% of the database.

So it looks like I'll need the index in the end!

I am actually new to PgSQL - I would be very grateful if you could point me to resources/tutorials to help me build an index extension in C?

Many thanks for your help.

George.

> To: oakmang(at)hotmail(dot)com
> CC: pgsql-general(at)postgresql(dot)org
> Subject: Re: [GENERAL] Indexing a Bit String column
> From: stark(at)enterprisedb(dot)com
> Date: Tue, 24 Feb 2009 15:35:58 +0000
>
>
> George Oakman <oakmang(at)hotmail(dot)com> writes:
>
> > Is it all I need to do? Will PgSQL know how to index properly a Bit String
> > column? Should I build the index using a special method, e.g.
> > CREATE INDEX myBitStringCol_idx ON myTable USING gist(myBitStringCol);
>
> No, the default will be to build a btree index which won't help these types of
> queries at all.
>
> You would want a GIST index if there was a built-in GIST opclass for these
> kinds of queries, but sadly there isn't. You could add one fairly easily but
> it would require C code. I think it would be a valuable addition to Postgres
> if you do write one.
>
> Note that something like "WHERE myBitStringCol & B'101'" might be selecting
> too much of your table to make an index useful anyways. If each bit is set in
> half the table then you're talking about selecting 3/4 of the table in which
> case a full table scan would be more efficient than any index.
>
> --
> Gregory Stark
> EnterpriseDB http://www.enterprisedb.com
> Ask me about EnterpriseDB's On-Demand Production Tuning
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

_________________________________________________________________
Windows Live Messenger just got better .Video display pics, contact updates & more.
http://www.download.live.com/messenger

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Thom Brown 2009-02-24 16:55:45 Warm standby failover mechanism
Previous Message Tom Lane 2009-02-24 15:46:38 Re: Indexing a Bit String column