Re: visualizing B-tree index coverage

From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: "TJ O'Donnell" <tjo(at)acm(dot)org>
Cc: pgsql-general(at)postgresql(dot)org, DCorbit(at)connx(dot)com
Subject: Re: visualizing B-tree index coverage
Date: 2005-01-26 06:53:47
Message-ID: Pine.GSO.4.62.0501260951440.6363@ra.sai.msu.su
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Excuse me for bothering but what kind of search engine you
developed. Does it looks like sets comparing ?

Oleg

On Tue, 25 Jan 2005, TJ O'Donnell wrote:

> Since I'm using a multi-column index, I can greatly influence
> the nature of the index created, depending on which columns I use
> and how many. I'm searching for an optimal set
> of columns that creates an index that, for sure does not have
> every value the same, nor only two values. Instead, I want to see
> how well I've spread the index out over the data (if that phrasing makes sense).
>
> More specifically, I have character data representing molecular structures.
> I've written (rather slow) search functions. I can create any number of
> columns that "fingerprint" each structure, e.g. # Carbon atoms, # N atoms,
> # single bonds, etc. I expect my fingerprints will not be unique (fingerprint may
> be a poor analogy), but rather will classify similar structures together.
> I create a multi-column index on these counts and
> get about 2-3 times speedup using 13 columns right now.
> For example:
>
> select count(smiles) from structure where oe_matches(smiles,'c1ccccc1CC(=O)NC') about 15 sec.
>
> select count(smiles) from structure where
> (_c, _n, _o, _s, _p, _halo,
> _arom_c, _arom_n, _arom_o, _arom_s,
> _atoms, _single_bonds, _other_bonds) >=
> ( 3,1,1,0,0,0, 6,0,0,0, 11,4,7 )
> and oe_matches(smiles,'c1ccccc1CC(=O)NC') about 6 seconds
> when the (_c, etc.) is a multi-column index.
>
> The data isn't inherently structured in any way that invites some particular number of columns
> for indexing. I don't want to use too many, nor too few columns. I also
> want to optimize the nature(which atom types, bond types, etc.)
> of the count columns. While I could do this
> and use the speedup as the measure of success, I think
> that if my B-tree were "covering" the data well, I would get the best results.
> Covering means finding that optimal situation where there is not one index for all rows
> and also not a unique index for every row - something inbetween would be ideal,
> or is that basically a wrong idea?
>
> TJ
>
>
>
>> Useful explanation of PostgreSQL index format:
>> http://www.faqs.org/docs/ppbook/c13329.htm
>>
>> I think you are aiming for the wrong thing.
>> The worst possible index is one with every value the same.
>> The second worst (still basically useless) is one with only two values. The greater the
>> differentiation of the data, the more workload is
>> reduced on a search.
>>
>> Since it isn't a straight binary tree, I don't think that having highly dissimilar data in the
>> index should be a problem.
>>
>> Do you have data or experience that shows otherwise?
>>
>> -----Original Message-----
>> From: pgsql-general-owner(at)postgresql(dot)org
>> [mailto:pgsql-general-owner(at)postgresql(dot)org] On Behalf Of TJ O'Donnell Sent: Tuesday, January 25,
>> 2005 2:19 PM
>> To: pgsql-general(at)postgresql(dot)org
>> Cc: tjo(at)acm(dot)org
>> Subject: [GENERAL] visualizing B-tree index coverage
>>
>> Does anyone know of a tools that allows one to visualize
>> the tree created by a multi-column B-tree index?
>> A picture of a tree with branches, showing how "branchy" the
>> tree is would be great.
>> I'm wondering how well I've "clustered" the data in my table
>> using the multi-column index. In other words, do my
>> multi-columns sufficiently but not overly discriminate rows from each other?
>> Do I have too many with the same index? (not enough branches)
>> Do I have a unique index for each row? (way too many branches)
>>
>> Thanks,
>> TJ
>>
>>
>>
>> ---------------------------(end of broadcast)--------------------------- TIP 7: don't forget to
>> increase your free space map settings
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Michael Fuhr 2005-01-26 07:08:36 Re: text field constraint advice
Previous Message Robby Russell 2005-01-26 06:28:17 Re: Good PostgreSQL Based Shopping Cart Software ... ?