Re: Indices types, what to use. Btree, Hash, Gin or Gist

From: Mohamed <mohamed5432154321(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Indices types, what to use. Btree, Hash, Gin or Gist
Date: 2009-02-01 17:00:02
Message-ID: 861fed220902010900t782d2b3by6d3f47da060ea540@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

My Gmail(bloody gmail!) has been auto-replying to the last messager (Scott)
so I think we have been having a private discussion on this topic. Here is
an update on our discussion.
ME :

When it comes to the boolean, the content is about 70-30%. I find it strange
though that an index on a 50-50% isn't that useful. With an index the DB can
skip 50% of the table so it should be useful, but perhaps the intersection
of sets is expensive for the DB?
Could an index in fact possibly slow down queries? Or will the DB ignore
using the index in such cases?

Most of my matches contains simple matches, and I don't see the use of
partial indexes, but I get the idea.
You want to make a bigger difference between the set contained/matched
against to get them "more" "unique".

I am now reading about fulltext search and its my next step. So I am a bit
interested in the Gin/Gist. But I will revive this thread once I am more
familiar with fulltext, currently reading up on the topic..

SCOTT :

> When it comes to the boolean, the content is about 70-30%. I find it
strange
> though that an index on a 50-50% isn't that useful. With an index the DB
can
> skip 50% of the table so it should be useful, but perhaps the intersection
> of sets is expensive for the DB?

If the values are randomly mixed, and you can fit at least a couple of
rows in each 8k block, then using an index on a 50/50 mix is a total
loser, because you're gonna have to read every single block anyway.
If you can fit 10 rows in a single block, then 10% of one value means
it's a loser too, because, again, you're gonna have to hit every block
anyway.

With the mix you list, 70/30, it means that if you can fit 3 rows in
one block, and they're randomly distributed, you'll have to hit every
block anyway, and an index on bool won't help.

Keep in mind random table accesses are about 4 to 10 times more
expensive than sequential scans, and you have to add in the random
access time of the index as well.

> Could an index in fact possibly slow down queries? Or will the DB ignore
> using the index in such cases?

The db should ignore it for select queries unless the statistics are
wrong. However, indexes ALWAYS cost on insert / update / delete.

> I am now reading about fulltext search and its my next step. So I am a bit
> interested in the Gin/Gist. But I will revive this thread once I am more
> familiar with fulltext, currently reading up on the topic..

If you're searching a lot on text, full text search is a great way to go.

Be sure and look up the pg_stat type tables. There's tons of useful
info in them about how your database is actually being accessed.
pg_stat_user_indexes and pg_stat_user_tables are both very useful.

ME:

Thanks, I am new to PostgreSQL and just an SQL scholar really. I am using
pgAdmin now, is there a way of looking at those stats from there or is it
just from the command line ?

Would you say it's safe to index all columns that are searched for in a
relation? I have indexed perhaps 10 columns (of 15) and some are like the
boolean one. But I am thinking that they will only be used if the DB finds
them useful so I am over-indexing.. is this ok? I fin d updates and
insertions pretty fast anyway so I am not worried about that aspect unless I
am wrong !? :O

SCOTT :

This is one of those "it really depends" types of questions. If the
database is mostly read from, and the updates aren't slowed down too
much by the many indexes, then sure, go ahead and add the indexes. It
won't generally slow down the select queries running all the time.
After a month or so check the pg_stat_user_indexes table to see which
non-unique indexes aren't being used and drop them.

Thank you Scott for your private help :)

/ Moe

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Martijn van Oosterhout 2009-02-01 17:12:44 Re: Indices types, what to use. Btree, Hash, Gin or Gist
Previous Message Scott Marlowe 2009-02-01 16:44:41 Re: PGSQL or other DB?