Re: [SQL] index on more-than-one columns?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: s-fery(at)kkt(dot)sote(dot)hu
Cc: PostgreSQL list <pgsql-sql(at)postgreSQL(dot)org>
Subject: Re: [SQL] index on more-than-one columns?
Date: 1999-08-17 21:04:29
Message-ID: 23231.934923869@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Engard Ferenc <fery(at)pons(dot)sote(dot)hu> writes:
> Is there any advantage to use index on more columns against to use
> more separate indexes, e.g. creating index on t1 (a,b) instead of
> index on t1 (a) plus index on t1 (b)?

If those are your two choices then the two separate indexes are
definitely more flexible, since the index on (a,b) can't be used
as a standalone index on b for queries that only involve b.

In general I'd say that a multicolumn index is a pretty specialized
beast, and probably only worth its keep if you frequently do sorts with
that order and/or multikey joins against another table with a similar
index.

For example: let's say you are trying to optimize for queries like

SELECT ... FROM a, b WHERE a.f1 = b.f1 AND a.f2 = b.f2;

If you have indexes on (f1,f2) for both tables then a two-column
mergejoin can be used --- that is, we scan both tables in the order
of the indexes. (BTW, 6.5 only manages to do this if both columns are
the same data type, but that will be fixed in 6.6.) Although that
looks cool, it's probably not a huge win compared to the one-column
merge you could do with indexes on either f1 or f2. It'd only be a
big win if neither f1 or f2 have a lot of distinct values by themselves
but f1 * f2 does.

Also, although an index on (f1,f2) can be used as an index on f1
alone, it's going to be bigger and therefore slower to access than
the single-column index. So, just adding on columns that you "might
need some of the time" isn't a win.

Finally, remember that every index costs you time whenever the table
is modified, because the index has to be updated too. So it's not
a win to make a whole lot of indexes without specific reasons for
each one, unless the table is queried a *lot* more often than it
is modified.

I'd probably suggest making one or two indexes on the individual
columns that are used most frequently in WHERE conditions, and not
going beyond that except when you have a specific frequently-used
query that you need to optimize.

regards, tom lane

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Nathan Angelacos 1999-08-17 23:36:27 CASE statement causes unknown node type 723
Previous Message John Ridout 1999-08-17 15:45:52 RE: [SQL] datediff function