Re: surprising query optimisation

From: Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>
To: Stephen Frost <sfrost(at)snowman(dot)net>, Chris Withers <chris(at)withers(dot)org>
Cc: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: surprising query optimisation
Date: 2018-11-30 22:10:46
Message-ID: 9a952022-82b1-a88d-07fb-72c47bc0f875@archidevsys.co.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 01/12/2018 04:33, Stephen Frost wrote:
> Greetings,
>
> * Chris Withers (chris(at)withers(dot)org) wrote:
>> On 28/11/2018 22:49, Stephen Frost wrote:
>>> * Chris Withers (chris(at)withers(dot)org) wrote:
>>>> We have an app that deals with a lot of queries, and we've been slowly
>>>> seeing performance issues emerge. We take a lot of free form queries from
>>>> users and stumbled upon a very surprising optimisation.
>>>>
>>>> So, we have a 'state' column which is a 3 character string column with an
>>>> index on it. Despite being a string, this column is only used to store one
>>>> of three values: 'NEW', 'ACK', or 'RSV'.
>>> Sounds like a horrible field to have an index on.
>> That's counter-intuitive for me. What leads you to say this and what would
>> you do/recommend instead?
> For this, specifically, it's because you end up with exactly what you
> have: a large index with tons of duplicate values. Indexes are
> particularly good when you have high-cardinality fields. Now, if you
> have a skewed index, where there's one popular value and a few much less
> popular ones, then that's where you really want a partial index (as I
> suggest earlier) so that queries against the non-popular value(s) is
> able to use the index and the index is much smaller.
>
> Of course, for this to work you need to set up the partial index
> correctly and make sure that your queries end up using it.
>
> Thanks!
>
> Stephen

An index simply tells pg which block to look at (assuming that the index
itself is not sufficient to satisfy the query), so if using an index
would still require that pg look at most blocks, then it cheaper to just
scan the whole table - rather than load the index and still look at all
blocks that contain the table data.  I've oversimplified slightly.

An index is best used when using it results in fewer blocks being read
from disk.

Also the use of RAM is better when the size of the index is kept small. 
For example having an index on sex for nurses is a waste of time because
most nurses are female.  However, a partial index (as suggested by
Stephen, for your query) that includes only males could be useful if you
have queries looking for male nurses (assumes male nurses are a very
small fraction of nurses such that most data blocks don't have rows for
males nurses, and the planner knows this).

I once optimised a very complex set queries that made extensive use of
indexes.  However, with the knowledge I have today, I would have most
likely had fewer and smaller indexes.  As I now realize, that some of my
indexes were probably counter productive, especially as I'd given no
thought to how much RAM would be required to host the data plus
indexes!  Fortunately, while the objective was to run all those queries
within 24 hours, they actually only took a couple of hours.

Chris, I would strongly suggest, you read up on the excellent
documentation pg has about indexes, but don't expect to understand it
all at one sitting...

Cheers,
Gavin

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Ron 2018-12-01 11:24:45 Unused indexes
Previous Message Adrian Klaver 2018-11-30 21:55:42 Re: Dump table using pg_dump vs pg_restore -f