Re: increase index performance

From: Thomas Finneid <tfinneid(at)fcon(dot)no>
To: pgsql-performance(at)postgresql(dot)org
Cc: Greg Smith <gsmith(at)gregsmith(dot)com>
Subject: Re: increase index performance
Date: 2009-05-13 07:25:18
Message-ID: 4A0A75DE.3040006@fcon.no
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


First off, is there a way to pre-filter some of this data, by a view,
temporary table, partitioned indexes or something.

Secondly, one of the problems seems to be the size of the data and its
index, how can I calulate how much space a particular part of the index
needs in memory? maybe I could rearrange things a bit better so it
better first inside pages and so on.

Thirdly I was a bit unclear and this was the best example I could think
of (my client probably dont want me to talk about this at all... hence
the contrived example):

85 city_ids,
2000 street_ids per city,
10 house_ids per street
500 floor_ids per house

Now it should have the correct data distribution and the correct
cardinality.

In this particular query I am interested in all streets in a city that
have the specific house id and the specific floor id.

By specifying
city_id, house_id and floor_id

I should get all street_ids that matches

The example you gave Greg assumed I wanted to follow cardinality, but I
need to skip the second field in order to get the right query. So
pulling the data based on the two first fields, City and Street would
just give me data for a single street, when I want it for all streets.

Greg Smith wrote:
> On Tue, 12 May 2009, Thomas Finneid wrote:
>
>> on a database with 260 GB of data and an index size of 109GB on
>> separate raid disks. there are
>> 85 city_ids, 2000
>> street_ids per city,
>> 20 house_ids per street per city
>> 5 floor_ids per house_ per street per city
>
> You should test what happens if you reduce the index to just being
> (city_id,street_id). Having all the fields in there makes the index
> larger, and it may end up being faster to just pull all of the ~100 data
> rows for a particular (city_id,street_id) using the smaller index and
> then filter out just the ones you need. Having a smaller index to
> traverse also means that you'll be more likely to keep all the index
> blocks in the buffer cache moving forward.
>
> A second level improvement there is to then CLUSTER on the smaller
> index, which increases the odds you'll get all of the rows you need by
> fetching only a small number of data pages.
>
> --
> * Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Thomas Finneid 2009-05-13 07:51:29 raid setup for db
Previous Message Arjen van der Meijden 2009-05-13 06:08:24 Re: AMD Shanghai versus Intel Nehalem