Re: Massive performance issues

From: Sebastian Hennebrueder <usenet(at)laliluna(dot)de>
To: Matthew Sackman <matthew(at)lshift(dot)net>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Massive performance issues
Date: 2005-09-01 20:39:13
Message-ID: 431766F1.6080501@laliluna.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Matthew Sackman schrieb:

>Hi,
>
>I'm having performance issues with a table consisting of 2,043,133 rows. The
>schema is:
>
>\d address
> Table "public.address"
> Column | Type | Modifiers
>----------------------+------------------------+-----------
> postcode_top | character varying(2) | not null
> postcode_middle | character varying(4) | not null
> postcode_bottom | character varying(7) | not null
> postcode | character varying(10) | not null
> property_type | character varying(15) | not null
> sale_type | character varying(10) | not null
> flat_extra | character varying(100) | not null
> number | character varying(100) | not null
> street | character varying(100) | not null
> locality_1 | character varying(100) | not null
> locality_2 | character varying(100) | not null
> city | character varying(100) | not null
> county | character varying(100) | not null
>Indexes:
> "address_city_index" btree (city)
> "address_county_index" btree (county)
> "address_locality_1_index" btree (locality_1)
> "address_locality_2_index" btree (locality_2)
> "address_pc_bottom_index" btree (postcode_bottom)
> "address_pc_middle_index" btree (postcode_middle)
> "address_pc_top_index" btree (postcode_top)
> "address_pc_top_middle_bottom_index" btree (postcode_top,
> postcode_middle, postcode_bottom)
> "address_pc_top_middle_index" btree (postcode_top, postcode_middle)
> "address_postcode_index" btree (postcode)
> "address_property_type_index" btree (property_type)
> "address_street_index" btree (street)
> "street_prefix" btree (lower("substring"((street)::text, 1, 1)))
>
>This is with postgresql 7.4 running on linux 2.6.11 with a 3GHz P4 and a
>SATA harddrive.
>
>Queries such as:
>
>select locality_2 from address where locality_2 = 'Manchester';
>
>are taking 14 seconds to complete, and this is only 2 years worth of
>data - we will have up to 15 years (so over 15 million rows).
>
>Interestingly, doing:
>explain select locality_2 from address where locality_2 = 'Manchester';
>gives
> QUERY PLAN
>----------------------------------------------------------------
> Seq Scan on address (cost=0.00..80677.16 rows=27923 width=12)
> Filter: ((locality_2)::text = 'Manchester'::text)
>
>but:
>explain select locality_1 from address where locality_1 = 'Manchester';
>gives
> QUERY PLAN
>----------------------------------------------------------------
> Index Scan using address_locality_1_index on address
>(cost=0.00..69882.18 rows=17708 width=13)
> Index Cond: ((locality_1)::text = 'Manchester'::text)
>
>Sadly, using the index makes things worse, the query taking 17 seconds.
>
>locality_1 has 16650 distinct values and locality_2 has 1156 distinct
>values.
>
>Whilst the locality_2 query is in progress, both the disk and the CPU
>are maxed out with the disk constantly reading at 60MB/s and the CPU
>rarely dropping under 100% load.
>
>With the locality_1 query in progress, the CPU is maxed out but the disk
>is reading at just 3MB/s.
>
>Obviously, to me, this is a problem, I need these queries to be under a
>second to complete. Is this unreasonable? What can I do to make this "go
>faster"? I've considered normalising the table but I can't work out
>whether the slowness is in dereferencing the pointers from the index
>into the table or in scanning the index in the first place. And
>normalising the table is going to cause much pain when inserting values
>and I'm not entirely sure if I see why normalising it should cause a
>massive performance improvement.
>
>
>
Just an idea: When you do not want to adapt your application to use a
normalized database you may push the data into normalized table using
triggers.
Example:
Add a table city with column id, name
and add a column city_id to your main table.
In this case you have redundant data in your main table (locality_1 and
city_id) but you could make queries to the city table when searching for
'Man%'

--
Best Regards / Viele Grüße

Sebastian Hennebrueder

----

http://www.laliluna.de

Tutorials for JSP, JavaServer Faces, Struts, Hibernate and EJB

Get support, education and consulting for these technologies.

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Arjen van der Meijden 2005-09-01 20:54:45 Re: Massive performance issues
Previous Message Tom Lane 2005-09-01 20:25:29 Re: Massive performance issues