My server is oddly very slow

From: adrien ducos <aducos(at)hbs-research(dot)com>
To: pgsql-admin(at)postgresql(dot)org
Subject: My server is oddly very slow
Date: 2011-05-04 16:11:23
Message-ID: 4DC17AAB.1080206@hbs-research.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

I have 2 servers 1 for development (called dev) 1 for production (called
prod). The server for development is very basic (a dual core 2.1ghtz on
a basic 300MB hard drive with 2GB of memory (this server is not
dedicated to postgres, it also uses php, mapserver and so on).

I assigned the memory as follow
max co 40
shared buffers 258MB
work_mem 8MB
maitainance_work_mem: 32MB
wal_buffers 8MB
check_point_seg 16
autovacuum on
effective-cache-size 744MB

Dev is installed on centos 4 32bits (on a virtual server openvz)

My production server is a QUAD CORE XEON E5310 (1,6Ghtz 3GB of RAM and
the hard drive is a very fast SAN with fiberchanel: the copy of files on
the SAN is about 3 times faster than on the development hard drive. This
is a fully dedicated server for postgres.

I assigned the memory as follow
max co 40
shared buffers 512MB
work_mem 16MB
maitainance_work_mem: 128MB
wal_buffers 8MB
check_point_seg 16
autovacuum on
effective-cache-size 1536MB

prod is installed on Red Hat Enterprise Linux ES release 4 (Nahant
Update 6) 32bits (not on a virtual server)
Those 2 distribution are very similar, and I use the same proprams RPM
on both systems

The version of both databases is postgres 8.4.1, I installed the same
database (in fact prod comes form a dump of dev once every week) The
dump is very big about 16GB compressed. Once it is dumped, I make a full
analyze on the database before I use it.

Now according to the hardware, I would suspect dev should be slower than
prod, but almose all queries are slower on prod, some times 10 times slower.

Today I made a query with an explain analyse (same query on both
servers) and I got for dev:

1.
"Limit (cost=396689.94..399801.92 rows=2000 width=512) (actual
time=8696.467..10028.179 rows=2000 loops=1)"
2.
" -> GroupAggregate (cost=396689.94..424470.56 rows=17854
width=512) (actual time=8696.463..10024.632 rows=2000 loops=1)"
3.
" Filter: (sum(effectif_max) >= 10)"
4.
" -> Sort (cost=396689.94..397088.34 rows=159360
width=512) (actual time=8693.396..8745.436 rows=8344 loops=1)"
5.
" Sort Key: implantation_company_id, virtual_building_id"
6.
" Sort Method: external merge Disk: 125928kB"
7.
" -> Bitmap Heap Scan on gen_establishment_search
(cost=8156.66..346425.20 rows=159360 width=512) (actual
time=2545.813..6688.078 rows=152200 loops=1)"
8.
" Recheck Cond: (gis_departement_id =
'75'::bpchar)"
9.
" Filter: (((telephone IS NOT NULL) OR
(telephone_siege IS NOT NULL)) AND (NOT etablissement_masque) AND
(ref_zone_permission_id = ANY ('{2,1}'::integer[])) AND
(created_by_user_group = 1) AND (ref_establishment_type_id = ANY
('{1,2,3}'::integer[])))"
10.
" -> Bitmap Index Scan on
gen_establishment_search_gis_departement_id (cost=0.00..8116.82
rows=498680 width=0) (actual time=2435.952..2435.952 rows=502304
loops=1)"
11.
" Index Cond: (gis_departement_id =
'75'::bpchar)"
12.
"Total runtime: 10109.194 ms"

-> so 10 seconds

and for prod

1.
Limit (cost=397787.02..400793.19 rows=2000 width=513) (actual
time=94188.493..97457.042 rows=2000 loops=1)
2.
-> GroupAggregate (cost=397787.02..425288.95 rows=18297
width=513) (actual time=94188.489..97452.268 rows=2000 loops=1)
3.
Filter: (sum(effectif_max) >= 10)
4.
-> Sort (cost=397787.02..398179.69 rows=157069
width=513) (actual time=94187.660..94530.617 rows=8312 loops=1)
5.
Sort Key: implantation_company_id, virtual_building_id
6.
Sort Method: external merge Disk: 124192kB
7.
-> Bitmap Heap Scan on gen_establishment_search
(cost=8231.64..347992.15 rows=157069 width=513) (actual
time=836.994..91818.746 rows=149851 loops=1)
8.
Recheck Cond: (gis_departement_id = '75'::bpchar)
9.
Filter: (((telephone IS NOT NULL) OR
(telephone_siege IS NOT NULL)) AND (NOT etablissement_masque) AND
(ref_zone_permission_id = ANY ('{2,1}'::int
10.
eger[])) AND (created_by_user_group = 1) AND
(ref_establishment_type_id = ANY ('{1,2,3}'::integer[])))
11.
-> Bitmap Index Scan on
gen_establishment_search_gis_departement_id (cost=0.00..8192.37
rows=502886 width=0) (actual time=699.530..699.530 rows=
12.
507213 loops=1)
13.
Index Cond: (gis_departement_id =
'75'::bpchar)
14.
Total runtime: 97496.574 ms

-> 90 seconds

for some other queries is is worst. None of the queries I have tested
are faster on prod.

So I checked the memory on prod during my query execution:
vmstat
procs -----------memory---------- ---swap-- -----io---- --system--
----cpu----
r b swpd free buff cache si so bi bo in cs us sy
id wa
0 1 1280 23320 6356 2975956 0 0 29 45 1 0 2
0 95 3

The swap is not used but something is a bit odd: the cache is using 98%
of the RAM

if I do
SHOW shared_buffers;
I get

shared_buffers
----------------
512MB

as I would expect from my configuration.

I've got out of idea, any idea?

--

Logo_HBS_mail.jpg
Adrien DUCOS
Analyste développeur
aducos(at)hbs-research(dot)com <mailto:aducos(at)hbs-research(dot)com>
www.hbs-research.com <http://www.hbs-research.com/>
+33 (0)9 70 44 64 10
24 rue de l'Est
75020 Paris

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Mark Stosberg 2011-05-04 16:48:44 Re: best practice for moving millions of rows to child table when setting up partitioning?
Previous Message Viktor Bojović 2011-05-04 14:03:20 Re: unknown type