Skip site navigation (1) Skip section navigation (2)

Re: Serious performance problem

From: Hannu Krosing <hannu(at)tm(dot)ee>
To: "Tille, Andreas" <TilleA(at)rki(dot)de>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Serious performance problem
Date: 2001-10-31 10:25:00
Message-ID: 3BDFD17C.C78B35E@tm.ee (view raw or flat)
Thread:
Lists: pgsql-hackers
"Tille, Andreas" wrote:
> 
> Hello,
> 
> I discussed a problem concerning the speed of PostgreSQL compared to
> MS SQL server heavily on postgres-general list.  The thread starts with
> message
> 
>     http://fts.postgresql.org/db/mw/msg.html?mid=1035557
> 
> Now I tried a snapshot of version 7.2 and got an increase of speed of
> about factor 2.  But sorry this is really not enough.  The very simple
> test I pointed to in my mail is even much to slow and the issue would
> probably spoil down the whole project which should be a complete open
> source solution and would perhaps and in any M$ stuff.  I´ve got under
> heavy preasur from my employer who was talking about the nice world
> of MS .net (while he is using MS-SQL exclusively).  To make the thing
> clear the issue is the gal database of infectious diseases in Germany
> runned by the Robert Koch-Institute.  So the beast could be of some
> importance for increasing the acceptance of PostgreSQL and Open Source
> in the field of medicine which is generally known for the money which
> is involved in.  So I really hope that some skilled programmers would
> be able to find a good way to solve the performance issue perhaps by
> just profiling the simple query
> 
>    SELECT Hauptdaten_Fall.MeldeKategorie, Count(Hauptdaten_Fall.ID) AS Anz FROM Hauptdaten_Fall WHERE (((Hauptdaten_Fall.IstAktuell)=20))
> GROUP BY Hauptdaten_Fall.MeldeKategorie ORDER BY Hauptdaten_Fall.MeldeKategorie;
> 
> to the data set I put on
> 
>    http://www.foodborne-net.de/~tillea/ifsg/ifsgtest.dump.bz2
> 
> If this should take less than half a second on a modern PC I could
> continue to try mo realistic queries.


I tried some more on optimizing the query on my work computer 
(AMD ATHLON 850, 512MB, PostgreSQL 7.1.3 with default memory settings)


SELECT MeldeKategorie, 
       Count(ID) AS Anz
  FROM Hauptdaten_Fall 
 WHERE IstAktuell=20
GROUP BY MeldeKategorie 
ORDER BY MeldeKategorie;

real    0m9.675s

create index i1 on Hauptdaten_Fall(IstAktuell,MeldeKategorie);

----------------------------
set enable_seqscan = off;
SELECT MeldeKategorie, 
       Count(ID) AS Anz
  FROM Hauptdaten_Fall 
 WHERE IstAktuell=20
GROUP BY MeldeKategorie 
ORDER BY MeldeKategorie;

Aggregate  (cost=4497.30..4510.18 rows=258 width=16)
  ->  Group  (cost=4497.30..4503.74 rows=2575 width=16)
        ->  Sort  (cost=4497.30..4497.30 rows=2575 width=16)
              ->  Index Scan using i1 on hauptdaten_fall 
(cost=0.00..4351.40 rows=2575 width=16)

real    0m7.131s

---------------------------

set enable_seqscan = off;
SELECT MeldeKategorie, 
       Count(ID) AS Anz
  FROM Hauptdaten_Fall 
 WHERE IstAktuell=20
GROUP BY IstAktuell,MeldeKategorie
ORDER BY IstAktuell,MeldeKategorie;

Aggregate  (cost=4497.30..4510.18 rows=258 width=16)
  ->  Group  (cost=4497.30..4503.74 rows=2575 width=16)
        ->  Index Scan using i1 on hauptdaten_fall  (cost=0.00..4351.40
rows=2575 width=16)

real    0m3.223s

-- same after doing

cluster i1 on Hauptdaten_Fall;

real    1.590 -- 1.600



select count(*) from Hauptdaten_Fall;

real    0m0.630s

---------------------------

The following query is marginally (about 0.1 sec) faster, though the 
plan looks the same down to cost estimates.

SET ENABLE_SEQSCAN = OFF;
SELECT MeldeKategorie, 
       Count(*) AS Anz
  FROM (select IstAktuell,MeldeKategorie from Hauptdaten_Fall where
IstAktuell=20) sub
GROUP BY IstAktuell,MeldeKategorie
ORDER BY IstAktuell,MeldeKategorie;

Aggregate  (cost=0.00..4370.72 rows=258 width=16)
  ->  Group  (cost=0.00..4364.28 rows=2575 width=16)
        ->  Index Scan using i1 on hauptdaten_fall  (cost=0.00..4351.40
rows=2575 width=16)

real    0m1.438s - 1.506s

---------------------------

now I make the dataset bigger keeping the number of rows returned by
query the same

insert into hauptdaten_fall (istaktuell, meldekategorie)
select istaktuell + 20, meldekategorie 
from  hauptdaten_fall ;

INSERT 0 257530

insert into hauptdaten_fall (istaktuell, meldekategorie)
select istaktuell + 40, meldekategorie 
from  hauptdaten_fall ;

INSERT 0 515060
ifsgtest=# select count(*) from hauptdaten_fall;
  count  
---------
 1030120
(1 row)

cluster i1 on Hauptdaten_Fall;
vacuum analyze;


-- The query time is still the same 1.44 - 1.5 sec

SET ENABLE_SEQSCAN = OFF;
SELECT MeldeKategorie, 
       Count(*) AS Anz
  FROM (select IstAktuell,MeldeKategorie from Hauptdaten_Fall where
IstAktuell=20) sub
GROUP BY IstAktuell,MeldeKategorie
ORDER BY IstAktuell,MeldeKategorie;

Aggregate  (cost=0.00..4370.72 rows=258 width=16)
  ->  Group  (cost=0.00..4364.28 rows=2575 width=16)
        ->  Index Scan using i1 on hauptdaten_fall  (cost=0.00..4351.40
rows=2575 width=16)

real    0m1.438s - 1.506s

----------------------------

now back to original data distribution, just 4 times bigger

ifsgtest=# update hauptdaten_fall
ifsgtest-# set istaktuell = case when istaktuell % 20 = 0 then 20 else
10 end
ifsgtest-# ;
UPDATE 1030120
ifsgtest=# vacuum analyze;
VACUUM

SET ENABLE_SEQSCAN = OFF;
SELECT MeldeKategorie, 
       Count(*) AS Anz
  FROM (select IstAktuell,MeldeKategorie from Hauptdaten_Fall where
IstAktuell=20) sub
GROUP BY IstAktuell,MeldeKategorie
ORDER BY IstAktuell,MeldeKategorie;

real    0m6.077 -- 6.606s

and after clustering:
cluster i1 on Hauptdaten_Fall;

real    0m5.683 - 5.750s

so it's linear growth here

----------------------------

Hannu

In response to

Responses

pgsql-hackers by date

Next:From: mlwDate: 2001-10-31 13:14:02
Subject: Re: Serious performance problem
Previous:From: Marc G. FournierDate: 2001-10-31 09:58:13
Subject: Re: pgsql-committers?

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group