Re: Serious performance problem

From: Hannu Krosing <hannu(at)tm(dot)ee>
To: "Tille, Andreas" <TilleA(at)rki(dot)de>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Serious performance problem
Date: 2001-10-31 10:25:00
Message-ID: 3BDFD17C.C78B35E@tm.ee
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"Tille, Andreas" wrote:
>
> Hello,
>
> I discussed a problem concerning the speed of PostgreSQL compared to
> MS SQL server heavily on postgres-general list. The thread starts with
> message
>
> http://fts.postgresql.org/db/mw/msg.html?mid=1035557
>
> Now I tried a snapshot of version 7.2 and got an increase of speed of
> about factor 2. But sorry this is really not enough. The very simple
> test I pointed to in my mail is even much to slow and the issue would
> probably spoil down the whole project which should be a complete open
> source solution and would perhaps and in any M$ stuff. I´ve got under
> heavy preasur from my employer who was talking about the nice world
> of MS .net (while he is using MS-SQL exclusively). To make the thing
> clear the issue is the gal database of infectious diseases in Germany
> runned by the Robert Koch-Institute. So the beast could be of some
> importance for increasing the acceptance of PostgreSQL and Open Source
> in the field of medicine which is generally known for the money which
> is involved in. So I really hope that some skilled programmers would
> be able to find a good way to solve the performance issue perhaps by
> just profiling the simple query
>
> SELECT Hauptdaten_Fall.MeldeKategorie, Count(Hauptdaten_Fall.ID) AS Anz FROM Hauptdaten_Fall WHERE (((Hauptdaten_Fall.IstAktuell)=20))
> GROUP BY Hauptdaten_Fall.MeldeKategorie ORDER BY Hauptdaten_Fall.MeldeKategorie;
>
> to the data set I put on
>
> http://www.foodborne-net.de/~tillea/ifsg/ifsgtest.dump.bz2
>
> If this should take less than half a second on a modern PC I could
> continue to try mo realistic queries.

I tried some more on optimizing the query on my work computer
(AMD ATHLON 850, 512MB, PostgreSQL 7.1.3 with default memory settings)

SELECT MeldeKategorie,
Count(ID) AS Anz
FROM Hauptdaten_Fall
WHERE IstAktuell=20
GROUP BY MeldeKategorie
ORDER BY MeldeKategorie;

real 0m9.675s

create index i1 on Hauptdaten_Fall(IstAktuell,MeldeKategorie);

----------------------------
set enable_seqscan = off;
SELECT MeldeKategorie,
Count(ID) AS Anz
FROM Hauptdaten_Fall
WHERE IstAktuell=20
GROUP BY MeldeKategorie
ORDER BY MeldeKategorie;

Aggregate (cost=4497.30..4510.18 rows=258 width=16)
-> Group (cost=4497.30..4503.74 rows=2575 width=16)
-> Sort (cost=4497.30..4497.30 rows=2575 width=16)
-> Index Scan using i1 on hauptdaten_fall
(cost=0.00..4351.40 rows=2575 width=16)

real 0m7.131s

---------------------------

set enable_seqscan = off;
SELECT MeldeKategorie,
Count(ID) AS Anz
FROM Hauptdaten_Fall
WHERE IstAktuell=20
GROUP BY IstAktuell,MeldeKategorie
ORDER BY IstAktuell,MeldeKategorie;

Aggregate (cost=4497.30..4510.18 rows=258 width=16)
-> Group (cost=4497.30..4503.74 rows=2575 width=16)
-> Index Scan using i1 on hauptdaten_fall (cost=0.00..4351.40
rows=2575 width=16)

real 0m3.223s

-- same after doing

cluster i1 on Hauptdaten_Fall;

real 1.590 -- 1.600

select count(*) from Hauptdaten_Fall;

real 0m0.630s

---------------------------

The following query is marginally (about 0.1 sec) faster, though the
plan looks the same down to cost estimates.

SET ENABLE_SEQSCAN = OFF;
SELECT MeldeKategorie,
Count(*) AS Anz
FROM (select IstAktuell,MeldeKategorie from Hauptdaten_Fall where
IstAktuell=20) sub
GROUP BY IstAktuell,MeldeKategorie
ORDER BY IstAktuell,MeldeKategorie;

Aggregate (cost=0.00..4370.72 rows=258 width=16)
-> Group (cost=0.00..4364.28 rows=2575 width=16)
-> Index Scan using i1 on hauptdaten_fall (cost=0.00..4351.40
rows=2575 width=16)

real 0m1.438s - 1.506s

---------------------------

now I make the dataset bigger keeping the number of rows returned by
query the same

insert into hauptdaten_fall (istaktuell, meldekategorie)
select istaktuell + 20, meldekategorie
from hauptdaten_fall ;

INSERT 0 257530

insert into hauptdaten_fall (istaktuell, meldekategorie)
select istaktuell + 40, meldekategorie
from hauptdaten_fall ;

INSERT 0 515060
ifsgtest=# select count(*) from hauptdaten_fall;
count
---------
1030120
(1 row)

cluster i1 on Hauptdaten_Fall;
vacuum analyze;

-- The query time is still the same 1.44 - 1.5 sec

SET ENABLE_SEQSCAN = OFF;
SELECT MeldeKategorie,
Count(*) AS Anz
FROM (select IstAktuell,MeldeKategorie from Hauptdaten_Fall where
IstAktuell=20) sub
GROUP BY IstAktuell,MeldeKategorie
ORDER BY IstAktuell,MeldeKategorie;

Aggregate (cost=0.00..4370.72 rows=258 width=16)
-> Group (cost=0.00..4364.28 rows=2575 width=16)
-> Index Scan using i1 on hauptdaten_fall (cost=0.00..4351.40
rows=2575 width=16)

real 0m1.438s - 1.506s

----------------------------

now back to original data distribution, just 4 times bigger

ifsgtest=# update hauptdaten_fall
ifsgtest-# set istaktuell = case when istaktuell % 20 = 0 then 20 else
10 end
ifsgtest-# ;
UPDATE 1030120
ifsgtest=# vacuum analyze;
VACUUM

SET ENABLE_SEQSCAN = OFF;
SELECT MeldeKategorie,
Count(*) AS Anz
FROM (select IstAktuell,MeldeKategorie from Hauptdaten_Fall where
IstAktuell=20) sub
GROUP BY IstAktuell,MeldeKategorie
ORDER BY IstAktuell,MeldeKategorie;

real 0m6.077 -- 6.606s

and after clustering:
cluster i1 on Hauptdaten_Fall;

real 0m5.683 - 5.750s

so it's linear growth here

----------------------------

Hannu

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message mlw 2001-10-31 13:14:02 Re: Serious performance problem
Previous Message Marc G. Fournier 2001-10-31 09:58:13 Re: pgsql-committers?