Re: Serious performance problem

From: "Tille, Andreas" <TilleA(at)rki(dot)de>
To:
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Serious performance problem
Date: 2001-11-01 15:02:16
Message-ID: Pine.LNX.4.33.0111011541480.21752-100000@wr-linux02.rki.ivbb.bund.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 31 Oct 2001, Hannu Krosing wrote:

> I tried some more on optimizing the query on my work computer
> (AMD ATHLON 850, 512MB, PostgreSQL 7.1.3 with default memory settings)
>
>
> SELECT MeldeKategorie,
> Count(ID) AS Anz
> FROM Hauptdaten_Fall
> WHERE IstAktuell=20
> GROUP BY MeldeKategorie
> ORDER BY MeldeKategorie;
>
> real 0m9.675s
>
> create index i1 on Hauptdaten_Fall(IstAktuell,MeldeKategorie);
>
> ----------------------------
> set enable_seqscan = off;
> SELECT MeldeKategorie,
> Count(ID) AS Anz
> FROM Hauptdaten_Fall
> WHERE IstAktuell=20
> GROUP BY MeldeKategorie
> ORDER BY MeldeKategorie;
>
> Aggregate (cost=4497.30..4510.18 rows=258 width=16)
> -> Group (cost=4497.30..4503.74 rows=2575 width=16)
> -> Sort (cost=4497.30..4497.30 rows=2575 width=16)
> -> Index Scan using i1 on hauptdaten_fall
> (cost=0.00..4351.40 rows=2575 width=16)
>
> real 0m7.131s
>
> ---------------------------
>
> set enable_seqscan = off;
> SELECT MeldeKategorie,
> Count(ID) AS Anz
> FROM Hauptdaten_Fall
> WHERE IstAktuell=20
> GROUP BY IstAktuell,MeldeKategorie
> ORDER BY IstAktuell,MeldeKategorie;
>
> Aggregate (cost=4497.30..4510.18 rows=258 width=16)
> -> Group (cost=4497.30..4503.74 rows=2575 width=16)
> -> Index Scan using i1 on hauptdaten_fall (cost=0.00..4351.40
> rows=2575 width=16)
>
> real 0m3.223s
Hmmm, could you please explain the theory behind that for quite a
beginner like me (perhaps on -general if you feel it apropriate)

The change in the second select is that you included IstAktuell in the
GROUP BY/ORDER BY clause and this gives a speed increas by factor 2.
It seems that the "Sort" can be left out in this case if I look at the
plan, but why that? The WHERE clause should select just all IstAktuell=20
data sets and so the GROUP BY/ORDER BY clauses should every time have the
same work - as for my humble understanding.

>
> -- same after doing
>
> cluster i1 on Hauptdaten_Fall;
>
> real 1.590 -- 1.600
That´s also interesting. In reality the table Hauptdaten_Fall has many fields
with many indices. If I understand things right it makes no sense to have
more than one clustered index, right? A further speed increase of factor two
would be welcome. Could I expect this if I would find out the "sensitive"
index of my table for certain tasks? Or is my understanging wrong and it
makes sense to cluster more than one index. Unfortunately clustering the
index of a huge table takes some time. Could I speed this up by some
tricks?

> select count(*) from Hauptdaten_Fall;
>
> real 0m0.630s
>
> ---------------------------
>
> The following query is marginally (about 0.1 sec) faster, though the
> plan looks the same down to cost estimates.
>
> SET ENABLE_SEQSCAN = OFF;
> SELECT MeldeKategorie,
> Count(*) AS Anz
> FROM (select IstAktuell,MeldeKategorie from Hauptdaten_Fall where
> IstAktuell=20) sub
> GROUP BY IstAktuell,MeldeKategorie
> ORDER BY IstAktuell,MeldeKategorie;
>
> Aggregate (cost=0.00..4370.72 rows=258 width=16)
> -> Group (cost=0.00..4364.28 rows=2575 width=16)
> -> Index Scan using i1 on hauptdaten_fall (cost=0.00..4351.40
> rows=2575 width=16)
>
> real 0m1.438s - 1.506s
Hmm, perhaps this is nearly nothing or is there any theory that a
count(*) is faster than a count(<fieldname>)?

> ...
> real 0m6.077 -- 6.606s
>
> and after clustering:
> cluster i1 on Hauptdaten_Fall;
>
> real 0m5.683 - 5.750s
>
> so it's linear growth here
This is what my colleague was afraid of: We would have linear growth
compared to the log(n) growth which is to be expected on MS SQL server
(for this certain type of queries and for sure up to a far limit of
data where other constraints could get influence, but we are far from
this limit). This would not convince him :-(.

Kind regards

Andreas.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Mr. Shannon Aldinger 2001-11-01 15:09:20 Posgresql 7.2b1 crashes
Previous Message Marc G. Fournier 2001-11-01 14:56:17 Server going down for several hours ...