Re: Serious performance problem

From: Antonio Fiol Bonnín <fiol(at)w3ping(dot)com>
To: Brent Verner <brent(at)rcfile(dot)org>
Cc: "Tille, Andreas" <TilleA(at)rki(dot)de>, "Claus, Hermann" <ClausH(at)rki(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Serious performance problem
Date: 2001-10-30 13:53:29
Message-ID: 3BDEB0D9.A277E82C@w3ping.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

AFAIK, sorting is necessary even when you have CLUSTERed a table using an index.

Somewhere on the docs I read sth like "CLUSTER reorders the table on disk so that entries
closer on the index are closer on the disk" (obviously written in better English ;-)

But if you INSERT a single row later, it will NOT get inserted to the right place. So
SORT is still necessary.

MAYBE, but I am not sure at all, the sort may take place in less "real" time than in case
the table was not CLUSTERed, as the table is "nearly" sorted.

Hackers, is the sorting algorithm capable of exiting at the very moment the table is
sorted, or are some extra passes always calculated?

Good luck!

Antonio

Brent Verner wrote:

> On 30 Oct 2001 at 11:44 (+0100), Tille, Andreas wrote:
> | On Mon, 29 Oct 2001, Vsevolod Lobko wrote:
> |
> | > Seems that problem is very simple :))
> | > MSSql can do queries from indexes, without using actual table at all.
> | > Postgresql doesn't.
> | >
> | > So mssql avoids sequental scanning of big table, and simply does scan of
> | > index which is already in needed order and has very much less size.
> | I forewarded this information to my colleague and he replied the following
> | (im translating from German into English):
> |
> | hc> I expected this problem. But what is the purpose of an index: Not
> | hc> to look into the table itself. Moreover this means that the expense
> | hc> grows linear with the table size - no good prospect at all (the
> | hc> good thing is it is not exponential :-)).
> | I have to explain that we are in the *beginning* of production process.
> | We expect a lot more of data.
> |
> | hc> In case of real index usage the expense grows only with log(n).
> | hc> No matter about the better philosophy of database servers, MS-SQL-Server
> | hc> has a consequent index usage and so it is very fast at many queries.
> | hc> When performing a query to a field without index, I get a slow
> | hc> table scan. This is like measuring the speed of the harddisk and
> | hc> the cleverness of the cache.
> |
> | The consequence for my problem is now: If it is technically possible
> | to implement index scans without table lookups please implement it. If
> | not we just have to look for another database engine which does so,
> | because our applictaion really need the speed on this type of queries.
> | I repeat from my initial posting: The choice of the server for our
> | application could have importance for many projects in the field of
> | medicine in Germany. I really hope that there is a reasonable solution
> | which perhaps could give a balance between safety and speed. For
> | example I can assure in my application that the index, once created
> | will be valid, because I just want to read in a new set of data once
> | a day (from the MS-SQL Server which collects data over the day). So
> | I could recreate all indices after the import and the database is
> | readonly until the next cron job. Is there any chance to speed up
> | those applications?
>
> CREATE INDEX idx_meldekategorie_hauptdaten_f
> ON hauptdaten_fall(meldekategorie);
> CLUSTER idx_meldekategorie_hauptdaten_f ON hauptdaten_fall;
>
> Aggregate (cost=5006.02..5018.90 rows=258 width=16)
> -> Group (cost=5006.02..5012.46 rows=2575 width=16)
> -> Sort (cost=5006.02..5006.02 rows=2575 width=16)
> -> Seq Scan on hauptdaten_fall (cost=0.00..4860.12 rows=2575 width=16)
>
> This looks much nicer, but is still quite slow. I'm quite sure the
> slowness is in the sort(), since all queries that don't sort, return
> quickly. I hoped the clustered index would speed up the sort, but
> that is not the case.
>
> It _seems_ a simple optimization would be to not (re)sort the tuples
> when using a clustered index.
>
> if( the_column_to_order_by_is_clustered ){
> if( order_by_is_DESC )
> // reverse the tuples to handle
> }
>
> I haven't looked at the code to see if this is even feasible, but I
> do imagine there is enough info available to avoid an unnecessary
> sort on the CLUSTERED index. The only problem I see with this is
> if the CLUSTERed index is not kept in a CLUSTERed state as more
> records are added to this table.
>
> brent
>
> --
> "Develop your talent, man, and leave the world something. Records are
> really gifts from people. To think that an artist would love you enough
> to share his music with anyone is a beautiful thing." -- Duane Allman
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuo Ishii 2001-10-30 13:57:55 Re: pgsql-committers?
Previous Message Marc G. Fournier 2001-10-30 13:29:24 Re: pgsql-committers?