Re: slow queries on large syslog table

From: Antonio Fiol Bonnín <fiol(at)w3ping(dot)com>
To: colm ennis <colm(dot)ennis(at)eircom(dot)net>
Cc: PostgreSQL General Mailing list <pgsql-general(at)postgresql(dot)org>
Subject: Re: slow queries on large syslog table
Date: 2001-12-13 18:13:34
Message-ID: 3C18EFCE.9060706@w3ping.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Are the rows estimations "real"?

colm ennis wrote:

>hi antonio,
>
>thanks for your advice.
>
>ive tried a lot of different index combinations, with extremely variable
>results,
>for instance :
> query - SELECT stimestamp,shostid,smsg FROM syslog_table WHERE (shostid IN
>(23,3)) AND (sciscomsgid IN (41,32,70)) ORDER BY stimestamp DESC LIMIT 1000
> matching messages - 19
>
>with original indexes :
> query time(s) - 225
> explain - Limit (cost=0.00..34559.46 rows=1000 width=24)
> -> Index Scan Backward using syslog_table_stimestamp_index on
>syslog_table (cost=0.00..577149.86 rows=16700 width=24)
>
>with antonios index :
> create index syslog_table_stimestamp_shostid_sciscomsgid_index on
>syslog_table (stimestamp, shostid, sciscomsgid);
> query time(s) - 174
> explain - Limit (cost=0.00..34329.14 rows=1000 width=24)
> -> Index Scan Backward using syslog_table_st_sh_sc_index on syslog_table
>(cost=0.00..580639.57 rows=16914 width=24)
>
>with NO! index :
> query time(s) - 77
> explain - Limit (cost=73979.79..73979.79 rows=1000 width=24)
> -> Sort (cost=73979.79..73979.79 rows=16905 width=24)
> -> Seq Scan on syslog_table (cost=0.00..72591.62 rows=16905
>width=24)
>
>i got similarily confusing results from other queries.
>
>it occured to me that that the index antonio suggests is going to be huge
>because
>of the per second timestamp.
>
>i thought about what you said about deciding how I would search for data.
>
>for the query above :
> SELECT stimestamp,shostid,smsg FROM syslog_table WHERE (shostid IN (23,3))
>AND (sciscomsgid IN (41,32,70)) ORDER BY stimestamp DESC LIMIT 1000
>i would lookup shostid and sciscomsgid in a combined index, and then load
>the
>indexed rows in syslog_table. there are about 80 different shostids and
>sciscomsgids, so i guess the max size of this index would be 1600 rows.
>
>for other queries i would use an shostid index or sciscomsgid if just one of
>these
>fields appeared in the select, avoiding the stimestamp at all cost because
>its index
>will be huge.
>
>so to test if this was any good i created the combined index :
> create index syslog_table_sh_sc_index on syslog_table (shostid,
>sciscomsgid);
>but using explain found it isnt being used? :
> explain - Limit (cost=74018.18..74018.18 rows=1000 width=24)
> -> Sort (cost=74018.18..74018.18 rows=16914 width=24)
> -> Seq Scan on syslog_table (cost=0.00..72629.33 rows=16914
>width=24)
>
>i still dont understand how to use indexes to increase the speed of queries.
>
>thanks for your help so far but i still feel lost,
>
>colm ennis
>
>-----Original Message-----
>From: Antonio Fiol Bonnin [mailto:fiol(at)w3ping(dot)com]
>Sent: 13 December 2001 15:29
>To: colm ennis; PostgreSQL General Mailing list
>Subject: Re: [GENERAL] slow queries on large syslog table
>
>
>>
>>
>>ive also created a web interface for selecting syslogs based on optional
>>combinations of timestamp, hostname and ciscomsg.
>>
>
>Combinations is the *magic* word.
>
>>to speed queries i created some indexes on syslog_table :
>> create index syslog_table_stimestamp_index on syslog_table (stimestamp);
>> create index syslog_table_shostid_index on syslog_table (shostid);
>> create index syslog_table_sciscomsgid_index on syslog_table (sciscomsgid);
>> create index syslog_table_shostid_sciscomsgid_index on syslog_table
>>(shostid,sciscomsgid);
>>
>Most of them are of no use. That's what you observed...
>
>I bet you will get much better perfs with:
>
>create index syslog_table_stimestamp_shostid_sciscomsg_index on syslog_table
>(stimestamt, shostid, sciscomsg);
>
>You can try other combinations, but the one I suggested should be of use
>in case you use all three on the query, (or even if you use only the
>first, or the first two, though not sure about this last part, in
>parentheses).
>
>
>Believe me. Creating an index on a large table is of no use, unless it
>is the right one. As a rule of thumb, include in the index as many of
>the SELECTIVE columns present in the WHERE clause as you can.
>
>I am not sure of the selectivity of your columns (never used that
>particular structure). There should be some information about that on
>some of the system tables. However, I do not know in which, or how to
>get that info.
>
>As a second rule of thumb, think how YOU would search for the data you
>need if it was written on a paper book, and especially, how you would
>like to find the book ordered. For example, if you were to look the
>address corresponding to a phone number, you would like to find the data
>ordered by phone number, and not by name. Name is not of any use to you.
>
>For selectivity, think of finding the phone numbers of all people that
>live at number 5, but of any street, and whose first name is Peter.
>
>Neither "Peter" nor "5" are REALLY useful informations to perform your
>search. And even, Peter is more useful than 5.
>
>HTH,
>
>Antonio
>
>
>
>---------------------------(end of broadcast)---------------------------
>TIP 6: Have you searched our list archives?
>
>http://archives.postgresql.org
>
>.
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Jason Earl 2001-12-13 18:25:45 Re: Money reformatting
Previous Message colm ennis 2001-12-13 17:57:55 Re: slow queries on large syslog table