Re: Performance issues when the number of records are around 10 Million

From: venu madhav <venutaurus539(at)gmail(dot)com>
To: Shrirang Chitnis <Shrirang(dot)Chitnis(at)hovservices(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Performance issues when the number of records are around 10 Million
Date: 2010-05-12 06:42:33
Message-ID: AANLkTimH6EzAourWQThioZWMcGN3dA123kumX23QDNe-@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Wed, May 12, 2010 at 3:22 AM, Shrirang Chitnis <
Shrirang(dot)Chitnis(at)hovservices(dot)com> wrote:

> Venu,
>
> For starters,
>
> 1) You have used the e.cid twice in ORDER BY clause.
>
[Venu] Actually the second cid acts as a secondary sort order if any other
column in the table is used for sorting. In the query since the primary
sorting key was also cid, we are seeing it twice. I can remove it.

> 2) If you want last twenty records in the table matching the criteria of
> timestamp, why do you need the offset?
>
[Venu] It is part of an UI application where a user can ask for date
between any dates. It has the options to browse through the data retrieved
between those intervals.

> 3) Do you have indexes on sig_id, signature and timestamp fields?
>
[Venu] Yes, I do have indexes on those three.

> If you do not get a good response after that, please post the EXPLAIN
> ANALYZE for the query.
>
snort=# EXPLAIN ANALYZE select e.cid, timestamp, s.sig_class,
s.sig_priority, s.sig_name, e.sniff_ip, e.sniff_channel, s.sig_config,
e.wifi_addr_1, e.wifi_addr_2, e.view_status, bssid FROM event e, signature
s WHERE s.sig_id = e.signature AND e.timestamp >= '1270449180' AND
e.timestamp < '1273473180' ORDER BY e.cid DESC, e.cid DESC limit 21 offset
10539780;
QUERY
PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=7885743.98..7885743.98 rows=1 width=287) (actual
time=1462193.060..1462193.083 rows=14 loops=1)
-> Sort (cost=7859399.66..7885743.98 rows=10537727 width=287) (actual
time=1349648.207..1456496.334 rows=10539794 loops=1)
Sort Key: e.cid
-> Hash Join (cost=2.44..645448.31 rows=10537727 width=287)
(actual time=0.182..139745.001 rows=10539794 loops=1)
Hash Cond: ("outer".signature = "inner".sig_id)
-> Seq Scan on event e (cost=0.00..487379.97 rows=10537727
width=104) (actual time=0.012..121595.257 rows=10539794 loops=1)
Filter: (("timestamp" >= 1270449180::bigint) AND
("timestamp" < 1273473180::bigint))
-> Hash (cost=2.35..2.35 rows=35 width=191) (actual
time=0.097..0.097 rows=36 loops=1)
-> Seq Scan on signature s (cost=0.00..2.35 rows=35
width=191) (actual time=0.005..0.045 rows=36 loops=1)
*Total runtime: 1463829.145 ms*
(10 rows)
Thank you,
Venu Madhav.

>
> Thanks,
>
> Shrirang Chitnis
> Sr. Manager, Applications Development
> HOV Services
> Office: (866) 808-0935 Ext: 39210
> shrirang(dot)chitnis(at)hovservices(dot)com
> www.hovservices.com
>
>
> The information contained in this message, including any attachments, is
> attorney privileged and/or confidential information intended only for the
> use of the individual or entity named as addressee. The review,
> dissemination, distribution or copying of this communication by or to anyone
> other than the intended addressee is strictly prohibited. If you have
> received this communication in error, please immediately notify the sender
> by replying to the message and destroy all copies of the original message.
>
> From: pgsql-performance-owner(at)postgresql(dot)org [mailto:
> pgsql-performance-owner(at)postgresql(dot)org] On Behalf Of venu madhav
> Sent: Tuesday, May 11, 2010 2:18 PM
> To: pgsql-performance(at)postgresql(dot)org
> Subject: [PERFORM] Performance issues when the number of records are around
> 10 Million
>
> Hi all,
> In my database application, I've a table whose records can reach 10M
> and insertions can happen at a faster rate like 100 insertions per second in
> the peak times. I configured postgres to do auto vacuum on hourly basis. I
> have frontend GUI application in CGI which displays the data from the
> database. When I try to get the last twenty records from the database, it
> takes around 10-15 mins to complete the operation.This is the query which
> is used:
>
> select e.cid, timestamp, s.sig_class, s.sig_priority, s.sig_name,
> e.sniff_ip, e.sniff_channel, s.sig_config, e.wifi_addr_1,
> e.wifi_addr_2, e.view_status, bssid FROM event e, signature s WHERE
> s.sig_id = e.signature AND e.timestamp >= '1270449180' AND e.timestamp <
> '1273473180' ORDER BY e.cid DESC, e.cid DESC limit 21 offset 10539780;
>
> Can any one suggest me a better solution to improve the performance.
>
> Please let me know if you've any further queries.
>
>
> Thank you,
> Venu
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message venu madhav 2010-05-12 06:59:11 Re: Performance issues when the number of records are around 10 Million
Previous Message venu madhav 2010-05-12 05:45:53 Re: Performance issues when the number of records are around 10 Million