Re: Performance issues when the number of records are around 10 Million

From: venu madhav <venutaurus539(at)gmail(dot)com>
To: Jorge Montero <jorge_montero(at)homedecorators(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Performance issues when the number of records are around 10 Million
Date: 2010-05-12 05:45:53
Message-ID: AANLkTiniCCNp1s6ZHXvQ90zR3ocQSZcLm0JPP8KTArs2@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Wed, May 12, 2010 at 3:17 AM, Jorge Montero <
jorge_montero(at)homedecorators(dot)com> wrote:

> First, are you sure you are getting autovacuum to run hourly? Autovacuum
> will only vacuum when certain configuration thresholds are reached. You can
> set it to only check for those thresholds every so often, but no vacuuming
> or analyzing will be done unless they are hit, regardless of how often
> autovacuum checks the tables. Whenever you are dealing with time series, the
> default thresholds are often insufficient, especially when you are
> especially interested in the last few records on a large table.
>
>
[Venu] Yes, autovacuum is running every hour. I could see in the log
messages. All the configurations for autovacuum are disabled except that it
should run for every hour. This application runs on an embedded box, so
can't change the parameters as they effect the other applications running on
it. Can you please explain what do you mean by default parameters.

> What are your autovacuum configuration parameters?
>
[Venu] Except these all others are disabled.
#---------------------------------------------------------------------------

# AUTOVACUUM
PARAMETERS
#---------------------------------------------------------------------------

autovacuum = on # enable autovacuum
subprocess?
autovacuum_naptime = 3600 # time between autovacuum runs, in
secs

When were the two tables last autovacuum and analyzed, according to
> pg_stat_user_tables?
>
[Venu] This is the content of pg_stat_user_tables for the two tables I am
using in that query.
* relid | schemaname | relname | seq_scan | seq_tup_read | idx_scan
| idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del
-------+------------+------------------+----------+--------------+----------+---------------+-----------+-----------+-----------
41188 | public | event | 117 | 1201705723 | 998
| 2824 | 28 | 0 | 0
41209 | public | signature | 153 | 5365 | 2
| 72 | 1 | 0 | 0
*

> Could you post the output of explain analyze of your query?
>
snort=# *EXPLAIN ANALYZE select e.cid, timestamp, s.sig_class,
s.sig_priority, s.sig_name, e.sniff_ip, e.sniff_channel, s.sig_config,
e.wifi_addr_1, e.wifi_addr_2, e.view_status, bssid FROM event e, signature
s WHERE s.sig_id = e.signature AND e.timestamp >= '1270449180' AND
e.timestamp < '1273473180' ORDER BY e.cid DESC,
e.cid DESC limit 21 offset 10539780; *
QUERY
PLAN
---------------------------------------------------------------------------

------------------------------------------------------------------
Limit (cost=7885743.98..7885743.98 rows=1 width=287) (actual
time=1462193.060..1462193.083 rows=14 loops=1)
-> Sort (cost=7859399.66..7885743.98 rows=10537727 width=287)
(actual time=1349648.207..1456496.334 rows=10539794 loops=1)
Sort Key: e.cid
-> Hash Join (cost=2.44..645448.31 rows=10537727 width=287)
(actual time=0.182..139745.001 rows=10539794 loops=1)
Hash Cond: ("outer".signature = "inner".sig_id)
-> Seq Scan on event e (cost=0.00..487379.97
rows=10537727 width=104) (actual time=0.012..121595.257 rows=10539794
loops=1)
Filter: (("timestamp" >= 1270449180::bigint) AND
("timestamp" < 1273473180::bigint))
-> Hash (cost=2.35..2.35 rows=35 width=191) (actual
time=0.097..0.097 rows=36 loops=1)
-> Seq Scan on signature s (cost=0.00..2.35
rows=35 width=191) (actual time=0.005..0.045 rows=36 loops=1)
Total runtime: 1463829.145 ms
(10 rows)

> Which default statistic collection parameters do you use? Have you changed
> them specifically for the tables you are using?
>
[Venu] These are the statistic collection parameters:
* # - Query/Index Statistics Collector -

stats_start_collector = on
stats_command_string = on
#stats_block_level = off
stats_row_level = on
#stats_reset_on_server_start = off*
Please let me know if you are referring to something else.

> Which version of Postgres are you running? Which OS?
>
[Venu] Postgres Version 8.1 and Cent OS 5.1 is the Operating System.

Thank you,
Venu

>
>
>
> >>> venu madhav <venutaurus539(at)gmail(dot)com> 05/11/10 3:47 AM >>>
>
> Hi all,
> In my database application, I've a table whose records can reach 10M and
> insertions can happen at a faster rate like 100 insertions per second in the
> peak times. I configured postgres to do auto vacuum on hourly basis. I have
> frontend GUI application in CGI which displays the data from the database.
> When I try to get the last twenty records from the database, it takes around
> 10-15 mins to complete the operation.This is the query which is used:
>
> *select e.cid, timestamp, s.sig_class, s.sig_priority, s.sig_name,
> e.sniff_ip, e.sniff_channel, s.sig_config, e.wifi_addr_1,
> e.wifi_addr_2, e.view_status, bssid FROM event e, signature s WHERE
> s.sig_id = e.signature AND e.timestamp >= '1270449180' AND e.timestamp <
> '1273473180' ORDER BY e.cid DESC, e.cid DESC limit 21 offset 10539780;
> *
> Can any one suggest me a better solution to improve the performance.
>
> Please let me know if you've any further queries.
>
>
> Thank you,
> Venu
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message venu madhav 2010-05-12 06:42:33 Re: Performance issues when the number of records are around 10 Million
Previous Message Josh Berkus 2010-05-12 01:04:53 Re: Performance issues when the number of records are around 10 Million