Re: Query performance discontinuity

From: Mike Nielsen <miken(at)bigpond(dot)net(dot)au>
To: Postgresql performance <pgsql-performance(at)postgresql(dot)org>
Cc: Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>
Subject: Re: Query performance discontinuity
Date: 2002-11-12 04:10:31
Message-ID: 1037074231.29703.234.camel@CPE-144-132-182-167
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Stephan, Tom & Josh:

Here's the result I get from changing the <> to a > in the tstart
condition (no improvement):

pganalysis=> explain analyze select * from ps2 where tstart> '2000-1-1
pganalysis'> 00:00:00' and time_stamp > '2000-1-1 00:00:00' order by
pganalysis-> tstart,time_stamp limit 59628;
NOTICE: QUERY PLAN:

Limit (cost=160313.27..160313.27 rows=59628 width=179) (actual
time=42269.87..42709.82 rows=59628 loops=1)
-> Sort (cost=160313.27..160313.27 rows=327895 width=179) (actual
time=42269.86..42643.74 rows=59629 loops=1)
-> Seq Scan on ps2 (cost=0.00..13783.40 rows=327895 width=179)
(actual time=0.15..15211.49 rows=327960 loops=1)
Total runtime: 43232.53 msec

EXPLAIN

Setting enable_seqscan=off produced a good result:

pganalysis=> explain analyze select * from ps2 where tstart> '2000-1-1
pganalysis'> 00:00:00' and time_stamp > '2000-1-1 00:00:00' order by
pganalysis-> tstart,time_stamp limit 59628;
NOTICE: QUERY PLAN:

Limit (cost=0.00..160322.87 rows=59628 width=179) (actual
time=40.39..2222.06 rows=59628 loops=1)
-> Index Scan using ps2_idx on ps2 (cost=0.00..881616.45 rows=327895
width=179) (actual time=40.38..2151.38 rows=59629 loops=1)
Total runtime: 2262.23 msec

EXPLAIN

The ps2 table is in time_stamp order, but the tstarts aren't quite as
good -- they're mostly there, but they're computed by subtracting a
(stochastic) value from time_stamp.

I haven't tinkered with sort_mem yet, but will once I've got this little
project wrapped up (1 or 2 days to go!).

This, by the way, is pg log data that I've parsed up so I can do some
performance number-crunching for a client of mine. Is there a better
way to get comprehensive, per-query read, write and cache hit data? As
you can imagine, with millions of records, my client-side perl script
for the parsing is slow. I've tried my hand at writing an aggregate
function on the server side using lex and yacc, but decided that, at
least for this project, I'd rather let the machine do the head-banging
-- I can tokenize the raw syslog data (loaded into another pg table)
into an intermediate result in an aggregate function, and my parser
works on the token strings, but the perl script finished before I could
go any further...

In the off chance, however, that I get invited to more of this kind of
work, it would be really handy to be able to get the data without all
this hassle! Any clues would be gratefully received.

Regards,

Mike

On Tue, 2002-11-12 at 04:44, Mike Nielsen wrote:
pgsql-performers,

Just out of curiosity, anybody with any ideas on what happens to this
query when the limit is 59626? It's as though 59626 = infinity?

pganalysis=> explain analyze select * from ps2 where tstart<> '2000-1-1
00:00:00' and time_stamp > '2000-1-1 00:00:00' order by
tstart,time_stamp limit 59624;
NOTICE: QUERY PLAN:

Limit (cost=0.00..160328.37 rows=59624 width=179) (actual
time=14.52..2225.16 rows=59624 loops=1)
-> Index Scan using ps2_idx on ps2 (cost=0.00..881812.85 rows=327935
width=179) (actual time=14.51..2154.59 rows=59625 loops=1)
Total runtime: 2265.93 msec

EXPLAIN
pganalysis=> explain analyze select * from ps2 where tstart<> '2000-1-1
00:00:00' and time_stamp > '2000-1-1 00:00:00' order by
tstart,time_stamp limit 59625;
NOTICE: QUERY PLAN:

Limit (cost=0.00..160331.06 rows=59625 width=179) (actual
time=0.45..2212.19 rows=59625 loops=1)
-> Index Scan using ps2_idx on ps2 (cost=0.00..881812.85 rows=327935
width=179) (actual time=0.45..2140.87 rows=59626 loops=1)
Total runtime: 2254.50 msec

EXPLAIN
pganalysis=> explain analyze select * from ps2 where tstart<> '2000-1-1
00:00:00' and time_stamp > '2000-1-1 00:00:00' order by
tstart,time_stamp limit 59626;
NOTICE: QUERY PLAN:

Limit (cost=160332.32..160332.32 rows=59626 width=179) (actual
time=37359.41..37535.85 rows=59626 loops=1)
-> Sort (cost=160332.32..160332.32 rows=327935 width=179) (actual
time=37359.40..37471.07 rows=59627 loops=1)
-> Seq Scan on ps2 (cost=0.00..13783.52 rows=327935 width=179)
(actual time=0.26..12433.00 rows=327960 loops=1)
Total runtime: 38477.39 msec

EXPLAIN
pganalysis=> explain analyze select * from ps2 where tstart<> '2000-1-1
00:00:00' and time_stamp > '2000-1-1 00:00:00' order by
tstart,time_stamp limit 59627;
NOTICE: QUERY PLAN:

Limit (cost=160332.32..160332.32 rows=59627 width=179) (actual
time=38084.85..38260.88 rows=59627 loops=1)
-> Sort (cost=160332.32..160332.32 rows=327935 width=179) (actual
time=38084.83..38194.63 rows=59628 loops=1)
-> Seq Scan on ps2 (cost=0.00..13783.52 rows=327935 width=179)
(actual time=0.15..12174.74 rows=327960 loops=1)
Total runtime: 38611.83 msec

EXPLAIN

pganalysis=> \d ps2
Table "ps2"
Column | Type | Modifiers
-------------+--------------------------+-----------
host | character varying(255) |
pid | integer |
line | integer |
time_stamp | timestamp with time zone |
seq | integer |
cpu_sys | real |
cpu_elapsed | real |
cpu_user | real |
cpu_syst | real |
cpu_usert | real |
mssp | integer |
sigp | integer |
msrt | integer |
msst | integer |
sigt | integer |
msrp | integer |
swap | integer |
swat | integer |
recp | integer |
rect | integer |
pgfp | integer |
pgft | integer |
icsp | integer |
vcst | integer |
icst | integer |
vcsp | integer |
fsbop | integer |
fsbos | integer |
fsbip | integer |
fsbis | integer |
dread | integer |
dwrit | integer |
sbhr | real |
sread | integer |
swrit | integer |
lbhr | real |
lread | integer |
lwrit | integer |
dbuser | character(8) |
tstart | timestamp with time zone |
Indexes: ps2_idx

pganalysis=> \d ps2_idx
Index "ps2_idx"
Column | Type
------------+--------------------------
tstart | timestamp with time zone
time_stamp | timestamp with time zone
btree

pganalysis=>

psql (PostgreSQL) 7.2
contains support for: readline, history, multibyte


Platform: Celeron 1.3GHz, 512MB 40GB IDE hard disk, Linux 2.4.8-26mdk
kernel

Regards,

Mike



---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Rod Taylor 2002-11-12 04:51:19 Re: Query performance discontinuity
Previous Message Tom Lane 2002-11-12 01:39:01 Re: Query performance discontinuity