From: | Edmund Dengler <edmundd(at)eSentire(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Performance of query (fwd) |
Date: | 2003-06-11 15:48:46 |
Message-ID: | Pine.BSO.4.44.0306111125510.15310-100000@cyclops4.esentire.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Ok, tried it with a constant. No improvements. The estimator is still very
high.
Regards,
Ed
explain analyze
select
event.sid, event.cid, event.timestamp, signature.sig_name, iphdr.ip_src, iphdr.ip_dst,
icmphdr.icmp_type, icmphdr.icmp_code, icmphdr.icmp_csum, icmphdr.icmp_id, icmphdr.icmp_seq,
udphdr.udp_sport, udphdr.udp_dport, udphdr.udp_len, udphdr.udp_csum,
tcphdr.tcp_sport, tcphdr.tcp_dport, tcphdr.tcp_seq, tcphdr.tcp_ack, tcphdr.tcp_off,
tcphdr.tcp_res, tcphdr.tcp_flags, tcphdr.tcp_win, tcphdr.tcp_csum, tcphdr.tcp_urp,
sensor.hostname, sensor.interface, data.data_payload
from (
select * from event
where timestamp > '2003-06-11 9:22 EDT'::timestamp
and exists (select 1 from hack_pull_sid where sid = event.sid)
) as event
left join signature on signature.sig_id = event.signature
left join iphdr on iphdr.sid = event.sid and iphdr.cid = event.cid
left join icmphdr on icmphdr.sid = event.sid and icmphdr.cid = event.cid
left join udphdr on udphdr.sid = event.sid and udphdr.cid = event.cid
left join tcphdr on tcphdr.sid = event.sid and tcphdr.cid = event.cid
left join sensor on sensor.sid = event.sid
left join data on data.sid = event.sid and data.cid = event.cid
order by
timestamp desc, event.sid desc, event.cid desc
;
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=1710629.85..1711222.69 rows=237136 width=853) (actual time=308403.45..308403.61 rows=139 loops=1)
Sort Key: public.event."timestamp", public.event.sid, public.event.cid
-> Merge Join (cost=1470394.12..1550828.17 rows=237136 width=853) (actual time=299837.82..308402.64 rows=139 loops=1)
Merge Cond: (("outer".sid = "inner".sid) AND ("outer".cid = "inner".cid))
-> Merge Join (cost=727702.20..801483.40 rows=237136 width=258) (actual time=45098.40..50164.32 rows=139loops=1)
Merge Cond: ("outer".sid = "inner".sid)
-> Merge Join (cost=727700.17..797923.97 rows=237136 width=226) (actual time=45077.16..50140.32 rows=139 loops=1)
Merge Cond: (("outer".sid = "inner".sid) AND ("outer".cid = "inner".cid))
-> Merge Join (cost=727700.17..735968.33 rows=237136 width=172) (actual time=25446.85..26861.14 rows=139 loops=1)
Merge Cond: (("outer".sid = "inner".sid) AND ("outer".cid = "inner".cid))
-> Merge Join (cost=727700.17..730447.45 rows=237136 width=144) (actual time=23709.07..24201.98 rows=139 loops=1)
Merge Cond: (("outer".sid = "inner".sid) AND ("outer".cid = "inner".cid))
-> Sort (cost=700380.22..700973.06 rows=237136 width=116) (actual time=17821.61..17821.81 rows=139 loops=1)
Sort Key: public.event.sid, public.event.cid
-> Merge Join (cost=590252.70..671029.48 rows=237136 width=116) (actual time=11569.28..17821.25 rows=139 loops=1)
Merge Cond: (("outer".sid = "inner".sid) AND ("outer".cid = "inner".cid))
-> Index Scan using iphdr_pkey on iphdr (cost=0.00..73422.89 rows=1432042 width=28) (actual time=18.79..16195.02 rows=730166 loops=1)
-> Sort (cost=590252.70..590845.54 rows=237136 width=88) (actual time=52.35..52.55 rows=139 loops=1)
Sort Key: public.event.sid, public.event.cid
-> Hash Join (cost=185.07..562558.03 rows=237136 width=88) (actual time=34.28..51.89 rows=139 loops=1)
Hash Cond: ("outer".signature = "inner".sig_id)
-> Index Scan using timestamp_idx on event (cost=0.00..557630.23 rows=237136 width=24) (actual time=0.22..16.58 rows=139 loops=1)
Index Cond: ("timestamp" > ('2003-06-11 09:22:00'::timestamp without time zone)::timestamp with time zone)
Filter: (subplan)
SubPlan
-> Seq Scan on hack_pull_sid (cost=0.00..1.15 rows=1 width=0) (actual time=0.01..0.01 rows=0 loops=953)
Filter: (sid = $0)
-> Hash (cost=167.26..167.26 rows=7126 width=64) (actualtime=33.91..33.91 rows=0 loops=1)
-> Seq Scan on signature (cost=0.00..167.26 rows=7126 width=64) (actual time=0.04..17.74 rows=7131 loops=1)
-> Sort (cost=27319.94..27897.87 rows=231171 width=28) (actual time=5671.28..6088.28 rows=130551 loops=1)
Sort Key: icmphdr.sid, icmphdr.cid
-> Seq Scan on icmphdr (cost=0.00..3784.71 rows=231171 width=28) (actual time=3.87..1278.01 rows=232877 loops=1)
-> Index Scan using udphdr_pkey on udphdr (cost=0.00..4411.44 rows=115228 width=28) (actual time=79.77..2532.43 rows=56946 loops=1)
-> Index Scan using tcphdr_pkey on tcphdr (cost=0.00..56586.06 rows=1139141 width=54) (actual time=89.40..22157.07 rows=504556 loops=1)
-> Sort (cost=2.04..2.11 rows=30 width=32) (actual time=21.18..21.40 rows=158 loops=1)
Sort Key: sensor.sid
-> Seq Scan on sensor (cost=0.00..1.30 rows=30 width=32) (actual time=20.98..21.05 rows=30 loops=1)
-> Sort (cost=742691.92..745059.80 rows=947154 width=595) (actual time=246938.33..257244.21 rows=444980 loops=1)
Sort Key: data.sid, data.cid
-> Seq Scan on data (cost=0.00..51821.54 rows=947154 width=595) (actual time=9.68..198574.76 rows=955002 loops=1)
Total runtime: 308903.04 msec
(41 rows)
On Tue, 10 Jun 2003, Tom Lane wrote:
> Edmund Dengler <edmundd(at)eSentire(dot)com> writes:
> > select * from event
> > where timestamp > (select now() - '2 hours'::interval)
> > and exists (select 1 from hack_pull_sid where sid = event.sid)
>
> > (note: <hack_pull_sid> is a table of SIDs I am interested in so that I
> > avoid the issues with IN)
>
> I think you're creating more issues than you're avoiding. With the
> above query, the planner has little chance of guessing how many rows
> will be retrieved from "event" ... and indeed the EXPLAIN output shows
> that its guess is off by more than a factor of 1000:
>
> > -> Index Scan using timestamp_idx on event (cost=0.00..558165.62 rows=237893 width=24) (actual time=0.18..3.05 rows=129 loops=1)
>
> With a misestimate of that magnitude at the core of the query, it's
> unsurprising that all the other plan choices are bad too.
>
> But actually I suspect the easiest point of attack is not the EXISTS
> subquery, but the timestamp comparison. Can you get your application
> to supply a simple literal constant to compare to the timestamp, viz
> '2003-06-10 21:44' rather than now()-'2 hours'? The former gives the
> planner something to compare to its statistics, the latter doesn't.
>
> Oh ... you have done an ANALYZE on event reasonably recently, no?
>
> regards, tom lane
>
From | Date | Subject | |
---|---|---|---|
Next Message | listrec | 2003-06-11 15:50:58 | Re: Index not being used in MAX function (7.2.3) |
Previous Message | Chris Gamache | 2003-06-11 15:44:21 | Re: Index not being used in MAX function (7.2.3) |