Re: Inner join vs where-clause subquery

From: "Jeremy Haile" <jhaile(at)fastmail(dot)fm>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Inner join vs where-clause subquery
Date: 2006-12-19 19:34:50
Message-ID: 1166556890.1882.281227661@webmail.messagingengine.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Here's the query and explain analyze using the result of the sub-query
substituted:

QUERY
explain analyze select min(nlogid) as start_nlogid,
max(nlogid) as end_nlogid,
min(dtCreateDate) as start_transaction_timestamp,
max(dtCreateDate) as end_transaction_timestamp
from activity_log_facts
where nlogid > 478287801
and dtCreateDate < '2006-12-18 9:10'

EXPLAIN ANALYZE
Aggregate (cost=657.37..657.38 rows=1 width=12) (actual
time=0.018..0.019 rows=1 loops=1)
-> Index Scan using activity_log_facts_nlogid_idx on
activity_log_facts (cost=0.00..652.64 rows=472 width=12) (actual
time=0.014..0.014 rows=0 loops=1)
Index Cond: (nlogid > 478287801)
Filter: (dtcreatedate < '2006-12-18 09:10:00'::timestamp without
time zone)
Total runtime: 0.076 ms

Sorry if the reason should be obvious, but I'm not the best at
interpreting the explains. Why is this explain so much simpler than the
other query plan (with the subquery)?

On Tue, 19 Dec 2006 18:23:06 +0000, "Richard Huxton" <dev(at)archonet(dot)com>
said:
> Jeremy Haile wrote:
> > Here is the explain analyze output:
>
> Well, the row estimates are about as far out as you can get:
>
> > -> Index Scan using activity_log_facts_pkey on
> > activity_log_facts (cost=0.00..1831613.82 rows=1539298
> > width=12) (actual time=0.050..0.050 rows=0 loops=1)
>
> > -> Index Scan Backward using activity_log_facts_pkey on
> > activity_log_facts (cost=0.00..1831613.82 rows=1539298
> > width=12) (actual time=0.004..0.004 rows=0 loops=1)
>
> > -> Index Scan using activity_log_facts_dtcreatedate_idx on
> > activity_log_facts (cost=0.00..5406927.50 rows=1539298
> > width=12) (actual time=100221.953..100221.953 rows=0 loops=1)
>
> > -> Index Scan Backward using
> > activity_log_facts_dtcreatedate_idx on activity_log_facts
> > (cost=0.00..5406927.50 rows=1539298 width=12) (actual
> > time=56367.364..56367.364 rows=0 loops=1)
>
> Hmm - it's using the indexes on dtCreateDate and nlogid which seems
> broadly sensible, and then plans to limit the results for min()/max().
> However, it's clearly wrong about how many rows will satisfy
> nlogid > (select max(a.end_nlogid) from activity_log_import_history a)
>
> >>> select min(nlogid) as start_nlogid,
> >>> max(nlogid) as end_nlogid,
> >>> min(dtCreateDate) as start_transaction_timestamp,
> >>> max(dtCreateDate) as end_transaction_timestamp
> >>> from activity_log_facts
> >>> where nlogid > ( select max(a.end_nlogid) from
> >>> activity_log_import_history a)
> >>> and dtCreateDate < '2006-12-18 9:10'
>
> If you run explain on the other forms of your query, I'd guess it's much
> more accurate. There's a simple way to see if that is the issue. Run the
> sub-query and substitute the actual value returned into the query above.
> Then, try the same but with a prepared query. If it's down to nlogid
> estimates then the first should be fast and the second slow.
>
> --
> Richard Huxton
> Archonet Ltd

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Richard Huxton 2006-12-19 20:02:35 Re: Inner join vs where-clause subquery
Previous Message Richard Huxton 2006-12-19 18:23:06 Re: Inner join vs where-clause subquery