From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Ernie <ernie(dot)cline(at)ipgdirect(dot)com> |
Cc: | "postgres" <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: performance help |
Date: | 2000-07-28 18:17:22 |
Message-ID: | 581.964808242@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Ernie <ernie(dot)cline(at)ipgdirect(dot)com> writes:
> This query is very fast.
>
> cw=# SELECT distinct n.news_id, headline, link, to_char(created,
> 'mm-dd-yyyyhh24:mi'),
> cw-# created FROM news_article_summary n, news_cat nc WHERE n.news_id =
> nc.news_id AND
> cw-# created > CURRENT_TIMESTAMP-30 AND nc.code_id
> cw-#
> in(14,227,326,321,327,4301,5179,5100,585,5175,567,5185,568,5381,5109,554,5621,5
> 462,
> cw(# 597,5324,5117,569,5142,570,5327,571,5167,5481,5145) ORDER BY created desc;
>
> Unique (cost=60322.14..60559.66 rows=1900 width=48)
> -> Sort (cost=60322.14..60322.14 rows=19001 width=48)
> -> Nested Loop (cost=0.00..58651.80 rows=19001 width=48)
> -> Seq Scan on news_article_summary n (cost=0.00..416.14
> rows=1898 width=36)
> -> Index Scan using news_cat_news_id on news_cat nc
> (cost=0.00..30.53 rows=10 width=12)
>
> And here is the query I'm having problems with (slow)
> cw=# SELECT distinct n.news_id, headline, link,
> cw-# to_char(created, 'mm-dd-yyyy hh24:mi'),
> cw-# created FROM news_article_summary n, news_cat nc
> cw-# WHERE n.news_id = nc.news_id AND created > CURRENT_TIMESTAMP-30 AND
> cw-# nc.code_id in(4261,4182) ORDER BY created desc;
>
> Unique (cost=35162.58..35181.27 rows=150 width=48)
> -> Sort (cost=35162.58..35162.58 rows=1496 width=48)
> -> Nested Loop (cost=0.00..35083.71 rows=1496 width=48)
> -> Index Scan using news_cat_code_id, news_cat_code_id on
> news_cat nc (cost=0.00..55.31 rows=79 width=12)
> -> Seq Scan on news_article_summary n (cost=0.00..416.14
> rows=1898 width=36)
The difference evidently is which table is scanned as the outside of the
nested loop. The first plan says "scan news_article_summary
sequentially, and for each row that passes the WHERE clauses that
mention only that table, probe into news_cat_news_id for the row(s)
that match by news_id; then check the remaining WHERE clauses on the
combined row(s)." The second plan says "probe into news_cat_news_id
for the row(s) that have the requested code_id values, and for each
one scan news_article_summary sequentially to locate matching rows."
If there are a lot of matches for nc.code_id in(4261,4182) then the
second plan would scan news_article_summary many times, which'd account
for it being slow.
Are news_id and code_id unique columns? I am guessing from context
that at least news_id might be. The planner does not seem to know
that, judging from its row-count guesses. Perhaps all you need is
a VACUUM ANALYZE so that the planner becomes aware the column is
unique.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Erich | 2000-07-28 20:25:56 | PG vs. Oracle for larger databases |
Previous Message | Tom Lane | 2000-07-28 17:36:35 | Re: pg_dump & performance degradation |