Re: performance help

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Ernie <ernie(dot)cline(at)ipgdirect(dot)com>
Cc: "postgres" <pgsql-general(at)postgresql(dot)org>
Subject: Re: performance help
Date: 2000-07-28 18:17:22
Message-ID: 581.964808242@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Ernie <ernie(dot)cline(at)ipgdirect(dot)com> writes:
> This query is very fast.
>
> cw=# SELECT distinct n.news_id, headline, link, to_char(created,
> 'mm-dd-yyyyhh24:mi'),
> cw-# created FROM news_article_summary n, news_cat nc WHERE n.news_id =
> nc.news_id AND
> cw-# created > CURRENT_TIMESTAMP-30 AND nc.code_id
> cw-#
> in(14,227,326,321,327,4301,5179,5100,585,5175,567,5185,568,5381,5109,554,5621,5
> 462,
> cw(# 597,5324,5117,569,5142,570,5327,571,5167,5481,5145) ORDER BY created desc;
>
> Unique (cost=60322.14..60559.66 rows=1900 width=48)
> -> Sort (cost=60322.14..60322.14 rows=19001 width=48)
> -> Nested Loop (cost=0.00..58651.80 rows=19001 width=48)
> -> Seq Scan on news_article_summary n (cost=0.00..416.14
> rows=1898 width=36)
> -> Index Scan using news_cat_news_id on news_cat nc
> (cost=0.00..30.53 rows=10 width=12)
>
> And here is the query I'm having problems with (slow)
> cw=# SELECT distinct n.news_id, headline, link,
> cw-# to_char(created, 'mm-dd-yyyy hh24:mi'),
> cw-# created FROM news_article_summary n, news_cat nc
> cw-# WHERE n.news_id = nc.news_id AND created > CURRENT_TIMESTAMP-30 AND
> cw-# nc.code_id in(4261,4182) ORDER BY created desc;
>
> Unique (cost=35162.58..35181.27 rows=150 width=48)
> -> Sort (cost=35162.58..35162.58 rows=1496 width=48)
> -> Nested Loop (cost=0.00..35083.71 rows=1496 width=48)
> -> Index Scan using news_cat_code_id, news_cat_code_id on
> news_cat nc (cost=0.00..55.31 rows=79 width=12)
> -> Seq Scan on news_article_summary n (cost=0.00..416.14
> rows=1898 width=36)

The difference evidently is which table is scanned as the outside of the
nested loop. The first plan says "scan news_article_summary
sequentially, and for each row that passes the WHERE clauses that
mention only that table, probe into news_cat_news_id for the row(s)
that match by news_id; then check the remaining WHERE clauses on the
combined row(s)." The second plan says "probe into news_cat_news_id
for the row(s) that have the requested code_id values, and for each
one scan news_article_summary sequentially to locate matching rows."

If there are a lot of matches for nc.code_id in(4261,4182) then the
second plan would scan news_article_summary many times, which'd account
for it being slow.

Are news_id and code_id unique columns? I am guessing from context
that at least news_id might be. The planner does not seem to know
that, judging from its row-count guesses. Perhaps all you need is
a VACUUM ANALYZE so that the planner becomes aware the column is
unique.

regards, tom lane

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Erich 2000-07-28 20:25:56 PG vs. Oracle for larger databases
Previous Message Tom Lane 2000-07-28 17:36:35 Re: pg_dump & performance degradation