Re: CTE query plan ignores selective index

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Patrick Krecker <patrick(at)judicata(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: CTE query plan ignores selective index
Date: 2014-12-02 22:07:03
Message-ID: 1727.1417558023@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Patrick Krecker <patrick(at)judicata(dot)com> writes:
> -> Nested Loop (cost=0.57..6148400.28 rows=1 width=24)
> (actual time=0.063..4.054 rows=88 loops=9)
> -> WorkTable Scan on path path_1 (cost=0.00..33.40
> rows=1670 width=16) (actual time=0.000..0.006 rows=112 loops=9)
> -> Index Scan using component_document_id on
> component t_1 (cost=0.57..3681.65 rows=1 width=16) (actual
> time=0.023..0.036 rows=1 loops=1007)
> Index Cond: (document_id = path_1.document_id)
> Filter: (path_1.parent_id = internal_id)
> Rows Removed by Filter: 237

> I would think that it has decided that the document_id index is not
> very selective for the given mix of rows, however I checked the
> statistics for the table and I found that n_distinct for document_id
> is 101559 (the true value is 162545). The value of pg_class.reltuples
> for the table is 96055600, which is very close to the true value
> 94613537.

> In the first query, it appears to me that postgres thinks the index
> scan is much more expensive than it really is. However, given the
> accurate statistics, I can't see how.

I think the problem is that it doesn't have any stats for the output of
path_1, so it's probably falling back on some rather generic assumptions
about how many component rows will match each of the two join conditions.
That causes it to think that the indexscan will reject a lot of rows at
the filter step and therefore be expensive. Possibly that could be
improved, but it won't happen overnight.

The most expeditious way to fix this would likely be to provide an
index on component(document_id, internal_id). The planner should
then think an indexscan on that is cheap, regardless of whether the
check on internal_id is really doing much of anything.

regards, tom lane

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Josh Berkus 2014-12-04 19:10:54 Re: issue in postgresql 9.1.3 in using arrow key in Solaris platform
Previous Message Patrick Krecker 2014-12-02 20:59:23 CTE query plan ignores selective index