BUG #7629: Suboptimal query plan when index search is possible and an additional search operator is given.

From: dmigowski(at)ikoffice(dot)de
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #7629: Suboptimal query plan when index search is possible and an additional search operator is given.
Date: 2012-10-30 11:49:21
Message-ID: E1TTAJt-0004X5-Gw@wrigleys.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 7629
Logged by: Daniel Migowski
Email address: dmigowski(at)ikoffice(dot)de
PostgreSQL version: 9.1.2
Operating system: Debian Linux 5.0
Description:

Hi!

PostgreSQL chooses to seq scan a table when I give an additional sort
operator, and I don't know why. This is the small example:

I have a table "delivery". There is a functional, partial index on the table
I'd like to use for querying:

CREATE INDEX idx_delivery_naturalnumber
ON delivery(prep_natural_sort(number))
WHERE recordstatus < 'X';

The field "number" is not null, btw. Then I do a query

select d.id, d.number
from Delivery d
where d.recordStatus<'X'
and (d.tenant_id=1 or d.tenant_id=2)
order by prep_natural_sort(d.number) ASC
limit 1

The where-clause matches the partial index, and the order by clause matches
the index fields, so no problem here.

"Limit (cost=0.00..0.34 rows=1 width=12)"
" Output: id, number, (prep_natural_sort((number)::text))"
" -> Index Scan using idx_delivery_naturalnumber on public.delivery d
(cost=0.00..46655.30 rows=135680 width=12)"
" Output: id, number, prep_natural_sort((number)::text)"
" Filter: ((d.tenant_id = 1) OR (d.tenant_id = 2))"

However, when I also want to order by id:

select d.id, d.number
from Delivery d
where d.recordStatus<'X'
and (d.tenant_id=1 or d.tenant_id=2)
order by prep_natural_sort(d.number) ASC,
d.id ASC
limit 1

it does a sequential scan.

"Limit (cost=43443.23..43443.24 rows=1 width=12)"
" Output: id, number, (prep_natural_sort((number)::text))"
" -> Sort (cost=43443.23..43782.43 rows=135680 width=12)"
" Output: id, number, (prep_natural_sort((number)::text))"
" Sort Key: (prep_natural_sort((d.number)::text)), d.id"
" -> Seq Scan on public.delivery d (cost=0.00..42764.83
rows=135680 width=12)"
" Output: id, number, prep_natural_sort((number)::text)"
" Filter: ((d.recordstatus < 'X'::recordstatus) AND
((d.tenant_id = 1) OR (d.tenant_id = 2)))"

This is a bit stupid, also because the relevant data could be fetched very
fast by the first order-by expression, and then the results could be ordered
again, which is then much faster than doing a full sequential scan on the
data. Do you think its worth to optimize for this case? What are the
problems? If the index scan is considered fast in the first case, why not do
it also in the second case? The only difference would be that one had to
fetch a few more rows in the index scan potentially, because the secondary
sort would need to impose an order on the first rows returned from the
index.

select VERSION();

"PostgreSQL 9.1.2 on i686-pc-linux-gnu, compiled by gcc (Debian 4.3.2-1.1)
4.3.2, 32-bit"

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message r d 2012-10-30 13:29:09 fuzzystrmatch module buggy? observations
Previous Message gtsal 2012-10-30 10:54:53 BUG #7628: Installation of PostgreSQL 9.2.1 on Windows 7 may take too long