Order by behaviour

From: Carlos Benkendorf <carlosbenkendorf(at)yahoo(dot)com(dot)br>
To: pgsql-performance(at)postgresql(dot)org
Subject: Order by behaviour
Date: 2005-12-23 12:34:39
Message-ID: 20051223123439.69718.qmail@web35507.mail.mud.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi,

We have more than 200 customers running 8.0.3 and two weeks ago started migration project to 8.1.1.After the first migration to 8.1.1 we had to return back to 8.0.3 because some applications were not working right.

Our user told me that records are not returning more in the correct order, so I started logging and saw that the select clause wasn´t not used with the ORDER BY clause. It seemed a simple problem to be solved.

I asked the programmers that they should add the ORDER BY clause if they need the rows in a certain order and they told me they could not do it because it will cost too much and the response time is bigger than not using ORDER BY. I disagreed with them because there was an index with the same order needed for the order by. Before starting a figth we decided to explain analyze both select types and discover who was right. For my surprise the select with order by was really more expensive than the select without the order by. I will not bet any more...;-)

For some implementation reason in 8.0.3 the query is returning the rows in the correct order even without the order by but in 8.1.1 probably the implementation changed and the rows are not returning in the correct order.

We need the 8.1 for other reasons but this order by behavior stopped the migration project.

Some friends of the list tried to help us and I did some configuration changes like increased work_mem and changed the primary columns from numeric types to smallint/integer/bigint but even so the runtime and costs are far from the ones from the selects without the ORDER BY clause.

What I can not understand is why the planner is not using the same retrieving method with the order by clause as without the order by clause. All the rows are retrieved in the correct order in both methods but one is much cheaper (without order by) than the other (with order by). Should not the planner choice that one?

Can someone explain me why the planner is not choosing the same method used with the selects without the order by clause instead of using a sort that is much more expensive?

Without order by:
explain analyze
SELECT * FROM iparq.ARRIPT
where
(ANOCALC = 2005
and CADASTRO = 19
and CODVENCTO = 00
and PARCELA >= 00 )
or
(ANOCALC = 2005
and CADASTRO = 19
and CODVENCTO > 00 )
or
(ANOCALC = 2005
and CADASTRO > 19 )
or
(ANOCALC > 2005 );
Index Scan using pk_arript, pk_arript, pk_arript, pk_arript on arript (cost=0.00..122255.35 rows=146602 width=897) (actual time=9.303..1609.987 rows=167710 loops=1)
Index Cond: (((anocalc = 2005::numeric) AND (cadastro = 19::numeric) AND (codvencto = 0::numeric) AND (parcela >= 0::numeric)) OR ((anocalc = 2005::numeric) AND (cadastro = 19::numeric) AND (codvencto > 0::numeric)) OR ((anocalc = 2005::numeric) AND (cadastro > 19::numeric)) OR (anocalc > 2005::numeric))
Total runtime: 1712.456 ms
(3 rows)


With order by:
explain analyze
SELECT * FROM iparq.ARRIPT
where
(ANOCALC = 2005
and CADASTRO = 19
and CODVENCTO = 00
and PARCELA >= 00 )
or
(ANOCALC = 2005
and CADASTRO = 19
and CODVENCTO > 00 )
or
(ANOCALC = 2005
and CADASTRO > 19 )
or
(ANOCALC > 2005 )
order by ANOCALC asc, CADASTRO asc, CODVENCTO asc, PARCELA asc;
Sort (cost=201296.59..201663.10 rows=146602 width=897) (actual time=9752.555..10342.363 rows=167710 loops=1)
Sort Key: anocalc, cadastro, codvencto, parcela
-> Index Scan using pk_arript, pk_arript, pk_arript, pk_arript on arript (cost=0.00..122255.35 rows=146602 width=897) (actual time=0.402..1425.085 rows=167710 loops=1)
Index Cond: (((anocalc = 2005::numeric) AND (cadastro = 19::numeric) AND (codvencto = 0::numeric) AND (parcela >= 0::numeric)) OR ((anocalc = 2005::numeric) AND (cadastro = 19::numeric) AND (codvencto > 0::numeric)) OR ((anocalc = 2005::numeric) AND (cadastro > 19::numeric)) OR (anocalc > 2005::numeric))
Total runtime: 10568.290 ms
(5 rows)

Table definition:
Table "iparq.arript"
Column | Type | Modifiers
-------------------+-----------------------+-----------
anocalc | numeric(4,0) | not null
cadastro | numeric(8,0) | not null
codvencto | numeric(2,0) | not null
parcela | numeric(2,0) | not null
inscimob | character varying(18) | not null
codvencto2 | numeric(2,0) | not null
parcela2 | numeric(2,0) | not null
codpropr | numeric(10,0) | not null
dtaven | numeric(8,0) | not null
anocalc2 | numeric(4,0) |
...
...
Indexes:
"pk_arript" PRIMARY KEY, btree (anocalc, cadastro, codvencto, parcela)
"iarchave04" UNIQUE, btree (cadastro, anocalc, codvencto, parcela)
"iarchave02" btree (inscimob, anocalc, codvencto2, parcela2)
"iarchave03" btree (codpropr, dtaven)
"iarchave05" btree (anocalc, inscimob, codvencto2, parcela2)

Best regards and thank you very much in advance,

Carlos Benkendorf


---------------------------------
Yahoo! doce lar. Faça do Yahoo! sua homepage.

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Mario Weilguni 2005-12-23 12:51:12 Re: Order by behaviour
Previous Message Anton Maksimenkov 2005-12-23 09:02:05 DELETE, INSERT vs SELECT, UPDATE || INSERT