Re: Optimizer choosing smaller index instead of right one

From: Ian Ribas <ian(dot)ribas(at)digitro(dot)com(dot)br>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Optimizer choosing smaller index instead of right one
Date: 2004-04-30 17:08:08
Message-ID: 409287F8.5040209@digitro.com.br
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Effectively, I hadn't seen the cast being done on the field
dt_finalizacao! The query actually runs inside a pgplsql procedure, so
the parameters are unknown constants (variables), but the situation got
much much better when I forced a cast on the constant part of the
condition (as follows), so allowing the planner to correctly choose an
index :

explain analyze SELECT dt_finalizacao
FROM cham_servico
WHERE id_chave_grupo = '7458' AND cod_bxs = 1
AND dt_finalizacao > CAST(to_timestamp ('2004-04-20','YYYY-MM-DD') AS
TIMESTAMP(0) WITHOUT TIME ZONE)
AND dt_finalizacao < CAST(now() AS TIMESTAMP(0) WITHOUT TIME ZONE)
ORDER BY dt_finalizacao ASC LIMIT 1;

QUERY
PLAN

------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..364.98 rows=1 width=8) (actual time=52.73..52.90
rows=1 loops=1)
-> Index Scan using xie1cham_servico on cham_servico
(cost=0.00..9756.63 rows=27 width=8) (actual time=52.73..52.89 rows=2
loops=1)
Index Cond: ((dt_finalizacao >
(to_timestamp('2004-04-20'::text, 'YYYY-MM-DD'::text))::timestamp(0)
without time zone) AND (dt_finalizacao < (now())::timestamp(0) without
time zone))
Filter: ((id_chave_grupo = '7458'::character varying) AND
(cod_bxs = 1))
Total runtime: 53.11 msec
(5 rows)

Unfortunately, I really must use timestamp without time zone for some
other unrelated reasons ...

Thank you very much for your attention

Best wishes,
Ian.

Tom Lane wrote:

>Ian Ribas <ian(dot)ribas(at)digitro(dot)com(dot)br> writes:
>
>
>>The index I created to try to optimize it is "xie3cham_servico", and
>>has all the three columns of the table that are used in the query. But
>>the index being used is actually xie2cham_servico (as can be seen
>>below) which has only two columns, one of them completely useless for
>>the query! I'm probably missing some setting, but I couldn't figure
>>out which.
>>
>>
>
>The optimizer's cost equations do say that a smaller index should be
>preferred over a larger one if the number of rows obtainable from the
>index is comparable (which I trust you will agree is reasonable).
>I think the real issue here is a datatype conflict. Look closely
>at the filter conditions:
>
>
>
>> -> Index Scan using xie2cham_servico on cham_servico (cost=0.00..15663.15 rows=26 width=8) (actual time=13604.37..13980.16 rows=3173 loops=1)
>> Index Cond: (id_chave_grupo = '7458'::character varying)
>> Filter: ((cod_bxs = 1) AND ((dt_finalizacao)::timestamp with time zone > to_timestamp('2004-04-20'::text, 'YYYY-MM-DD'::text)) AND ((dt_finalizacao)::timestamp with time zone < now()))
>>
>>
>
>The two comparisons on dt_finalizacao are unindexable because they
>coerce it to a different datatype; the "<" and ">" are the operators on
>timestamp with time zone and so they are not relevant to an index on
>timestamp without time zone.
>
>Very likely your column really ought to be of type timestamp with time
>zone. If you are convinced it should be timestamp without time zone
>then you need to fix the values you are comparing to.
>
>I'm also a bit disturbed by the fact that the rows estimates are off by
>more than two orders of magnitude (26 est vs 3173 actual). It would
>perhaps help if you wrote the boundary date as a simple constant:
>
> ... AND dt_finalizacao > '2004-04-20' AND ...
>
>so that the planner had some chance of comparing the constant to its
>statistics about the distribution of dt_finalizacao. to_timestamp is
>not a constant-foldable function (because it depends on the current
>TimeZone setting) and so the query as you've written it looks like
>
> ... AND dt_finalizacao > unpredictable_expression AND ...
>
>as far as the planner knows. (I'm not sure this will help though;
>it may be that the big problem is that the three columns are not
>independent, which is something the planner doesn't know because it has
>no cross-column statistics.)
>
> regards, tom lane
>
>
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message CS Wagner 2004-04-30 17:17:16 Load Balancing and Backup
Previous Message Kris Jurka 2004-04-30 17:07:28 Re: "Idle in Transaction" and hung connections