Re: BUG #16251: ::text type casting of a constant breaks query performance

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: imyfess(at)gmail(dot)com, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #16251: ::text type casting of a constant breaks query performance
Date: 2020-02-08 07:28:07
Message-ID: CAFj8pRC_a9xy1x3WNsqpEOy9+Lz0HjKjhqNE1iVn0y=Wg9npBQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

so 8. 2. 2020 v 8:09 odesílatel Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
napsal:

>
>
> so 8. 2. 2020 v 7:44 odesílatel PG Bug reporting form <
> noreply(at)postgresql(dot)org> napsal:
>
>> The following bug has been logged on the website:
>>
>> Bug reference: 16251
>> Logged by: Dima Pavlov
>> Email address: imyfess(at)gmail(dot)com
>> PostgreSQL version: 12.1
>> Operating system: Windows 10
>> Description:
>>
>> Test table and indexes (PostgreSQL 12.1):
>>
>> CREATE TABLE t (dt timestamp with time zone);
>> CREATE INDEX ind ON t USING btree (dt);
>>
>> INSERT
>> INTO t(dt)
>> SELECT
>> (
>> timestamp '2020-01-01 00:00:00' +
>> random() * (
>> timestamp '2020-02-29 00:00:00' -
>> timestamp '2020-01-01 00:00:00'
>> )
>> )
>> FROM generate_series(1, 10000)
>>
>> -------------------------------------
>>
>> In the first query, everything is ok, appropriate index "ind" is used:
>>
>> explain (analyze, buffers)
>> SELECT *
>> FROM t
>> WHERE
>> ('2020-02-08')::date IS NULL
>> OR
>> dt > '2020-02-08'
>> ORDER BY dt
>> LIMIT 1
>>
>> "Limit (cost=0.29..0.37 rows=1 width=8) (actual time=0.186..0.188 rows=1
>> loops=1)"
>> " Buffers: shared hit=3"
>> " -> Index Only Scan using ind on t (cost=0.29..303.75 rows=3627
>> width=8)
>> (actual time=0.184..0.184 rows=1 loops=1)"
>> " Index Cond: (dt > '2020-02-08 00:00:00+05'::timestamp with time
>> zone)"
>> " Heap Fetches: 1"
>> " Buffers: shared hit=3"
>> "Planning Time: 2.365 ms"
>> "Execution Time: 0.239 ms"
>>
>> -----------------------------------------------
>>
>> With '::text' type casting of '2020-02-08' (which is already text) query
>> permofance is very low
>>
>> explain (analyze, buffers)
>> SELECT *
>> FROM t
>> WHERE
>> ('2020-02-08'::text)::date IS NULL
>> OR
>> dt > '2020-02-08'
>> ORDER BY dt
>> LIMIT 1
>>
>> "Limit (cost=0.29..0.44 rows=1 width=8) (actual time=45.306..45.307
>> rows=1
>> loops=1)"
>> " Buffers: shared hit=6232"
>> " -> Index Only Scan using ind on t (cost=0.29..561.28 rows=3658
>> width=8)
>> (actual time=45.304..45.304 rows=1 loops=1)"
>> " Filter: ((('2020-02-08'::cstring)::date IS NULL) OR (dt >
>> '2020-02-08 00:00:00+05'::timestamp with time zone))"
>> " Rows Removed by Filter: 6367"
>> " Heap Fetches: 6368"
>> " Buffers: shared hit=6232"
>> "Planning Time: 0.348 ms"
>> "Execution Time: 45.343 ms"
>>
>
> This is not a bug, but just feature.
>
> The '2020-02-08' is not text type - it is 'unknown' type - and then is
> just directly transformed to date. I think so cast from text to date is not
> maybe immutable, and it can stops some optimizations.
>
>
> postgres=# explain analyze select * from foo where '2020-02-01'::date is
> null or a > '2020-01-01';
>
> ┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
> │ QUERY PLAN
> │
>
> ╞══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
> │ Index Only Scan using foo_a_idx on foo (cost=0.42..9.12 rows=40
> width=8) (actual time=0.083..0.125 rows=37 loops=1) │
> │ Index Cond: (a > '2020-01-01 00:00:00+01'::timestamp with time zone)
> │
> │ Heap Fetches: 37
> │
> │ Planning Time: 0.279 ms
> │
> │ Execution Time: 0.189 ms
> │
>
> └──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
> (5 rows)
>
> postgres=# explain analyze select * from foo where
> '2020-02-01'::text::date is null or a > '2020-01-01';
>
> ┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
> │ QUERY PLAN
> │
>
> ╞═════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
> │ Seq Scan on foo (cost=0.00..2193.00 rows=540 width=8) (actual
> time=0.050..71.671 rows=37 loops=1) │
> │ Filter: ((('2020-02-01'::cstring)::date IS NULL) OR (a > '2020-01-01
> 00:00:00+01'::timestamp with time zone)) │
> │ Rows Removed by Filter: 99963
> │
> │ Planning Time: 0.299 ms
> │
> │ Execution Time: 71.714 ms
> │
>
> └─────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
> (5 rows)
>
> The problem is in casting from Date to Timestamptz - the related function
> "timestamptz" is marked as "stable" - and then probably the all expression
> is marked as "stable", what block to use this expression as index condition.
>

I was wrong - problem is in probably cast from text to date.

> I don't know the context, but the expression "'2020-02-01'::date is null
> or a > '2020-01-01'" has not too much sense, so just don't do this.
> Postgres is not too smart and don't try to reduce some useless part of
> expressions.
>

More times Postgres is more sensitive on data types due special rules of
casting.

> Or if you need it, then use UNION and separate this expression to two
> independent expressions
>
> postgres=# explain analyze select * from foo where
> '2020-02-01'::text::date is null union select * from foo where a >
> '2020-01-01';
>
> ┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
> │ QUERY PLAN
> │
>
> ╞════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
> │ HashAggregate (cost=3202.82..4203.22 rows=100040 width=8) (actual
> time=0.175..1.635 rows=37 loops=1) │
> │ Group Key: foo.a
> │
> │ -> Append (cost=0.01..2952.72 rows=100040 width=8) (actual
> time=0.061..0.120 rows=37 loops=1) │
> │ -> Result (cost=0.01..1443.01 rows=100000 width=8) (actual
> time=0.016..0.017 rows=0 loops=1) │
> │ One-Time Filter: (('2020-02-01'::cstring)::date IS NULL)
> │
> │ -> Seq Scan on foo (cost=0.01..1443.01 rows=100000
> width=8) (never executed) │
> │ -> Index Only Scan using foo_a_idx on foo foo_1
> (cost=0.42..9.12 rows=40 width=8) (actual time=0.043..0.090 rows=37
> loops=1) │
> │ Index Cond: (a > '2020-01-01 00:00:00+01'::timestamp with
> time zone) │
> │ Heap Fetches: 37
> │
> │ Planning Time: 0.437 ms
> │
> │ Execution Time: 6.690 ms
> │
>
> └────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
> (11 rows)
>
>
>
> Regards
>
> Pavel
>

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Marco Lechner privat 2020-02-08 09:00:58 Re: BUG #16245: proj63 from rhel-7 repo depends on sqlite33, but is not available
Previous Message Pavel Stehule 2020-02-08 07:09:11 Re: BUG #16251: ::text type casting of a constant breaks query performance