Re: BUG #17540: Prepared statement: PG switches to a generic query plan which is consistently much slower

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, william(dot)duclot(at)gmail(dot)com, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #17540: Prepared statement: PG switches to a generic query plan which is consistently much slower
Date: 2022-07-07 04:36:56
Message-ID: 20220707043656.6tug2hfauxbizmi3@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

Hi,

On 2022-07-06 23:13:18 -0400, Tom Lane wrote:
> Andres Freund <andres(at)anarazel(dot)de> writes:
> > I think the cost for the slow plan being so much cheaper can almost be
> > qualified as bug.
> > The slow plan seems pretty nonsensical to me. ISTM that something in the
> > costing there is at least almost broken.
>
> I think this is probably an instance of the known problem that a generic
> plan is made without knowledge of the actual parameter values, and that
> can lead us to make statistical assumptions that are not valid for the
> actual values, but nonetheless make one plan look cheaper than another
> even though the opposite is true given the actual values. In essence,
> comparing the cost estimate for the generic plan to the cost estimate
> for a custom plan is not really logically valid, because those estimates
> are founded on different statistics. I don't know how to fix that :-(.

I think there's something more fundamentally wrong - somehow we end up with
assuming > 50% selectivity on both the min and the max initplan, for the same
condition! And afaics (although it's a bit hard to see with the precision
explain prints floating point values as) don't charge cpu_operator_cost /
cpu_tuple_cost. And this is on a table where we can know, despite not know the
parameter value, that the column being compared has a correlation of 1.

In this case the whole generic plan part seems like a red herring. The generic
plan is *awful* and would still be awful if the value were known, but
somewhere around the middle of the value range.

Here's the op's tables + query, but without the prepared statement part:

CREATE TABLE relation_tuple_transaction (
id BIGSERIAL NOT NULL UNIQUE,
timestamp TIMESTAMP WITHOUT TIME ZONE NOT NULL UNIQUE,
CONSTRAINT pk_rttx PRIMARY KEY (id)
);
CREATE INDEX ix_relation_tuple_transaction_by_timestamp on relation_tuple_transaction(timestamp);
INSERT INTO relation_tuple_transaction(timestamp) SELECT * FROM generate_series
( now() - interval '3 days'
, now()
, '1 second'::interval) dd
;
vacuum freeze analyze;
EXPLAIN ANALYZE SELECT MIN(id), MAX(id) from relation_tuple_transaction WHERE timestamp >= (now() - interval '1.5 days');

postgres[631148][1]=# EXPLAIN ANALYZE SELECT MIN(id), MAX(id) from relation_tuple_transaction WHERE timestamp >= (now() - interval '1.5 days');;

Result (cost=1.01..1.02 rows=1 width=16) (actual time=113.379..113.381 rows=1 loops=1)
InitPlan 1 (returns $0)
-> Limit (cost=0.42..0.50 rows=1 width=8) (actual time=113.347..113.348 rows=1 loops=1)
-> Index Scan using pk_rttx on relation_tuple_transaction (cost=0.42..10741.45 rows=127009 width=8) (actual time=113.345..113.345 rows=1 loops=1)
Index Cond: (id IS NOT NULL)
Filter: ("timestamp" >= (now() - '1 day 12:00:00'::interval))
Rows Removed by Filter: 129746
InitPlan 2 (returns $1)
-> Limit (cost=0.42..0.50 rows=1 width=8) (actual time=0.024..0.024 rows=1 loops=1)
-> Index Scan Backward using pk_rttx on relation_tuple_transaction relation_tuple_transaction_1 (cost=0.42..10741.45 rows=127009 width=8) (actual time=0.023..0.023 rows=1 loops=1)
Index Cond: (id IS NOT NULL)
Filter: ("timestamp" >= (now() - '1 day 12:00:00'::interval))
Planning Time: 0.370 ms
Execution Time: 113.441 ms
(14 rows)

We're pretty much by definition scanning half the table via the index scans,
and end up with a cost of 1.02 (yes, aware that the paths are costed
separately).

FWIW, manually writing the min/max as ORDER BY timestamp ASC/DESC LIMIT 1
queries yields a *vastly* better plan:

EXPLAIN ANALYZE SELECT (SELECT id FROM relation_tuple_transaction WHERE timestamp >= (now() - interval '1.5 days') ORDER BY timestamp ASC LIMIT 1), (SELECT id FROM relation_tuple_transaction WHERE timestamp >= (now() - interval '1.5 days') ORDER BY timestamp DESC LIMIT 1);

Result (cost=0.92..0.93 rows=1 width=16) (actual time=0.110..0.111 rows=1 loops=1)
InitPlan 1 (returns $0)
-> Limit (cost=0.42..0.46 rows=1 width=16) (actual time=0.079..0.079 rows=1 loops=1)
-> Index Scan using ix_relation_tuple_transaction_by_timestamp on relation_tuple_transaction (cost=0.42..4405.46 rows=129602 width=16) (actual time=0.077..0.078 rows=1 loops=1)
Index Cond: ("timestamp" >= (now() - '1 day 12:00:00'::interval))
InitPlan 2 (returns $1)
-> Limit (cost=0.42..0.46 rows=1 width=16) (actual time=0.028..0.028 rows=1 loops=1)
-> Index Scan Backward using ix_relation_tuple_transaction_by_timestamp on relation_tuple_transaction relation_tuple_transaction_1 (cost=0.42..4405.46 rows=129602 width=16) (actual time=0.027..0.027 rows=1 loops=1)
Index Cond: ("timestamp" >= (now() - '1 day 12:00:00'::interval))
Planning Time: 0.270 ms
Execution Time: 0.159 ms (11 rows)

And it stays sane even if you add a (redundantly evaluated) AND id IS NOT NULL.

EXPLAIN SELECT id FROM relation_tuple_transaction WHERE timestamp >= (now() - interval '1.5 days') AND id IS NOT NULL ORDER BY timestamp ASC LIMIT 1;
QUERY PLAN
Limit (cost=0.42..0.46 rows=1 width=16)
-> Index Scan using ix_relation_tuple_transaction_by_timestamp on relation_tuple_transaction (cost=0.42..4405.46 rows=129602 width=16)
Index Cond: ("timestamp" >= (now() - '1 day 12:00:00'::interval))
Filter: (id IS NOT NULL)
(4 rows)

EXPLAIN SELECT min(id) FROM relation_tuple_transaction WHERE timestamp >= (now() - interval '1.5 days');
QUERY PLAN
Result (cost=0.50..0.51 rows=1 width=8)
InitPlan 1 (returns $0)
-> Limit (cost=0.42..0.50 rows=1 width=8)
-> Index Scan using pk_rttx on relation_tuple_transaction (cost=0.42..10741.45 rows=129602 width=8)
Index Cond: (id IS NOT NULL)
Filter: ("timestamp" >= (now() - '1 day 12:00:00'::interval))
(6 rows)

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Michael Paquier 2022-07-07 05:04:13 Re: pg_upgrade (12->14) fails on aggregate
Previous Message Tom Lane 2022-07-07 03:50:48 Re: BUG #17540: Prepared statement: PG switches to a generic query plan which is consistently much slower

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2022-07-07 04:56:39 Re: Bump MIN_WINNT to 0x0600 (Vista) as minimal runtime in 16~
Previous Message Tom Lane 2022-07-07 03:50:48 Re: BUG #17540: Prepared statement: PG switches to a generic query plan which is consistently much slower