a wrong index choose when statistics is out of date

From: Andy Fan <zhihuifan1213(at)163(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: a wrong index choose when statistics is out of date
Date: 2024-03-03 07:01:23
Message-ID: 878r2zeqsp.fsf@163.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


The issue can be reproduced with the following steps:

create table x_events (.., created_at timestamp, a int, b int);

create index idx_1 on t(created_at, a);
create index idx_2 on t(created_at, b);

query:
select * from t where create_at = current_timestamp and b = 1;

index (created_at, a) rather than (created_at, b) may be chosen for the
above query if the statistics think "create_at = current_timestamp" has
no rows, then both index are OK, actually it is true just because
statistics is out of date.

I just run into this again recently and have two new idea this time,
I'd like gather some feedback on this.

1. We can let the user define the column as the value is increased day by
day. the syntax may be:

ALTER TABLE x_events ALTER COLUMN created_at ALWAYS_INCREASED.

then when a query like 'create_at op const', the statistics module can
treat it as 'created_at = $1'. so the missing statistics doesn't make
difference. Then I think the above issue can be avoided.

This is different from letting user using a PreparedStmt directly
because it is possible that we always choose a custom plan, the
easiest way to make this happen is we do a planning time partition
prune.

2. Use some AI approach to forecast the data it doesn't gather yet. The
training stage may happen at analyze stage, take the above case for
example, it may get a model like 'there are 100 rows per second for
the time during 9:00 to 18:00 and there are 2 rows per seconds for
other time range.

For now, I think option 1 may be easier to happen.

--
Best Regards
Andy Fan

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Zhijie Hou (Fujitsu) 2024-03-03 07:51:38 RE: Synchronizing slots from primary to standby
Previous Message Jelte Fennema-Nio 2024-03-03 06:59:15 Re: Make query cancellation keys longer