Re: postgres chooses objectively wrong index

From: Alexey Ermakov <alexius(dot)work(at)gmail(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: postgres chooses objectively wrong index
Date: 2026-03-18 20:12:13
Message-ID: 37499019-dab7-4a6b-933a-d78f88d2664c@gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 2026-03-19 00:38, Merlin Moncure wrote:
> On Tue, Mar 17, 2026 at 11:27 PM Alexey Ermakov
> <alexius(dot)work(at)gmail(dot)com> wrote:
>
> On 2026-03-18 04:52, Merlin Moncure wrote:
>>
>> Hello. I think planner doesn't have information about
>> distribution of *async.task_execution_state(task)* unless
>> it's part of any full index. I would try to give that with
>> extended statistics (postgresql 14+):
>>
>> create statistics (mcv) task_task_execution_state_stat on ((async.task_execution_state(task))) from async.task;
>> analyze async.task;
>>
>> If that won't help - please show distribution from
>> pg_stats_ext view for extended statistic above.
>>
>>
>> This unfortunately fails, probably because the table type
>> includes system columns (despite not using them).
>> orchestrator_service_user(at)orchestrator=> create statistics
>>  task_stats (mcv) on (async.task_execution_state(task)) from
>> async.task;
>> ERROR:  statistics creation on system columns is not supported
>>
>> This would require some refactoring to fix.
>
> Interesting... In that case functional index should help (as it
> also makes statistic for the planner):
>
> create index concurrently on task_task_execution_state_idx async.task using btree ((async.task_execution_state(task)));
>
> analyze async.task;
>
>
> This can't help performance, as the index...
> CREATE INDEX ON async.task(concurrency_pool, priority, entered)
> WHERE async.task_execution_state(task) = 'READY';
>
> ...is very precisely configured to provide exactly what's needed; I
> need tasks for that exact pool in that exact order if and only if
> ready.  The partial predicate is designed to keep the index nice and
> small since only a small portion of tasks would be eligible at any
> specific time.

The index I suggested was not intended to be used by such queries, it's
only a way to provide statistics for the planner as `create statistics`
on expression is not working here.

It might not be enough and require additional columns (in that case it
will be replacement for your index) or perhaps elevated statistics
target. It could even make things worse but I'm sure you have ways to
test that before putting it on an important database.

It should help if planner underestimate number of rows but even then
total estimation won't be perfect when we have 2 conditions on state
that obviously statistically dependent. Combining both conditions on
application side would work much better if that is possible...

What would also might help there - output of `explain (analyze,
buffers)` of a query that really had a bad plan and executes in seconds
with sizes of indexes. And same output but with `set enable_sort = off`
to see plan that supposed to be better. And just in case number of
live/dead tuples in that table from pg_stat_user_tables.

--

Alexey Ermakov

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Merlin Moncure 2026-03-18 22:42:29 Re: postgres chooses objectively wrong index
Previous Message Merlin Moncure 2026-03-18 18:38:07 Re: postgres chooses objectively wrong index