Re: postgres chooses objectively wrong index

From: Alexey Ermakov <alexius(dot)work(at)gmail(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: postgres chooses objectively wrong index
Date: 2026-03-18 05:27:26
Message-ID: 381e7d47-8021-458d-9a64-0cdb80b004f0@gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 2026-03-18 04:52, Merlin Moncure wrote:
> On Tue, Mar 17, 2026 at 4:16 PM Alexey Ermakov
> <alexius(dot)work(at)gmail(dot)com> wrote:
>
> On 2026-03-18 03:01, Merlin Moncure wrote:
>> I've been maintaining an airflow style orchestrator in pl/pgsql,
>> and it's revealed a performance issue I just can't solve.  There
>> is a table, task, which may normally contain billions of rows,
>> but only a tiny portion is interesting for specific reasons—a
>> common pattern in task-type systems.
>>
>> ...
>>
>> I'm wondering if there are other tricks that might apply here,
>> for example, multi column index statistics...curious if anyone
>> has thoughts on that.
>>
>> Any suggestions?
>>
>> merlin
>>
> Hello. I think planner doesn't have information about distribution
> of *async.task_execution_state(task)* unless it's part of any full
> index. I would try to give that with extended statistics
> (postgresql 14+):
>
> create statistics (mcv) task_task_execution_state_stat on ((async.task_execution_state(task))) from async.task;
> analyze async.task;
>
> If that won't help - please show distribution from pg_stats_ext
> view for extended statistic above.
>
>
> This unfortunately fails, probably because the table type includes
> system columns (despite not using them).
> orchestrator_service_user(at)orchestrator=> create statistics  task_stats
> (mcv) on (async.task_execution_state(task)) from async.task;
> ERROR:  statistics creation on system columns is not supported
>
> This would require some refactoring to fix.

Interesting... In that case functional index should help (as it also
makes statistic for the planner):

create index concurrently on task_task_execution_state_idx async.task using btree ((async.task_execution_state(task)));

analyze async.task;

Perhaps multicolumn index will also help for queries but hard to say
without knowing distributions. We could check state distribution info
after index creation and analyze with query like this:

select * from pg_stats where tablename = 'task_task_execution_state_idx' \gx

--

Alexey Ermakov

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Pavel Stehule 2026-03-18 06:35:27 Re: proposal: schema variables
Previous Message Merlin Moncure 2026-03-17 22:52:42 Re: postgres chooses objectively wrong index