| From: | Alexey Ermakov <alexius(dot)work(at)gmail(dot)com> |
|---|---|
| To: | Merlin Moncure <mmoncure(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
| Cc: | pgsql-performance(at)lists(dot)postgresql(dot)org |
| Subject: | Re: postgres chooses objectively wrong index |
| Date: | 2026-03-18 05:27:26 |
| Message-ID: | 381e7d47-8021-458d-9a64-0cdb80b004f0@gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-performance |
On 2026-03-18 04:52, Merlin Moncure wrote:
> On Tue, Mar 17, 2026 at 4:16 PM Alexey Ermakov
> <alexius(dot)work(at)gmail(dot)com> wrote:
>
> On 2026-03-18 03:01, Merlin Moncure wrote:
>> I've been maintaining an airflow style orchestrator in pl/pgsql,
>> and it's revealed a performance issue I just can't solve. There
>> is a table, task, which may normally contain billions of rows,
>> but only a tiny portion is interesting for specific reasons—a
>> common pattern in task-type systems.
>>
>> ...
>>
>> I'm wondering if there are other tricks that might apply here,
>> for example, multi column index statistics...curious if anyone
>> has thoughts on that.
>>
>> Any suggestions?
>>
>> merlin
>>
> Hello. I think planner doesn't have information about distribution
> of *async.task_execution_state(task)* unless it's part of any full
> index. I would try to give that with extended statistics
> (postgresql 14+):
>
> create statistics (mcv) task_task_execution_state_stat on ((async.task_execution_state(task))) from async.task;
> analyze async.task;
>
> If that won't help - please show distribution from pg_stats_ext
> view for extended statistic above.
>
>
> This unfortunately fails, probably because the table type includes
> system columns (despite not using them).
> orchestrator_service_user(at)orchestrator=> create statistics task_stats
> (mcv) on (async.task_execution_state(task)) from async.task;
> ERROR: statistics creation on system columns is not supported
>
> This would require some refactoring to fix.
Interesting... In that case functional index should help (as it also
makes statistic for the planner):
create index concurrently on task_task_execution_state_idx async.task using btree ((async.task_execution_state(task)));
analyze async.task;
Perhaps multicolumn index will also help for queries but hard to say
without knowing distributions. We could check state distribution info
after index creation and analyze with query like this:
select * from pg_stats where tablename = 'task_task_execution_state_idx' \gx
--
Alexey Ermakov
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Pavel Stehule | 2026-03-18 06:35:27 | Re: proposal: schema variables |
| Previous Message | Merlin Moncure | 2026-03-17 22:52:42 | Re: postgres chooses objectively wrong index |