| From: | Merlin Moncure <mmoncure(at)gmail(dot)com> |
|---|---|
| To: | Alexey Ermakov <alexius(dot)work(at)gmail(dot)com> |
| Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-performance(at)lists(dot)postgresql(dot)org |
| Subject: | Re: postgres chooses objectively wrong index |
| Date: | 2026-03-18 18:38:07 |
| Message-ID: | CAHyXU0zhdKYDvt_iON5kVSa4AXa=p_vLP1JMs3u0FFA29rP-aQ@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-performance |
On Tue, Mar 17, 2026 at 11:27 PM Alexey Ermakov <alexius(dot)work(at)gmail(dot)com>
wrote:
> On 2026-03-18 04:52, Merlin Moncure wrote:
>
> Hello. I think planner doesn't have information about distribution of
>> *async.task_execution_state(task)* unless it's part of any full index. I
>> would try to give that with extended statistics (postgresql 14+):
>>
>> create statistics (mcv) task_task_execution_state_stat on ((async.task_execution_state(task))) from async.task;
>> analyze async.task;
>>
>> If that won't help - please show distribution from pg_stats_ext view for
>> extended statistic above.
>>
>
> This unfortunately fails, probably because the table type includes system
> columns (despite not using them).
>
> orchestrator_service_user(at)orchestrator=> create statistics task_stats
> (mcv) on (async.task_execution_state(task)) from async.task;
> ERROR: statistics creation on system columns is not supported
>
> This would require some refactoring to fix.
>
> Interesting... In that case functional index should help (as it also makes
> statistic for the planner):
>
> create index concurrently on task_task_execution_state_idx async.task using btree ((async.task_execution_state(task)));
>
> analyze async.task;
>
>
This can't help performance, as the index...
CREATE INDEX ON async.task(concurrency_pool, priority, entered)
WHERE async.task_execution_state(task) = 'READY';
...is very precisely configured to provide exactly what's needed; I need
tasks for that exact pool in that exact order if and only if ready. The
partial predicate is designed to keep the index nice and small since only a
small portion of tasks would be eligible at any specific time.
@Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> I'm pretty sure you were following me, but my
abstraction earlier was a bit off;
Simplified, the query is:
> SELECT * FROM foo WHERE a=? AND b=K ORDER BY c, d LIMIT N;
> CREATE INDEX ON foo(a,b,c) WHERE b=K;
Should have been:
SELECT * FROM foo WHERE a=? AND d=K ORDER BY b, c LIMIT N;
CREATE INDEX ON foo(a,b,c) WHERE d=K;
Point being, the index match in on (=, order, order). If a contains any
less than 100% of the total records, and N is small relative to table size,
this ought to be the ideal index for just about any case, the exact match
on partial qual is just gravy.
I think the planner is not giving enough bonus for an exact match versus an
inexact match on partial index mathcing, (A=A should be better than A
IN(A,B,C)), and it's unclear why the planner things bitmap heap + sort is
outperforming a raw read off the index base on marginal estimated row
counts. Lowering random_page_cost definitely biases the plan I like,
but it skews both estimates.
@Alexey Ermakov <alexius(dot)work(at)gmail(dot)com>
If you're interested in more context, see:
pgasync <https://github.com/merlinm/pgasync>
pgflow <https://github.com/merlinm/pgflow>
graph example <https://imgur.com/a/LZNpTC1>
merlin
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Alexey Ermakov | 2026-03-18 20:12:13 | Re: postgres chooses objectively wrong index |
| Previous Message | Pavel Stehule | 2026-03-18 06:35:27 | Re: proposal: schema variables |