Partial index plan/cardinality costing

From: James Coleman <jtc331(at)gmail(dot)com>
To: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Partial index plan/cardinality costing
Date: 2018-09-07 16:17:25
Message-ID: CAAaqYe8OdYVsNRMHuO865Y7KLaZWpggaiQ=dpY+2Uq94KOnMjQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

I have the following tables:
- m(pk bigserial primary key, status text): with a single row
- s(pk bigserial primary key, status text, action_at date, m_fk bigint):
* 80% of the data has action_at between the current date and 1 year ago
and status of E or C
* 20% of the data has action_at between 5 days ago and 25 days into the
future and status of P, PD, or A

I have two partial indexes:
- s_pk_action_at on s(pk, action_at) where status in ('P', 'PD', 'A')
- s_action_at_pk on s(action_at, pk) where status in ('P', 'PD', 'A')

With the query:
SELECT s.pk FROM s
INNER JOIN m ON m.pk = s.m_fk
WHERE
s.status IN ('A', 'PD', 'P')
AND (action_at <= '2018-09-06')
AND s.status IN ('A', 'P')
AND m.status = 'A';

I generally expect the index s_action_at_pk to always be preferred over
s_pk_action_at. And on stock Postgres it does in fact use that index (with
a bitmap index scan).

We like to set random_page_cost = 2 since we use fast SSDs only. With that
change Postgres strongly prefers the index s_pk_action_at unless I both
disable the other index and turn off bitmap heap scans.

I'm attaching the following plans:
- base_plan.txt: default costs; both indexes available
- base_plan_rpc2.txt: random_page_cost = 2; both indexes available
- inddisabled_plan_rpc2.txt: random_page_cost = 2; only s_action_at_pk
available
- inddisabled_bhsoff_plan_rpc2.txt: random_page_cost = 2; enable_bitmapscan
= false; only s_action_at_pk available

A couple of questions:
- How is s_pk_action_at ever efficient to scan? Given that the highest
cardinality (primary key) column is first, wouldn't an index scan
effectively have to scan the entire index?
- Why does index scan on s_action_at_pk reads over 2x as many blocks as the
bitmap heap scan with the same index?
- Would you expect Postgres to generally always prefer using the
s_action_at_pk index over the s_pk_action_at index for this query? I
realize changing the random page cost is part of what's driving this, but I
still can't imagine reading the full s_pk_action_at index (assuming that's
what it is doing) could ever be more valuable.

As a side note, the planner is very bad at understanding a query that
happens (I realize you wouldn't write this by hand, but ORMs) when you have
a where clause like:
s.status IN ('A', 'PD', 'P') AND s.status IN ('A', 'P')
the row estimates are significantly different from a where clause with only:
s.status IN ('A', 'P')
even though semantically those are identical.

Attachment Content-Type Size
inddisabled_bhsoff_plan_rpc2.txt text/plain 963 bytes
base_plan.txt text/plain 1.2 KB
base_plan_rpc2.txt text/plain 981 bytes
inddisabled_plan_rpc2.txt text/plain 1.2 KB
query.sql application/sql 538 bytes
setup.sql application/sql 963 bytes

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2018-09-07 16:27:28 Re: pgsql: Refactor dlopen() support
Previous Message Maksim Milyutin 2018-09-07 15:59:34 Re: Hint to set owner for tablespace directory

Browse pgsql-performance by date

  From Date Subject
Next Message padusuma 2018-09-07 17:04:02 Performance of INSERT into temporary tables using psqlODBC driver
Previous Message Felix Geisendörfer 2018-09-07 15:56:30 GIN Index has O(N^2) complexity for array overlap operator?