Re: Partial index plan/cardinality costing

From: James Coleman <jtc331(at)gmail(dot)com>
To: pgsql-performance(at)lists(dot)postgresql(dot)org
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Partial index plan/cardinality costing
Date: 2018-10-08 22:05:19
Message-ID: CAAaqYe80kA2yqmqyWQ-zggKxnMPEb4JaRGzpDvofsmxm4+QYsw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

Bump, and curious if anyone on hackers has any ideas here: of particular
interest is why the (pk, created_at) index can possibly be more valuable
than the (created_at, pk) variant since the former effectively implies
having to scan the entire index.
On Fri, Sep 7, 2018 at 12:17 PM James Coleman <jtc331(at)gmail(dot)com> wrote:

> I have the following tables:
> - m(pk bigserial primary key, status text): with a single row
> - s(pk bigserial primary key, status text, action_at date, m_fk bigint):
> * 80% of the data has action_at between the current date and 1 year ago
> and status of E or C
> * 20% of the data has action_at between 5 days ago and 25 days into the
> future and status of P, PD, or A
>
> I have two partial indexes:
> - s_pk_action_at on s(pk, action_at) where status in ('P', 'PD', 'A')
> - s_action_at_pk on s(action_at, pk) where status in ('P', 'PD', 'A')
>
> With the query:
> SELECT s.pk FROM s
> INNER JOIN m ON m.pk = s.m_fk
> WHERE
> s.status IN ('A', 'PD', 'P')
> AND (action_at <= '2018-09-06')
> AND s.status IN ('A', 'P')
> AND m.status = 'A';
>
> I generally expect the index s_action_at_pk to always be preferred over
> s_pk_action_at. And on stock Postgres it does in fact use that index (with
> a bitmap index scan).
>
> We like to set random_page_cost = 2 since we use fast SSDs only. With that
> change Postgres strongly prefers the index s_pk_action_at unless I both
> disable the other index and turn off bitmap heap scans.
>
> I'm attaching the following plans:
> - base_plan.txt: default costs; both indexes available
> - base_plan_rpc2.txt: random_page_cost = 2; both indexes available
> - inddisabled_plan_rpc2.txt: random_page_cost = 2; only s_action_at_pk
> available
> - inddisabled_bhsoff_plan_rpc2.txt: random_page_cost = 2;
> enable_bitmapscan = false; only s_action_at_pk available
>
> A couple of questions:
> - How is s_pk_action_at ever efficient to scan? Given that the highest
> cardinality (primary key) column is first, wouldn't an index scan
> effectively have to scan the entire index?
> - Why does index scan on s_action_at_pk reads over 2x as many blocks as
> the bitmap heap scan with the same index?
> - Would you expect Postgres to generally always prefer using the
> s_action_at_pk index over the s_pk_action_at index for this query? I
> realize changing the random page cost is part of what's driving this, but I
> still can't imagine reading the full s_pk_action_at index (assuming that's
> what it is doing) could ever be more valuable.
>
> As a side note, the planner is very bad at understanding a query that
> happens (I realize you wouldn't write this by hand, but ORMs) when you have
> a where clause like:
> s.status IN ('A', 'PD', 'P') AND s.status IN ('A', 'P')
> the row estimates are significantly different from a where clause with
> only:
> s.status IN ('A', 'P')
> even though semantically those are identical.
>
>
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2018-10-08 23:54:19 Re: Relax transactional restrictions on ALTER ENUM ... ADD TYPE (redux)
Previous Message Peter Geoghegan 2018-10-08 20:44:20 Re: [HACKERS] [WIP] Zipfian distribution in pgbench

Browse pgsql-performance by date

  From Date Subject
Next Message Justin Pryzby 2018-10-09 00:49:44 Re: Partial index plan/cardinality costing
Previous Message Paul McGarry 2018-10-08 21:34:51 Re: Why the index is not used ?