Re: Query planner skipping index depending on DISTINCT parameter order (2)

From: Dilyan Palauzov <dilyan(dot)palauzov(at)aegee(dot)org>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: Query planner skipping index depending on DISTINCT parameter order (2)
Date: 2017-09-17 00:30:11
Message-ID: a68737c6-601f-6846-2ba0-3795c942c81f@aegee.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hello Tomas,

thanks for your answer.

While adding an ORDER BY uses the index, it does not help, as the user
has to take care of the order of provided columns. Whether the care is
is taken in DISTINCT ON or in ORDER BY is secondary.

The behaviour behind DISTINCT and indexes surprises me, as the query
planner does reorder the columns for SELECT to determine the most
suitable index.

My proposal to reflect this:

diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -1069,7 +1069,10 @@ SELECT DISTINCT ON (location) location, time, report
<para>
Currently, <literal>FOR NO KEY UPDATE</>, <literal>FOR UPDATE</>,
<literal>FOR SHARE</> and <literal>FOR KEY SHARE</> cannot be
- specified with <literal>DISTINCT</literal>.
+ specified with <literal>DISTINCT</literal>. Contrary to
+ <literal>SELECT</>, which reorders its parameters to find a best,
+ matching index, DISTINCT ON constructs an expression, e.g. from the
+ provided rows, and checks then if an index can serve the expression.
</para>
</refsect2>

Please ignore my email from Friday, I have forgotten an ON after DISTINCT.

Greetings
Dilyan

On 09/16/2017 10:23 AM, Tomas Vondra wrote:
> Hello Dilyan,
>
> You're right - we're currently not able to use the index if it's not
> consistent with the DISTINCT ordering. That is, if you have index on
> (a,b) and DISTINCT ON (b,a) we fail to leverage the index.
>
> The reason for this simple - if you look at create_distinct_paths [1],
> which is where the Unique path comes from, you'll see it iterates over
> all paths and compares the ordering using pathkeys_is_contained [2].
>
> That however only ensures the path matches the expected Unique ordering
> (determined by the column list in DISTINCT ON clause), we don't try to
> re-shuffle the columns in any way at this point.
>
> So this is more a missing optimization than a bug, I'd guess. But it
> seems worthwhile and possibly not extremely difficult to implement, so I
> may look into it - but that's PG11 at the earliest.
>
> But, looking at the code in create_distinct_paths, ISTM you can easily
> convince the planner to use the index by simply adding a matching ORDER
> BY clause. That is
>
> SELECT DISTINCT ON(token, id) token FROM bayes_token
> ORDER BY id, token;
>
> should be able to use the index on (id,token).
>
>
> [1]
> https://github.com/postgres/postgres/blob/master/src/backend/optimizer/plan/planner.c#L4725
>
> [2]
> https://github.com/postgres/postgres/blob/master/src/backend/optimizer/plan/planner.c#L4811
>
>
> regards
>

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tomas Vondra 2017-09-17 16:39:27 Re: Query planner skipping index depending on DISTINCT parameter order (2)
Previous Message Bruce Momjian 2017-09-16 21:02:06 Re: BUG #14798: postgres user superuser changed