Re: Query planner skipping index depending on DISTINCT parameter order (2)

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Dilyan Palauzov <dilyan(dot)palauzov(at)aegee(dot)org>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: Query planner skipping index depending on DISTINCT parameter order (2)
Date: 2017-09-19 14:10:23
Message-ID: ecf623e7-04f5-e20a-4301-8e056c398eba@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On 09/17/2017 07:15 PM, Tom Lane wrote:
> Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> writes:
>> On 09/17/2017 02:30 AM, Dilyan Palauzov wrote:
>>> The behaviour behind DISTINCT and indexes surprises me, as the
>>> query planner does reorder the columns for SELECT to determine the
>>> most suitable index.
>
>> Well, I agree it's somewhat reasonable optimization. The thing is, the
>> planner/optimizer does not start with all features on day 1, it gets
>> improved over time. And no one implemented this bit yet.
>
> For the DISTINCT ON case, the user-visible semantics are actually pretty
> tightly tied to ORDER BY, so that it would not be very reasonable to
> consider any other orderings than the given column order anyway.
>

Tied in what sense? In the docs we explicitly say this:

https://www.postgresql.org/docs/10/static/sql-select.html#sql-distinct

SELECT DISTINCT ON ( expression [, ...] ) keeps only the first row of
each set of rows where the given expressions evaluate to equal. The
DISTINCT ON expressions are interpreted using the same rules as for
ORDER BY (see above). Note that the “first row” of each set is
unpredictable unless ORDER BY is used to ensure that the desired row
appears first.

which in my understanding is that while we use the same rules as ORDER
BY, we don't guarantee any particular ordering (i.e. which row we keep)
unless an explicit ORDER BY clause is used.

So if an ORDER BY is not specified, why couldn't we pick an arbitrary
ordering matching based on available indexes?

> For plain DISTINCT, yeah we could consider other orderings ... but
> we're rather unlikely to find an index that matches all the output
> columns, regardless of what order they're in. So it's just not that
> exciting.
>

Not necessarily. For example if we get the incremental sort in, we might
pick from a much wider set of indexes.

> IOW, somebody might get around to this someday, but don't hold your
> breath; there's lots of higher-value fruit to be reaching for.
>

Sure. But the perceived value really depends on the user - what's
worthless for one user may be quite valuable for another one.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2017-09-19 15:35:42 Re: BUG #14819: postgres_fwd could not load library
Previous Message fte 2017-09-19 09:14:02 BUG #14819: postgres_fwd could not load library