Re: FETCH FIRST clause WITH TIES option

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: Surafel Temesgen <surafel3000(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: FETCH FIRST clause WITH TIES option
Date: 2018-10-30 01:12:16
Message-ID: 73660663-d1c6-12cc-200f-315616560a93@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 10/29/2018 05:47 PM, Andrew Gierth wrote:
>>>>>> "Tomas" == Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> writes:
>
> >> I still think that this is the wrong approach. Implementing WITH
> >> TIES and PERCENT together using an implicit window function call
> >> kills two birds with one very small stone (the only executor change
> >> needed would be teaching LIMIT to be able to stop on a boolean
> >> condition), with maximum reuse of existing facilities.
>
> Tomas> Hmmm, maybe. How would that work, exactly? Wouldn't that mean
> Tomas> extra overhead (the window functions are hardly free) and
> Tomas> limitations?
>
> They're not free, but then this case probably shouldn't be treated as a
> particularly hot code path.
>
> The basic idea is this: extend the Limit node to be able to stop on a
> boolean expression, such that the first false value ends the result.
>
> Then FETCH FIRST N WITH TIES becomes "stop when the expression
> rank() over (order by ...) <= N
> is no longer true" (where the ... is the existing top level order by)
>
> and FETCH FIRST N PERCENT is the same but with percent_rank (or
> cume_dist, I'd have to check the exact semantics)
>
> rank() doesn't need to read ahead in the input. percent_rank of course
> does, but it's clearly impossible to implement PERCENT without doing
> that.
>

OK. I was under the impression the window function would need to see all
the input rows, but that seems not to be the case in general.

> Also, one could consider extending LIMIT to support arbitrary
> expressions, with some syntax like LIMIT WHEN (condition) which would be
> the general form.
>

Hmmm. I can't really recall needing such thing, but of course - that
does not prove it'd not be a good thing.

> Tomas> FWIW, I doubt the patch can be much smaller/simpler - a
> Tomas> significant part of the new stuff is in gram.y and node read/out
> Tomas> infrastructure, the changes to LIMIT node are fairly minimal.
>
> It's not a matter of patch size as such, but reuse of mechanisms rather
> than adding new ones. Also doing WITH TIES as a special case in the
> Limit node doesn't make adding PERCENT any easier at all, whereas the
> above gets it basically for free.
>

Makes sense, I guess. At first I was thinking that this certainly does
not add more new mechanisms than allowing LIMIT to terminate on boolean
expression. But you're right about the WITH PERCENT part - using window
functions would make adding this much simpler.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2018-10-30 01:33:18 Re: partition tree inspection functions
Previous Message Amit Langote 2018-10-30 00:53:54 Re: Speeding up INSERTs and UPDATEs to partitioned tables