Re: FETCH FIRST clause WITH TIES option

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Surafel Temesgen <surafel3000(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, andrew(at)tao11(dot)riddles(dot)org(dot)uk, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: FETCH FIRST clause WITH TIES option
Date: 2019-01-02 15:19:37
Message-ID: e5c5129c-2518-4a5d-fc28-0a080a08526f@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 1/2/19 11:51 AM, Surafel Temesgen wrote:
>
>
> On Tue, Jan 1, 2019 at 8:38 PM Tomas Vondra
> <tomas(dot)vondra(at)2ndquadrant(dot)com <mailto:tomas(dot)vondra(at)2ndquadrant(dot)com>> wrote:
>
> >
> >     The attached patch include all the comment given by Tomas and i
> >     check sql standard about LIMIT and this feature
> >
>
> Unfortunately, it seems the "limit" regression tests fail for some
> reason - the output mismatches the expected results for some reason. It
> seems as if the WITH TIES code affects ordering of the results within
> the group. See the attached file.
>
>
> Yes the reason is the order of returned row is not always the same. I
> remove other columns from the result set to get constant result
>

Thanks, that seems reasonable.

After looking at the "FETCH FIRST ... PERCENT" patch, I wonder if this
patch should tweak estimates in some way. Currently, the cardinality
estimate is the same as for plain LIMIT, using the requested number of
rows. But let's say there are very few large groups - that will
naturally increase the number of rows produced.

As an example, let's say the subplan produces 1M rows, and there are
1000 groups (when split according to the ORDER BY clause). Consider a
query with "FETCH FIRST 10 ROWS WITH TIES". AFAICS the current estimate
will be 10, but in fact we know that it's likely to produce ~1000 rows
(because that's the average group size).

So I think the patch should tweak the estimate to be

limitCount + (avgGroupSize/2);

or perhaps

Max(avgGroupSize, limitCount + (avgGroupSize/2))

The 1/2 is there because we don't know where the group starts.

Of course, using average group size like this is rather crude, but it's
about the best thing we can do. In principle, increasing the cardinality
estimate is the right thing to do.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2019-01-02 15:20:06 Re: explain plans with information about (modified) gucs
Previous Message Peter Eisentraut 2019-01-02 15:07:40 Re: chained transactions