Re: FETCH FIRST clause PERCENT option

From: Surafel Temesgen <surafel3000(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Ryan Lambert <ryan(at)rustprooflabs(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Vik Fearing <vik(dot)fearing(at)2ndquadrant(dot)com>, Mark Dilger <hornschnorter(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Subject: Re: FETCH FIRST clause PERCENT option
Date: 2019-09-06 11:16:22
Message-ID: CALAY4q9YMMY0VyKgrLhcw-bJOD3hNL-wq-Z_8z4rCMnqe-ozAw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Tom,
On Fri, Sep 6, 2019 at 1:26 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Surafel Temesgen <surafel3000(at)gmail(dot)com> writes:
> > [ percent-incremental-v8.patch ]
>
> I took a quick look through this.
>
> * Why is this adding new functionality in tuplestore.c? Especially
> functionality to get out a different tuple representation than what was
> put in?

If am not mistaken tuplestore store MinimalTuple(tuple without transaction
status information) for minimal memory usage. it change what we put in into
MinimalTuple so we need the same format for getting out. The other way I
can think of doing it is allocating new MinimalTuple slot every time to get
the tuple which have resource like. The operation is very similar to
RunFromStore without the ability of reusing once allocated slot.

> I can't see a reason why nodeLimit should need that, and for
> sure I don't see a reason why it would need it for this feature if it
> didn't before.
>

Only this feature needs tuple storage

> * The business with query.limitCount having different data types
> depending on LimitOption seems entirely bletcherous and bug-inducing.
> (There are live bugs of that sort in what you have now: notably, that
> kluge in adjust_limit_rows_costs will crash and burn if float8 is
> pass-by-reference. Breaking the Datum abstraction is bad practice.)
> I wonder if we could get away with making limitCount be float8 all the
> time. This would mean that you'd lose exactness for counts above 2^53
> (or so, depending on whether float8 is IEEE or not), but I doubt that
> anybody would ever notice.
>
> * Names like "PERCENTAGE" seem mighty generic to be exposing as globally
> known symbols; also you're disregarding a fairly widespread PG
> convention that members of an enum should have names derived from the
> enum type's name. So I'd be inclined to make the values of enum
> LimitOption be LIMIT_OPTION_COUNT and LIMIT_OPTION_PERCENT. (I don't
> especially like EXACT_NUMBER, first because it's going to be quite long
> with the prefix, and second because it suggests that maybe there's an
> INEXACT_NUMBER alternative.)
>
> * The "void *limitOption" business in gram.y also seems pretty ugly;
> it'd be better for that to be enum LimitOption. I gather you want
> the null to indicate "no option given", but likely it'd be better
> to invent a LIMIT_OPTION_DEFAULT enum alternative for that purpose.
>
>
i like to fix the above three more because of scaling

* An alternative to the three points above is just to separate
> Query.limitCount (an int8) from Query.limitPercent (a float8), with the
> understanding that at most one of the two can be non-null, and use
> similar representations at other steps of the processing chain. Then
> you don't need enum limitOption at all. That might not scale especially
> nicely to additional variants, but trying to overload limitCount even
> further isn't going to be nice either.
>
> * The proposed change in grouping_planner() seems like a complete hack.
> Why would you not change the way that limit_tuples was computed earlier,
> instead?
>
That is, it seems like the part of the planner to teach about
> this feature is preprocess_limit (and limit_needed), not someplace else.
> It is surely not sane that only this one planner decision would react to
> a percentage-based limit.
>
> * I didn't really study the changes in nodeLimit.c, but doing
> "tuplestore_rescan" in ExecReScanLimit is surely just wrong. You
> probably want to delete and recreate the tuplestore, instead, since
> whatever data you already collected is of no further use. Maybe, in
> the case where no rescan of the child node is needed, you could re-use
> the data already collected; but that would require a bunch of additional
> logic. I'm inclined to think that v1 of the patch shouldn't concern
> itself with that sort of optimization.
>
>
ok

> * I don't see any delta in ruleutils.c, but surely that needs to know
> about this feature so that it can correctly print views using it.
>

ok

regards
Surafel

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Surafel Temesgen 2019-09-06 11:52:53 Re: FETCH FIRST clause WITH TIES option
Previous Message Esteban Zimanyi 2019-09-06 10:50:33 Re: Specifying attribute slot for storing/reading statistics