Re: FETCH FIRST clause PERCENT option

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Surafel Temesgen <surafel3000(at)gmail(dot)com>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Ryan Lambert <ryan(at)rustprooflabs(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Vik Fearing <vik(dot)fearing(at)2ndquadrant(dot)com>, Mark Dilger <hornschnorter(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Subject: Re: FETCH FIRST clause PERCENT option
Date: 2019-09-05 22:26:29
Message-ID: 3277.1567722389@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Surafel Temesgen <surafel3000(at)gmail(dot)com> writes:
> [ percent-incremental-v8.patch ]

I took a quick look through this.

* Why is this adding new functionality in tuplestore.c? Especially
functionality to get out a different tuple representation than what was
put in? I can't see a reason why nodeLimit should need that, and for
sure I don't see a reason why it would need it for this feature if it
didn't before.

* The business with query.limitCount having different data types
depending on LimitOption seems entirely bletcherous and bug-inducing.
(There are live bugs of that sort in what you have now: notably, that
kluge in adjust_limit_rows_costs will crash and burn if float8 is
pass-by-reference. Breaking the Datum abstraction is bad practice.)
I wonder if we could get away with making limitCount be float8 all the
time. This would mean that you'd lose exactness for counts above 2^53
(or so, depending on whether float8 is IEEE or not), but I doubt that
anybody would ever notice.

* Names like "PERCENTAGE" seem mighty generic to be exposing as globally
known symbols; also you're disregarding a fairly widespread PG
convention that members of an enum should have names derived from the
enum type's name. So I'd be inclined to make the values of enum
LimitOption be LIMIT_OPTION_COUNT and LIMIT_OPTION_PERCENT. (I don't
especially like EXACT_NUMBER, first because it's going to be quite long
with the prefix, and second because it suggests that maybe there's an
INEXACT_NUMBER alternative.)

* The "void *limitOption" business in gram.y also seems pretty ugly;
it'd be better for that to be enum LimitOption. I gather you want
the null to indicate "no option given", but likely it'd be better
to invent a LIMIT_OPTION_DEFAULT enum alternative for that purpose.

* An alternative to the three points above is just to separate
Query.limitCount (an int8) from Query.limitPercent (a float8), with the
understanding that at most one of the two can be non-null, and use
similar representations at other steps of the processing chain. Then
you don't need enum limitOption at all. That might not scale especially
nicely to additional variants, but trying to overload limitCount even
further isn't going to be nice either.

* The proposed change in grouping_planner() seems like a complete hack.
Why would you not change the way that limit_tuples was computed earlier,
instead? That is, it seems like the part of the planner to teach about
this feature is preprocess_limit (and limit_needed), not someplace else.
It is surely not sane that only this one planner decision would react to
a percentage-based limit.

* I didn't really study the changes in nodeLimit.c, but doing
"tuplestore_rescan" in ExecReScanLimit is surely just wrong. You
probably want to delete and recreate the tuplestore, instead, since
whatever data you already collected is of no further use. Maybe, in
the case where no rescan of the child node is needed, you could re-use
the data already collected; but that would require a bunch of additional
logic. I'm inclined to think that v1 of the patch shouldn't concern
itself with that sort of optimization.

* I don't see any delta in ruleutils.c, but surely that needs to know
about this feature so that it can correctly print views using it.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera from 2ndQuadrant 2019-09-05 22:51:58 Re: FETCH FIRST clause WITH TIES option
Previous Message Alvaro Herrera from 2ndQuadrant 2019-09-05 22:21:22 Re: libpq debug log