Re: FETCH FIRST clause PERCENT option

From: Surafel Temesgen <surafel3000(at)gmail(dot)com>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: Ryan Lambert <ryan(at)rustprooflabs(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Vik Fearing <vik(dot)fearing(at)2ndquadrant(dot)com>, Mark Dilger <hornschnorter(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Subject: Re: FETCH FIRST clause PERCENT option
Date: 2019-08-22 07:40:30
Message-ID: CALAY4q-P3g+U_Kk=qJ5DVqdT5UX8w+4azVv0OuWwhPdw+rTgeQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Aug 20, 2019 at 9:10 AM Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
wrote:

> Hi,
>
> At Wed, 7 Aug 2019 10:20:09 +0300, Surafel Temesgen <surafel3000(at)gmail(dot)com>
> wrote in <
> CALAY4q98xbVHtZ4yj9DcCMG2-s1_JuRr7fYaNfW+bKmr22OiVQ(at)mail(dot)gmail(dot)com>
> > Hi
> > On Wed, Aug 7, 2019 at 6:11 AM Kyotaro Horiguchi <
> horikyota(dot)ntt(at)gmail(dot)com>
> > wrote:
> >
> > >
> > > I have some comments.
> > >
> > > This patch uses distinct parameters for exact number and
> > > percentage. On the othe hand planner has a notion of
> > > tuple_fraction covering the both. The same handling is also used
> > > for tuple number estimation. Using the same scheme will make
> > > things far simpler. See the comment of grouping_planner().
> > >
> > >
> > Its because of data type difference .In planner the data type is the same
>
> I meant that, with the usage of tuple_fraction, one variable can
> represent both the absolute limit and relative limit. This
> simplifies parse structs.
>
In grouping_planner the patch set tuple bound to -1 in create_ordered_paths
because it iterate until the end in percentage. Did that answer your
question?

>
>
> > > In executor part, addition to LimiteState.position, this patch
> > > uses LimiteState.backwardPosition to count how many tuples we're
> > > back from the end of the current tuplestore. But things will get
> > > simpler by just asking the tuplestore for the number of holding
> > > tuples.
> > >
> > >
> > backwardPosition hold how many tuple returned in backward scan
>
> I meant that we don't need to hold the number in estate.
>

I did it this way because I didn't find an API in tuplestore to do this

>
> > > + slot = node->subSlot;
> > >
> > > Why is this needed? The variable is properly set before use and
> > > the assignment is bogus in the first place.
> > >
> > >
> > its because Tuplestore needs initialized slot.
>
> I meant that the initilized slot is overwritten before first use.
>
> > > The new code block in LIMIT_RESCAN in ExecLimit is useless since
> > > it is exatctly the same with existing code block. Why didn't you
> > > use the existing if block?
> > >
> >
> > But they test different scenario
>
> I meant that the two different scenario can share the code block.
>

Sorry for not clarifying will .One have to be check before offsetting and
the other is after offsetting

>
> > > > if (node->limitOption == PERCENTAGE)
> > > + {
> > > + node->perExactCount = ceil(node->percent *
> > > node->position / 100.0);
> > > +
> > > +
> > >
> > > node->position is the number of tuples returned to upper node so
> > > far (not the number of tuples this node has read from the lower
> > > node so far). I don't understand what the expression means.
> > >
> >
> > node->position hold the number of tuples this node has read from the
> lower
> > node so far. see LIMIT_RESCAN state
>
> Reallly? node->position is incremented when
> tuplestore_gettupleslot_heaptuple() succeeded and reutnrs the
> tuple to the caller immediately...
>
> > > + if (node->perExactCount == node->perExactCount +
> 1)
> > > + node->perExactCount++;
> > >
> > > What? The condition never be true. As the result, the following
> > > if block below won't run.
> > >
> >
> > it became true according to the number of tuple returned in from the
> lower
> > node so far
> > and percentage specification.
>
> Mmm. How do you think X can be equal to (X + 1)?
>

Oops my bad .The attached patch remove the need of doing that

>
> > > > /*
> > > + * Return the tuple up to the number of exact count in
> > > OFFSET
> > > + * clause without percentage value consideration.
> > > + */
> > > + if (node->perExactCount > 0)
> > > + {
> > > +
> > >
> > >
> > >
> > >
> > > + /*
> > > + * We may needed this tuple in backward scan so put it
> into
> > > + * tuplestore.
> > > + */
> > > + if (node->limitOption == PERCENTAGE)
> > > + {
> > > + tuplestore_puttupleslot(node->tupleStore, slot);
> > > + tuplestore_advance(node->tupleStore, true);
> > > + }
> > >
> > > "needed"->"need" ? The comment says that this is needed for
> > > backward scan, but it is actually required even in forward
> > > scan. More to the point, tuplestore_advance lacks comment.
> >
> >
> > ok
> >
> >
> > >
> > > Anyway, the code in LIMIT_RESCAN is broken in some ways. For
> > > example, if it is used as the inner scan of a NestLoop, the
> > > tuplestore accumulates tuples by every scan. You will see that
> > > the tuplestore stores up to 1000 tuples (10 times of the inner)
> > > by the following query.
> > >
> >
> > It this because in percentage we scan the whole table
>
> It's useless and rather harmful that the tuplestore holds
> indefinite number of duplicate set of the whole tuples from the
> lower node. We must reuse tuples already stored in the tuplestore
> or clear it before the next round.
>
>

i agree with this optimization but it don't have to be in first version

regards
Surafel

Attachment Content-Type Size
percent-incremental-v8.patch text/x-patch 47.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2019-08-22 07:45:58 Refactoring of connection with password prompt loop for frontends
Previous Message Konstantin Knizhnik 2019-08-22 07:06:56 Re: Optimization of vacuum for logical replication