Re: FETCH FIRST clause PERCENT option

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Surafel Temesgen <surafel3000(at)gmail(dot)com>
Cc: vik(dot)fearing(at)2ndquadrant(dot)com, Mark Dilger <hornschnorter(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, andrew(at)tao11(dot)riddles(dot)org(dot)uk
Subject: Re: FETCH FIRST clause PERCENT option
Date: 2019-01-04 14:27:38
Message-ID: b6a3bdfa-d5c8-534c-c1e8-fdabee061b04@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 1/4/19 7:40 AM, Surafel Temesgen wrote:
>
>
> On Thu, Jan 3, 2019 at 4:51 PM Tomas Vondra
> <tomas(dot)vondra(at)2ndquadrant(dot)com <mailto:tomas(dot)vondra(at)2ndquadrant(dot)com>> wrote:
>
>
> On 1/3/19 1:00 PM, Surafel Temesgen wrote:
> > Hi
> >
> > On Tue, Jan 1, 2019 at 10:08 PM Tomas Vondra
> > <tomas(dot)vondra(at)2ndquadrant(dot)com
> <mailto:tomas(dot)vondra(at)2ndquadrant(dot)com>
> <mailto:tomas(dot)vondra(at)2ndquadrant(dot)com
> <mailto:tomas(dot)vondra(at)2ndquadrant(dot)com>>> wrote:
>
> >     The execution part of the patch seems to be working correctly,
> but I
> >     think there's an improvement - we don't need to execute the
> outer plan
> >     to completion before emitting the first row. For example,
> let's say the
> >     outer plan produces 10000 rows in total and we're supposed to
> return the
> >     first 1% of those rows. We can emit the first row after
> fetching the
> >     first 100 rows, we don't have to wait for fetching all 10k rows.
> >
> >
> > but total rows count is not given how can we determine safe to
> return row
> >
>
> But you know how many rows were fetched from the outer plan, and this
> number only grows grows. So the number of rows returned by FETCH FIRST
> ... PERCENT also only grows. For example with 10% of rows, you know that
> once you reach 100 rows you should emit ~10 rows, with 200 rows you know
> you should emit ~20 rows, etc. So you may track how many rows we're
> supposed to return / returned so far, and emit them early.
>
>
>
>
> yes that is clear but i don't find it easy to put that in formula. may
> be someone with good mathematics will help
>

What formula? All the math remains exactly the same, you just need to
update the number of rows to return and track how many rows are already
returned.

I haven't tried doing that, but AFAICS you'd need to tweak how/when
node->count is computed - instead of computing it only once it needs to
be updated after fetching each row from the subplan.

Furthermore, you'll need to stash the subplan rows somewhere (into a
tuplestore probably), and whenever the node->count value increments,
you'll need to grab a row from the tuplestore and return that (i.e.
tweak node->position and set node->subSlot).

I hope that makes sense. The one thing I'm not quite sure about is
whether tuplestore allows adding and getting rows at the same time.

Does that make sense?

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2019-01-04 14:32:16 Re: Fast path for empty relids in check_outerjoin_delay()
Previous Message Tomas Vondra 2019-01-04 14:12:05 Re: Delay locking partitions during query execution