Re: Re: parallel distinct union and aggregate support patch

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: "bucoo(at)sohu(dot)com" <bucoo(at)sohu(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: parallel distinct union and aggregate support patch
Date: 2020-10-27 14:22:50
Message-ID: CAFiTN-vm+kNZbUu_FsNND3wL4LtNv=jK1D-msB5ALG=QUVG1YA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Oct 27, 2020 at 3:27 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> On Fri, Oct 23, 2020 at 11:58 AM bucoo(at)sohu(dot)com <bucoo(at)sohu(dot)com> wrote:
> >
> > > Interesting idea. So IIUC, whenever a worker is scanning the tuple it
> > > will directly put it into the respective batch(shared tuple store),
> > > based on the hash on grouping column and once all the workers are
> > > doing preparing the batch then each worker will pick those baches one
> > > by one, perform sort and finish the aggregation. I think there is a
> > > scope of improvement that instead of directly putting the tuple to the
> > > batch what if the worker does the partial aggregations and then it
> > > places the partially aggregated rows in the shared tuple store based
> > > on the hash value and then the worker can pick the batch by batch. By
> > > doing this way, we can avoid doing large sorts. And then this
> > > approach can also be used with the hash aggregate, I mean the
> > > partially aggregated data by the hash aggregate can be put into the
> > > respective batch.
> >
> > Good idea. Batch sort suitable for large aggregate result rows,
> > in large aggregate result using partial aggregation maybe out of memory,
> > and all aggregate functions must support partial(using batch sort this is unnecessary).
> >
> > Actually i written a batch hash store for hash aggregate(for pg11) like this idea,
> > but not write partial aggregations to shared tuple store, it's write origin tuple and hash value
> > to shared tuple store, But it's not support parallel grouping sets.
> > I'am trying to write parallel hash aggregate support using batch shared tuple store for PG14,
> > and need support parallel grouping sets hash aggregate.
>
> I was trying to look into this patch to understand the logic in more
> detail. Actually, there are no comments at all so it's really hard to
> understand what the code is trying to do.
>
> I was reading the below functions, which is the main entry point for
> the batch sort.
>
> +static TupleTableSlot *ExecBatchSortPrepare(PlanState *pstate)
> +{
> ...
> + for (;;)
> + {
> ...
> + tuplesort_puttupleslot(state->batches[hash%node->numBatches], slot);
> + }
> +
> + for (i=node->numBatches;i>0;)
> + tuplesort_performsort(state->batches[--i]);
> +build_already_done_:
> + if (parallel)
> + {
> + for (i=node->numBatches;i>0;)
> + {
> + --i;
> + if (state->batches[i])
> + {
> + tuplesort_end(state->batches[i]);
> + state->batches[i] = NULL;
> + }
> + }
>
> I did not understand this part, that once each worker has performed
> their local batch-wise sort why we are clearing the baches? I mean
> individual workers have their on batches so eventually they supposed
> to get merged. Can you explain this part and also it will be better
> if you can add the comments.

I think I got this, IIUC, each worker is initializing the shared
short and performing the batch-wise sorting and we will wait on a
barrier so that all the workers can finish with their sorting. Once
that is done the workers will coordinate and pick the batch by batch
and perform the final merge for the batch.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2020-10-27 14:23:31 Re: parallel distinct union and aggregate support patch
Previous Message Bruce Momjian 2020-10-27 14:20:35 Re: Internal key management system