Re: parallel distinct union and aggregate support patch

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: "bucoo(at)sohu(dot)com" <bucoo(at)sohu(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: parallel distinct union and aggregate support patch
Date: 2020-10-22 09:08:03
Message-ID: CAFiTN-s85CsefWxZnm=X7bh+unMdUng4XBOx7Zgpd1HFGd2fXA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Oct 19, 2020 at 8:19 PM bucoo(at)sohu(dot)com <bucoo(at)sohu(dot)com> wrote:
>
> Hi hackers,
> I write a path for soupport parallel distinct, union and aggregate using batch sort.
> steps:
> 1. generate hash value for group clauses values, and using mod hash value save to batch
> 2. end of outer plan, wait all other workers finish write to batch
> 3. echo worker get a unique batch number, call tuplesort_performsort() function finish this batch sort
> 4. return row for this batch
> 5. if not end of all batchs, got step 3
>
> BatchSort paln make sure same tuple(group clause) return in same range, so Unique(or GroupAggregate) plan can work.

Interesting idea. So IIUC, whenever a worker is scanning the tuple it
will directly put it into the respective batch(shared tuple store),
based on the hash on grouping column and once all the workers are
doing preparing the batch then each worker will pick those baches one
by one, perform sort and finish the aggregation. I think there is a
scope of improvement that instead of directly putting the tuple to the
batch what if the worker does the partial aggregations and then it
places the partially aggregated rows in the shared tuple store based
on the hash value and then the worker can pick the batch by batch. By
doing this way, we can avoid doing large sorts. And then this
approach can also be used with the hash aggregate, I mean the
partially aggregated data by the hash aggregate can be put into the
respective batch.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2020-10-22 09:16:48 Re: Enumize logical replication message actions
Previous Message Kyotaro Horiguchi 2020-10-22 08:50:36 Re: [Patch] Optimize dropping of relation buffers using dlist