Re: parallel distinct union and aggregate support patch

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: "bucoo(at)sohu(dot)com" <bucoo(at)sohu(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: parallel distinct union and aggregate support patch
Date: 2020-10-21 04:27:46
Message-ID: CA+hUKGLOpX8cx6z=U4HXi-q-Ydu2-Bq6YKkdeYePYf89bmuk_w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Oct 20, 2020 at 3:49 AM bucoo(at)sohu(dot)com <bucoo(at)sohu(dot)com> wrote:
> I write a path for soupport parallel distinct, union and aggregate using batch sort.
> steps:
> 1. generate hash value for group clauses values, and using mod hash value save to batch
> 2. end of outer plan, wait all other workers finish write to batch
> 3. echo worker get a unique batch number, call tuplesort_performsort() function finish this batch sort
> 4. return row for this batch
> 5. if not end of all batchs, got step 3
>
> BatchSort paln make sure same tuple(group clause) return in same range, so Unique(or GroupAggregate) plan can work.

Hi!

Interesting work! In the past a few people have speculated about a
Parallel Repartition operator that could partition tuples a bit like
this, so that each process gets a different set of partitions. Here
you combine that with a sort. By doing both things in one node, you
avoid a lot of overheads (writing into a tuplestore once in the
repartitioning node, and then once again in the sort node, with tuples
being copied one-by-one between the two nodes).

If I understood correctly, the tuples emitted by Parallel Batch Sort
in each process are ordered by (hash(key, ...) % npartitions, key,
...), but the path is claiming to be ordered by (key, ...), no?
That's enough for Unique and Aggregate to give the correct answer,
because they really only require equal keys to be consecutive (and in
the same process), but maybe some other plan could break?

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Abhijit Menon-Sen 2020-10-21 04:35:40 Re: [PATCH] SET search_path += octopus
Previous Message Amit Kapila 2020-10-21 04:17:23 Re: Track statistics for streaming of in-progress transactions