Re: Allow parallel DISTINCT

From: Zhihong Yu <zyu(at)yugabyte(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Allow parallel DISTINCT
Date: 2021-08-17 20:56:23
Message-ID: CALNJ-vQg3=YwoJyo=bkDY3=AQi6LZSBX+W=V0QTP5ErTF9tE6Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Aug 17, 2021 at 1:47 PM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:

> On Wed, 18 Aug 2021 at 02:42, Zhihong Yu <zyu(at)yugabyte(dot)com> wrote:
> > Since create_partial_distinct_paths() calls
> create_final_distinct_paths(), I wonder if numDistinctRows can be passed to
> create_final_distinct_paths() so that the latter doesn't need to call
> estimate_num_groups().
>
> That can't be done. The two calls to estimate_num_groups() are passing
> in a different number of input rows. In
> create_partial_distinct_paths() the number of rows is the number of
> expected input rows from a partial path. In
> create_final_distinct_paths() when called to complete the final
> distinct step, that's the number of distinct values multiplied by the
> number of workers.
>
> It might be more possible to do something like cache the value of
> distinctExprs, but I just don't feel the need. If there are partial
> paths in the input_rel then it's most likely that planning time is not
> going to dominate much between planning and execution. Also, if we
> were to calculate the value of distinctExprs in create_distinct_paths
> always, then we might end up calculating it for nothing as
> create_final_distinct_paths() does not always need it. I don't feel
> the need to clutter up the code by doing any lazy calculating of it
> either.
>
> David
>
Hi,
Thanks for your explanation.

The patch is good from my point of view.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bossart, Nathan 2021-08-17 21:09:52 Re: archive status ".ready" files may be created too early
Previous Message David Rowley 2021-08-17 20:47:25 Re: Allow parallel DISTINCT