Quick Links

Re: Allow parallel DISTINCT

From:	David Rowley <dgrowleyml(at)gmail(dot)com>
To:	Zhihong Yu <zyu(at)yugabyte(dot)com>
Cc:	PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Allow parallel DISTINCT
Date:	2021-08-17 20:47:25
Message-ID:	CAApHDvpu8bo6GXzMU_oUzRA-bSkX8Rphab+CT96K_NLMCK0w6w@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, 18 Aug 2021 at 02:42, Zhihong Yu <zyu(at)yugabyte(dot)com> wrote:
> Since create_partial_distinct_paths() calls create_final_distinct_paths(), I wonder if numDistinctRows can be passed to create_final_distinct_paths() so that the latter doesn't need to call estimate_num_groups().

That can't be done. The two calls to estimate_num_groups() are passing
in a different number of input rows. In
create_partial_distinct_paths() the number of rows is the number of
expected input rows from a partial path. In
create_final_distinct_paths() when called to complete the final
distinct step, that's the number of distinct values multiplied by the
number of workers.

It might be more possible to do something like cache the value of
distinctExprs, but I just don't feel the need. If there are partial
paths in the input_rel then it's most likely that planning time is not
going to dominate much between planning and execution. Also, if we
were to calculate the value of distinctExprs in create_distinct_paths
always, then we might end up calculating it for nothing as
create_final_distinct_paths() does not always need it. I don't feel
the need to clutter up the code by doing any lazy calculating of it
either.

David

In response to

Re: Allow parallel DISTINCT at 2021-08-17 14:47:40 from Zhihong Yu

Responses

Re: Allow parallel DISTINCT at 2021-08-17 20:56:23 from Zhihong Yu

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Zhihong Yu	2021-08-17 20:56:23	Re: Allow parallel DISTINCT
Previous Message	alvherre@alvh.no-ip.org	2021-08-17 20:23:03	Re: archive status ".ready" files may be created too early