Re: BUG #16887: Group by is faster than distinct

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: liuxy(at)gatech(dot)edu, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #16887: Group by is faster than distinct
Date: 2021-02-23 05:28:18
Message-ID: CAApHDvrAgN4APYrsoMGoAhps6zsa2SEom5QW+O-ZqEpjggm-6w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Tue, 23 Feb 2021 at 10:26, PG Bug reporting form
<noreply(at)postgresql(dot)org> wrote:
> Actual Behavior
> We executed both queries on the TPC-H benchmark of scale factor 5: the first
> query takes over 20 seconds, while the second query only takes 6.5 seconds.
> We think the time difference results from different plans selected.
> Specifically, in the first (slow) query, the optimizer decides to not
> parallelize the SCAN and GROUP operations.

> Expected Behavior
> Since these two queries are semantically equivalent, we were hoping that
> PostgreSQL will evaluate them in roughly the same amount of time.

It makes sense that you'd expect this, however, we don't currently
generate parallel plans for DISTINCT queries. So this is more of
something that's yet to be implemented rather than a bug.

When parallel aggregates were added in 9.6, it was quite late in the
cycle and I narrowed the scope to just GROUP BY. DISTINCT was left
behind. I tried to pick that up again several years ago, but I was
encouraged to drop it in favour of other work.

David

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Alexander Lakhin 2021-02-23 06:00:00 Re: BUG #16801: Invalid memory access on WITH RECURSIVE with nested WITHs
Previous Message Adrian Klaver 2021-02-23 05:23:02 Re: pg_restore - generated column - not populating