Re: cost_sort vs cost_agg

From: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
To: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
Subject: Re: cost_sort vs cost_agg
Date: 2021-02-08 08:04:46
Message-ID: CAKU4AWoiHnE8B7BTJ9JCarJv7_b+bgr5Le=yYnuk48HtJ6swSg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thank you Ashutosh.

On Fri, Jan 15, 2021 at 7:18 PM Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
wrote:

> On Thu, Jan 14, 2021 at 7:12 PM Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com> wrote:
> >
> > Currently the cost_sort doesn't consider the number of columns to sort,
> which
> > means the cost of SELECT * FROM t ORDER BY a; equals with the SELECT *
> > FROM t ORDER BY a, b; which is obviously wrong. The impact of this is
> when we
> > choose the plan for SELECT DISTINCT * FROM t ORDER BY c between:
> >
> > Sort
> > Sort Key: c
> > -> HashAggregate
> > Group Key: c, a, b, d, e, f, g, h, i, j, k, l, m, n
> >
> > and
> >
> > Unique
> > -> Sort
> > Sort Key: c, a, b, d, e, f, g, h, i, j, k, l, m, n
> >
> >
> > Since "Sort (c)" has the same cost as "Sort (c, a, b, d, e, f, g, h, i,
> j, k,
> > l, m, n)", and Unique node on a sorted input is usually cheaper than
> > HashAggregate, so the later one will win usually which might bad at many
> > places.
>
> I can imagine that HashAggregate + Sort will perform better if there
> are very few distinct rows but otherwise, Unique on top of Sort would
> be a better strategy since it doesn't need two operations.
>
>
Thanks for the hint, I will consider the distinct rows as a factor in the
next
patch.

> >
> > Optimizer chose HashAggregate with my patch, but it takes 6s. after set
> > enable_hashagg = off, it takes 2s.
>
> This example actually shows that using Unique is better than
> HashAggregate + Sort. May be you want to try with some data which has
> very few distinct rows.
>
>

--
Best Regards
Andy Fan (https://www.aliyun.com/)

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tang, Haiying 2021-02-08 08:12:40 RE: Parallel INSERT (INTO ... SELECT ...)
Previous Message Greg Nancarrow 2021-02-08 08:04:27 Re: Parallel INSERT (INTO ... SELECT ...)