Re: Fix incorrect start up costs for WindowAgg paths (bug #17862)

From: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Fix incorrect start up costs for WindowAgg paths (bug #17862)
Date: 2023-04-13 00:16:21
Message-ID: CAKU4AWo0K+85q-R8Hs_r9n47CV6_m6-GG_hCFCJcqhmBCbRDhw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 13, 2023 at 6:09 AM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:

> .On Thu, 13 Apr 2023 at 02:28, Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com> wrote:
> > The concept of startup_tuples for a WindowAgg looks good to me, but I
> > can't follow up with the below line:
> >
> > + return clamp_row_est(partition_tuples * DEFAULT_INEQ_SEL);
> >
> > # select count(*) over() from tenk1 limit 1;
> > count
> > -------
> > 10000 --> We need to scan all the tuples.
> >
> > Should we just return clamp_row_est(partition_tuples)?
>
> For the case you've shown, it will. It's handled by this code:
>
> if (wc->orderClause == NIL)
> return clamp_row_est(partition_tuples);
>
> My fault. I should have real debugging to double check my
understanding, surely I will next time.

It would take something like the following to hit the code you're
> concerned about:
>
> explain select count(*) over(order by unique1 rows between unbounded
> preceding and 10*random() following) from tenk1;
> QUERY PLAN
>
> --------------------------------------------------------------------------------------------
> WindowAgg (cost=140.23..420.29 rows=10000 width=12)
> -> Index Only Scan using tenk1_unique1 on tenk1
> (cost=0.29..270.29 rows=10000 width=4)
> (2 rows)
>
> You can see the startup cost is about 33% of the total cost for that,
> which is from the DEFAULT_INEQ_SEL. I'm not exactly set on that
> having to be DEFAULT_INEQ_SEL, but I'm not really sure what we could
> put that's better. I don't really follow why assuming all rows are
> required is better. That'll just mean we favour cheap startup plans
> less, but there might be a case where a cheap startup plan is
> favourable. I was opting for a happy medium when I thought to use
> DEFAULT_INEQ_SEL.
>

That looks reasonable to me. My suggestion came from my misreading
before, It was a bit late in my time zone when writing. Thanks for the
detailed explanation!

>
> I also see I might need to do a bit more work on this as the following
> is not handled correctly:
>
> select count(*) over(rows between unbounded preceding and 10
> following) from tenk1;
>
> it's assuming all rows due to lack of ORDER BY, but it seems like it
> should be 10 rows due to the 10 FOLLOWING end bound.
>
>
True to me.

--
Best Regards
Andy Fan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2023-04-13 00:16:27 Clean up hba.c of code freeing regexps
Previous Message Daniel Gustafsson 2023-04-12 23:25:05 Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert