Re: Fix incorrect start up costs for WindowAgg paths (bug #17862)

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
Cc: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Fix incorrect start up costs for WindowAgg paths (bug #17862)
Date: 2023-04-12 22:09:39
Message-ID: CAApHDvqFTr=vORoBFgK-=7M0SpCT2n+Y3QZJJPUB5g_SC8ijjw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

.On Thu, 13 Apr 2023 at 02:28, Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com> wrote:
> The concept of startup_tuples for a WindowAgg looks good to me, but I
> can't follow up with the below line:
>
> + return clamp_row_est(partition_tuples * DEFAULT_INEQ_SEL);
>
> # select count(*) over() from tenk1 limit 1;
> count
> -------
> 10000 --> We need to scan all the tuples.
>
> Should we just return clamp_row_est(partition_tuples)?

For the case you've shown, it will. It's handled by this code:

if (wc->orderClause == NIL)
return clamp_row_est(partition_tuples);

It would take something like the following to hit the code you're
concerned about:

explain select count(*) over(order by unique1 rows between unbounded
preceding and 10*random() following) from tenk1;
QUERY PLAN
--------------------------------------------------------------------------------------------
WindowAgg (cost=140.23..420.29 rows=10000 width=12)
-> Index Only Scan using tenk1_unique1 on tenk1
(cost=0.29..270.29 rows=10000 width=4)
(2 rows)

You can see the startup cost is about 33% of the total cost for that,
which is from the DEFAULT_INEQ_SEL. I'm not exactly set on that
having to be DEFAULT_INEQ_SEL, but I'm not really sure what we could
put that's better. I don't really follow why assuming all rows are
required is better. That'll just mean we favour cheap startup plans
less, but there might be a case where a cheap startup plan is
favourable. I was opting for a happy medium when I thought to use
DEFAULT_INEQ_SEL.

I also see I might need to do a bit more work on this as the following
is not handled correctly:

select count(*) over(rows between unbounded preceding and 10
following) from tenk1;

it's assuming all rows due to lack of ORDER BY, but it seems like it
should be 10 rows due to the 10 FOLLOWING end bound.

David

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2023-04-12 22:23:48 Re: Add LZ4 compression in pg_dump
Previous Message Michael Paquier 2023-04-12 21:49:16 Re: [PATCH] Allow Postgres to pick an unused port to listen