Re: explain analyze rows=%.0f

From: Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Andrei Lepikhov <lepihov(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Guillaume Lelarge <guillaume(at)lelarge(dot)info>, Matheus Alcantara <matheusssilv97(at)gmail(dot)com>, Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>
Subject: Re: explain analyze rows=%.0f
Date: 2025-04-09 19:40:42
Message-ID: e049a590-36cb-495c-943d-79017e4b248a@tantorlabs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 31.03.2025 23:59, Ilia Evdokimov wrote:
>
> We definitely shouldn’t remove the row counts < 1 check, since there
> are many places in the planner where we divide by rows. This mechanism
> was added specifically to prevent division by zero. Also, allowing
> rows estimates below 1 can sometimes make the planner overly
> optimistic, leading it to prefer cheaper-looking plans that may not
> perform well in practice. For example, choosing a Nested Loop instead
> of a more appropriate Hash Join.
>
> Allowing fractional rows > 1 might help improve planner accuracy in
> some cases, but this needs further study to fully understand the impact.
>

I've been investigating whether it's worth removing rounding in row
estimates - and I believe it is.

[ v1-0001-Always-use-two-fractional-digits-for-estimated-rows_SRC.patch ]

Currently, we round most row estimates using rint() inside
clamp_row_est(). However, this function is also used for rounding tuples
and page counts. These should remain integral, but row estimates can and
should remain fractional for better precision. To address this, I
introduced a new function clamp_tuple_est() which retains the existing
rounding behavior (via rint()), while clamp_tuple_est() no longer
rounds. I use clamp_tuple_est() only for row estimates and
clamp_tuple_est() for tuples and pages elsewhere.

After removing rounding, two small issues needed fixing. First, there
was a check rows > 1 in the cost estimation for Nested Loop joins, which
is no longer reliable for values like 1.3. I updated it to rows >= 2 to
retain the original behavior. This can be refined further, but, in my
opinion, it's a practical compromise. Second, there is still a call to
rint() in cost of mergejoin which likely should be removed too - though
I haven’t included that here yet.

Also, if we're no longer rounding estimates, EXPLAIN should display them
with two decimal digits, just like it already does for actual rows.

[ v1-0002-Always-use-two-fractional-digits-for-estimated-rows_TESTS.patch ]

So, what does this change improve? Here are some of the observed plan
improvements:

* Previously, a Parallel Aggregate was chosen. With slightly more
accurate estimation, the planner switches to FinalizeAggregate,
which can be more efficient in distributed plans.
* In certain nested joins with constant subqueries, the planner
previously inserted an unnecessary Materialize. With improved
estimates, it now skips that step, reducing memory usage and latency.
* When the estimated number of iterations becomes non-integer but
still justifies caching, the planner adds Memoize instead of
re-running a function like generate_series(). This can speed up
execution significantly.
* In one case involving partitioned tables and filter conditions like
t1.b = 0, the planner now chooses an index-based nested loop join
instead of a hash join. This results in a more efficient plan with
fewer memory and CPU costs.

I know this patch still needs documentation updates to describe the new
estimation display and behaviors. But before that, I’d like to gather
feedback: does community agree that more precise estimates and
fractional values are better than always rounding?

If anyone would like to see the EXPLAIN ANALYZE VERBOSE output of
changes in regression tests, I’d be happy to share them.

All feedback and suggestions welcome!

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

Attachment Content-Type Size
v1-0001-Always-use-two-fractional-digits-for-estimated-rows_SRC.patch text/x-patch 17.1 KB
v1-0002-Always-use-two-fractional-digits-for-estimated-rows_TESTS.patch text/x-patch 61.1 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ilia Evdokimov 2025-04-09 19:47:03 Re: Sample rate added to pg_stat_statements
Previous Message Sami Imseih 2025-04-09 19:39:32 Re: n_ins_since_vacuum stats for aborted transactions