Re: Improve rowcount estimate for UNNEST(column)

From: Paul Jungwirth <pj(at)illuminatedcomputing(dot)com>
To: jian he <jian(dot)universality(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Improve rowcount estimate for UNNEST(column)
Date: 2023-12-07 06:32:07
Message-ID: 166ae46a-9f77-4a72-987c-54fc3e63b3c6@illuminatedcomputing.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

On 11/26/23 12:22, Tom Lane wrote:
> Yes, this regression test is entirely unacceptable; the numbers will
> not be stable enough. Even aside from the different-settings issue,
> you can't rely on ANALYZE deriving exactly the same stats every time.
> Usually what we try to do is devise a query where the plan shape
> changes because of the better estimate.

Here is a patch with an improved test. With the old "10" estimate we get a Merge Join, but now that
the planner can see there are only ~4 elements per array, we get a Nested Loop.

It was actually hard to get a new plan, since all our regress tables' arrays have around 5 elements
average, which isn't so far from 10. Adding a table with 1- or 2- element arrays would be more
dramatic. So I resorted to tuning the query with `WHERE seqno <= 50`. Hopefully that's not cheating
too much.

I thought about also adding a test where the old code *underestimates*, but then I'd have to add a
new table with big arrays. If it's worth it let me know.

On 11/26/23 23:05, jian he wrote:
> I found using table array_op_test test more convincing.

True, arrtest is pretty small. The new test uses array_op_test instead.

On 11/29/23 20:35, jian he wrote:
> I did a minor change. change estimate_array_length return type to double

I'm not sure I want to change estimate_array_length from returning ints to returning doubles, since
it's called in many places. I can see how that might give plans that are more accurate yet, so maybe
it's worth it? It feels like it ought to be a separate patch to me, but if others want me to include
it here please let me know.

I did add clamp_row_est since it's essentially free and maybe gives some safety.

Rebased onto d43bd090a8.

Yours,

--
Paul ~{:-)
pj(at)illuminatedcomputing(dot)com

Attachment Content-Type Size
v2-0001-Use-statitics-for-estimating-UNNEST-column-rows.patch text/x-patch 8.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2023-12-07 06:33:13 Re: catalog access with reset GUCs during parallel worker startup
Previous Message David Rowley 2023-12-07 06:29:51 Re: catalog access with reset GUCs during parallel worker startup