Re: using extended statistics to improve join estimates

From: Andy Fan <zhihuifan1213(at)163(dot)com>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: Julien Rouhaud <rjuju123(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: using extended statistics to improve join estimates
Date: 2024-04-02 08:23:45
Message-ID: 87cyr89nk5.fsf@163.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


> On Wed, Mar 02, 2022 at 11:38:21AM -0600, Justin Pryzby wrote:
>> Rebased over 269b532ae and muted compiler warnings.

Thank you Justin for the rebase!

Hello Tomas,

Thanks for the patch! Before I review the path at the code level, I want
to explain my understanding about this patch first.

Before this patch, we already use MCV information for the eqjoinsel, it
works as combine the MCV on the both sides to figure out the mcv_freq
and then treat the rest equally, but this doesn't work for MCV in
extended statistics, this patch fill this gap. Besides that, since
extended statistics means more than 1 columns are involved, if 1+
columns are Const based on RestrictInfo, we can use such information to
filter the MCVs we are interesting, that's really cool.

I did some more testing, all of them are inner join so far, all of them
works amazing and I am suprised this patch didn't draw enough
attention. I will test more after I go though the code.

At for the code level, I reviewed them in the top-down manner and almost
40% completed. Here are some findings just FYI. For efficiency purpose,
I provide each feedback with a individual commit, after all I want to
make sure my comment is practical and coding and testing is a good way
to archive that. I tried to make each of them as small as possible so
that you can reject or accept them convinently.

0001 is your patch, I just rebase them against the current master. 0006
is not much relevant with current patch, and I think it can be committed
individually if you are OK with that.

Hope this kind of review is helpful.

--
Best Regards
Andy Fan

Attachment Content-Type Size
v1-0001-Estimate-joins-using-extended-statistics.patch text/x-diff 67.6 KB
v1-0002-Remove-estimiatedcluases-and-varRelid-arguments.patch text/x-diff 6.1 KB
v1-0003-Remove-SpecialJoinInfo-sjinfo-argument.patch text/x-diff 4.4 KB
v1-0004-Remove-joinType-argument.patch text/x-diff 2.7 KB
v1-0005-use-the-pre-calculated-RestrictInfo-left-right_re.patch text/x-diff 2.9 KB
v1-0006-Fast-path-for-general-clauselist_selectivity.patch text/x-diff 996 bytes
v1-0007-bms_is_empty-is-more-effective-than-bms_num_membe.patch text/x-diff 832 bytes
v1-0008-a-branch-of-updates-around-JoinPairInfo.patch text/x-diff 8.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bertrand Drouvot 2024-04-02 08:24:52 Re: Synchronizing slots from primary to standby
Previous Message Andy Fan 2024-04-02 08:14:49 Re: [HACKERS] make async slave to wait for lsn to be replayed