Re: Optimize join selectivity estimation by not reading MCV stats for unique join attributes

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: David Geier <geidav(dot)pg(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Richard Guo <guofenglinux(at)gmail(dot)com>
Cc: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Optimize join selectivity estimation by not reading MCV stats for unique join attributes
Date: 2022-11-18 13:00:06
Message-ID: 53950f93-8cce-4440-52f1-5879653c6803@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 11/18/22 09:54, David Geier wrote:
> Thanks everyone for the great feedback and suggestions.
>
>>
>>> Yes, it is.  Using zero flag would short-cut get_attstatsslot() to just
>>> return whether the slot type exists without loading it.  Do you think we
>>> need to emphasize this use case in the comments for 'flags'?
>> Perhaps, it's not really obvious now.
>
> Comment added.
>
>
>> I wonder whether we need to also check statistic_proc_security_check()
>>> when determining if MCVs exists in both sides.
>> Yeah, I thought about hoisting the statistic_proc_security_check
>> tests up into get_mcv_stats.  I don't think it's a great idea
>> though.  Again, it'd complicate untangling this if we ever
>> generalize the use of MCVs in this function.  Also, I don't
>> think we should be micro-optimizing the case where the security
>> check doesn't pass --- if it doesn't, you're going to be hurting
>> from bad plans a lot more than you are from some wasted cycles
>> here.
>
> Sounds reasonable.
>
> Attached is v2 of the patch.
> This is basically Tom's version plus a comment for the flags of
> get_attstatslot() as suggested by Richard.
>

Seems fine. I wonder if we could/could introduce a new constant for 0,
similar to ATTSTATSSLOT_NUMBERS/ATTSTATSSLOT_VALUES, instead of using a
magic constant. Say, ATTSTATSSLOT_NONE or ATTSTATSSLOT_CHECK.

> I couldn't come up with any reasonable way of writing an automated test
> for that.
> Any ideas?
>

I don't think you can write a test for this, because there is no change
to behavior that can be observed by the user. If one side has no MCV,
the only difference is whether we try to load the other MCV or not.
There's no impact on estimates, because we won't use it.

IMO the best thing we can do is check coverage, that the new code is
exercised in regression tests. And I think that's fine.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2022-11-18 13:07:22 Re: Add index scan progress to pg_stat_progress_vacuum
Previous Message David Geier 2022-11-18 12:43:31 Re: CREATE UNLOGGED TABLE seq faults when debug_discard_caches=1