| From: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> | 
|---|---|
| To: | "Jamison, Kirk" <k(dot)jamison(at)jp(dot)fujitsu(dot)com> | 
| Cc: | Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> | 
| Subject: | Re: Multivariate MCV list vs. statistics target | 
| Date: | 2019-07-26 22:05:52 | 
| Message-ID: | 20190726220552.kafvalbd2o3l7xw6@development | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
On Fri, Jul 26, 2019 at 07:03:41AM +0000, Jamison, Kirk wrote:
>On Sat, July 20, 2019 8:12 AM (GMT+9), Tomas Vondra wrote:
>
>> >+		/* XXX What if the target is set to 0? Reset the statistic?
>> */
>> >
>> >This also makes me wonder. I haven't looked deeply into the code, but
>> >since 0 is a valid value, I believe it should reset the stats.
>>
>> I agree resetting the stats after setting the target to 0 seems quite
>> reasonable. But that's not what we do for attribute stats, because in that
>> case we simply skip the attribute during the future ANALYZE runs - we don't
>> reset the stats, we keep the existing stats. So I've done the same thing here,
>> and I've removed the XXX comment.
>>
>> If we want to change that, I'd do it in a separate patch for both the regular
>> and extended stats.
>
>Hi, Tomas
>
>Sorry for my late reply.
>You're right. I have no strong opinion whether we'd want to change that behavior.
>I've also confirmed the change in the patch where setting statistics target 0
>skips the statistics.
>
OK, thanks.
>Maybe only some minor nitpicks in the source code comments below:
>1. "it's" should be "its":
>> +		 * Compute statistic target, based on what's set for the statistic
>> +		 * object itself, and for it's attributes.
>
>2. Consistency whether you'd use either "statistic " or "statisticS ".
>Ex. statistic target vs statisticS target, statistics object vs statistic object, etc.
>
>> Attached is v4 of the patch, with a couple more improvements:
>>
>> 1) I've renamed the if_not_exists flag to missing_ok, because that's more
>> consistent with the "IF EXISTS" clause in the grammar (the old flag was kinda
>> the exact opposite), and I've added a NOTICE about the skip.
>
>+	bool		missing_ok;      /* do nothing if statistics does not exist */
>
>Confirmed. So we ignore if statistic does not exist, and skip the error.
>Maybe to make it consistent with other data structures in parsernodes.h,
>you can change the comment of missing_ok to:
>/* skip error if statistics object does not exist */
>
Thanks, I've fixed all those places in the attached v5.
>> 2) I've renamed ComputeExtStatsTarget to ComputeExtStatsRows, because that's
>> what the function was doing anyway (computing sample size).
>>
>> 3) I've added a couple of regression tests to stats_ext.sql
>>
>> Aside from that, I've cleaned up a couple of places and improved a bunch of
>> comments. Nothing huge.
>
>I have a question though regarding ComputeExtStatisticsRows.
>I'm just curious with the value 300 when computing sample size.
>Where did this value come from?
>
>+	/* compute sample size based on the statistic target */
>+	return (300 * result);
>
>Overall, the patch is almost already in good shape for commit.
>I'll wait for the next update.
>
That's how we compute number of rows to sample, based on the statistics
target. See std_typanalyze() in analyze.c, which also cites the paper
where this comes from.
regards
-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 
| Attachment | Content-Type | Size | 
|---|---|---|
| alter-statistics-v5.patch | text/plain | 30.5 KB | 
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Alvaro Herrera | 2019-07-26 22:16:56 | Re: Attached partition not considering altered column properties of root partition. | 
| Previous Message | Alvaro Herrera | 2019-07-26 22:05:38 | Re: TopoSort() fix |