RE: Multivariate MCV list vs. statistics target

From: "Jamison, Kirk" <k(dot)jamison(at)jp(dot)fujitsu(dot)com>
To: 'Tomas Vondra' <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: Multivariate MCV list vs. statistics target
Date: 2019-07-29 01:53:08
Message-ID: D09B13F772D2274BB348A310EE3027C6517E5D@g01jpexmbkw24
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Saturday, July 27, 2019 7:06 AM(GMT+9), Tomas Vondra wrote:
> On Fri, Jul 26, 2019 at 07:03:41AM +0000, Jamison, Kirk wrote:
> >On Sat, July 20, 2019 8:12 AM (GMT+9), Tomas Vondra wrote:
> >
> >> >+ /* XXX What if the target is set to 0? Reset the statistic?
> >> */
> >> >
> >> >This also makes me wonder. I haven't looked deeply into the code,
> >> >but since 0 is a valid value, I believe it should reset the stats.
> >>
> >> I agree resetting the stats after setting the target to 0 seems quite
> >> reasonable. But that's not what we do for attribute stats, because in
> >> that case we simply skip the attribute during the future ANALYZE runs
> >> - we don't reset the stats, we keep the existing stats. So I've done
> >> the same thing here, and I've removed the XXX comment.
> >>
> >> If we want to change that, I'd do it in a separate patch for both the
> >> regular and extended stats.
> >
> >Hi, Tomas
> >
> >Sorry for my late reply.
> >You're right. I have no strong opinion whether we'd want to change that
> behavior.
> >I've also confirmed the change in the patch where setting statistics
> >target 0 skips the statistics.
> >
>
> OK, thanks.
>
> >Maybe only some minor nitpicks in the source code comments below:
> >1. "it's" should be "its":
> >> + * Compute statistic target, based on what's set for the
> statistic
> >> + * object itself, and for it's attributes.
> >
> >2. Consistency whether you'd use either "statistic " or "statisticS ".
> >Ex. statistic target vs statisticS target, statistics object vs statistic
> object, etc.
> >
> >> Attached is v4 of the patch, with a couple more improvements:
> >>
> >> 1) I've renamed the if_not_exists flag to missing_ok, because that's
> >> more consistent with the "IF EXISTS" clause in the grammar (the old
> >> flag was kinda the exact opposite), and I've added a NOTICE about the skip.
> >
> >+ bool missing_ok; /* do nothing if statistics does
> not exist */
> >
> >Confirmed. So we ignore if statistic does not exist, and skip the error.
> >Maybe to make it consistent with other data structures in
> >parsernodes.h, you can change the comment of missing_ok to:
> >/* skip error if statistics object does not exist */
> >
>
> Thanks, I've fixed all those places in the attached v5.

I've confirmed the fix.

> >> 2) I've renamed ComputeExtStatsTarget to ComputeExtStatsRows, because
> >> that's what the function was doing anyway (computing sample size).
> >>
> >> 3) I've added a couple of regression tests to stats_ext.sql
> >>
> >> Aside from that, I've cleaned up a couple of places and improved a
> >> bunch of comments. Nothing huge.
> >
> >I have a question though regarding ComputeExtStatisticsRows.
> >I'm just curious with the value 300 when computing sample size.
> >Where did this value come from?
> >
> >+ /* compute sample size based on the statistic target */
> >+ return (300 * result);
> >
> >Overall, the patch is almost already in good shape for commit.
> >I'll wait for the next update.
> >
>
> That's how we compute number of rows to sample, based on the statistics target.
> See std_typanalyze() in analyze.c, which also cites the paper where this comes
> from.
Noted. Found it. Thank you for the reference.

There's just a small whitespace (extra space) below after running git diff --check.
>src/bin/pg_dump/pg_dump.c:7226: trailing whitespace.
>+
It would be better to post an updated patch,
but other than that, I've confirmed the fixes.
The patch passed the make-world and regression tests as well.
I've marked this as "ready for committer".

Regards,
Kirk Jamison

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2019-07-29 01:57:47 Re: COPY command on a table column marked as GENERATED ALWAYS
Previous Message Michael Paquier 2019-07-29 01:51:52 Re: [PATCH] minor bugfix for pg_basebackup (9.6 ~ )