Re: More stable query plans via more predictable column statistics

From: "Shulgin, Oleksandr" <oleksandr(dot)shulgin(at)zalando(dot)de>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, David Steele <david(at)pgmasters(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: More stable query plans via more predictable column statistics
Date: 2016-03-09 16:22:11
Message-ID: CACACo5RA9D+GZ3AjNLqTLRJtO4_L+Z8vAoiTXD60T2_DjLjePQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 9, 2016 at 1:33 PM, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
wrote:

> Hi,
>
> On Wed, 2016-03-09 at 11:23 +0100, Shulgin, Oleksandr wrote:
> > On Tue, Mar 8, 2016 at 8:16 PM, Alvaro Herrera
> > <alvherre(at)2ndquadrant(dot)com> wrote:
> >
> > Also, I can't quite figure out why the "else" now in line 2131
> > is now "else if track_cnt != 0". What happens if track_cnt is
> > zero?
> > The comment above the "if" block doesn't provide any guidance.
> >
> >
> > It is there only to avoid potentially dividing zero by zero when
> > calculating avgcount (which will not be used after that anyway). I
> > agree it deserves a comment.
>
> That definitely deserves a comment. It's not immediately clear why
> (track_cnt != 0) would prevent division by zero in the code. The only
> way such error could happen is if ndistinct==0, because that's the
> actual denominator. Which means this
>
> ndistinct = ndistinct * sample_cnt
>
> would have to evaluate to 0. But ndistinct==0 can't happen as we're in
> the (nonnull_cnt > 0) branch, and that guarantees (standistinct != 0).
>
> Thus the only possibility seems to be (nonnull_cnt==toowide_cnt). Why
> not to use this condition instead?
>

Yes, I now recall that my actual concern was that sample_cnt may calculate
to 0 due to the latest condition above, but that also implies track_cnt ==
0, and then we have a for loop there which will not run at all due to this,
so I figured we can avoid calculating avgcount and running the loop
altogether with that check. I'm not opposed to changing the condition if
that makes the code easier to understand (or dropping it altogether if
calculating 0/0 is believed to be harmless anyway).

--
Alex

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2016-03-09 16:28:14 Re: More stable query plans via more predictable column statistics
Previous Message Tom Lane 2016-03-09 16:15:16 Re: More stable query plans via more predictable column statistics